Addresses 12 of 13 findings from the Codex tech-debt + security review of versions 1.0.0-22 through 1.0.0-27. Item #5 (live pool re-check before start_job) deferred — would add an SSH round-trip per start. #1 Pool detection now treats zpool / lsblk / findmnt failures INDEPENDENTLY. Previously a single None blew away the whole map, so a host where lsblk lacks zfs_member info but zpool works would never lock pool members. Extended findmnt parser to recognise /dev/mapper/*, /dev/dm-*, /dev/md*, /dev/da*, /dev/ada* (LVM, devicemapper, MD RAID, FreeBSD CORE devnames). #2 Admin role enforced on every settings mutation. New auth.require_admin() helper applied to GET /settings, POST /api/v1/settings, /test-smtp, /test-ssh. Previously any authenticated user (the CLI explicitly supports non-admin accounts) could rewrite SMTP/SSH/API secrets. #3 First-user setup race closed. auth.create_user() now accepts bootstrap_only=True which wraps the existence check + insert in BEGIN IMMEDIATE so two concurrent /api/v1/auth/setup requests can't both create admin accounts during the bootstrap window. #4 Case-insensitive uniqueness enforced via new `uniq_users_username_nocase` index. Login does NOCASE lookup so without this `Admin` and `admin` could coexist as distinct rows. #6 New `session_cookie_secure` setting (default False for LAN/dev deploys, set True in production behind HTTPS) flips the session cookie's Secure flag. Defends against on-the-wire exposure when the dashboard is reachable over plain HTTP. #7 Audit trail bound to authenticated identity. Burn-in start / cancel / unlock / drive reset all now use `_operator_for(request)` which reads `request.state.current_user.full_name|username` instead of the body's operator field. Logged-in users can no longer spoof attribution. Drive reset's literal-"operator" fallback (window._operator was never set) is also fixed by this. #8 Login rate-limit race fixed. New `register_login_attempt()` is atomic check-AND-increment in synchronous code (no awaits inside), so a parallel burst can't slip past the threshold. `record_login_failure()` removed; `clear_login_failures()` now also drops any active lockout for a successful auth. Pre-existing bug where `tripped` was always False (so user_login_locked_out audit events never fired) also fixed. #9 NVMe surface_validate post-format check now mirrors the SSH path: fails on FAILED health AND on real SMART attribute failures, soft-passes SSH-only failures (logged), surfaces warnings to the stage log without failing. #10 retention.backup_db() now writes to `.tmp` then atomic-renames into the canonical daily slot — an interrupted backup leaves the tmp behind but doesn't corrupt the real snapshot. Scheduler marks last_run_date only on (prune AND backup) success so a transient failure gets retried within the 03:00 hour. #11 /health DB probe now exercises the WRITE path via a temp-table INSERT/SELECT/COMMIT round-trip. Previously only read PRAGMA journal_mode + a row count, which silently passes on read-only mounts and broken-WAL conditions. #12 security-scan.sh now fails loudly if `git fetch` or `git reset --hard origin/main` errors (was `|| true`, scanning stale code silently). pip-audit now runs in a throwaway python:3.12-slim container against requirements.txt instead of `docker exec`-ing into the live truenas-burnin container — cleaner separation, no transient package install on prod. #13 Badblocks SSH stage no longer doubles its log_text. Previously appended every 20-line chunk during streaming AND the full accumulated output at end. Now only flushes the un-flushed tail (typically <20 lines). `result["output"]` stays in-memory only. Verification: all 44 unit tests pass in container; /health 200; security scan returns 0 findings; deployed maple build is green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
116 lines
5.3 KiB
Python
116 lines
5.3 KiB
Python
from pydantic_settings import BaseSettings, SettingsConfigDict
|
|
|
|
|
|
class Settings(BaseSettings):
|
|
model_config = SettingsConfigDict(
|
|
env_file=".env",
|
|
env_file_encoding="utf-8",
|
|
case_sensitive=False,
|
|
)
|
|
|
|
app_host: str = "0.0.0.0" # nosec B104 — container deliberately binds all interfaces; nginx-proxy-manager fronts it.
|
|
app_port: int = 8080
|
|
db_path: str = "/data/app.db"
|
|
|
|
truenas_base_url: str = "http://localhost:8000"
|
|
truenas_api_key: str = "mock-key"
|
|
truenas_verify_tls: bool = False
|
|
|
|
poll_interval_seconds: int = 12
|
|
stale_threshold_seconds: int = 45
|
|
max_parallel_burnins: int = 2
|
|
surface_validate_seconds: int = 45 # mock simulation duration
|
|
io_validate_seconds: int = 25 # mock simulation duration
|
|
|
|
# Logging
|
|
log_level: str = "INFO"
|
|
|
|
# Security — comma-separated IPs or CIDRs, e.g. "10.0.0.0/24,127.0.0.1"
|
|
# Empty string means allow all (default).
|
|
allowed_ips: str = ""
|
|
|
|
# SMTP — daily status email at 8am local time
|
|
# Leave smtp_host empty to disable email.
|
|
smtp_host: str = ""
|
|
smtp_port: int = 587
|
|
smtp_user: str = ""
|
|
smtp_password: str = ""
|
|
smtp_from: str = ""
|
|
smtp_to: str = "" # comma-separated recipients
|
|
smtp_report_hour: int = 8 # local hour to send (0-23)
|
|
smtp_daily_report_enabled: bool = True # set False to skip daily report without disabling alerts
|
|
smtp_alert_on_fail: bool = True # immediate email when a job fails
|
|
smtp_alert_on_pass: bool = False # immediate email when a job passes
|
|
smtp_ssl_mode: str = "starttls" # "starttls" | "ssl" | "plain"
|
|
smtp_timeout: int = 60 # connection + read timeout in seconds
|
|
|
|
# Webhook — POST JSON payload on every job state change (pass/fail)
|
|
# Leave empty to disable. Works with Slack, Discord, ntfy, n8n, etc.
|
|
webhook_url: str = ""
|
|
|
|
# Stuck-job detection: jobs running longer than this are marked 'unknown'
|
|
stuck_job_hours: int = 24
|
|
|
|
# Temperature thresholds (°C) — drives table colouring + precheck gate
|
|
temp_warn_c: int = 46 # orange warning
|
|
temp_crit_c: int = 55 # red critical (precheck refuses to start above this)
|
|
|
|
# Bad-block tolerance — surface_validate fails if bad blocks exceed this
|
|
bad_block_threshold: int = 0
|
|
|
|
# Surface-validate (badblocks) tunables — defaults match the Spearfoot
|
|
# disk-burnin.sh community script's recommended geometry for large HDDs.
|
|
# block_size : -b in bytes; aligned to AF (4 KiB) sectors. Bumping
|
|
# to 8192 roughly halves badblocks runtime on multi-TB
|
|
# drives at the cost of ~2x RAM in the test buffer.
|
|
# block_buffer : -c blocks held in memory per IO. 64 = badblocks
|
|
# default. Higher values = larger buffer, faster IO,
|
|
# more RAM (block_size * block_buffer bytes per pass).
|
|
# passes : -p value. 1 = repeat until one consecutive clean
|
|
# scan (current behavior). 2-3 for paranoid burn-in
|
|
# that re-confirms after finding errors.
|
|
surface_validate_block_size: int = 4096
|
|
surface_validate_block_buffer: int = 64
|
|
surface_validate_passes: int = 1
|
|
|
|
# SSH credentials for direct TrueNAS command execution (Stage 7)
|
|
# When ssh_host is set, burn-in stages use SSH for smartctl/badblocks instead of REST API.
|
|
# Leave ssh_host empty to use the mock/REST API (development mode).
|
|
ssh_host: str = ""
|
|
ssh_port: int = 22
|
|
ssh_user: str = "root" # TrueNAS CORE default is root
|
|
ssh_password: str = "" # Password auth (leave blank if using key)
|
|
ssh_key: str = "" # PEM private key content (paste full key including headers)
|
|
|
|
# Application version — used by the /api/v1/updates/check endpoint
|
|
app_version: str = "1.0.0-28"
|
|
|
|
# ---- Authentication (1.0.0-22) ----
|
|
# session_secret: HMAC key for signing session cookies. Empty = generate
|
|
# one and persist to /data/session_secret on first run (sessions survive
|
|
# restarts but rotate if the file is deleted). Set explicitly via
|
|
# SESSION_SECRET env var if you want to share secrets across replicas.
|
|
session_secret: str = ""
|
|
session_max_age_seconds: int = 60 * 60 * 24 * 7 # 7 days
|
|
# Set to True when the dashboard is exclusively reachable over HTTPS
|
|
# (typical when fronted by nginx-proxy-manager with TLS). Refuses to
|
|
# send the session cookie on plain HTTP, eliminating the on-the-wire
|
|
# exposure surface. Leaving False allows initial deploy + LAN testing.
|
|
session_cookie_secure: bool = False
|
|
# Initial admin bootstrap. If both env vars are set AND the users table
|
|
# is empty at startup, create that account immediately. After that the
|
|
# env vars are ignored — change passwords via the UI / database, not
|
|
# by editing compose.yml.
|
|
initial_admin_username: str = ""
|
|
initial_admin_password: str = ""
|
|
|
|
# ---- Retention + backup (1.0.0-23) ----
|
|
# log_days : burnin_stages.log_text NULLed out after this many days
|
|
# (history rows themselves are preserved). Default keeps
|
|
# ~5 weeks; long-soak burn-ins typically finish in <2.
|
|
# backup_keep: number of nightly DB snapshots to keep in /data/backups.
|
|
retention_log_days: int = 35
|
|
retention_backup_keep: int = 14
|
|
|
|
|
|
settings = Settings()
|