nas-burnin

History

Brandon Walter 066fbbc403 Some checks are pending Security scan / pip-audit (push) Waiting to run Details Security scan / bandit (push) Waiting to run Details Security scan / gitleaks (push) Waiting to run Details fix: address Codex audit findings (1.0.0-28) Addresses 12 of 13 findings from the Codex tech-debt + security review of versions 1.0.0-22 through 1.0.0-27. Item #5 (live pool re-check before start_job) deferred — would add an SSH round-trip per start. #1 Pool detection now treats zpool / lsblk / findmnt failures INDEPENDENTLY. Previously a single None blew away the whole map, so a host where lsblk lacks zfs_member info but zpool works would never lock pool members. Extended findmnt parser to recognise /dev/mapper/, /dev/dm-, /dev/md, /dev/da, /dev/ada* (LVM, devicemapper, MD RAID, FreeBSD CORE devnames). #2 Admin role enforced on every settings mutation. New auth.require_admin() helper applied to GET /settings, POST /api/v1/settings, /test-smtp, /test-ssh. Previously any authenticated user (the CLI explicitly supports non-admin accounts) could rewrite SMTP/SSH/API secrets. #3 First-user setup race closed. auth.create_user() now accepts bootstrap_only=True which wraps the existence check + insert in BEGIN IMMEDIATE so two concurrent /api/v1/auth/setup requests can't both create admin accounts during the bootstrap window. #4 Case-insensitive uniqueness enforced via new `uniq_users_username_nocase` index. Login does NOCASE lookup so without this `Admin` and `admin` could coexist as distinct rows. #6 New `session_cookie_secure` setting (default False for LAN/dev deploys, set True in production behind HTTPS) flips the session cookie's Secure flag. Defends against on-the-wire exposure when the dashboard is reachable over plain HTTP. #7 Audit trail bound to authenticated identity. Burn-in start / cancel / unlock / drive reset all now use `_operator_for(request)` which reads `request.state.current_user.full_name\|username` instead of the body's operator field. Logged-in users can no longer spoof attribution. Drive reset's literal-"operator" fallback (window._operator was never set) is also fixed by this. #8 Login rate-limit race fixed. New `register_login_attempt()` is atomic check-AND-increment in synchronous code (no awaits inside), so a parallel burst can't slip past the threshold. `record_login_failure()` removed; `clear_login_failures()` now also drops any active lockout for a successful auth. Pre-existing bug where `tripped` was always False (so user_login_locked_out audit events never fired) also fixed. #9 NVMe surface_validate post-format check now mirrors the SSH path: fails on FAILED health AND on real SMART attribute failures, soft-passes SSH-only failures (logged), surfaces warnings to the stage log without failing. #10 retention.backup_db() now writes to `.tmp` then atomic-renames into the canonical daily slot — an interrupted backup leaves the tmp behind but doesn't corrupt the real snapshot. Scheduler marks last_run_date only on (prune AND backup) success so a transient failure gets retried within the 03:00 hour. #11 /health DB probe now exercises the WRITE path via a temp-table INSERT/SELECT/COMMIT round-trip. Previously only read PRAGMA journal_mode + a row count, which silently passes on read-only mounts and broken-WAL conditions. #12 security-scan.sh now fails loudly if `git fetch` or `git reset --hard origin/main` errors (was `\|\| true`, scanning stale code silently). pip-audit now runs in a throwaway python:3.12-slim container against requirements.txt instead of `docker exec`-ing into the live truenas-burnin container — cleaner separation, no transient package install on prod. #13 Badblocks SSH stage no longer doubles its log_text. Previously appended every 20-line chunk during streaming AND the full accumulated output at end. Now only flushes the un-flushed tail (typically <20 lines). `result["output"]` stays in-memory only. Verification: all 44 unit tests pass in container; /health 200; security scan returns 0 findings; deployed maple build is green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-02 18:48:16 -04:00
..
static	feat: secret handling — status badges + redacted endpoint + rotation audit (1.0.0-26)	2026-05-02 18:15:57 -04:00
templates	feat: secret handling — status badges + redacted endpoint + rotation audit (1.0.0-26)	2026-05-02 18:15:57 -04:00
__init__.py	Initial commit — TrueNAS Burn-In Dashboard v0.5.0	2026-02-24 00:08:29 -05:00
auth.py	fix: address Codex audit findings (1.0.0-28)	2026-05-02 18:48:16 -04:00
auth_cli.py	feat: app-level login + hardening sweep (1.0.0-22 -> 1.0.0-23)	2026-05-02 11:08:29 -04:00
burnin.py	fix: address Codex audit findings (1.0.0-28)	2026-05-02 18:48:16 -04:00
config.py	fix: address Codex audit findings (1.0.0-28)	2026-05-02 18:48:16 -04:00
database.py	fix: address Codex audit findings (1.0.0-28)	2026-05-02 18:48:16 -04:00
logging_config.py	Initial commit — TrueNAS Burn-In Dashboard v0.5.0	2026-02-24 00:08:29 -05:00
mailer.py	feat: app-level login + hardening sweep (1.0.0-22 -> 1.0.0-23)	2026-05-02 11:08:29 -04:00
main.py	fix: address Codex audit findings (1.0.0-28)	2026-05-02 18:48:16 -04:00
models.py	feat: pool-membership lock + cancellation hardening + smart_health refresh + tunables (1.0.0-13 -> 1.0.0-21)	2026-05-02 09:25:56 -04:00
notifier.py	Stage 7: SSH architecture, SMART attribute monitoring, drive reset, and polish	2026-02-24 08:09:30 -05:00
poller.py	fix: address Codex audit findings (1.0.0-28)	2026-05-02 18:48:16 -04:00
renderer.py	Stage 7: SSH architecture, SMART attribute monitoring, drive reset, and polish	2026-02-24 08:09:30 -05:00
retention.py	fix: address Codex audit findings (1.0.0-28)	2026-05-02 18:48:16 -04:00
routes.py	fix: address Codex audit findings (1.0.0-28)	2026-05-02 18:48:16 -04:00
settings_store.py	feat: pool-membership lock + cancellation hardening + smart_health refresh + tunables (1.0.0-13 -> 1.0.0-21)	2026-05-02 09:25:56 -04:00
ssh_client.py	fix: address Codex audit findings (1.0.0-28)	2026-05-02 18:48:16 -04:00
terminal.py	chore: re-sync deployed work that pre-dates this session	2026-05-02 09:24:42 -04:00
truenas.py	chore: re-sync deployed work that pre-dates this session	2026-05-02 09:24:42 -04:00