Commit graph

17 commits

Author SHA1 Message Date
Brandon Walter
c906ab15f7 feat: job-level Est. completion in drawer header (1.0.0-53)
Some checks are pending
Security scan / pip-audit (push) Waiting to run
Security scan / bandit (push) Waiting to run
Security scan / gitleaks (push) Waiting to run
Security scan / mypy (push) Waiting to run
The drawer's per-stage Finish chip is the stage's finish, not the
whole burn-in's. Added a right-aligned "Est. completion" pill in
the drawer-job-header that uses the server-weighted burnin.percent
to extrapolate the whole job's finish time (covers precheck + SMART
+ surface_validate + final_check).

Suppressed under 0.5% job progress to avoid the early-sample
overshoot we saw earlier ("Finish: Sep 22" on a fresh start).

Bind-mount only (templates + static); no rebuild needed. Running
container reports 1.0.0-52 until next rebuild; this commit just
catches the source version up.
2026-05-10 22:45:04 -07:00
Brandon Walter
7f959e6f4c feat: prominent failure-reason block + heuristic in drawer (1.0.0-50)
Some checks are pending
Security scan / pip-audit (push) Waiting to run
Security scan / bandit (push) Waiting to run
Security scan / gitleaks (push) Waiting to run
Security scan / mypy (push) Waiting to run
When a stage ends in failed/cancelled/unknown the drawer now shows
a coloured "Reason" pill at the top of that stage's section. Three
sources, in order of preference:

1. stage.error_text (the canonical, when set)
2. job.error_text (backfilled in the drawer endpoint when stage's
   own is empty — catches orphan rows from hard crashes like the
   pre-busy-timeout DB-locked failures)
3. Heuristic: if log_text is tiny (<500 bytes, just the START
   banner) AND no real badblocks progress was recorded, label as
   "Stopped without recording an error — likely cause: SSH
   connection drop or container restart while this stage was
   running." This catches the fingerprint of a deploy-during-burn-in
   killing the SSH session.

Otherwise: "No error message recorded." so there's never a blank
where the operator expects to see why something broke.

Red styling for failed, yellow for cancelled/unknown. Replaces the
inline stage-error-line for terminal states; the existing
stage-error-line still renders for non-terminal contexts.
2026-05-09 12:06:11 -07:00
Brandon Walter
f5c6b85402 feat: client-side column sorting with SSE re-apply (1.0.0-48)
Some checks are pending
Security scan / pip-audit (push) Waiting to run
Security scan / bandit (push) Waiting to run
Security scan / gitleaks (push) Waiting to run
Security scan / mypy (push) Waiting to run
Clickable headers on Drive / Serial / Size / Temp / Health / Short
SMART / Long SMART / Burn-In. Click cycles asc → desc → cleared,
with a small ▲/▼ indicator next to the active column.

Sort state lives in localStorage so it survives reload AND every
SSE-driven tbody refresh (HTMX swaps `#drives-table-wrap` innerHTML
on each `drives-update` event). The htmx:afterSwap hook re-applies
the sort and re-paints indicators.

Sortable values are emitted as data-sort-* attributes on each <tr>:
- raw devname / serial / size_bytes / temperature_c
- numeric priority maps for SMART health, SMART test states, and
  burn-in state (so "running" sorts ahead of "passed" regardless
  of alphabetical order)

Empty values always sink to the bottom regardless of direction so
"sort by temp asc" doesn't put a missing-temp drive on top.
2026-05-08 23:48:04 -07:00
Brandon Walter
383258df97 feat: phase caption + bad-block badge + per-pattern history (1.0.0-47)
Some checks are pending
Security scan / pip-audit (push) Waiting to run
Security scan / bandit (push) Waiting to run
Security scan / gitleaks (push) Waiting to run
Security scan / mypy (push) Waiting to run
Three additions to the surface_validate drawer block:

1. **Phase caption** below the meters: "Pattern 2 of 4 · Verify 0x55
   · 47% within phase". Pure JS — no schema change. Makes the
   visual grammar explicit without needing the operator to mentally
   map phase=4 to "verifying pattern 2".

2. **Bad-block badge** in the vitals row. Green at 0, red at >0.
   The number was already on the stage row but burying it in the
   log felt wrong — surfacing it next to temp/speed/ETA keeps it
   in eye-line during long runs.

3. **Per-pattern duration history** below the caption. New
   bb_phase_history JSON column (idempotent migration) maps
   {phase_num: ts}. Parser stamps the timestamp on every phase
   transition (and on stage entry for phase 1). Drawer diffs
   consecutive write-phase starts to derive "0xaa: 14h 22m"
   for completed patterns. Once one pattern is done you can
   predict the rest without leaving the drawer.

Persistence is idempotent: re-entry of the same phase keeps the
original timestamp so a transient parser reset doesn't blow away
history. JSON parse failures fail gracefully (no row rendered).
2026-05-08 23:23:02 -07:00
Brandon Walter
6b2367b892 feat: vital-signs strip above per-pattern meters (1.0.0-46)
Some checks are pending
Security scan / pip-audit (push) Waiting to run
Security scan / bandit (push) Waiting to run
Security scan / gitleaks (push) Waiting to run
Security scan / mypy (push) Waiting to run
The drawer's surface_validate area now leads with a row of operator
vitals computed from data already in the response:

- Temp: drive temperature with cool/warm/hot colour (≥48 red, ≥42 yellow)
- Speed: live MB/s, NULL until second progress sample arrives
- Elapsed: time since stage started_at
- ETA: extrapolated from overall progress; suppressed under 0.5%
  to avoid the "47 days remaining" artefact early in pattern 1

Live MB/s comes from a new bb_mbps column on burnin_stages, computed
in the badblocks parser as (delta_overall_pct / 800) * size_bytes / dt.
Skipped on phase transitions (per-phase pct resets) and sub-second
samples (noisy).

Drawer endpoint now passes drive.temperature_c through; JS stashes
the latest drive object in _DRAWER_LAST_DRIVE so the burn-in renderer
can pull it for the vitals row without changing call signatures.

Tightened table CSS in this same session is unrelated and shipped
already in earlier rounds via the bind-mounted app.css.
2026-05-08 23:13:58 -07:00
Brandon Walter
30062affc2 feat: per-pattern badblocks meters in drive drawer (1.0.0-44)
Some checks are pending
Security scan / pip-audit (push) Waiting to run
Security scan / bandit (push) Waiting to run
Security scan / gitleaks (push) Waiting to run
Security scan / mypy (push) Waiting to run
User asked for one meter per badblocks pattern. The drawer now shows
4 meters (one per pattern: 0xaa / 0x55 / 0xff / 0x00), each split
into write (left, blue) + verify (right, green) halves so a glance
shows both which pattern is current AND whether you're writing or
verifying within it.

Backend:
- New columns burnin_stages.bb_phase (1-8) + bb_phase_pct (0-100)
  via idempotent ALTER TABLE migration
- _update_stage_bb_phase() helper called from the badblocks parser
  on every tick (when phase or percent changes)
- /api/v1/drives/{id}/drawer SELECT now returns the new fields

Frontend (app.js + app.css):
- _drawerRenderBadblocksMeters(phase, phasePct) computes per-pattern
  fill state and emits 4-meter HTML with W/V sub-labels
- Conditional render: only shows when stage_name === 'surface_validate'
  AND bb_phase is set, so historical pre-1.0.0-44 stage rows render
  unchanged (single percent, no meters)

3 new tests cover the migration columns, single-tick persistence,
and overwrite-on-second-tick. Total suite: 75 tests.

Image rebuilt and tagged but NOT deployed — 4 burn-ins are running
right now and a recreate would SIGHUP them. Deploy with
`docker compose up -d` after the current batch finishes; the
migration runs at init and the meters light up for the next batch.
2026-05-08 22:34:35 -07:00
Brandon Walter
a8a7d99621 rename: TrueNAS Burn-In → NAS Burn-In (1.0.0-38)
Some checks are pending
Security scan / pip-audit (push) Waiting to run
Security scan / bandit (push) Waiting to run
Security scan / gitleaks (push) Waiting to run
Security scan / mypy (push) Waiting to run
Product display name only — page titles, headers, email, browser
notification, FastAPI app title. Repo, container_name, file paths,
and infrastructure identifiers (truenas-burnin everywhere) stay put
to avoid breaking deployment.
2026-05-03 14:01:40 -04:00
Brandon Walter
11218753ce feat: secret handling — status badges + redacted endpoint + rotation audit (1.0.0-26)
Closes #5 of the post-Codex hardening list:

* Settings UI now shows a `[set]` (green) or `[unset]` (gray) badge next
  to every password/key field. Tells the operator at a glance which
  secrets are configured without ever rendering the value.

* SSH key gets a granular source label: `set (environment variable)`,
  `set (mounted secret)`, or `set (stored in settings DB — prefer a
  mounted secret in production)`. Same hint copy in the field's help
  text now actively recommends `/run/secrets/ssh_key` over the textarea.

* New `GET /api/v1/settings/redacted` admin-only endpoint dumps every
  editable setting with secrets replaced by `***`, plus the per-secret
  status map. Useful for ops triage ("what's actually loaded?") without
  the secrets ever leaving the container or hitting a transcript.

* `POST /api/v1/settings` writes a `settings_secret_changed` audit event
  whenever a non-empty secret is rotated. Records field names, operator,
  source IP — never the value. Lets the audit page answer "who rotated
  the SMTP password last week?".

Internal: `_SECRET_FIELDS` constant in routes.py is now the single
source of truth for which fields get the redaction / audit treatment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 18:15:57 -04:00
Brandon Walter
d4c0770b9e feat: app-level login + hardening sweep (1.0.0-22 -> 1.0.0-23)
Two layered changes shipped in this branch:

== 1.0.0-22: app-level authentication ==

The dashboard previously had only an IP allowlist. Adds username +
bcrypt password auth, signed-cookie sessions, and a "first user setup"
flow.

* New app/auth.py: User dataclass, bcrypt hash/verify, get_user_by_id/
  username, create_user, touch_last_login, FastAPI `get_current_user`
  dependency. Session secret loaded from SESSION_SECRET env or persisted
  to /data/session_secret.
* New app/auth_cli.py: `python -m app.auth_cli list|reset|add` for
  out-of-band user management. Passwords always read from a TTY prompt.
* Schema: idempotent ALTER for `users` table (id, username unique,
  password_hash, full_name, is_admin, created_at, last_login_at).
* main.py: SessionMiddleware (HMAC-signed cookie, max-age 7 days,
  SameSite=strict — see hardening section) + _AuthGateMiddleware that
  populates request.state.current_user and bounces unauth'd HTML GETs
  to /login while returning 401 JSON for everything else.
* Routes: GET /login renders first-user-setup form when users table is
  empty otherwise sign-in form; POST /login; POST /api/v1/auth/setup
  (only works while empty); GET|POST /logout.
* Bootstrap: env vars INITIAL_ADMIN_USERNAME + INITIAL_ADMIN_PASSWORD
  create the first admin on startup if both set AND users table empty.
  Ignored thereafter — change passwords via UI or CLI.
* Layout: header shows current_user.full_name|username + Logout link.
  Modal operator field auto-fills from the logged-in user via
  <meta name="default-operator"> rendered in layout (replaces the
  localStorage-only previous behaviour).
* requirements.txt: pinned bcrypt>=4.0,<5.0, itsdangerous>=2.1,
  python-multipart>=0.0.7. First step toward addressing the
  unpinned-deps gotcha.
* New app/templates/login.html with first-user-setup variant.

== 1.0.0-23: hardening sweep ==

Closes the eight-item gap audit:

* DB retention + automated backup. New app/retention.py runs daily at
  03:00 local. Nulls burnin_stages.log_text on stages older than
  retention_log_days (default 35), VACUUMs to reclaim pages, then runs
  `sqlite3 .backup` to /data/backups/app-YYYY-MM-DD.db keeping the
  retention_backup_keep most recent (default 14). Wired into the
  lifespan supervisor next to mailer/poller.

* CSRF mitigation. SessionMiddleware bumped to SameSite=strict so the
  browser refuses to send the session cookie on cross-site POSTs —
  removes the actual CSRF vector. Trade-off: external links into the
  app require re-auth.

* Login rate limiting. In-memory per-username AND per-source-IP failure
  counters in auth.py. 10 failures within 10 min trips a 15-min lockout
  for both keys. Returns HTTP 429 with a clear "try again in N min"
  message. Cleared on successful login.

* Login audit events. New event types in audit_events: user_login,
  user_login_failed, user_login_locked_out, user_logout,
  user_password_changed. All include source IP. Recorded via
  auth.audit_auth_event().

* Password change UI. Header link "Change password" opens
  templates/components/modal_password.html (current/new/confirm).
  Posts to POST /api/v1/auth/change-password — bcrypt-verifies current,
  requires >=8 char new pw, writes audit event.

* NVMe burn-in path. _stage_surface_validate now detects nvme*
  devnames and routes to _stage_surface_validate_nvme() which runs
  `nvme format -s 1 --force` (cryptographic erase). Seconds vs hours
  of badblocks, exercises the controller's secure-erase. Falls back
  to badblocks if nvme-cli isn't installed. Post-format SMART check.

* Mounted-FS detection. ssh_client.get_mounted_drives() runs
  `findmnt -no SOURCE`, parses non-ZFS sources back to base devnames.
  Poller treats them as pool_name='(mounted)', pool_role='mounted'.
  Confirm token DESTROY MOUNTED FILESYSTEM, distinct purple styling,
  audit event mounted_drive_unlocked, daily-report banner picks it up.

* Deeper /health. Real readiness check — DB write probe (PRAGMA
  journal_mode), poller freshness (age <= 3x stale_threshold), SSH
  test_connection() when configured. Returns 503 when any check fails
  so a proxy/orchestrator can take the container out of rotation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 11:08:29 -04:00
Brandon Walter
5da1a1704f feat: pool-membership lock + cancellation hardening + smart_health refresh + tunables (1.0.0-13 -> 1.0.0-21)
Substantial feature + reliability sweep. Each version below was developed,
tested live against the maple/TrueNAS deployment, and Codex-reviewed
before bundling.

1.0.0-13 — asyncssh proc.kill() doesn't actually kill the remote process
  (sshd ignores SSH signal-channel requests by default), so a cancel of a
  long-running badblocks left the remote process running and proc.wait()
  hanging — pinning the asyncio.Semaphore slot forever.

  * Wrap long-lived commands in `sh -c 'echo PID:$$; exec <cmd>'` to
    capture the remote PID; store in burnin._remote_pids[job_id].
  * burnin._kill_remote_process(job_id) opens a fresh SSH session and
    issues `kill -9 <pid>` — sshd honours that.
  * Bound proc.wait() with asyncio.wait_for(timeout=15).
  * burnin._active_tasks tracks every _run_job task so cancel_job and
    check_stuck_jobs can actually cancel the asyncio task (was DB-only
    before). Also fixes the documented asyncio.create_task GC gotcha
    (weak refs only).
  * _run_job finalizer reads current state and skips the write if state
    != 'running' so cancelled/unknown aren't clobbered.

1.0.0-14 — poller._upsert_drive ON CONFLICT only refreshed temperature/
  health/poll timestamps; devname/serial/model/size_bytes were stuck at
  first-INSERT values forever. After kernel SCSI re-enumeration two
  drives could both show as `sda`. Fixed by updating all six fields.
  Also added 7-day stale filter to _DRIVES_QUERY so removed drives drop
  off the dashboard while audit/burnin_jobs FKs stay intact.

1.0.0-15/-16 — pool-membership lock.
  * ssh_client.get_pool_membership() runs `zpool list -vHP` and parses
    the flattened TrueNAS output (container vdevs + their device children
    both appear at depth 1; section markers cache/log/spare/special/dedup
    switch the role).
  * ssh_client.get_zfs_member_drives() runs `lsblk -no NAME,FSTYPE -l`
    to detect drives carrying ZFS labels not in any active pool — they
    get pool_name='(exported)', pool_role='exported'.
  * Three idempotent ALTER TABLE migrations on drives:
    pool_name/pool_role/pool_seen_at.
  * burnin.start_job raises PoolMemberError if pool_name IS NOT NULL and
    the drive isn't in burnin._unlock_grants. Routes layer maps to 409
    with structured detail {pool_name, pool_role, pool_locked: true} so
    the frontend can render an unlock affordance.
  * POST /api/v1/drives/{id}/unlock accepts {confirm_token, operator,
    reason}. Token is the pool name for active pools, "DESTROY BOOT POOL"
    for boot-pool, "DESTROY EXPORTED POOL" for exported. Reason >= 5
    chars. TTL = UNLOCK_TTL_SECONDS = 600. Audit event types:
    pool_drive_unlocked / boot_pool_drive_unlocked /
    exported_pool_drive_unlocked.
  * Grants are in-memory only — container restart wipes them.
  * UI: lock icon (yellow/red/orange), pool pill, conditional Unlock vs
    Burn-In button. modal_unlock.html with type-to-confirm field.
    Live unlock countdown via tickUnlockCountdowns() in app.js.
  * Daily report: red banner listing every unlock event from the last
    24h, with operator + reason + timestamp.

1.0.0-17 — Codex review fail-open + XSS + structured-error fixes.
  * ssh_client.get_pool_membership / get_zfs_member_drives now return
    None on failure (vs {} for 'definitely empty'). poller passes
    update_pool=False to _upsert_drive on detection failure, preserving
    existing pool columns instead of clearing them. Without this fix a
    1-second SSH blip silently unlocked every drive.
  * mailer._build_unlock_banner_html escapes every interpolated field
    via html.escape() (was '<' only). Time filter switched to
    julianday() — string >= against datetime('now', '-1 day') compared
    formats with different separators ('T' vs ' ') and timezone
    suffixes, causing subtle off-by-N-hour inclusion.
  * app.js submitStart/submitBatchStart now detect the structured
    pool_locked 409 detail and auto-open the unlock modal for the
    offending drive (was [object Object] in toast).

1.0.0-18 — Codex grant-binding + commit-ordering fixes.
  * Unlock grants bound to the (pool_name, pool_role) observed at unlock
    time. _UnlockGrant dataclass; _is_unlocked and unlock_expiry
    invalidate the grant if the live row's pool identity has changed.
    Prevents an 'exported' unlock from carrying over when the drive
    turns out to be in active 'tank' or 'boot-pool'.
  * grant_pool_unlock now writes to _unlock_grants only AFTER db.commit()
    succeeds — previously a failed audit insert left an unaudited grant
    armed.

1.0.0-19 — Codex race + cancellation classification + test scaffold.
  * Partial unique index uniq_active_burnin_per_drive ON burnin_jobs
    (drive_id) WHERE state IN ('queued','running'). INSERT now wraps in
    try/except aiosqlite.IntegrityError -> ValueError so the read-then-
    insert race in start_job can't produce two queued rows for the same
    drive.
  * _run_job tracks was_cancelled flag; on bare task.cancel() (shutdown,
    future code paths) where DB state is still 'running', finalizer
    writes 'unknown' instead of mis-classifying as 'failed'.
  * tests/ stdlib unittest scaffold:
    - test_pool_parser.py (21 tests): mirror/raidz/draid container vdevs,
      single-disk depth-1, plural section markers, partition stripping,
      sdaa-style names, multi-pool, role reset between pools.
    - test_unlock_flow.py (18 tests): token validation per pool kind,
      identity-binding invalidation, TTL expiry, audit-commit-then-arm
      ordering, unique-active-burnin partial index.
    Run via `python -m unittest discover tests/`. No new dependencies.

1.0.0-20 — Spearfoot-inspired badblocks tunables.
  * surface_validate_block_size (-b, default 4096), surface_validate_
    block_buffer (-c, default 64), surface_validate_passes (-p, default
    1) exposed in Settings UI; persist via settings_store.json.
    Validation: block size must be a power of 2 between 512 and
    1048576. Defaults preserve existing behaviour. Bumping to 8192/64/1
    roughly halves runtime on multi-TB HDDs at ~2x RAM cost.

1.0.0-21 — SMART overall-health column actually populated.
  * /api/v2.0/disk doesn't expose smart_health, so every drive defaulted
    to UNKNOWN forever (only burn-in stages ever wrote a real value).
  * ssh_client.get_smart_health_map([devnames]) runs `smartctl -H` for
    all drives in a single SSH session, deterministically delimited with
    @@devname@@ ... @@END@@ markers. Returns {devname: PASSED|FAILED|
    UNKNOWN} or None on SSH failure.
  * poller calls it every 5th cycle (~1 min at default 12s interval),
    caches in _state['smart_health_cache'] so transient failures preserve
    the previous values.
  * Dashboard CSS: col-smart min-width 150 -> 95, horizontal padding 14
    -> 6 so Short/Long SMART columns fit comfortably on a 13-inch
    display.
  * 5 additional parser tests (44 total, all passing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 09:25:56 -04:00
Brandon Walter
5a802bff2e feat: live SSH terminal in drawer (xterm.js + asyncssh WebSocket)
Adds a Terminal tab to the log drawer with a full PTY session bridged
over WebSocket to the TrueNAS SSH host. xterm.js loaded lazily on
first tab open. Supports resize, paste, full color, and reconnect.

- app/terminal.py: asyncssh PTY ↔ WebSocket bridge
- routes.py: @router.websocket("/ws/terminal")
- dashboard.html: Terminal tab + drawer panel
- app.js: xterm.js lazy load, init, onData, resize observer, reconnect
- app.css: terminal panel styles (no padding, overflow hidden)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 09:30:56 -05:00
Brandon Walter
70c26121a8 ui: move version badge next to title in header left side
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 09:23:10 -05:00
Brandon Walter
22ed2c6e12 fix: JS syntax error breaking all buttons; add settings restart banner
app.js: stages.forEach callback in _drawerRenderBurnin was missing its
closing });, causing a syntax error that prevented the entire script
from loading — all click handlers (Short/Long SMART, Burn-In, cancel)
were unregistered as a result.

settings.html: add a prominent yellow restart banner with the docker
command (docker compose restart app) that appears after saving any
system settings that require a container restart to take effect.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 08:57:57 -05:00
Brandon Walter
2dff58bd52 Stage 7: SSH architecture, SMART attribute monitoring, drive reset, and polish
SSH (app/ssh_client.py — new):
- asyncssh-based client: start_smart_test, poll_smart_progress, abort_smart_test,
  get_smart_attributes, run_badblocks with streaming progress callbacks
- SMART attribute table: monitors attrs 5/10/188/197/198/199 for warn/fail thresholds
- Falls back to REST API / mock simulation when ssh_host is not configured

Burn-in stages updated (burnin.py):
- _stage_smart_test: SSH path polls smartctl -a, stores raw output + parsed attributes
- _stage_surface_validate: SSH path streams badblocks, counts bad blocks vs configurable threshold
- _stage_final_check: SSH path checks smartctl attributes; DB fallback for mock mode
- New DB helpers: _append_stage_log, _update_stage_bad_blocks, _store_smart_attrs,
  _store_smart_raw_output

Database (database.py):
- Migrations: burnin_stages.log_text, burnin_stages.bad_blocks,
  drives.smart_attrs (JSON), smart_tests.raw_output

Settings (config.py + settings_store.py):
- ssh_host, ssh_port, ssh_user, ssh_password, ssh_key — all runtime-editable
- SSH section in Settings UI with Test SSH Connection button

Webhook (notifier.py):
- Added bad_blocks and timestamp fields to payload per SPEC

Drive reset (routes.py + drives_table.html):
- POST /api/v1/drives/{id}/reset — clears SMART state, smart_attrs; audit logged
- Reset button visible on drives with completed test state (no active burn-in)

Log drawer (app.js):
- Burn-In tab: shows raw stage log_text (SSH output) with bad block highlighting
- SMART tab: shows SMART attribute table with warn/fail colouring + raw smartctl output

Polish:
- Version badge (v1.0.0-6d) in header via Jinja2 global
- Parallel burn-in warning when max_parallel_burnins > 8 in Settings
- Stats page: avg duration by drive size + failure breakdown by stage
- settings.html: SSH section with key textarea, parallel warn div

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 08:09:30 -05:00
Brandon Walter
4ab54d7ed8 Add temp thresholds, bad block threshold, editable system settings, check for updates, history completed time
- config.py: add temp_warn_c (46°C), temp_crit_c (55°C), bad_block_threshold (0), app_version
- settings_store.py: expose all new fields + system settings (truenas_base_url, api_key, poll_interval, etc.) as editable; save to JSON for persistence; add validation for log_level, poll/stale intervals, temp range
- renderer.py: _temp_class() now reads temp_warn_c/temp_crit_c from settings instead of hardcoded 40/50
- burnin.py: precheck uses settings.temp_crit_c; fix NameError bug (_execute_stages referenced 'profile' that was not in scope)
- routes.py: add GET /api/v1/updates/check (Forgejo releases API); settings_page passes new editable fields; save_settings skips empty truenas_api_key like smtp_password
- settings.html: move system settings from read-only card into editable form; add temp/bad-block fields to Burn-In Behavior; add Check for Updates button; restart-required indicator on save
- history.html: add Completed (finished_at) column next to Started
- app.css: toast container shifts up when drawer is open (body.drawer-open)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 07:43:23 -05:00
Brandon Walter
c0f9098779 feat: add log drawer (Stage 7a)
Click any drive row to slide up a drawer with three tabs:
- Burn-In: stage timeline with state icons, elapsed timers, error lines in red
- SMART: short and long test status, timestamps, progress
- Events: last 50 audit events for the drive (newest first)

Drawer auto-refreshes on every SSE poll cycle. Row highlights blue
while drawer is open. Clicking same row or pressing Esc closes it.
Auto-scroll toggle keeps burn-in tab pinned to bottom during active runs.

New API: GET /api/v1/drives/{id}/drawer

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 07:22:53 -05:00
Brandon Walter
b73b5251ae Initial commit — TrueNAS Burn-In Dashboard v0.5.0
Full-stack burn-in orchestration dashboard (Stages 1–6d complete):
FastAPI backend, SQLite/WAL, SSE live dashboard, mock TrueNAS server,
SMTP/webhook notifications, batch burn-in, settings UI, audit log,
stats page, cancel SMART/burn-in, drag-to-reorder stages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 00:08:29 -05:00