TrueNAS Burn-In Dashboard v0.9.0 — Live mode, thermal monitoring, adaptive concurrency

Go live against real TrueNAS SCALE 25.10: - Remove mock-truenas dependency; mount SSH key as Docker secret - Filter expired disk records from /api/v2.0/disk (expiretime field) - Route all SMART operations through SSH (SCALE 25.10 removed REST smart/test endpoint) - Poll drive temperatures via POST /api/v2.0/disk/temperatures (SCALE-specific) - Store raw smartctl output in smart_tests.raw_output for proof of test execution - Fix percent-remaining=0 false jump to 100% on test start - Fix terminal WebSocket: add mounted key file fallback (/run/secrets/ssh_key) - Fix WebSocket support: uvicorn → uvicorn[standard] (installs websockets) HBA/system sensor temps on dashboard: - SSH to TrueNAS and run sensors -j each poll cycle - Parse coretemp (CPU package) and pch_* (PCH/chipset — storage I/O proxy) - Render as compact chips in stats bar, color-coded green/yellow/red - Live updates via new SSE system-sensors event every 12s Adaptive concurrency signal: - Thermal pressure indicator in stats bar: hidden when OK, WARM/HOT when running burn-in drives hit temp_warn_c / temp_crit_c thresholds - Thermal gate in burn-in queue: jobs wait up to 3 min before acquiring semaphore slot if running drives are already at warning temp; times out and proceeds Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 06:33:36 -05:00 · 2026-02-27 06:33:36 -05:00 · 3e0000528f
commit 3e0000528f
parent b1a0fe6bd5
23 changed files with 3211 additions and 169 deletions
--- a/claude-sandbox/truenas-burnin/CLAUDE.md
+++ b/claude-sandbox/truenas-burnin/CLAUDE.md
@ -1,7 +1,7 @@
 # TrueNAS Burn-In Dashboard — Project Context

 > Drop this file in any new Claude session to resume work with full context.
-> Last updated: 2026-02-22 (Stage 6d)
+> Last updated: 2026-02-24 (Stage 8)

 ---

@ -28,7 +28,8 @@ against a TrueNAS CORE instance. Deployed on **maple.local** (10.0.0.138).
 | 6b | UX overhaul (stats bar, alerts, batch, notifications, location, print, analytics) | ✅ |
 | 6c | Settings overhaul (editable form, runtime store, SMTP fix, stage selection) | ✅ |
 | 6d | Cancel SMART tests, Cancel All burn-ins, drag-to-reorder stages in modals | ✅ |
-| 7 | Cut to real TrueNAS | 🔲 future |
+| 7 | SSH burn-in execution, SMART attr monitoring, drive reset, version badge, stats polish | ✅ |
+| 8 | Live SSH terminal in drawer (xterm.js + asyncssh WebSocket PTY bridge) | ✅ |

 ---

@ -52,6 +53,8 @@ truenas-burnin/
    ├── database.py             # schema, migrations, init_db(), get_db()
    ├── models.py               # Pydantic v2 models; StartBurninRequest has run_surface/run_short/run_long + profile property
    ├── settings_store.py       # runtime settings store — persists to /data/settings_overrides.json
+    ├── ssh_client.py           # asyncssh client: smartctl parsing, badblocks streaming, test_connection
+    ├── terminal.py             # WebSocket ↔ asyncssh PTY bridge for live terminal tab
    ├── truenas.py              # httpx async client with retry (lambda factory pattern)
    ├── poller.py               # poll loop, SSE pub/sub, stale detection, stuck-job check
    ├── burnin.py               # orchestrator, semaphore, stages, check_stuck_jobs()
@ -68,12 +71,12 @@ truenas-burnin/
    │
    └── templates/
        ├── layout.html         # header nav: History, Stats, Audit, Settings, bell button
-        ├── dashboard.html      # stats bar, failed banner, batch bar
+        ├── dashboard.html      # stats bar, failed banner, batch bar, log drawer (4 tabs: Burn-In/SMART/Events/Terminal)
        ├── history.html
        ├── job_detail.html     # + Print/Export button
        ├── audit.html          # audit event log
-        ├── stats.html          # analytics: pass rate by model, daily activity
-        ├── settings.html       # editable 2-col form: SMTP (left) + Notifications/Behavior/Webhook (right)
+        ├── stats.html          # analytics: pass rate by model, daily activity, duration by size, failures by stage
+        ├── settings.html       # editable 2-col form: SMTP + SSH (left) + Notifications/Behavior/Webhook/System (right)
        ├── job_print.html      # print view with client-side QR code (qrcodejs CDN)
        └── components/
            ├── drives_table.html   # checkboxes, elapsed time, location inline edit
@ -129,10 +132,19 @@ burnin_jobs (id, drive_id FK, profile, state CHECK(queued/running/passed/

 -- burnin_stages: one row per stage per job
 burnin_stages (id, burnin_job_id FK, stage_name, state, percent,
-               started_at, finished_at, error_text)
+               started_at, finished_at, error_text,
+               log_text TEXT,        -- raw smartctl/badblocks SSH output
+               bad_blocks INTEGER)   -- bad sector count from surface_validate

 -- audit_events: append-only log
 audit_events (id, event_type, drive_id, job_id, operator, note, created_at)
+
+-- drives columns added by migrations:
+--   location TEXT, notes TEXT (Stage 6b)
+--   smart_attrs TEXT            -- JSON blob of last SMART attribute snapshot (Stage 7)
+
+-- smart_tests columns added by migrations:
+--   raw_output TEXT             -- raw smartctl -a output (Stage 7)
 ```

 ---
@ -194,6 +206,15 @@ All read from `.env` via `pydantic-settings`. See `.env.example` for full list.
 | `SMTP_ALERT_ON_FAIL` | `true` | Immediate email when a job fails |
 | `SMTP_ALERT_ON_PASS` | `false` | Immediate email when a job passes |
 | `WEBHOOK_URL` | `` | POST JSON on burnin_passed/burnin_failed. Works with ntfy, Slack, Discord, n8n |
+| `TEMP_WARN_C` | `46` | Temperature warning threshold (°C) |
+| `TEMP_CRIT_C` | `55` | Temperature critical threshold — precheck fails above this |
+| `BAD_BLOCK_THRESHOLD` | `0` | Max bad blocks allowed before surface_validate fails (0 = any bad = fail) |
+| `APP_VERSION` | `1.0.0-7` | Displayed in header version badge |
+| `SSH_HOST` | `` | TrueNAS SSH hostname/IP — empty disables SSH mode (uses mock/REST) |
+| `SSH_PORT` | `22` | TrueNAS SSH port |
+| `SSH_USER` | `root` | TrueNAS SSH username |
+| `SSH_PASSWORD` | `` | TrueNAS SSH password (use key instead for production) |
+| `SSH_KEY` | `` | TrueNAS SSH private key PEM string — loaded in-memory, never written to disk |

 ---

@ -305,27 +326,166 @@ async def burnin_get(job_id: int, ...): ...
 | First row clipped after Stage 6b | Stats bar added 70px but max-height not updated | `max-height: calc(100vh - 205px)` |
 | SMTP "Connection unexpectedly closed" | `_send_email` used `settings.smtp_port` (587 default) even in SSL mode | Derive port from mode via `_MODE_PORTS` dict; SSL→465, STARTTLS→587, Plain→25 |
 | SSL mode missing EHLO | `smtplib.SMTP_SSL` was created without calling `ehlo()` | Added `server.ehlo()` after both SSL and STARTTLS connections |
+| `profile` NameError in `_execute_stages` | `_execute_stages` called `_recalculate_progress(job_id, profile)` but `profile` not in scope | Changed to `_recalculate_progress(job_id)` — profile param was unused |
+| `app_version` Jinja2 global rendered as function | Set `templates.env.globals["app_version"] = _get_app_version` (callable) | Set to the static string value directly: `= _settings.app_version` |
+| All buttons broken (Short/Long/Burn-In/Cancel) | `stages.forEach(function(s){` in `_drawerRenderBurnin` missing closing `});` — JS syntax error prevented entire IIFE from loading | Added missing `});` before `} else {` |

 ---

-## Stage 7 — Cutting to Real TrueNAS (TODO)
+## Feature Reference (Stage 7)
+
+### SSH Burn-In Architecture
+
+`ssh_client.py` provides an optional SSH execution layer. When `SSH_HOST` is set (and key or password is present), all burn-in stages run real commands over SSH against TrueNAS. When `SSH_HOST` is empty, stages fall back to mock/REST simulation.
+
+**Dual-mode dispatch** — each stage checks `ssh_client.is_configured()`:
+```python
+if ssh_client.is_configured():
+    # run smartctl / badblocks over SSH
+else:
+    # simulate with REST API or timed sleep (mock mode)
+```
+
+**SSH client capabilities** (`ssh_client.py`):
+- `test_connection()` → `{"ok": bool, "error": str}` — used by Test SSH button
+- `get_smart_attributes(devname)` → parse `smartctl -a`, return `{health, raw_output, attributes, warnings, failures}`
+- `start_smart_test(devname, test_type)` → `smartctl -t short|long /dev/{devname}`
+- `poll_smart_progress(devname)` → `smartctl -a` during test; returns `{state, percent_remaining, output}`
+- `abort_smart_test(devname)` → `smartctl -X /dev/{devname}`
+- `run_badblocks(devname, on_progress, cancelled_fn)` → streams `badblocks -wsv -b 4096 -p 1`; counts bad sectors from stdout (digit-only lines)
+
+**Key auth pattern** — key is stored as PEM string in settings, never written to disk:
+```python
+asyncssh.connect(host, ..., client_keys=[asyncssh.import_private_key(pem_str)], known_hosts=None)
+```
+
+**badblocks streaming** — uses `asyncssh.create_process()` with parallel stdout/stderr draining via `asyncio.gather`. Progress updates written to DB every 20 lines to avoid excessive writes.
+
+### SMART Attribute Monitoring
+
+Monitored attributes and their thresholds:
+
+| ID | Name | Any non-zero → |
+|----|------|----------------|
+| 5 | Reallocated_Sector_Ct | FAIL |
+| 10 | Spin_Retry_Count | WARN |
+| 188 | Command_Timeout | WARN |
+| 197 | Current_Pending_Sector | FAIL |
+| 198 | Offline_Uncorrectable | FAIL |
+| 199 | UDMA_CRC_Error_Count | WARN |
+
+SMART attrs stored as JSON blob in `drives.smart_attrs`. Updated by `final_check` stage (SSH mode) or `short_smart`/`long_smart` REST mode. Displayed in drive drawer with colour-coded table + raw `smartctl -a` output.
+
+### Drive Reset Action
+
+- `POST /api/v1/drives/{drive_id}/reset` — clears `smart_tests` rows to idle, clears `drives.smart_attrs`, writes audit event, notifies SSE subscribers
+- Button appears in action column when `can_reset` = drive has no active burn-in AND has any non-idle smart state or smart attrs
+- Burn-in history (burnin_jobs, burnin_stages) is preserved — reset only affects SMART test state
+
+### New Routes (Stage 7)
+
+| Method | Path | Description |
+|--------|------|-------------|
+| `POST` | `/api/v1/drives/{id}/reset` | Reset SMART state and attrs for a drive |
+| `POST` | `/api/v1/settings/test-ssh` | Test SSH connection with current SSH settings |
+| `GET`  | `/api/v1/updates/check` | Check for latest release from Forgejo git.hellocomputer.xyz |
+
+### Check for Updates
+
+Settings page has a "Check for Updates" button that fetches:
+```
+GET https://git.hellocomputer.xyz/api/v1/repos/brandon/truenas-burnin/releases/latest
+```
+Compares tag name against `settings.app_version`; shows "up to date" or "v{tag} available".
+
+### Version Badge
+
+`app_version` set as Jinja2 global in `renderer.py`:
+```python
+templates.env.globals["app_version"] = _settings.app_version
+```
+Displayed in header as `<span class="header-version">v{app_version}</span>` (right side, muted).
+
+### Configurable Thresholds
+
+`renderer.py` `_temp_class` now reads from settings instead of hardcoded values:
+```python
+if temp >= settings.temp_crit_c:  return "temp-crit"
+if temp >= settings.temp_warn_c:  return "temp-warn"
+```
+`precheck` stage fails if `temperature_c >= settings.temp_crit_c`.
+
+Surface validate fails if `bad_blocks > settings.bad_block_threshold` (default 0 = any bad sector = fail).
+
+## Feature Reference (Stage 8)
+
+### Live Terminal
+
+A full PTY SSH terminal embedded in the log drawer as a fourth tab ("Terminal"). Requires SSH to be configured in Settings.
+
+**Architecture:**
+```
+Browser (xterm.js)  ──WS binary──▶  /ws/terminal (FastAPI WebSocket)
+                                          │
+                                    terminal.py handle()
+                                          │
+                                    asyncssh.connect() → create_process(term_type="xterm-256color")
+                                          │
+                                    asyncio tasks:  ssh_to_ws() + ws_to_ssh()
+```
+
+**Message protocol** (client ↔ server):
+- Client → server **binary**: raw keyboard input bytes forwarded to SSH stdin
+- Client → server **text**: JSON control message — only `{"type":"resize","cols":N,"rows":N}` used currently
+- Server → client **binary**: raw terminal output bytes from SSH stdout
+
+**`app/terminal.py`** — `handle(ws)`:
+1. Guard: `ssh_host` must be set; key or password must be present
+2. `asyncssh.connect(known_hosts=None)` with key loaded via `import_private_key()` (never written to disk)
+3. `conn.create_process(term_type="xterm-256color", term_size=(80,24), encoding=None)` — opens shell PTY
+4. Two asyncio tasks bridging the streams; `asyncio.wait(FIRST_COMPLETED)` + cancel pending on disconnect
+5. ANSI-formatted status messages for connect/error states
+
+**Frontend (app.js):**
+- xterm.js 5.3.0 + xterm-addon-fit 0.8.0 loaded **lazily** on first Terminal tab click (CDN, ~300KB — not loaded until needed)
+- `_termInit()` creates Terminal + FitAddon, opens into the panel div, registers `onData` once
+- `ResizeObserver` on the panel → `fit()` + sends `resize` JSON to server
+- `_termConnect()` called on init and by Reconnect button — guards against double-connect with `readyState <= 1` check
+- `onData` always writes to current `_termWs` by reference — multiple reconnects don't add duplicate handlers
+- Reconnect bar floats over terminal on `ws.onclose`; removed on `ws.onopen`
+
+**Tab lifecycle:**
+- Terminal tab click → `openTerminalTab()`: loads libs → `_termInit()` → `_termConnect()` on first open; just refits on subsequent opens
+- Autoscroll label hidden when terminal tab is active (not applicable)
+- WebSocket stays alive when drawer closes — shell persists until page unload or explicit disconnect
+
+**New route:**
+| Method | Path | Description |
+|--------|------|-------------|
+| `WS` | `/ws/terminal` | asyncssh PTY bridge |
+
+**Config used:** `ssh_host`, `ssh_port`, `ssh_user`, `ssh_key`, `ssh_password` — same SSH settings as burn-in stages.
+
+**xterm.js theme:** GitHub Dark color palette (matches app dark theme). `scrollback: 2000`. Font: SF Mono / Fira Code / Consolas.
+
+### Cutting to Real TrueNAS (Next Steps)

 When ready to test against a real TrueNAS CORE box:

-1. In `.env` on maple.local, set:
-   ```env
-   TRUENAS_BASE_URL=https://10.0.0.203   # or whatever your TrueNAS IP is
-   TRUENAS_API_KEY=your-real-key-here
-   TRUENAS_VERIFY_TLS=false              # unless you have a valid cert
-   ```
-2. Comment out `mock-truenas` service in `docker-compose.yml` (or leave it running — harmless)
-3. Verify TrueNAS CORE v2.0 API contract matches what `truenas.py` expects:
+1. In Settings (or `.env`), set:
+   - **TrueNAS URL** → `https://10.0.0.X` (real IP)
+   - **API Key** → real API key
+   - **SSH Host** → same IP as TrueNAS
+   - **SSH User** → `root` (or sudoer with smartctl/badblocks access)
+   - **SSH Key** → paste PEM key into textarea
+2. Click **Test SSH Connection** to verify before starting a burn-in
+3. TrueNAS CORE uses `ada0`, `da0` device names (not `sda`). Mock drive names will differ.
+4. Delete `app.db` before first real poll to clear mock drive rows
+5. Comment out `mock-truenas` service in `docker-compose.yml` (optional — harmless to leave)
+6. Verify TrueNAS CORE v2.0 REST API:
   - `GET /api/v2.0/disk` returns list with `name`, `serial`, `model`, `size`, `temperature`
   - `GET /api/v2.0/core/get_jobs` with filter `[["method","=","smart.test"]]`
   - `POST /api/v2.0/smart/test` accepts `{disks: [devname], type: "SHORT"|"LONG"}`
-4. Check that disk names match expected format (TrueNAS CORE uses `ada0`, `da0`, etc. — not `sda`)
-   - You may need to update mock drive names back or adjust poller logic
-5. Delete `app.db` to clear mock drive rows before first real poll

 ---

--- a/claude-sandbox/truenas-burnin/SPEC.md
+++ b/claude-sandbox/truenas-burnin/SPEC.md
@ -1,6 +1,6 @@
 # TrueNAS Burn-In — Project Specification

-**Version:** 0.5.0  
+**Version:** 1.0.0-8
 **Status:** Active Development
 **Audience:** Public / Open Source  

@ -49,7 +49,7 @@ badblocks -wsv -b 4096 -p 1 /dev/sdX
 ```
 This is a **destructive write test**. The UI must display a prominent warning before this stage begins, and again in the Settings page where the behavior is documented. The `-w` flag overwrites all data on the drive. This is intentional — these are new drives being validated before pool use.

-**Failure threshold:** 2 or more bad blocks found triggers immediate abort and FAILED status. The threshold should be configurable in Settings (default: 2).
+**Failure threshold:** Any bad blocks found triggers immediate abort and FAILED status by default. The threshold is configurable in Settings (`Bad Block Threshold`, default: 0 — meaning any bad sector = fail).

 ---

@ -97,10 +97,11 @@ A **Reset** action clears the test state for a drive so it can be re-queued. It

 Slides up from the bottom of the page when a drive row is clicked. Does not navigate away — the table remains visible and scrollable above.

-Three tabs:
- **badblocks** — live tail of badblocks stdout, including error lines with sector numbers highlighted in red.
- **SMART** — output of the last smartctl run for this drive, with monitored attribute values highlighted.
- **Events** — chronological timeline of everything that happened to this drive (test started, test passed, failure detected, alert sent, etc.).
+Four tabs:
+- **Burn-In** — stage-by-stage progress for the latest burn-in job; shows live elapsed time, raw SSH log output (smartctl / badblocks), and bad block count.
+- **SMART** — output of the last smartctl run for this drive, with monitored attribute values highlighted (green/yellow/red). Raw `smartctl -a` output also shown when SSH mode is active.
+- **Events** — chronological timeline of everything that happened to this drive (test started, test passed, failure detected, alert sent, reset, etc.).
+- **Terminal** — live SSH PTY session (xterm.js). Opens an interactive shell on the TrueNAS host. Requires SSH to be configured in Settings. Supports full colour, resize, paste, and reconnect. xterm.js is loaded lazily on first use.

 Features:
 - Auto-scroll toggle (on by default).
@ -233,6 +234,7 @@ Key endpoints:
 - `POST /api/v1/burnin/start` — start a burn-in job.
 - `POST /api/v1/burnin/{job_id}/cancel` — cancel a burn-in job.
 - `GET /sse/drives` — Server-Sent Events stream powering the real-time dashboard UI.
+- `WS  /ws/terminal` — WebSocket endpoint bridging xterm.js to an asyncssh PTY on TrueNAS.
 - `GET /health` — health check endpoint.

 The API makes this app a strong candidate for MCP server integration, allowing an AI assistant to query drive status, start tests, or receive alerts conversationally.
@ -280,9 +282,8 @@ To validate against real hardware:

 ## Version

- App version starts at **0.5.0**
- Displayed on the dashboard landing page header and in Settings.
- Update check in Settings queries GitHub releases API.
+- App version: **1.0.0-8** (displayed in header next to the title, and in Settings).
+- Update check in Settings queries Forgejo releases API (`git.hellocomputer.xyz`).
 - API version tracked separately, currently **0.1.0**.

 ---
--- a/claude-sandbox/truenas-burnin/app/burnin.py
+++ b/claude-sandbox/truenas-burnin/app/burnin.py
@ -206,10 +206,45 @@ async def cancel_job(job_id: int, operator: str) -> bool:
 # Job runner
 # ---------------------------------------------------------------------------

+async def _thermal_gate_ok() -> bool:
+    """True if it's thermally safe to start a new burn-in.
+    Checks the peak temperature of drives currently under active burn-in.
+    """
+    try:
+        async with _db() as db:
+            cur = await db.execute("""
+                SELECT MAX(d.temperature_c)
+                FROM drives d
+                JOIN burnin_jobs bj ON bj.drive_id = d.id
+                WHERE bj.state = 'running' AND d.temperature_c IS NOT NULL
+            """)
+            row = await cur.fetchone()
+            max_temp = row[0] if row and row[0] is not None else None
+        return max_temp is None or max_temp < settings.temp_warn_c
+    except Exception:
+        return True  # Never block on error
+
+
 async def _run_job(job_id: int) -> None:
    """Acquire semaphore slot, execute all stages, persist final state."""
    assert _semaphore is not None, "burnin.init() not called"

+    # Adaptive thermal gate: wait before competing for a slot if running drives
+    # are already at or above the warning threshold.  This prevents layering a
+    # new burn-in on top of a thermally-stressed system.  Gives up after 3 min
+    # and proceeds anyway so jobs don't queue indefinitely.
+    for _attempt in range(18):  # 18 × 10 s = 3 min max
+        if await _thermal_gate_ok():
+            break
+        if _attempt == 0:
+            log.info(
+                "Thermal gate: job %d waiting — running drive temps at or above %d°C",
+                job_id, settings.temp_warn_c,
+            )
+        await asyncio.sleep(10)
+    else:
+        log.warning("Thermal gate timed out for job %d — proceeding anyway", job_id)
+
    async with _semaphore:
        if await _is_cancelled(job_id):
            return
@ -303,6 +338,16 @@ async def _run_job(job_id: int) -> None:
                )
                job_row = await cur2.fetchone()
            if job_row:
+                # Get bad_blocks count from surface_validate stage if present
+                bad_blocks = 0
+                async with _db() as db3:
+                    cur3 = await db3.execute(
+                        "SELECT bad_blocks FROM burnin_stages WHERE burnin_job_id=? AND stage_name='surface_validate'",
+                        (job_id,)
+                    )
+                    bb_row = await cur3.fetchone()
+                    if bb_row and bb_row[0]:
+                        bad_blocks = bb_row[0]
                asyncio.create_task(notifier.notify_job_complete(
                    job_id=job_id,
                    devname=devname,
@ -312,6 +357,7 @@ async def _run_job(job_id: int) -> None:
                    profile=job_row["profile"],
                    operator=job_row["operator"],
                    error_text=error_text,
+                    bad_blocks=bad_blocks,
                ))
        except Exception as exc:
            log.error("Failed to schedule notifications: %s", exc)
@ -339,7 +385,7 @@ async def _execute_stages(job_id: int, stages: list[str], devname: str, drive_id
            await _cancel_stage(job_id, stage_name)
        else:
            await _finish_stage(job_id, stage_name, success=ok)
-        await _recalculate_progress(job_id, profile)
+        await _recalculate_progress(job_id)
        _push_update()

        if not ok:
@ -352,15 +398,15 @@ async def _dispatch_stage(job_id: int, stage_name: str, devname: str, drive_id:
    if stage_name == "precheck":
        return await _stage_precheck(job_id, drive_id)
    elif stage_name == "short_smart":
-        return await _stage_smart_test(job_id, devname, "SHORT", "short_smart")
+        return await _stage_smart_test(job_id, devname, "SHORT", "short_smart", drive_id)
    elif stage_name == "long_smart":
-        return await _stage_smart_test(job_id, devname, "LONG", "long_smart")
+        return await _stage_smart_test(job_id, devname, "LONG", "long_smart", drive_id)
    elif stage_name == "surface_validate":
-        return await _stage_timed_simulate(job_id, "surface_validate", settings.surface_validate_seconds)
+        return await _stage_surface_validate(job_id, devname, drive_id)
    elif stage_name == "io_validate":
        return await _stage_timed_simulate(job_id, "io_validate", settings.io_validate_seconds)
    elif stage_name == "final_check":
-        return await _stage_final_check(job_id, devname)
+        return await _stage_final_check(job_id, devname, drive_id)
    return True


@ -385,16 +431,25 @@ async def _stage_precheck(job_id: int, drive_id: int) -> bool:
        await _set_stage_error(job_id, "precheck", "Drive SMART health is FAILED — refusing to burn in")
        return False

-    if temp and temp > 60:
-        await _set_stage_error(job_id, "precheck", f"Drive temperature {temp}°C exceeds 60°C limit")
+    if temp and temp > settings.temp_crit_c:
+        await _set_stage_error(job_id, "precheck", f"Drive temperature {temp}°C exceeds {settings.temp_crit_c}°C limit")
        return False

    await asyncio.sleep(1)  # Simulate brief check
    return True


-async def _stage_smart_test(job_id: int, devname: str, test_type: str, stage_name: str) -> bool:
-    """Start a TrueNAS SMART test and poll until complete."""
+async def _stage_smart_test(job_id: int, devname: str, test_type: str, stage_name: str,
+                            drive_id: int | None = None) -> bool:
+    """Start a SMART test. Uses SSH if configured, TrueNAS REST API otherwise."""
+    from app import ssh_client
+    if ssh_client.is_configured():
+        return await _stage_smart_test_ssh(job_id, devname, test_type, stage_name, drive_id)
+    return await _stage_smart_test_api(job_id, devname, test_type, stage_name)
+
+
+async def _stage_smart_test_api(job_id: int, devname: str, test_type: str, stage_name: str) -> bool:
+    """TrueNAS REST API path for SMART test (mock / dev mode)."""
    tn_job_id = await _client.start_smart_test([devname], test_type)

    while True:
@ -428,8 +483,349 @@ async def _stage_smart_test(job_id: int, devname: str, test_type: str, stage_nam
        await asyncio.sleep(POLL_INTERVAL)


+async def _stage_smart_test_ssh(job_id: int, devname: str, test_type: str, stage_name: str,
+                                 drive_id: int | None) -> bool:
+    """SSH path for SMART test — runs smartctl directly on TrueNAS."""
+    from app import ssh_client
+
+    # Start the test
+    try:
+        startup = await ssh_client.start_smart_test(devname, test_type)
+        await _append_stage_log(job_id, stage_name, startup + "\n")
+    except Exception as exc:
+        await _set_stage_error(job_id, stage_name, f"Failed to start SMART test via SSH: {exc}")
+        return False
+
+    # Brief pause to let the test register in smartctl output
+    await asyncio.sleep(3)
+
+    # Poll until complete
+    while True:
+        if await _is_cancelled(job_id):
+            try:
+                await ssh_client.abort_smart_test(devname)
+            except Exception:
+                pass
+            return False
+
+        await asyncio.sleep(POLL_INTERVAL)
+
+        try:
+            progress = await ssh_client.poll_smart_progress(devname)
+        except Exception as exc:
+            log.warning("SSH SMART poll failed: %s", exc, extra={"job_id": job_id})
+            await _append_stage_log(job_id, stage_name, f"[poll error] {exc}\n")
+            continue
+
+        await _append_stage_log(job_id, stage_name, progress["output"] + "\n---\n")
+
+        if progress["state"] == "running":
+            pct = max(0, 100 - progress["percent_remaining"])
+            await _update_stage_percent(job_id, stage_name, pct)
+            await _recalculate_progress(job_id)
+            _push_update()
+
+        elif progress["state"] == "passed":
+            await _update_stage_percent(job_id, stage_name, 100)
+            # Run attribute check
+            if drive_id is not None:
+                try:
+                    attrs = await ssh_client.get_smart_attributes(devname)
+                    await _store_smart_attrs(drive_id, attrs)
+                    await _store_smart_raw_output(drive_id, test_type, attrs["raw_output"])
+                    if attrs["failures"]:
+                        error = "SMART attribute failures: " + "; ".join(attrs["failures"])
+                        await _set_stage_error(job_id, stage_name, error)
+                        return False
+                    if attrs["warnings"]:
+                        await _append_stage_log(
+                            job_id, stage_name,
+                            "[WARNING] " + "; ".join(attrs["warnings"]) + "\n"
+                        )
+                except Exception as exc:
+                    log.warning("Failed to retrieve SMART attributes: %s", exc)
+            await _recalculate_progress(job_id)
+            _push_update()
+            return True
+
+        elif progress["state"] == "failed":
+            await _set_stage_error(job_id, stage_name, f"SMART {test_type} test failed")
+            return False
+        # "unknown" → keep polling
+
+
+async def _badblocks_available() -> bool:
+    """Check if badblocks is installed on the remote host (Linux/SCALE only)."""
+    from app import ssh_client
+    try:
+        async with await ssh_client._connect() as conn:
+            result = await conn.run("which badblocks", check=False)
+            return result.returncode == 0
+    except Exception:
+        return False
+
+
+async def _stage_surface_validate(job_id: int, devname: str, drive_id: int) -> bool:
+    """
+    Surface validation stage — auto-routes to the right implementation:
+
+    1. SSH configured + badblocks available (TrueNAS SCALE / Linux):
+       → runs badblocks -wsv -b 4096 -p 1 /dev/{devname} directly over SSH.
+    2. SSH configured + badblocks NOT available (TrueNAS CORE / FreeBSD):
+       → uses TrueNAS REST API disk.wipe FULL job + post-wipe SMART check.
+    3. No SSH:
+       → simulated timed progress (dev/mock mode).
+    """
+    from app import ssh_client
+    if ssh_client.is_configured():
+        if await _badblocks_available():
+            return await _stage_surface_validate_ssh(job_id, devname, drive_id)
+        # TrueNAS CORE/FreeBSD: badblocks not available — use native wipe API
+        await _append_stage_log(
+            job_id, "surface_validate",
+            "[INFO] badblocks not found on host (TrueNAS CORE/FreeBSD) — "
+            "using TrueNAS disk.wipe API (FULL write pass).\n\n"
+        )
+        return await _stage_surface_validate_truenas(job_id, devname, drive_id)
+    return await _stage_timed_simulate(job_id, "surface_validate", settings.surface_validate_seconds)
+
+
+async def _stage_surface_validate_ssh(job_id: int, devname: str, drive_id: int) -> bool:
+    """Run badblocks over SSH, streaming output to stage log."""
+    from app import ssh_client
+
+    await _append_stage_log(
+        job_id, "surface_validate",
+        f"[START] badblocks -wsv -b 4096 -p 1 /dev/{devname}\n"
+        f"[NOTE]  This is a DESTRUCTIVE write test. All data on /dev/{devname} will be overwritten.\n\n"
+    )
+
+    def _is_cancelled_sync() -> bool:
+        # Synchronous version — we check the DB state flag set by cancel_job()
+        import asyncio
+        loop = asyncio.get_event_loop()
+        try:
+            return loop.run_until_complete(_is_cancelled(job_id))
+        except Exception:
+            return False
+
+    last_logged_pct = [-1]
+
+    def on_progress(pct: int, bad_blocks: int, line: str) -> None:
+        nonlocal last_logged_pct
+        # Write to log (fire-and-forget via asyncio.create_task from sync context)
+        # The log append is done in the async flush below
+        pass
+
+    accumulated_lines: list[str] = []
+
+    async def on_progress_async(pct: int, bad_blocks: int, line: str) -> None:
+        accumulated_lines.append(line)
+        # Flush to DB and update progress every ~25 lines to avoid excessive DB writes
+        if len(accumulated_lines) % 25 == 0:
+            await _append_stage_log(job_id, "surface_validate", "".join(accumulated_lines[-25:]))
+            await _update_stage_bad_blocks(job_id, "surface_validate", bad_blocks)
+            await _update_stage_percent(job_id, "surface_validate", pct)
+            await _recalculate_progress(job_id)
+            _push_update()
+        if await _is_cancelled(job_id):
+            raise asyncio.CancelledError
+
+    # Run badblocks — we adapt the callback pattern to async by collecting then flushing
+    result = {"bad_blocks": 0, "output": "", "aborted": False}
+    try:
+        # The actual streaming; we handle progress via the accumulated_lines pattern
+        bad_blocks_total = 0
+        output_lines: list[str] = []
+
+        async with await ssh_client._connect() as conn:
+            cmd = f"badblocks -wsv -b 4096 -p 1 /dev/{devname}"
+            async with conn.create_process(cmd) as proc:
+                import re as _re
+
+                async def _drain(stream, is_stderr: bool):
+                    nonlocal bad_blocks_total
+                    async for raw in stream:
+                        line = raw if isinstance(raw, str) else raw.decode("utf-8", errors="replace")
+                        output_lines.append(line)
+
+                        if is_stderr:
+                            m = _re.search(r"([\d.]+)%\s+done", line)
+                            if m:
+                                pct = min(99, int(float(m.group(1))))
+                                await _update_stage_percent(job_id, "surface_validate", pct)
+                                await _update_stage_bad_blocks(job_id, "surface_validate", bad_blocks_total)
+                                await _recalculate_progress(job_id)
+                                _push_update()
+                        else:
+                            stripped = line.strip()
+                            if stripped and stripped.isdigit():
+                                bad_blocks_total += 1
+
+                        # Append to DB log in chunks
+                        if len(output_lines) % 20 == 0:
+                            chunk = "".join(output_lines[-20:])
+                            await _append_stage_log(job_id, "surface_validate", chunk)
+
+                        # Abort on bad block threshold
+                        if bad_blocks_total > settings.bad_block_threshold:
+                            proc.kill()
+                            output_lines.append(
+                                f"\n[ABORTED] {bad_blocks_total} bad block(s) exceeded "
+                                f"threshold ({settings.bad_block_threshold})\n"
+                            )
+                            return
+
+                        if await _is_cancelled(job_id):
+                            proc.kill()
+                            return
+
+                await asyncio.gather(
+                    _drain(proc.stdout, False),
+                    _drain(proc.stderr, True),
+                    return_exceptions=True,
+                )
+                await proc.wait()
+
+        # Flush remaining output
+        remainder = "".join(output_lines)
+        await _append_stage_log(job_id, "surface_validate", remainder)
+        result["bad_blocks"] = bad_blocks_total
+        result["output"] = remainder
+        result["aborted"] = bad_blocks_total > settings.bad_block_threshold
+
+    except asyncio.CancelledError:
+        return False
+    except Exception as exc:
+        await _append_stage_log(job_id, "surface_validate", f"\n[SSH error] {exc}\n")
+        await _set_stage_error(job_id, "surface_validate", f"SSH badblocks error: {exc}")
+        return False
+
+    await _update_stage_bad_blocks(job_id, "surface_validate", result["bad_blocks"])
+
+    if result["aborted"] or result["bad_blocks"] > settings.bad_block_threshold:
+        await _set_stage_error(
+            job_id, "surface_validate",
+            f"Surface validate FAILED: {result['bad_blocks']} bad block(s) found "
+            f"(threshold: {settings.bad_block_threshold})"
+        )
+        return False
+
+    return True
+
+
+async def _stage_surface_validate_truenas(job_id: int, devname: str, drive_id: int) -> bool:
+    """
+    Surface validation via TrueNAS CORE disk.wipe REST API.
+    Used on FreeBSD (TrueNAS CORE) where badblocks is unavailable.
+
+    Sends a FULL write-zero pass across the entire disk, polls progress,
+    then runs a post-wipe SMART attribute check to catch reallocated sectors.
+    """
+    from app import ssh_client
+
+    await _append_stage_log(
+        job_id, "surface_validate",
+        f"[START] TrueNAS disk.wipe FULL — {devname}\n"
+        f"[NOTE]  DESTRUCTIVE: all data on {devname} will be overwritten.\n\n"
+    )
+
+    # Start the wipe job
+    try:
+        tn_job_id = await _client.wipe_disk(devname, "FULL")
+    except Exception as exc:
+        await _set_stage_error(job_id, "surface_validate", f"Failed to start disk.wipe: {exc}")
+        return False
+
+    await _append_stage_log(
+        job_id, "surface_validate",
+        f"[JOB] TrueNAS wipe job started (job_id={tn_job_id})\n"
+    )
+
+    # Poll until complete
+    log_flush_counter = 0
+    while True:
+        if await _is_cancelled(job_id):
+            try:
+                await _client.abort_job(tn_job_id)
+            except Exception:
+                pass
+            return False
+
+        await asyncio.sleep(POLL_INTERVAL)
+
+        try:
+            job = await _client.get_job(tn_job_id)
+        except Exception as exc:
+            log.warning("Wipe job poll failed: %s", exc, extra={"job_id": job_id})
+            await _append_stage_log(job_id, "surface_validate", f"[poll error] {exc}\n")
+            continue
+
+        if not job:
+            await _set_stage_error(job_id, "surface_validate", f"Wipe job {tn_job_id} not found")
+            return False
+
+        state = job.get("state", "")
+        pct = int(job.get("progress", {}).get("percent", 0) or 0)
+        desc = job.get("progress", {}).get("description", "")
+
+        await _update_stage_percent(job_id, "surface_validate", min(pct, 99))
+        await _recalculate_progress(job_id)
+        _push_update()
+
+        # Log progress description every ~5 polls to avoid DB spam
+        log_flush_counter += 1
+        if desc and log_flush_counter % 5 == 0:
+            await _append_stage_log(job_id, "surface_validate", f"[{pct}%] {desc}\n")
+
+        if state == "SUCCESS":
+            await _update_stage_percent(job_id, "surface_validate", 100)
+            await _append_stage_log(
+                job_id, "surface_validate",
+                f"\n[DONE] Wipe job {tn_job_id} completed successfully.\n"
+            )
+            # Post-wipe SMART check — catch any sectors that failed under write stress
+            if ssh_client.is_configured() and drive_id is not None:
+                await _append_stage_log(
+                    job_id, "surface_validate",
+                    "[CHECK] Running post-wipe SMART attribute check...\n"
+                )
+                try:
+                    attrs = await ssh_client.get_smart_attributes(devname)
+                    await _store_smart_attrs(drive_id, attrs)
+                    if attrs["failures"]:
+                        error = "Post-wipe SMART check: " + "; ".join(attrs["failures"])
+                        await _set_stage_error(job_id, "surface_validate", error)
+                        return False
+                    if attrs["warnings"]:
+                        await _append_stage_log(
+                            job_id, "surface_validate",
+                            "[WARNING] " + "; ".join(attrs["warnings"]) + "\n"
+                        )
+                    await _append_stage_log(
+                        job_id, "surface_validate",
+                        f"[CHECK] SMART health: {attrs['health']} — no critical attributes.\n"
+                    )
+                except Exception as exc:
+                    log.warning("Post-wipe SMART check failed: %s", exc)
+                    await _append_stage_log(
+                        job_id, "surface_validate",
+                        f"[WARN] Post-wipe SMART check failed (non-fatal): {exc}\n"
+                    )
+            return True
+
+        elif state in ("FAILED", "ABORTED", "ERROR"):
+            error_msg = job.get("error") or f"Disk wipe failed (state={state})"
+            await _set_stage_error(
+                job_id, "surface_validate",
+                f"TrueNAS disk.wipe FAILED: {error_msg}"
+            )
+            return False
+        # RUNNING or WAITING — keep polling
+
+
 async def _stage_timed_simulate(job_id: int, stage_name: str, duration_seconds: int) -> bool:
-    """Simulate a timed stage (surface validation / IO validation) with progress updates."""
+    """Simulate a timed stage with progress updates (mock / dev mode)."""
    start = time.monotonic()

    while True:
@ -449,9 +845,28 @@ async def _stage_timed_simulate(job_id: int, stage_name: str, duration_seconds:
        await asyncio.sleep(POLL_INTERVAL)


-async def _stage_final_check(job_id: int, devname: str) -> bool:
-    """Verify drive passed all tests by checking current SMART health in DB."""
+async def _stage_final_check(job_id: int, devname: str, drive_id: int | None = None) -> bool:
+    """
+    Verify drive passed all tests.
+    SSH mode: run smartctl -a and check critical attributes.
+    Mock mode: check SMART health field in DB.
+    """
    await asyncio.sleep(1)
+    from app import ssh_client
+    if ssh_client.is_configured() and drive_id is not None:
+        try:
+            attrs = await ssh_client.get_smart_attributes(devname)
+            await _store_smart_attrs(drive_id, attrs)
+            if attrs["health"] == "FAILED" or attrs["failures"]:
+                failures = attrs["failures"] or [f"SMART health: {attrs['health']}"]
+                await _set_stage_error(job_id, "final_check",
+                                       "Final check failed: " + "; ".join(failures))
+                return False
+            return True
+        except Exception as exc:
+            log.warning("SSH final_check failed, falling back to DB check: %s", exc)
+
+    # DB check (mock mode fallback)
    async with _db() as db:
        cur = await db.execute(
            "SELECT smart_health FROM drives WHERE devname=?", (devname,)
@ -549,6 +964,57 @@ async def _cancel_stage(job_id: int, stage_name: str) -> None:
        await db.commit()


+async def _append_stage_log(job_id: int, stage_name: str, text: str) -> None:
+    """Append text to the log_text column of a burnin_stages row."""
+    async with _db() as db:
+        await db.execute("PRAGMA journal_mode=WAL")
+        await db.execute(
+            """UPDATE burnin_stages
+               SET log_text = COALESCE(log_text, '') || ?
+               WHERE burnin_job_id=? AND stage_name=?""",
+            (text, job_id, stage_name),
+        )
+        await db.commit()
+
+
+async def _update_stage_bad_blocks(job_id: int, stage_name: str, count: int) -> None:
+    async with _db() as db:
+        await db.execute("PRAGMA journal_mode=WAL")
+        await db.execute(
+            "UPDATE burnin_stages SET bad_blocks=? WHERE burnin_job_id=? AND stage_name=?",
+            (count, job_id, stage_name),
+        )
+        await db.commit()
+
+
+async def _store_smart_attrs(drive_id: int, attrs: dict) -> None:
+    """Persist latest SMART attribute dict to drives.smart_attrs (JSON)."""
+    import json
+    # Convert int keys to str for JSON serialisation
+    serialisable = {str(k): v for k, v in attrs.get("attributes", {}).items()}
+    blob = json.dumps({
+        "health":   attrs.get("health", "UNKNOWN"),
+        "attrs":    serialisable,
+        "warnings": attrs.get("warnings", []),
+        "failures": attrs.get("failures", []),
+    })
+    async with _db() as db:
+        await db.execute("PRAGMA journal_mode=WAL")
+        await db.execute("UPDATE drives SET smart_attrs=? WHERE id=?", (blob, drive_id))
+        await db.commit()
+
+
+async def _store_smart_raw_output(drive_id: int, test_type: str, raw: str) -> None:
+    """Store raw smartctl output in smart_tests.raw_output."""
+    async with _db() as db:
+        await db.execute("PRAGMA journal_mode=WAL")
+        await db.execute(
+            "UPDATE smart_tests SET raw_output=? WHERE drive_id=? AND test_type=?",
+            (raw, drive_id, test_type.lower()),
+        )
+        await db.commit()
+
+
 async def _set_stage_error(job_id: int, stage_name: str, error_text: str) -> None:
    async with _db() as db:
        await db.execute("PRAGMA journal_mode=WAL")
--- a/claude-sandbox/truenas-burnin/app/config.py
+++ b/claude-sandbox/truenas-burnin/app/config.py
@ -51,5 +51,24 @@ class Settings(BaseSettings):
    # Stuck-job detection: jobs running longer than this are marked 'unknown'
    stuck_job_hours: int = 24

+    # Temperature thresholds (°C) — drives table colouring + precheck gate
+    temp_warn_c: int = 46   # orange warning
+    temp_crit_c: int = 55   # red critical (precheck refuses to start above this)
+
+    # Bad-block tolerance — surface_validate fails if bad blocks exceed this
+    bad_block_threshold: int = 0
+
+    # SSH credentials for direct TrueNAS command execution (Stage 7)
+    # When ssh_host is set, burn-in stages use SSH for smartctl/badblocks instead of REST API.
+    # Leave ssh_host empty to use the mock/REST API (development mode).
+    ssh_host: str = ""
+    ssh_port: int = 22
+    ssh_user: str = "root"        # TrueNAS CORE default is root
+    ssh_password: str = ""        # Password auth (leave blank if using key)
+    ssh_key: str = ""             # PEM private key content (paste full key including headers)
+
+    # Application version — used by the /api/v1/updates/check endpoint
+    app_version: str = "1.0.0-7"
+

 settings = Settings()
--- a/claude-sandbox/truenas-burnin/app/database.py
+++ b/claude-sandbox/truenas-burnin/app/database.py
@ -82,6 +82,13 @@ CREATE INDEX IF NOT EXISTS idx_audit_events_job   ON audit_events(burnin_job_id)
 _MIGRATIONS = [
    "ALTER TABLE drives ADD COLUMN notes TEXT",
    "ALTER TABLE drives ADD COLUMN location TEXT",
+    # Stage 7: SSH command output + SMART attribute storage
+    "ALTER TABLE burnin_stages ADD COLUMN log_text TEXT",
+    "ALTER TABLE burnin_stages ADD COLUMN bad_blocks INTEGER DEFAULT 0",
+    "ALTER TABLE drives ADD COLUMN smart_attrs TEXT",
+    "ALTER TABLE smart_tests ADD COLUMN raw_output TEXT",
+    # Stage 8: track last reset time so dashboard burn-in col clears after reset
+    "ALTER TABLE drives ADD COLUMN last_reset_at TEXT",
 ]


--- a/claude-sandbox/truenas-burnin/app/notifier.py
+++ b/claude-sandbox/truenas-burnin/app/notifier.py
@ -23,8 +23,10 @@ async def notify_job_complete(
    profile: str,
    operator: str,
    error_text: str | None,
+    bad_blocks: int = 0,
 ) -> None:
    """Fire all configured notifications for a completed burn-in job."""
+    from datetime import datetime, timezone
    tasks = []

    if settings.webhook_url:
@ -38,6 +40,8 @@ async def notify_job_complete(
            "profile":        profile,
            "operator":       operator,
            "error_text":     error_text,
+            "bad_blocks":     bad_blocks,
+            "timestamp":      datetime.now(timezone.utc).isoformat(),
        }))

    if settings.smtp_host:
--- a/claude-sandbox/truenas-burnin/app/poller.py
+++ b/claude-sandbox/truenas-burnin/app/poller.py
@ -20,13 +20,15 @@ from app.truenas import TrueNASClient

 log = logging.getLogger(__name__)

-# Shared state read by the /health endpoint
+# Shared state read by the /health endpoint and dashboard template
 _state: dict[str, Any] = {
    "last_poll_at": None,
    "last_error": None,
    "healthy": False,
    "drives_seen": 0,
    "consecutive_failures": 0,
+    "system_temps": {},        # {"cpu_c": int|None, "pch_c": int|None}
+    "thermal_pressure": "ok",  # "ok" | "warn" | "crit" — based on running burn-in drive temps
 }

 # SSE subscriber queues — notified after each successful poll
@ -208,6 +210,67 @@ async def _sync_history(
 # Poll cycle
 # ---------------------------------------------------------------------------

+async def _poll_smart_via_ssh(db: aiosqlite.Connection, now: str) -> None:
+    """
+    Poll progress for SMART tests started via SSH (truenas_job_id IS NULL).
+    Used on TrueNAS SCALE 25.10+ where the REST smart/test API no longer exists.
+    """
+    from app import ssh_client
+    if not ssh_client.is_configured():
+        return
+
+    cur = await db.execute(
+        """SELECT st.id, st.test_type, st.drive_id, d.devname, st.started_at
+           FROM smart_tests st
+           JOIN drives d ON d.id = st.drive_id
+           WHERE st.state = 'running' AND st.truenas_job_id IS NULL"""
+    )
+    rows = await cur.fetchall()
+    if not rows:
+        return
+
+    for row in rows:
+        test_id, ttype, drive_id, devname, started_at = row[0], row[1], row[2], row[3], row[4]
+        try:
+            progress = await ssh_client.poll_smart_progress(devname)
+        except Exception as exc:
+            log.warning("SSH SMART poll failed for %s: %s", devname, exc)
+            continue
+
+        state = progress["state"]
+        pct_remaining = progress.get("percent_remaining")  # None = not yet in output
+        raw_output = progress.get("output", "")
+
+        if state == "running":
+            # pct_remaining=None means smartctl output doesn't have the % line yet
+            # (test just started) — keep percent at 0 rather than jumping to 100
+            if pct_remaining is None:
+                pct = 0
+            else:
+                pct = max(0, 100 - pct_remaining)
+            eta = _eta_from_progress(pct, started_at)
+            await db.execute(
+                "UPDATE smart_tests SET percent=?, eta_at=?, raw_output=? WHERE id=?",
+                (pct, eta, raw_output, test_id),
+            )
+        elif state == "passed":
+            await db.execute(
+                "UPDATE smart_tests SET state='passed', percent=100, finished_at=?, raw_output=? WHERE id=?",
+                (now, raw_output, test_id),
+            )
+            log.info("SSH SMART %s passed on %s", ttype, devname)
+        elif state == "failed":
+            await db.execute(
+                "UPDATE smart_tests SET state='failed', percent=0, finished_at=?, "
+                "error_text=?, raw_output=? WHERE id=?",
+                (now, f"SMART {ttype.upper()} test failed", raw_output, test_id),
+            )
+            log.warning("SSH SMART %s FAILED on %s", ttype, devname)
+        # state == "unknown" → keep polling, no update
+
+    await db.commit()
+
+
 async def poll_cycle(client: TrueNASClient) -> int:
    """Run one full poll. Returns number of drives seen."""
    now = _now()
@ -215,6 +278,20 @@ async def poll_cycle(client: TrueNASClient) -> int:
    disks = await client.get_disks()
    running_jobs = await client.get_smart_jobs(state="RUNNING")

+    # Fetch temperatures via SCALE-specific endpoint.
+    # CORE doesn't have this endpoint — silently skip on any error.
+    try:
+        temps = await client.get_disk_temperatures()
+    except Exception:
+        temps = {}
+
+    # Inject temperature into each disk dict (SCALE 25.10 has no temp in /disk)
+    for disk in disks:
+        devname = disk.get("devname", "")
+        t = temps.get(devname)
+        if t is not None:
+            disk["temperature"] = int(round(t))
+
    # Index running jobs by (devname, test_type)
    active: dict[tuple[str, str], dict] = {}
    for job in running_jobs:
@ -243,6 +320,9 @@ async def poll_cycle(client: TrueNASClient) -> int:

        await db.commit()

+        # SSH SMART polling — for tests started via smartctl (no TrueNAS REST job)
+        await _poll_smart_via_ssh(db, now)
+
    return len(disks)


@ -263,6 +343,39 @@ async def run(client: TrueNASClient) -> None:
            _state["drives_seen"] = count
            _state["consecutive_failures"] = 0
            log.debug("Poll OK", extra={"drives": count})
+
+            # System sensor temps via SSH (non-fatal)
+            from app import ssh_client as _ssh
+            if _ssh.is_configured():
+                try:
+                    _state["system_temps"] = await _ssh.get_system_sensors()
+                except Exception:
+                    pass
+
+            # Thermal pressure: max temp of drives currently under burn-in
+            try:
+                async with aiosqlite.connect(settings.db_path) as _tdb:
+                    _tdb.row_factory = aiosqlite.Row
+                    await _tdb.execute("PRAGMA journal_mode=WAL")
+                    _cur = await _tdb.execute("""
+                        SELECT MAX(d.temperature_c)
+                        FROM drives d
+                        JOIN burnin_jobs bj ON bj.drive_id = d.id
+                        WHERE bj.state = 'running' AND d.temperature_c IS NOT NULL
+                    """)
+                    _row = await _cur.fetchone()
+                    _max_t = _row[0] if _row and _row[0] is not None else None
+                if _max_t is None:
+                    _state["thermal_pressure"] = "ok"
+                elif _max_t >= settings.temp_crit_c:
+                    _state["thermal_pressure"] = "crit"
+                elif _max_t >= settings.temp_warn_c:
+                    _state["thermal_pressure"] = "warn"
+                else:
+                    _state["thermal_pressure"] = "ok"
+            except Exception:
+                _state["thermal_pressure"] = "ok"
+
            _notify_subscribers()

            # Check for stuck jobs every 5 cycles (~1 min at default 12s interval)
--- a/claude-sandbox/truenas-burnin/app/renderer.py
+++ b/claude-sandbox/truenas-burnin/app/renderer.py
@ -37,9 +37,10 @@ def _format_eta(seconds: int | None) -> str:
 def _temp_class(celsius: int | None) -> str:
    if celsius is None:
        return ""
-    if celsius < 40:
+    from app.config import settings
+    if celsius < settings.temp_warn_c:
        return "temp-cool"
-    if celsius < 50:
+    if celsius < settings.temp_crit_c:
        return "temp-warm"
    return "temp-hot"

@ -125,7 +126,7 @@ def _format_elapsed(iso: str | None) -> str:
        return ""


-# Register
+# Register filters
 templates.env.filters["format_bytes"]    = _format_bytes
 templates.env.filters["format_eta"]      = _format_eta
 templates.env.filters["temp_class"]      = _temp_class
@ -134,3 +135,7 @@ templates.env.filters["format_dt_full"]  = _format_dt_full
 templates.env.filters["format_duration"] = _format_duration
 templates.env.filters["format_elapsed"]  = _format_elapsed
 templates.env.globals["drive_status"]    = _drive_status
+
+
+from app.config import settings as _settings
+templates.env.globals["app_version"] = _settings.app_version
--- a/claude-sandbox/truenas-burnin/app/routes.py
+++ b/claude-sandbox/truenas-burnin/app/routes.py
@ -5,7 +5,7 @@ import json
 from datetime import datetime, timezone

 import aiosqlite
-from fastapi import APIRouter, Depends, HTTPException, Query, Request
+from fastapi import APIRouter, Depends, HTTPException, Query, Request, WebSocket
 from fastapi.responses import HTMLResponse, StreamingResponse
 from sse_starlette.sse import EventSourceResponse

@ -118,11 +118,17 @@ _DRIVES_QUERY = """


 async def _fetch_burnin_by_drive(db: aiosqlite.Connection) -> dict[int, dict]:
-    """Return latest burn-in job (any state) keyed by drive_id."""
+    """Return latest burn-in job (any state) keyed by drive_id.
+
+    Jobs created before the drive's last_reset_at are excluded so the
+    dashboard burn-in column clears after a reset while history is preserved.
+    """
    cur = await db.execute("""
        SELECT bj.*
        FROM burnin_jobs bj
+        JOIN drives d ON d.id = bj.drive_id
        WHERE bj.id IN (SELECT MAX(id) FROM burnin_jobs GROUP BY drive_id)
+          AND (d.last_reset_at IS NULL OR bj.created_at > d.last_reset_at)
    """)
    rows = await cur.fetchall()
    return {r["drive_id"]: dict(r) for r in rows}
@ -212,6 +218,18 @@ async def sse_drives(request: Request):

                yield {"event": "drives-update", "data": html}

+                # Push system sensor state so JS can update temp chips live
+                ps = poller.get_state()
+                yield {
+                    "event": "system-sensors",
+                    "data": json.dumps({
+                        "system_temps":    ps.get("system_temps", {}),
+                        "thermal_pressure": ps.get("thermal_pressure", "ok"),
+                        "temp_warn_c":     settings.temp_warn_c,
+                        "temp_crit_c":     settings.temp_crit_c,
+                    }),
+                }
+
                # Push browser notification event if this was a job completion
                if alert:
                    yield {"event": "job-alert", "data": json.dumps(alert)}
@ -249,6 +267,87 @@ async def list_drives(db: aiosqlite.Connection = Depends(get_db)):
    return [_row_to_drive(r) for r in rows]


+@router.get("/api/v1/drives/{drive_id}/drawer")
+async def drive_drawer(drive_id: int, db: aiosqlite.Connection = Depends(get_db)):
+    """Data for the log drawer — latest burn-in job + stages, SMART tests, audit events."""
+    cur = await db.execute(_DRIVES_QUERY.format(where="WHERE d.id = ?"), (drive_id,))
+    row = await cur.fetchone()
+    if not row:
+        raise HTTPException(status_code=404, detail="Drive not found")
+    drive = _row_to_drive(row)
+
+    # Latest burn-in job + its stages (include log_text and bad_blocks)
+    cur = await db.execute(
+        "SELECT * FROM burnin_jobs WHERE drive_id=? ORDER BY id DESC LIMIT 1",
+        (drive_id,),
+    )
+    job_row = await cur.fetchone()
+    burnin = None
+    if job_row:
+        job = dict(job_row)
+        cur = await db.execute(
+            "SELECT id, stage_name, state, percent, started_at, finished_at, "
+            "duration_seconds, error_text, log_text, bad_blocks "
+            "FROM burnin_stages WHERE burnin_job_id=? ORDER BY id",
+            (job_row["id"],),
+        )
+        job["stages"] = [dict(r) for r in await cur.fetchall()]
+        burnin = job
+
+    # SMART raw output from smart_tests table
+    cur = await db.execute(
+        "SELECT test_type, state, percent, started_at, finished_at, error_text, raw_output "
+        "FROM smart_tests WHERE drive_id=?",
+        (drive_id,),
+    )
+    smart_rows = {r["test_type"]: dict(r) for r in await cur.fetchall()}
+
+    # Cached SMART attributes (JSON blob on drives table)
+    import json as _json
+    smart_attrs = None
+    cur = await db.execute("SELECT smart_attrs FROM drives WHERE id=?", (drive_id,))
+    attrs_row = await cur.fetchone()
+    if attrs_row and attrs_row["smart_attrs"]:
+        try:
+            smart_attrs = _json.loads(attrs_row["smart_attrs"])
+        except Exception:
+            pass
+
+    # Last 50 audit events for this drive (newest first)
+    cur = await db.execute("""
+        SELECT id, event_type, operator, message, created_at
+        FROM audit_events
+        WHERE drive_id = ?
+        ORDER BY id DESC
+        LIMIT 50
+    """, (drive_id,))
+    events = [dict(r) for r in await cur.fetchall()]
+
+    def _smart_card(test_type: str) -> dict:
+        smart_obj = drive.smart_short if test_type == "short" else drive.smart_long
+        base = smart_obj.model_dump() if smart_obj else {}
+        row = smart_rows.get(test_type, {})
+        base["raw_output"] = row.get("raw_output")
+        return base
+
+    return {
+        "drive": {
+            "id":         drive.id,
+            "devname":    drive.devname,
+            "serial":     drive.serial,
+            "model":      drive.model,
+            "size_bytes": drive.size_bytes,
+        },
+        "burnin":      burnin,
+        "smart": {
+            "short":       _smart_card("short"),
+            "long":        _smart_card("long"),
+            "attrs":       smart_attrs,
+        },
+        "events":      events,
+    }
+
+
@router.get("/api/v1/drives/{drive_id}", response_model=DriveResponse)
 async def get_drive(drive_id: int, db: aiosqlite.Connection = Depends(get_db)):
    cur = await db.execute(
@ -266,9 +365,13 @@ async def smart_start(
    body: dict,
    db: aiosqlite.Connection = Depends(get_db),
 ):
-    """Start a standalone SHORT or LONG SMART test on a single drive."""
-    from app.truenas import TrueNASClient
-    from app import burnin as _burnin
+    """Start a standalone SHORT or LONG SMART test on a single drive.
+
+    Uses SSH (smartctl) when configured — required for TrueNAS SCALE 25.10+
+    where the REST smart/test endpoint no longer exists.
+    Falls back to TrueNAS REST API for older versions.
+    """
+    from app import burnin as _burnin, ssh_client

    test_type = (body.get("type") or "").upper()
    if test_type not in ("SHORT", "LONG"):
@ -280,16 +383,41 @@ async def smart_start(
        raise HTTPException(status_code=404, detail="Drive not found")
    devname = row[0]

-    # Use the shared TrueNAS client held by the burnin module
+    now = datetime.now(timezone.utc).isoformat()
+    ttype_lower = test_type.lower()
+
+    if ssh_client.is_configured():
+        # SSH path — works on TrueNAS SCALE 25.10+ and CORE
+        try:
+            output = await ssh_client.start_smart_test(devname, test_type)
+        except Exception as exc:
+            raise HTTPException(status_code=502, detail=f"SSH error: {exc}")
+
+        # Mark as running in DB (truenas_job_id=NULL signals SSH-managed test)
+        # Store smartctl start output as proof the test was initiated
+        await db.execute(
+            """INSERT INTO smart_tests (drive_id, test_type, state, percent, started_at, raw_output)
+               VALUES (?,?,?,?,?,?)
+               ON CONFLICT(drive_id, test_type) DO UPDATE SET
+                   state='running', percent=0, truenas_job_id=NULL,
+                   started_at=excluded.started_at, finished_at=NULL, error_text=NULL,
+                   raw_output=excluded.raw_output""",
+            (drive_id, ttype_lower, "running", 0, now, output),
+        )
+        await db.commit()
+        from app import poller as _poller
+        _poller._notify_subscribers()
+        return {"devname": devname, "type": test_type, "message": output[:200]}
+
+    else:
+        # REST path — older TrueNAS CORE / SCALE versions
        client = _burnin._client
        if client is None:
            raise HTTPException(status_code=503, detail="TrueNAS client not ready")
-
        try:
            tn_job_id = await client.start_smart_test([devname], test_type)
        except Exception as exc:
            raise HTTPException(status_code=502, detail=f"TrueNAS error: {exc}")
-
        return {"job_id": tn_job_id, "devname": devname, "type": test_type}


@ -316,7 +444,16 @@ async def smart_cancel(
    if client is None:
        raise HTTPException(status_code=503, detail="TrueNAS client not ready")

-    # Find the running TrueNAS job for this drive/test-type
+    from app import ssh_client
+
+    if ssh_client.is_configured():
+        # SSH path — abort via smartctl -X
+        try:
+            await ssh_client.abort_smart_test(devname)
+        except Exception as exc:
+            raise HTTPException(status_code=502, detail=f"SSH abort error: {exc}")
+    else:
+        # REST path — find TrueNAS job and abort it
        try:
            jobs = await client.get_smart_jobs()
            tn_job_id = None
@ -620,6 +757,57 @@ async def update_drive(
    return {"updated": True}


+@router.post("/api/v1/drives/{drive_id}/reset")
+async def reset_drive(
+    drive_id: int,
+    body: dict,
+    db: aiosqlite.Connection = Depends(get_db),
+):
+    """
+    Clear SMART test results for a drive so it shows as fresh.
+    Only allowed when no burn-in job is active (queued or running).
+    Preserves all job history — just resets the display state.
+    """
+    cur = await db.execute("SELECT id FROM drives WHERE id=?", (drive_id,))
+    if not await cur.fetchone():
+        raise HTTPException(status_code=404, detail="Drive not found")
+
+    # Reject if any active burn-in
+    cur = await db.execute(
+        "SELECT COUNT(*) FROM burnin_jobs WHERE drive_id=? AND state IN ('queued','running')",
+        (drive_id,),
+    )
+    if (await cur.fetchone())[0] > 0:
+        raise HTTPException(status_code=409, detail="Cannot reset while a burn-in is active")
+
+    operator = body.get("operator", "operator")
+
+    # Reset SMART test state to idle
+    await db.execute(
+        """UPDATE smart_tests SET state='idle', percent=0, started_at=NULL,
+           eta_at=NULL, finished_at=NULL, error_text=NULL, raw_output=NULL
+           WHERE drive_id=?""",
+        (drive_id,),
+    )
+    # Clear SMART attrs cache + stamp reset time (hides prior burn-in from dashboard)
+    now = datetime.now(timezone.utc).isoformat()
+    await db.execute(
+        "UPDATE drives SET smart_attrs=NULL, last_reset_at=? WHERE id=?",
+        (now, drive_id),
+    )
+
+    # Audit event
+    await db.execute(
+        """INSERT INTO audit_events (event_type, drive_id, operator, message)
+           VALUES (?,?,?,?)""",
+        ("drive_reset", drive_id, operator, "Drive reset — SMART state cleared"),
+    )
+    await db.commit()
+
+    poller._notify_subscribers()
+    return {"reset": True}
+
+
 # ---------------------------------------------------------------------------
 # Audit log page
 # ---------------------------------------------------------------------------
@ -714,6 +902,36 @@ async def stats_page(
    """)
    by_day = [dict(r) for r in await cur.fetchall()]

+    # Average test duration by drive size (rounded to nearest TB)
+    cur = await db.execute("""
+        SELECT
+            CAST(ROUND(CAST(d.size_bytes AS REAL) / 1e12) AS INTEGER) AS size_tb,
+            COUNT(*)  AS total,
+            ROUND(AVG(
+                (julianday(bj.finished_at) - julianday(bj.started_at)) * 86400 / 3600.0
+            ), 1)     AS avg_hours
+        FROM burnin_jobs bj
+        JOIN drives d ON d.id = bj.drive_id
+        WHERE bj.state IN ('passed', 'failed')
+          AND bj.started_at IS NOT NULL
+          AND bj.finished_at IS NOT NULL
+        GROUP BY size_tb
+        ORDER BY size_tb
+    """)
+    by_size = [dict(r) for r in await cur.fetchall()]
+
+    # Failure breakdown by stage (which stage caused the failure)
+    cur = await db.execute("""
+        SELECT
+            COALESCE(bj.stage_name, 'unknown') AS failed_stage,
+            COUNT(*) AS count
+        FROM burnin_jobs bj
+        WHERE bj.state = 'failed'
+        GROUP BY failed_stage
+        ORDER BY count DESC
+    """)
+    by_failure_stage = [dict(r) for r in await cur.fetchall()]
+
    # Drives tracked
    cur = await db.execute("SELECT COUNT(*) FROM drives")
    drives_total = (await cur.fetchone())[0]
@ -724,6 +942,8 @@ async def stats_page(
        "overall":          overall,
        "by_model":         by_model,
        "by_day":           by_day,
+        "by_size":          by_size,
+        "by_failure_stage": by_failure_stage,
        "drives_total":     drives_total,
        "poller":           ps,
        **_stale_context(ps),
@ -739,18 +959,9 @@ async def settings_page(
    request: Request,
    db: aiosqlite.Connection = Depends(get_db),
 ):
-    # Read-only display values (require container restart to change)
-    readonly = {
-        "truenas_base_url":        settings.truenas_base_url,
-        "truenas_verify_tls":      settings.truenas_verify_tls,
-        "poll_interval_seconds":   settings.poll_interval_seconds,
-        "stale_threshold_seconds": settings.stale_threshold_seconds,
-        "allowed_ips":             settings.allowed_ips or "(allow all)",
-        "log_level":               settings.log_level,
-    }
-
-    # Editable values — real values for form fields (password excluded)
+    # Editable values — real values for form fields (secrets excluded)
    editable = {
+        # SMTP
        "smtp_host":                 settings.smtp_host,
        "smtp_port":                 settings.smtp_port,
        "smtp_ssl_mode":             settings.smtp_ssl_mode or "starttls",
@ -762,17 +973,37 @@ async def settings_page(
        "smtp_daily_report_enabled": settings.smtp_daily_report_enabled,
        "smtp_alert_on_fail":        settings.smtp_alert_on_fail,
        "smtp_alert_on_pass":        settings.smtp_alert_on_pass,
+        # Webhook
        "webhook_url":               settings.webhook_url,
+        # Burn-in behaviour
        "stuck_job_hours":           settings.stuck_job_hours,
        "max_parallel_burnins":      settings.max_parallel_burnins,
+        "temp_warn_c":               settings.temp_warn_c,
+        "temp_crit_c":               settings.temp_crit_c,
+        "bad_block_threshold":       settings.bad_block_threshold,
+        # SSH credentials (take effect immediately — each SSH call reads live settings)
+        "ssh_host":                  settings.ssh_host,
+        "ssh_port":                  settings.ssh_port,
+        "ssh_user":                  settings.ssh_user,
+        # Note: ssh_password and ssh_key intentionally omitted from display (sensitive)
+        # System settings (restart required to fully apply)
+        "truenas_base_url":          settings.truenas_base_url,
+        "truenas_verify_tls":        settings.truenas_verify_tls,
+        "poll_interval_seconds":     settings.poll_interval_seconds,
+        "stale_threshold_seconds":   settings.stale_threshold_seconds,
+        "allowed_ips":               settings.allowed_ips,
+        "log_level":                 settings.log_level,
+        # Note: truenas_api_key intentionally omitted from display (sensitive)
    }

+    from app import ssh_client as _ssh
    ps = poller.get_state()
    return templates.TemplateResponse("settings.html", {
        "request":        request,
-        "readonly":     readonly,
        "editable":       editable,
        "smtp_enabled":   bool(settings.smtp_host),
+        "ssh_configured": _ssh.is_configured(),
+        "app_version":    settings.app_version,
        "poller":         ps,
        **_stale_context(ps),
    })
@ -780,10 +1011,11 @@ async def settings_page(

@router.post("/api/v1/settings")
 async def save_settings(body: dict):
-    """Save editable runtime settings.  Password is only updated if non-empty."""
-    # Don't overwrite password if client sent empty string
-    if "smtp_password" in body and body["smtp_password"] == "":
-        del body["smtp_password"]
+    """Save editable runtime settings.  Secrets are only updated if non-empty."""
+    # Don't overwrite secrets if client sent empty string
+    for secret_field in ("smtp_password", "truenas_api_key", "ssh_password", "ssh_key"):
+        if secret_field in body and body[secret_field] == "":
+            del body[secret_field]

    try:
        saved = settings_store.save(body)
@ -802,6 +1034,55 @@ async def test_smtp():
    return {"ok": True}


+@router.post("/api/v1/settings/test-ssh")
+async def test_ssh():
+    """Test the current SSH configuration."""
+    from app import ssh_client
+    result = await ssh_client.test_connection()
+    if not result["ok"]:
+        raise HTTPException(status_code=502, detail=result.get("error", "Connection failed"))
+    return {"ok": True}
+
+
+@router.websocket("/ws/terminal")
+async def terminal_ws(websocket: WebSocket):
+    """WebSocket endpoint bridging the browser xterm.js terminal to an SSH PTY."""
+    from app import terminal as _term
+    await _term.handle(websocket)
+
+
+@router.get("/api/v1/updates/check")
+async def check_updates():
+    """Check for a newer release on Forgejo."""
+    import httpx
+    current = settings.app_version
+    try:
+        async with httpx.AsyncClient(timeout=8.0) as client:
+            r = await client.get(
+                "https://git.hellocomputer.xyz/api/v1/repos/brandon/truenas-burnin/releases/latest",
+                headers={"Accept": "application/json"},
+            )
+            if r.status_code == 200:
+                data = r.json()
+                latest = data.get("tag_name", "").lstrip("v")
+                up_to_date = not latest or latest == current
+                return {
+                    "current": current,
+                    "latest": latest or None,
+                    "update_available": not up_to_date,
+                    "message": None,
+                }
+            elif r.status_code == 404:
+                return {"current": current, "latest": None, "update_available": False,
+                        "message": "No releases published yet"}
+            else:
+                return {"current": current, "latest": None, "update_available": False,
+                        "message": f"Forgejo API returned {r.status_code}"}
+    except Exception as exc:
+        return {"current": current, "latest": None, "update_available": False,
+                "message": f"Could not reach update server: {exc}"}
+
+
 # ---------------------------------------------------------------------------
 # Print view  (must be BEFORE /{job_id} int route)
 # ---------------------------------------------------------------------------
--- a/claude-sandbox/truenas-burnin/app/settings_store.py
+++ b/claude-sandbox/truenas-burnin/app/settings_store.py
@ -4,8 +4,8 @@ Runtime settings store — persists editable settings to /data/settings_override
 Changes take effect immediately (in-memory setattr on the global Settings object)
 and survive restarts (JSON file is loaded in main.py lifespan).

-Settings that require a container restart (TrueNAS URL, poll interval, allowed IPs, etc.)
-are NOT included here and are display-only on the settings page.
+System settings (TrueNAS URL, poll interval, etc.) are saved to JSON but require
+a container restart to fully take effect (clients/middleware are initialized at boot).
 """

 import json
@ -18,6 +18,7 @@ log = logging.getLogger(__name__)

 # Field name → coerce function.  Only fields listed here are accepted by save().
 _EDITABLE: dict[str, type] = {
+    # Email / SMTP
    "smtp_host":                 str,
    "smtp_ssl_mode":             str,
    "smtp_timeout":              int,
@ -29,12 +30,32 @@ _EDITABLE: dict[str, type] = {
    "smtp_report_hour":          int,
    "smtp_alert_on_fail":        bool,
    "smtp_alert_on_pass":        bool,
+    # Webhook
    "webhook_url":               str,
+    # Burn-in behaviour
    "stuck_job_hours":           int,
    "max_parallel_burnins":      int,
+    "temp_warn_c":               int,
+    "temp_crit_c":               int,
+    "bad_block_threshold":       int,
+    # SSH credentials — take effect immediately (each connection reads live settings)
+    "ssh_host":                  str,
+    "ssh_port":                  int,
+    "ssh_user":                  str,
+    "ssh_password":              str,
+    "ssh_key":                   str,
+    # System settings — saved to JSON; require container restart to fully apply
+    "truenas_base_url":          str,
+    "truenas_api_key":           str,
+    "truenas_verify_tls":        bool,
+    "poll_interval_seconds":     int,
+    "stale_threshold_seconds":   int,
+    "allowed_ips":               str,
+    "log_level":                 str,
 }

 _VALID_SSL_MODES  = {"starttls", "ssl", "plain"}
+_VALID_LOG_LEVELS = {"DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"}


 def _overrides_path() -> Path:
@ -63,6 +84,21 @@ def _apply(data: dict) -> None:
            if key == "smtp_report_hour" and not (0 <= int(val) <= 23):
                log.warning("settings_store: smtp_report_hour out of range — ignoring")
                continue
+            if key == "log_level" and val not in _VALID_LOG_LEVELS:
+                log.warning("settings_store: invalid log_level %r — ignoring", val)
+                continue
+            if key in ("poll_interval_seconds", "stale_threshold_seconds") and int(val) < 1:
+                log.warning("settings_store: %s must be >= 1 — ignoring", key)
+                continue
+            if key in ("temp_warn_c", "temp_crit_c") and not (20 <= int(val) <= 80):
+                log.warning("settings_store: %s out of range (20–80) — ignoring", key)
+                continue
+            if key == "bad_block_threshold" and int(val) < 0:
+                log.warning("settings_store: bad_block_threshold must be >= 0 — ignoring")
+                continue
+            if key == "ssh_port" and not (1 <= int(val) <= 65535):
+                log.warning("settings_store: ssh_port out of range — ignoring")
+                continue
            setattr(settings, key, val)
        except (ValueError, TypeError) as exc:
            log.warning("settings_store: invalid value for %s: %s", key, exc)
--- a/claude-sandbox/truenas-burnin/app/ssh_client.py
+++ b/claude-sandbox/truenas-burnin/app/ssh_client.py
@ -0,0 +1,386 @@
+"""
+SSH client for direct TrueNAS command execution (Stage 7).
+
+When ssh_host is configured, burn-in stages use SSH to run smartctl and
+badblocks directly on the TrueNAS host instead of going through the REST API.
+Falls back to REST API / simulation when SSH is not configured (dev/mock mode).
+
+TrueNAS CORE (FreeBSD) device paths: /dev/ada0, /dev/da0, etc.
+TrueNAS SCALE (Linux) device paths: /dev/sda, /dev/sdb, etc.
+The devname from the TrueNAS API is used as-is in /dev/{devname}.
+"""
+
+import asyncio
+import logging
+import re
+from typing import Callable
+
+log = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Monitored SMART attributes
+# True  → any non-zero raw value is a hard failure (drive rejected)
+# False → non-zero is a warning (flagged but test continues)
+# ---------------------------------------------------------------------------
+
+SMART_ATTRS: dict[int, tuple[str, bool]] = {
+    5:   ("Reallocated_Sector_Ct",  True),   # reallocation = FAIL
+    10:  ("Spin_Retry_Count",       False),  # mechanical stress = WARN
+    188: ("Command_Timeout",        False),  # drive not responding = WARN
+    197: ("Current_Pending_Sector", True),   # pending reallocation = FAIL
+    198: ("Offline_Uncorrectable",  True),   # unrecoverable read error = FAIL
+    199: ("UDMA_CRC_Error_Count",   False),  # cable/controller issue = WARN
+}
+
+
+# ---------------------------------------------------------------------------
+# Configuration check
+# ---------------------------------------------------------------------------
+
+def is_configured() -> bool:
+    """Returns True when SSH host + at least one auth method is available."""
+    import os
+    from app.config import settings
+    if not settings.ssh_host:
+        return False
+    has_creds = bool(
+        settings.ssh_key
+        or settings.ssh_password
+        or os.path.exists(os.environ.get("SSH_KEY_FILE", _MOUNTED_KEY_PATH))
+    )
+    return has_creds
+
+
+# ---------------------------------------------------------------------------
+# Low-level connection
+# ---------------------------------------------------------------------------
+
+_MOUNTED_KEY_PATH = "/run/secrets/ssh_key"
+
+
+async def _connect():
+    """Open a single-use SSH connection. Caller must use `async with`."""
+    import asyncssh
+    from app.config import settings
+
+    kwargs: dict = {
+        "host":        settings.ssh_host,
+        "port":        settings.ssh_port,
+        "username":    settings.ssh_user,
+        "known_hosts": None,          # trust all hosts (same spirit as TRUENAS_VERIFY_TLS=false)
+    }
+    if settings.ssh_key:
+        # Key material provided via env var (base case)
+        kwargs["client_keys"] = [asyncssh.import_private_key(settings.ssh_key)]
+    elif settings.ssh_password:
+        kwargs["password"] = settings.ssh_password
+    else:
+        # Fall back to mounted key file (preferred for production — no key in env vars)
+        import os
+        key_path = os.environ.get("SSH_KEY_FILE", _MOUNTED_KEY_PATH)
+        if os.path.exists(key_path):
+            kwargs["client_keys"] = [key_path]
+        # If nothing is configured, asyncssh will attempt agent/default key lookup
+
+    return asyncssh.connect(**kwargs)
+
+
+# ---------------------------------------------------------------------------
+# Public API
+# ---------------------------------------------------------------------------
+
+async def test_connection() -> dict:
+    """Test SSH connectivity. Returns {"ok": True} or {"ok": False, "error": str}."""
+    if not is_configured():
+        return {"ok": False, "error": "SSH not configured (ssh_host is empty)"}
+    try:
+        async with await _connect() as conn:
+            result = await conn.run("echo ok", check=False)
+            if "ok" in result.stdout:
+                return {"ok": True}
+            return {"ok": False, "error": result.stderr.strip() or "unexpected output"}
+    except Exception as exc:
+        return {"ok": False, "error": str(exc)}
+
+
+async def get_smart_attributes(devname: str) -> dict:
+    """
+    Run `smartctl -a /dev/{devname}` and parse the output.
+    Returns:
+        health:     str — "PASSED" | "FAILED" | "UNKNOWN"
+        raw_output: str — full smartctl output
+        attributes: dict[int, {"name": str, "raw": int}]
+        warnings:   list[str] — attribute names with non-zero raw (non-critical)
+        failures:   list[str] — attribute names with non-zero raw (critical)
+    """
+    cmd = f"smartctl -a /dev/{devname}"
+    try:
+        async with await _connect() as conn:
+            result = await conn.run(cmd, check=False)
+            output = result.stdout + result.stderr
+            return _parse_smartctl(output)
+    except Exception as exc:
+        return {
+            "health":     "UNKNOWN",
+            "raw_output": str(exc),
+            "attributes": {},
+            "warnings":   [],
+            "failures":   [f"SSH error: {exc}"],
+        }
+
+
+async def start_smart_test(devname: str, test_type: str) -> str:
+    """
+    Run `smartctl -t short|long /dev/{devname}`.
+    Returns raw output. Raises RuntimeError on unrecoverable failure.
+    test_type: "SHORT" or "LONG"
+    """
+    arg = "short" if test_type.upper() == "SHORT" else "long"
+    cmd = f"smartctl -t {arg} /dev/{devname}"
+    async with await _connect() as conn:
+        result = await conn.run(cmd, check=False)
+        output = result.stdout + result.stderr
+        # smartctl exits 0 or 4 when the test is successfully started on most drives
+        started = ("Testing has begun" in output or
+                   "test has begun" in output.lower() or
+                   result.returncode in (0, 4))
+        if not started:
+            raise RuntimeError(f"smartctl returned exit {result.returncode}: {output[:400]}")
+        return output
+
+
+async def poll_smart_progress(devname: str) -> dict:
+    """
+    Run `smartctl -a /dev/{devname}` and extract self-test status.
+    Returns:
+        state:             "running" | "passed" | "failed" | "unknown"
+        percent_remaining: int (0 = complete when state != "running")
+        output:            str
+    """
+    cmd = f"smartctl -a /dev/{devname}"
+    async with await _connect() as conn:
+        result = await conn.run(cmd, check=False)
+        output = result.stdout + result.stderr
+        return _parse_smart_progress(output)
+
+
+async def abort_smart_test(devname: str) -> None:
+    """Send `smartctl -X /dev/{devname}` to abort an in-progress test."""
+    cmd = f"smartctl -X /dev/{devname}"
+    async with await _connect() as conn:
+        await conn.run(cmd, check=False)
+
+
+async def run_badblocks(
+    devname: str,
+    on_progress: Callable[[int, int, str], None],
+    cancelled_fn: Callable[[], bool] | None = None,
+) -> dict:
+    """
+    Run `badblocks -wsv -b 4096 -p 1 /dev/{devname}` and stream output.
+
+    on_progress(percent, bad_blocks, line) is called for each line of output.
+    cancelled_fn() is polled to support mid-test cancellation.
+
+    Returns: {"bad_blocks": int, "output": str, "aborted": bool}
+    """
+    from app.config import settings
+    cmd = f"badblocks -wsv -b 4096 -p 1 /dev/{devname}"
+    lines: list[str] = []
+    bad_blocks = 0
+    aborted = False
+    last_pct = 0
+
+    try:
+        async with await _connect() as conn:
+            async with conn.create_process(cmd) as proc:
+                # badblocks writes progress to stderr, bad block numbers to stdout
+                async def _read_stream(stream, is_stderr: bool):
+                    nonlocal bad_blocks, last_pct, aborted
+                    async for raw_line in stream:
+                        line = raw_line if isinstance(raw_line, str) else raw_line.decode("utf-8", errors="replace")
+                        lines.append(line)
+
+                        if is_stderr:
+                            m = re.search(r"([\d.]+)%\s+done", line)
+                            if m:
+                                last_pct = min(99, int(float(m.group(1))))
+                        else:
+                            # Each non-empty stdout line during badblocks is a bad block number
+                            stripped = line.strip()
+                            if stripped and stripped.isdigit():
+                                bad_blocks += 1
+
+                        on_progress(last_pct, bad_blocks, line)
+
+                        # Abort if threshold exceeded
+                        if bad_blocks > settings.bad_block_threshold:
+                            aborted = True
+                            proc.kill()
+                            lines.append(
+                                f"\n[ABORTED] Bad block count ({bad_blocks}) exceeded "
+                                f"threshold ({settings.bad_block_threshold})\n"
+                            )
+                            return
+
+                        # Abort on cancellation
+                        if cancelled_fn and cancelled_fn():
+                            aborted = True
+                            proc.kill()
+                            return
+
+                stdout_task = asyncio.create_task(_read_stream(proc.stdout, False))
+                stderr_task = asyncio.create_task(_read_stream(proc.stderr, True))
+                await asyncio.gather(stdout_task, stderr_task, return_exceptions=True)
+                await proc.wait()
+
+    except Exception as exc:
+        lines.append(f"\n[SSH error] {exc}\n")
+
+    if not aborted:
+        last_pct = 100
+
+    return {
+        "bad_blocks": bad_blocks,
+        "output":     "".join(lines),
+        "aborted":    aborted,
+    }
+
+
+async def get_system_sensors() -> dict:
+    """
+    Run `sensors -j` on TrueNAS and extract system-level temperatures.
+    Returns {"cpu_c": int|None, "pch_c": int|None}.
+    cpu_c  = CPU package temp (coretemp chip)
+    pch_c  = PCH/chipset temp (pch_* chip) — proxy for storage I/O lane thermals
+    Falls back gracefully if SSH is not configured or lm-sensors is unavailable.
+    """
+    if not is_configured():
+        return {}
+    try:
+        async with await _connect() as conn:
+            result = await conn.run("sensors -j 2>/dev/null", check=False)
+            output = result.stdout.strip()
+            if not output:
+                return {}
+            return _parse_sensors_json(output)
+    except Exception as exc:
+        log.debug("get_system_sensors failed: %s", exc)
+        return {}
+
+
+def _parse_sensors_json(output: str) -> dict:
+    import json as _json
+    try:
+        data = _json.loads(output)
+    except Exception:
+        return {}
+
+    cpu_c: int | None = None
+    pch_c: int | None = None
+
+    for chip_name, chip_data in data.items():
+        if not isinstance(chip_data, dict):
+            continue
+
+        # CPU package temp — coretemp chip, "Package id N" sensor
+        if chip_name.startswith("coretemp") and cpu_c is None:
+            for sensor_name, sensor_vals in chip_data.items():
+                if not isinstance(sensor_vals, dict):
+                    continue
+                if "package" in sensor_name.lower():
+                    for k, v in sensor_vals.items():
+                        if k.endswith("_input") and isinstance(v, (int, float)):
+                            cpu_c = int(round(v))
+                            break
+                if cpu_c is not None:
+                    break
+
+        # PCH / chipset temp — manages PCIe lanes including HBA / storage I/O
+        elif chip_name.startswith("pch_") and pch_c is None:
+            for sensor_name, sensor_vals in chip_data.items():
+                if not isinstance(sensor_vals, dict):
+                    continue
+                for k, v in sensor_vals.items():
+                    if k.endswith("_input") and isinstance(v, (int, float)):
+                        pch_c = int(round(v))
+                        break
+                if pch_c is not None:
+                    break
+
+    return {"cpu_c": cpu_c, "pch_c": pch_c}
+
+
+# ---------------------------------------------------------------------------
+# Parsers
+# ---------------------------------------------------------------------------
+
+def _parse_smartctl(output: str) -> dict:
+    health = "UNKNOWN"
+    attributes: dict[int, dict] = {}
+    warnings: list[str] = []
+    failures: list[str] = []
+
+    m = re.search(r"self-assessment test result:\s+(\w+)", output, re.IGNORECASE)
+    if m:
+        health = m.group(1).upper()
+
+    # Attribute table: ID#  NAME  FLAG  VALUE  WORST  THRESH  TYPE  UPDATED  WHEN_FAILED  RAW_VALUE
+    for line in output.splitlines():
+        am = re.match(
+            r"\s*(\d+)\s+(\S+)\s+\S+\s+\d+\s+\d+\s+\d+\s+\S+\s+\S+\s+\S+\s+(\d+)",
+            line,
+        )
+        if not am:
+            continue
+        attr_id  = int(am.group(1))
+        attr_name = am.group(2)
+        raw_val   = int(am.group(3))
+        attributes[attr_id] = {"name": attr_name, "raw": raw_val}
+
+        if attr_id in SMART_ATTRS:
+            _, is_critical = SMART_ATTRS[attr_id]
+            if raw_val > 0:
+                msg = f"{attr_name} = {raw_val}"
+                if is_critical:
+                    failures.append(msg)
+                else:
+                    warnings.append(msg)
+
+    return {
+        "health":     health,
+        "raw_output": output,
+        "attributes": attributes,
+        "warnings":   warnings,
+        "failures":   failures,
+    }
+
+
+def _parse_smart_progress(output: str) -> dict:
+    state = "unknown"
+    percent_remaining = None  # None = "in progress but no % line parsed yet"
+
+    lower = output.lower()
+
+    if "self-test routine in progress" in lower or "self-test routine in progress" in output:
+        state = "running"
+        m = re.search(r"(\d+)%\s+of\s+test\s+remaining", output, re.IGNORECASE)
+        if m:
+            percent_remaining = int(m.group(1))
+    elif "completed without error" in lower:
+        state = "passed"
+    elif (
+        "completed: read failure" in lower
+        or "completed: write failure" in lower
+        or "aborted by host" in lower
+        or ("completed" in lower and "failure" in lower)
+    ):
+        state = "failed"
+    elif "in progress" in lower:
+        state = "running"
+
+    return {
+        "state":             state,
+        "percent_remaining": percent_remaining,
+        "output":            output,
+    }
--- a/claude-sandbox/truenas-burnin/app/static/app.css
+++ b/claude-sandbox/truenas-burnin/app/static/app.css
@ -755,6 +755,11 @@ tr:hover td {
  flex-direction: column;
  gap: 8px;
  pointer-events: none;
+  transition: bottom 0.25s ease;
+}
+
+body.drawer-open #toast-container {
+  bottom: calc(45vh + 16px);
 }

 .toast {
@ -1071,6 +1076,56 @@ a.stat-card:hover {
 .stat-passed  .stat-value { color: var(--green); }
 .stat-idle    .stat-value { color: var(--text-muted); }

+/* Vertical separator between drive-count cards and sensor chips */
+.stats-bar-sep {
+  width: 1px;
+  height: 36px;
+  background: var(--border);
+  align-self: center;
+  flex-shrink: 0;
+}
+
+/* Compact sensor chip — CPU / PCH / Thermal */
+.stat-sensor {
+  background: var(--bg-card);
+  border: 1px solid var(--border);
+  border-radius: 8px;
+  padding: 6px 12px;
+  text-align: center;
+  min-width: 52px;
+  display: flex;
+  flex-direction: column;
+  gap: 2px;
+}
+
+.stat-sensor-val {
+  font-size: 16px;
+  font-weight: 700;
+  font-variant-numeric: tabular-nums;
+  line-height: 1.1;
+}
+
+.stat-sensor-label {
+  font-size: 9px;
+  text-transform: uppercase;
+  letter-spacing: 0.08em;
+  color: var(--text-muted);
+  line-height: 1.2;
+}
+
+/* Thermal pressure states */
+.stat-sensor-thermal-warn {
+  border-color: var(--yellow-bd);
+  background: var(--yellow-bg);
+}
+.stat-sensor-thermal-warn .stat-sensor-val { color: var(--yellow); }
+
+.stat-sensor-thermal-crit {
+  border-color: var(--red-bd);
+  background: var(--red-bg);
+}
+.stat-sensor-thermal-crit .stat-sensor-val { color: var(--red); }
+
 /* -----------------------------------------------------------------------
   Batch action bar (inside filter-bar)
 ----------------------------------------------------------------------- */
@ -1937,3 +1992,508 @@ a.header-brand:hover .header-title {
  outline: 2px solid var(--blue);
  outline-offset: 2px;
 }
+
+/* -----------------------------------------------------------------------
+   Log Drawer
+----------------------------------------------------------------------- */
+.log-drawer {
+  position: fixed;
+  bottom: 0;
+  left: 0;
+  right: 0;
+  height: 45vh;
+  min-height: 260px;
+  background: var(--bg-card);
+  border-top: 2px solid var(--border);
+  z-index: 150;
+  display: flex;
+  flex-direction: column;
+  box-shadow: 0 -6px 32px rgba(0,0,0,0.5);
+  animation: drawer-slide-up 0.18s ease;
+}
+.log-drawer[hidden] { display: none; }
+
+@keyframes drawer-slide-up {
+  from { transform: translateY(100%); opacity: 0; }
+  to   { transform: translateY(0);   opacity: 1; }
+}
+
+/* Shrink table when drawer is open */
+body.drawer-open .table-wrap {
+  max-height: calc(100vh - 205px - 45vh);
+}
+
+/* Drawer header */
+.drawer-header {
+  display: flex;
+  align-items: center;
+  gap: 14px;
+  padding: 7px 16px;
+  border-bottom: 1px solid var(--border);
+  flex-shrink: 0;
+  background: var(--bg);
+}
+
+.drawer-drive-info {
+  display: flex;
+  flex-direction: column;
+  gap: 1px;
+  min-width: 80px;
+}
+
+.drawer-devname {
+  font-size: 13px;
+  font-weight: 600;
+  color: var(--text-strong);
+  font-family: "SF Mono", "Cascadia Code", monospace;
+}
+
+.drawer-drive-meta {
+  font-size: 11px;
+  color: var(--text-muted);
+  white-space: nowrap;
+  overflow: hidden;
+  text-overflow: ellipsis;
+  max-width: 240px;
+}
+
+/* Tabs */
+.drawer-tabs {
+  display: flex;
+  gap: 2px;
+}
+
+.drawer-tab {
+  background: none;
+  border: 1px solid transparent;
+  border-radius: 5px;
+  color: var(--text-muted);
+  cursor: pointer;
+  font-size: 12px;
+  font-family: inherit;
+  font-weight: 500;
+  padding: 4px 12px;
+  transition: color 0.12s, background 0.12s;
+}
+.drawer-tab:hover {
+  color: var(--text);
+  background: var(--bg-card);
+}
+.drawer-tab.active {
+  color: var(--text-strong);
+  background: var(--bg-card);
+  border-color: var(--border);
+}
+
+/* Controls */
+.drawer-controls {
+  display: flex;
+  align-items: center;
+  gap: 12px;
+  margin-left: auto;
+  flex-shrink: 0;
+}
+
+.autoscroll-label {
+  display: flex;
+  align-items: center;
+  gap: 5px;
+  font-size: 11px;
+  color: var(--text-muted);
+  cursor: pointer;
+  user-select: none;
+}
+.autoscroll-label input { accent-color: var(--blue); cursor: pointer; }
+
+.drawer-close {
+  background: none;
+  border: 1px solid var(--border);
+  border-radius: 4px;
+  color: var(--text-muted);
+  cursor: pointer;
+  font-size: 12px;
+  width: 24px;
+  height: 24px;
+  display: flex;
+  align-items: center;
+  justify-content: center;
+  padding: 0;
+  transition: color 0.12s, border-color 0.12s;
+}
+.drawer-close:hover { color: var(--text); border-color: var(--text-muted); }
+
+/* Body + panels */
+.drawer-body {
+  flex: 1;
+  overflow: hidden;
+  position: relative;
+}
+
+.drawer-panel {
+  display: none;
+  height: 100%;
+  overflow-y: auto;
+  padding: 12px 16px 20px;
+}
+.drawer-panel.active { display: block; }
+
+.drawer-loading,
+.drawer-empty {
+  color: var(--text-muted);
+  font-size: 13px;
+  padding: 28px 0;
+  text-align: center;
+}
+
+/* Clickable rows */
+#drives-tbody tr[id^="drive-"] { cursor: pointer; }
+
+/* Active row highlight */
+tr.drawer-row-active {
+  background: rgba(88, 166, 255, 0.07) !important;
+  outline: 1px solid var(--blue-bd);
+  outline-offset: -1px;
+}
+
+/* ---- Burn-In tab ---- */
+.drawer-job-header {
+  display: flex;
+  align-items: center;
+  gap: 10px;
+  margin-bottom: 12px;
+}
+
+.drawer-job-meta {
+  font-size: 12px;
+  color: var(--text-muted);
+}
+
+.drawer-stages {
+  display: flex;
+  flex-direction: column;
+  gap: 6px;
+}
+
+.drawer-stage {
+  border: 1px solid var(--border);
+  border-radius: 6px;
+  overflow: hidden;
+}
+
+.stage-row-header {
+  display: flex;
+  align-items: center;
+  gap: 8px;
+  padding: 8px 12px;
+  font-size: 13px;
+}
+
+.stage-running  .stage-row-header { background: var(--blue-bg); }
+.stage-passed   .stage-row-header { background: var(--green-bg); }
+.stage-failed   .stage-row-header { background: var(--red-bg); }
+
+.stage-icon {
+  font-size: 12px;
+  width: 16px;
+  text-align: center;
+  flex-shrink: 0;
+}
+.stage-running  .stage-icon { color: var(--blue); }
+.stage-passed   .stage-icon { color: var(--green); }
+.stage-failed   .stage-icon { color: var(--red); }
+.stage-cancelled .stage-icon,
+.stage-pending  .stage-icon { color: var(--gray); }
+
+.stage-name-label {
+  font-size: 13px;
+  font-weight: 500;
+  color: var(--text);
+  flex: 1;
+}
+
+.stage-pct {
+  font-size: 12px;
+  color: var(--blue);
+  font-weight: 600;
+  font-variant-numeric: tabular-nums;
+}
+
+.stage-duration {
+  font-size: 11px;
+  color: var(--text-muted);
+  font-variant-numeric: tabular-nums;
+}
+
+.stage-cursor {
+  color: var(--blue);
+  font-size: 14px;
+  animation: blink 1s step-end infinite;
+}
+
+@keyframes blink {
+  0%, 100% { opacity: 1; }
+  50%       { opacity: 0; }
+}
+
+.stage-error-line {
+  padding: 7px 12px;
+  font-size: 12px;
+  color: var(--red);
+  font-family: "SF Mono", "Cascadia Code", monospace;
+  background: var(--red-bg);
+  border-top: 1px solid var(--red-bd);
+  white-space: pre-wrap;
+  word-break: break-word;
+}
+
+/* ---- SMART tab ---- */
+.drawer-smart-grid {
+  display: grid;
+  grid-template-columns: 1fr 1fr;
+  gap: 12px;
+}
+
+.smart-card {
+  background: var(--bg);
+  border: 1px solid var(--border);
+  border-radius: 8px;
+  padding: 12px 14px;
+  display: flex;
+  flex-direction: column;
+  gap: 8px;
+}
+
+.smart-card-label {
+  font-size: 11px;
+  font-weight: 600;
+  text-transform: uppercase;
+  letter-spacing: 0.06em;
+  color: var(--text-muted);
+}
+
+.smart-progress {
+  display: flex;
+  align-items: center;
+  gap: 8px;
+}
+.smart-progress .progress-bar { flex: 1; }
+
+.smart-detail {
+  font-size: 12px;
+  color: var(--text-muted);
+}
+
+/* ---- Events tab ---- */
+.drawer-events {
+  display: flex;
+  flex-direction: column;
+}
+
+.drawer-event {
+  display: flex;
+  align-items: baseline;
+  gap: 10px;
+  padding: 7px 0;
+  border-bottom: 1px solid var(--border);
+  font-size: 12px;
+}
+.drawer-event:last-child { border-bottom: none; }
+
+.event-time {
+  color: var(--text-muted);
+  font-size: 11px;
+  white-space: nowrap;
+  flex-shrink: 0;
+  font-variant-numeric: tabular-nums;
+}
+
+.event-type {
+  color: var(--blue);
+  font-weight: 500;
+  white-space: nowrap;
+  flex-shrink: 0;
+}
+
+.event-message {
+  color: var(--text);
+  flex: 1;
+}
+
+.event-operator {
+  color: var(--text-muted);
+  font-size: 11px;
+  white-space: nowrap;
+  flex-shrink: 0;
+}
+
+.drawer-event.event-error .event-type    { color: var(--red); }
+.drawer-event.event-error .event-message { color: var(--red); }
+
+@media (max-width: 600px) {
+  .drawer-smart-grid { grid-template-columns: 1fr; }
+  .drawer-drive-meta { display: none; }
+}
+
+/* -----------------------------------------------------------------------
+   Stage raw log output (SSH mode)
+----------------------------------------------------------------------- */
+.stage-log {
+  font-family: "SF Mono", "Consolas", "Monaco", monospace;
+  font-size: 11px;
+  line-height: 1.5;
+  color: var(--text-muted);
+  background: var(--bg);
+  border-left: 2px solid var(--border);
+  margin: 6px 0 2px 28px;
+  padding: 6px 10px;
+  white-space: pre-wrap;
+  word-break: break-all;
+  max-height: 200px;
+  overflow-y: auto;
+}
+.stage-log .log-bad-block {
+  color: var(--red);
+  font-weight: 600;
+}
+.stage-log .log-warn {
+  color: var(--yellow);
+}
+
+/* -----------------------------------------------------------------------
+   SMART attributes table in drawer
+----------------------------------------------------------------------- */
+.smart-attrs {
+  margin-top: 12px;
+  border-top: 1px solid var(--border);
+  padding-top: 10px;
+}
+.smart-attrs-title {
+  font-size: 11px;
+  font-weight: 600;
+  color: var(--text-muted);
+  text-transform: uppercase;
+  letter-spacing: .05em;
+  margin-bottom: 6px;
+}
+.smart-attr-row {
+  display: flex;
+  justify-content: space-between;
+  align-items: center;
+  padding: 3px 0;
+  font-size: 12px;
+  border-bottom: 1px solid color-mix(in srgb, var(--border) 50%, transparent);
+}
+.smart-attr-row:last-child { border-bottom: none; }
+.smart-attr-name { color: var(--text-muted); }
+.smart-attr-val  { font-family: "SF Mono", monospace; font-size: 12px; }
+.smart-attr-val.attr-ok   { color: var(--green); }
+.smart-attr-val.attr-warn { color: var(--yellow); font-weight: 600; }
+.smart-attr-val.attr-fail { color: var(--red);    font-weight: 600; }
+.smart-attr-raw-output {
+  font-family: "SF Mono", "Consolas", monospace;
+  font-size: 10.5px;
+  line-height: 1.45;
+  color: var(--text-muted);
+  background: var(--bg);
+  border: 1px solid var(--border);
+  border-radius: 4px;
+  padding: 8px 10px;
+  margin-top: 10px;
+  white-space: pre;
+  overflow: auto;
+  max-height: 240px;
+}
+
+/* -----------------------------------------------------------------------
+   Reset button
+----------------------------------------------------------------------- */
+.btn-reset {
+  background: transparent;
+  border: 1px solid color-mix(in srgb, var(--text-muted) 40%, transparent);
+  color: var(--text-muted);
+  border-radius: 5px;
+  padding: 3px 8px;
+  font-size: 12px;
+  cursor: pointer;
+  transition: border-color .15s, color .15s;
+}
+.btn-reset:hover {
+  border-color: var(--yellow);
+  color: var(--yellow);
+}
+
+/* -----------------------------------------------------------------------
+   Parallel burn-in inline warning
+----------------------------------------------------------------------- */
+.sf-inline-warn {
+  background: color-mix(in srgb, var(--yellow) 12%, transparent);
+  border: 1px solid color-mix(in srgb, var(--yellow) 40%, transparent);
+  border-radius: 5px;
+  color: var(--yellow);
+  font-size: 12px;
+  padding: 7px 10px;
+  margin: 4px 0 8px 0;
+}
+
+/* -----------------------------------------------------------------------
+   SSH textarea
+----------------------------------------------------------------------- */
+.sf-textarea {
+  resize: vertical;
+  min-height: 90px;
+  font-family: "SF Mono", "Consolas", monospace;
+  font-size: 11px;
+}
+
+/* -----------------------------------------------------------------------
+   Version badge in header
+----------------------------------------------------------------------- */
+.header-version {
+  font-size: 10px;
+  color: var(--text-muted);
+  opacity: .55;
+  font-weight: 400;
+  letter-spacing: 0;
+  align-self: flex-end;
+  padding-bottom: 1px;
+  font-variant-numeric: tabular-nums;
+}
+
+/* -----------------------------------------------------------------------
+   Live Terminal drawer panel (xterm.js)
+----------------------------------------------------------------------- */
+.drawer-panel-terminal {
+  padding: 0 !important;
+  overflow: hidden !important;
+  position: relative;
+  background: #0d1117;
+}
+
+/* Let xterm fill the full panel height */
+.drawer-panel-terminal .xterm {
+  height: 100%;
+}
+.drawer-panel-terminal .xterm-viewport {
+  overflow-y: auto !important;
+}
+
+/* Reconnect bar — floats over the terminal when disconnected */
+.term-reconnect-bar {
+  position: absolute;
+  bottom: 12px;
+  right: 12px;
+  z-index: 20;
+  display: flex;
+  align-items: center;
+  gap: 8px;
+  background: rgba(13,17,23,0.85);
+  border: 1px solid var(--border);
+  border-radius: 6px;
+  padding: 6px 10px;
+  font-size: 12px;
+  color: var(--text-muted);
+}
+.term-reconnect-bar .btn-secondary {
+  padding: 3px 10px;
+  font-size: 11px;
+}
--- a/claude-sandbox/truenas-burnin/app/static/app.js
+++ b/claude-sandbox/truenas-burnin/app/static/app.js
@ -69,6 +69,10 @@
    restoreCheckboxes();
    initElapsedTimers();
    initLocationEdits();
+    if (_drawerDriveId) {
+      _drawerHighlightRow(_drawerDriveId);
+      drawerFetch(_drawerDriveId);
+    }
  });

  updateCounts();
@ -131,14 +135,59 @@
    if (nb) nb.style.display = 'none';
  }

-  // Handle job-alert SSE events for browser notifications
+  // Handle SSE events
  document.addEventListener('htmx:sseMessage', function (e) {
-    if (!e.detail || e.detail.type !== 'job-alert') return;
-    try {
-      handleJobAlert(JSON.parse(e.detail.data));
-    } catch (_) {}
+    if (!e.detail) return;
+    if (e.detail.type === 'job-alert') {
+      try { handleJobAlert(JSON.parse(e.detail.data)); } catch (_) {}
+    } else if (e.detail.type === 'system-sensors') {
+      try { handleSystemSensors(JSON.parse(e.detail.data)); } catch (_) {}
+    }
  });

+  function handleSystemSensors(data) {
+    var st   = data.system_temps  || {};
+    var tp   = data.thermal_pressure || 'ok';
+    var warn = data.temp_warn_c   || 46;
+    var crit = data.temp_crit_c   || 55;
+
+    function tempClass(c) {
+      if (c == null) return '';
+      return c >= crit ? 'temp-hot' : c >= warn ? 'temp-warm' : 'temp-cool';
+    }
+
+    // CPU chip
+    var cpuChip = document.getElementById('sensor-cpu');
+    var cpuVal  = document.getElementById('sensor-cpu-val');
+    if (cpuVal && st.cpu_c != null) {
+      if (cpuChip) cpuChip.hidden = false;
+      cpuVal.textContent = st.cpu_c + '°';
+      cpuVal.className   = 'stat-sensor-val ' + tempClass(st.cpu_c);
+    }
+
+    // PCH chip
+    var pchChip = document.getElementById('sensor-pch');
+    var pchVal  = document.getElementById('sensor-pch-val');
+    if (pchVal && st.pch_c != null) {
+      if (pchChip) pchChip.hidden = false;
+      pchVal.textContent = st.pch_c + '°';
+      pchVal.className   = 'stat-sensor-val ' + tempClass(st.pch_c);
+    }
+
+    // Thermal pressure chip
+    var tChip = document.getElementById('sensor-thermal');
+    var tVal  = document.getElementById('sensor-thermal-val');
+    if (tChip && tVal) {
+      if (tp === 'warn' || tp === 'crit') {
+        tChip.hidden = false;
+        tChip.className = 'stat-sensor stat-sensor-thermal stat-sensor-thermal-' + tp;
+        tVal.textContent = tp === 'warn' ? 'WARM' : 'HOT';
+      } else {
+        tChip.hidden = true;
+      }
+    }
+  }
+
  function handleJobAlert(data) {
    var isPass   = data.state === 'passed';
    var icon     = isPass ? '✓' : '✕';
@ -842,7 +891,458 @@
      if (modal && !modal.hidden) { closeModal(); return; }
      var bModal = document.getElementById('batch-modal');
      if (bModal && !bModal.hidden) { closeBatchModal(); return; }
+      if (_drawerDriveId) { closeDrawer(); return; }
    }
  });

+  // -----------------------------------------------------------------------
+  // Log Drawer
+  // -----------------------------------------------------------------------
+
+  var _drawerDriveId = null;
+  var _drawerTab = 'burnin';
+
+  function openDrawer(driveId) {
+    if (_drawerDriveId === driveId) { closeDrawer(); return; }
+    _drawerDriveId = driveId;
+    var drawer = document.getElementById('log-drawer');
+    drawer.removeAttribute('hidden');
+    document.body.classList.add('drawer-open');
+    _drawerHighlightRow(driveId);
+    drawerFetch(driveId);
+  }
+
+  function closeDrawer() {
+    _drawerDriveId = null;
+    var drawer = document.getElementById('log-drawer');
+    drawer.setAttribute('hidden', '');
+    document.body.classList.remove('drawer-open');
+    document.querySelectorAll('tr.drawer-row-active').forEach(function (r) {
+      r.classList.remove('drawer-row-active');
+    });
+  }
+
+  function _drawerHighlightRow(driveId) {
+    document.querySelectorAll('tr.drawer-row-active').forEach(function (r) {
+      r.classList.remove('drawer-row-active');
+    });
+    var row = document.getElementById('drive-' + driveId);
+    if (row) row.classList.add('drawer-row-active');
+  }
+
+  async function drawerFetch(driveId) {
+    ['burnin', 'smart', 'events'].forEach(function (tab) {
+      var p = document.getElementById('drawer-panel-' + tab);
+      if (p && !p.innerHTML.trim()) {
+        p.innerHTML = '<div class="drawer-loading">Loading\u2026</div>';
+      }
+    });
+    try {
+      var resp = await fetch('/api/v1/drives/' + driveId + '/drawer');
+      if (!resp.ok) throw new Error('HTTP ' + resp.status);
+      var data = await resp.json();
+      _drawerRender(data);
+    } catch (e) {
+      ['burnin', 'smart', 'events'].forEach(function (tab) {
+        var p = document.getElementById('drawer-panel-' + tab);
+        if (p) p.innerHTML = '<div class="drawer-loading" style="color:var(--red)">Failed to load.</div>';
+      });
+    }
+  }
+
+  function _drawerRender(data) {
+    var drive = data.drive || {};
+    var devnameEl = document.getElementById('drawer-devname');
+    var metaEl    = document.getElementById('drawer-drive-meta');
+    if (devnameEl) devnameEl.textContent = drive.devname || '\u2014';
+    if (metaEl) {
+      var meta = drive.model || '';
+      if (drive.serial) meta += ' \u00b7 ' + drive.serial;
+      metaEl.textContent = meta;
+    }
+    _drawerRenderBurnin(data.burnin);
+    _drawerRenderSmart(data.smart);
+    _drawerRenderEvents(data.events);
+  }
+
+  function _drawerRenderBurnin(burnin) {
+    var panel = document.getElementById('drawer-panel-burnin');
+    if (!panel) return;
+
+    if (!burnin) {
+      panel.innerHTML = '<div class="drawer-empty">No burn-in history for this drive.</div>';
+      return;
+    }
+
+    var html = '<div class="drawer-job-header">';
+    html += '<span class="chip chip-' + _esc(burnin.state) + '">' + _esc(burnin.state.toUpperCase()) + '</span>';
+    html += '<span class="drawer-job-meta">';
+    if (burnin.operator) html += 'by ' + _esc(burnin.operator);
+    if (burnin.started_at) html += ' \u00b7 ' + _drawerFmtDt(burnin.started_at);
+    html += '</span></div>';
+
+    html += '<div class="drawer-stages">';
+    var stages = burnin.stages || [];
+    if (stages.length) {
+      stages.forEach(function (s) {
+        html += '<div class="drawer-stage stage-' + _esc(s.state) + '">';
+        html += '<div class="stage-row-header">';
+        html += '<span class="stage-icon">' + _drawerStageIcon(s.state) + '</span>';
+        html += '<span class="stage-name-label">' + _esc(_drawerStageName(s.stage_name)) + '</span>';
+        if (s.state === 'running') {
+          html += '<span class="stage-pct">' + (s.percent || 0) + '%</span>';
+          if (s.started_at) {
+            html += '<span class="elapsed-timer" data-started="' + _esc(s.started_at) + '"></span>';
+          }
+          html += '<span class="stage-cursor">\u258a</span>';
+        } else if (s.finished_at && s.started_at) {
+          html += '<span class="stage-duration">' + _drawerFmtDuration(s.started_at, s.finished_at) + '</span>';
+        }
+        html += '</div>';
+        if (s.error_text) {
+          html += '<div class="stage-error-line">' + _esc(s.error_text) + '</div>';
+        }
+        // Raw SSH log output (if available)
+        if (s.log_text) {
+          var logHtml = _esc(s.log_text)
+            .replace(/^(\d+)\s*$/gm, '<span class="log-bad-block">$1  ← BAD BLOCK</span>')
+            .replace(/\[WARNING\][^\n]*/g, '<span class="log-warn">$&</span>');
+          html += '<pre class="stage-log">' + logHtml + '</pre>';
+        }
+        // Bad block count badge
+        if (s.bad_blocks && s.bad_blocks > 0) {
+          html += '<div class="stage-error-line">' + s.bad_blocks + ' bad block(s) found</div>';
+        }
+        html += '</div>';
+      });
+    } else {
+      html += '<div class="drawer-empty">No stage data yet.</div>';
+    }
+    html += '</div>';
+
+    var wasAtBottom = panel.scrollHeight - panel.scrollTop <= panel.clientHeight + 5;
+    panel.innerHTML = html;
+    tickElapsedTimers();
+    var autoScroll = document.getElementById('autoscroll-toggle');
+    if (autoScroll && autoScroll.checked && wasAtBottom) {
+      panel.scrollTop = panel.scrollHeight;
+    }
+  }
+
+  // Monitored SMART attributes for inline colouring
+  var _SMART_CRITICAL = {5: true, 197: true, 198: true};
+  var _SMART_WARN     = {10: true, 188: true, 199: true};
+
+  function _drawerRenderSmart(smart) {
+    var panel = document.getElementById('drawer-panel-smart');
+    if (!panel) return;
+
+    var html = '<div class="drawer-smart-grid">';
+    ['short', 'long'].forEach(function (type) {
+      var t = smart ? smart[type] : null;
+      var label = type === 'short' ? 'Short SMART' : 'Long SMART';
+      html += '<div class="smart-card">';
+      html += '<div class="smart-card-label">' + label + '</div>';
+      if (!t || !t.state || t.state === 'idle') {
+        html += '<span class="chip chip-unknown">Not run</span>';
+      } else {
+        html += '<span class="chip chip-' + _esc(t.state) + '">' + _esc(t.state.toUpperCase()) + '</span>';
+        if (t.state === 'running') {
+          html += '<div class="smart-progress"><div class="progress-bar"><div class="progress-fill" style="width:' + (t.percent || 0) + '%"></div></div>'
+                + '<span style="font-size:12px;color:var(--blue)">' + (t.percent || 0) + '%</span></div>';
+        }
+        if (t.started_at)  html += '<div class="smart-detail">Started: '  + _drawerFmtDt(t.started_at) + '</div>';
+        if (t.finished_at) html += '<div class="smart-detail">Finished: ' + _drawerFmtDt(t.finished_at) + '</div>';
+        if (t.error_text)  html += '<div class="stage-error-line">' + _esc(t.error_text) + '</div>';
+        // Raw smartctl output (SSH mode)
+        if (t.raw_output) {
+          html += '<pre class="smart-attr-raw-output">' + _esc(t.raw_output) + '</pre>';
+        }
+      }
+      html += '</div>';
+    });
+    html += '</div>';
+
+    // SMART attribute table (from SSH attribute parse)
+    var attrs = smart && smart.attrs;
+    if (attrs) {
+      html += '<div class="smart-attrs">';
+      html += '<div class="smart-attrs-title">SMART Attributes</div>';
+      if (attrs.failures && attrs.failures.length) {
+        html += '<div class="stage-error-line" style="margin-bottom:6px">✕ Failures: ' + _esc(attrs.failures.join('; ')) + '</div>';
+      }
+      if (attrs.warnings && attrs.warnings.length) {
+        html += '<div class="stage-error-line" style="color:var(--yellow);margin-bottom:6px">⚠ Warnings: ' + _esc(attrs.warnings.join('; ')) + '</div>';
+      }
+      var attrMap = attrs.attrs || {};
+      var monitoredIds = [5, 10, 188, 197, 198, 199];
+      monitoredIds.forEach(function (id) {
+        var entry = attrMap[String(id)];
+        if (!entry) return;
+        var raw = entry.raw;
+        var cls = raw > 0 ? (_SMART_CRITICAL[id] ? 'attr-fail' : 'attr-warn') : 'attr-ok';
+        html += '<div class="smart-attr-row">';
+        html += '<span class="smart-attr-name">' + id + ' ' + _esc(entry.name) + '</span>';
+        html += '<span class="smart-attr-val ' + cls + '">' + raw + '</span>';
+        html += '</div>';
+      });
+      html += '</div>';
+    }
+
+    panel.innerHTML = html;
+  }
+
+  function _drawerRenderEvents(events) {
+    var panel = document.getElementById('drawer-panel-events');
+    if (!panel) return;
+
+    if (!events || events.length === 0) {
+      panel.innerHTML = '<div class="drawer-empty">No events recorded for this drive.</div>';
+      return;
+    }
+
+    var html = '<div class="drawer-events">';
+    events.forEach(function (ev) {
+      var isErr = (ev.event_type || '').indexOf('fail') !== -1 || (ev.event_type || '').indexOf('stuck') !== -1;
+      html += '<div class="drawer-event' + (isErr ? ' event-error' : '') + '">';
+      html += '<span class="event-time">'    + _drawerFmtDt(ev.created_at) + '</span>';
+      html += '<span class="event-type">'    + _esc(ev.event_type || '') + '</span>';
+      if (ev.message)  html += '<span class="event-message">'  + _esc(ev.message) + '</span>';
+      if (ev.operator) html += '<span class="event-operator">by ' + _esc(ev.operator) + '</span>';
+      html += '</div>';
+    });
+    html += '</div>';
+    panel.innerHTML = html;
+  }
+
+  function _esc(s) {
+    return String(s == null ? '' : s)
+      .replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;').replace(/"/g, '&quot;');
+  }
+
+  function _drawerFmtDt(iso) {
+    if (!iso) return '';
+    try { return new Date(iso).toLocaleString(); } catch (e) { return iso; }
+  }
+
+  function _drawerFmtDuration(startIso, endIso) {
+    try {
+      var secs = Math.max(0, Math.floor((new Date(endIso) - new Date(startIso)) / 1000));
+      var h = Math.floor(secs / 3600), m = Math.floor((secs % 3600) / 60), s = secs % 60;
+      if (h > 0) return h + 'h ' + m + 'm';
+      if (m > 0) return m + 'm ' + s + 's';
+      return s + 's';
+    } catch (e) { return ''; }
+  }
+
+  function _drawerStageName(name) {
+    return (name || '').replace(/_/g, ' ').replace(/\b\w/g, function (c) { return c.toUpperCase(); });
+  }
+
+  function _drawerStageIcon(state) {
+    return { passed: '\u2713', failed: '\u2715', running: '\u25b6', cancelled: '\u25fc', pending: '\u25cb', skipped: '\u2014' }[state] || '\u25cb';
+  }
+
+  // Row click → open drawer (ignore interactive elements)
+  document.addEventListener('click', function (e) {
+    if (e.target.closest('button, input, label, a, .drive-location')) return;
+    var row = e.target.closest('#drives-tbody tr[id^="drive-"]');
+    if (!row) return;
+    openDrawer(row.id.replace('drive-', ''));
+  });
+
+  // Tab switching
+  document.addEventListener('click', function (e) {
+    var btn = e.target.closest('.drawer-tab');
+    if (!btn) return;
+    _drawerTab = btn.dataset.tab;
+    document.querySelectorAll('.drawer-tab').forEach(function (b) {
+      b.classList.toggle('active', b.dataset.tab === _drawerTab);
+    });
+    document.querySelectorAll('.drawer-panel').forEach(function (p) {
+      p.classList.toggle('active', p.id === 'drawer-panel-' + _drawerTab);
+    });
+    // Terminal tab: init/fit on activation; hide autoscroll (N/A for terminal)
+    var asl = document.querySelector('.autoscroll-label');
+    if (_drawerTab === 'terminal') {
+      if (asl) asl.style.visibility = 'hidden';
+      openTerminalTab();
+    } else {
+      if (asl) asl.style.visibility = '';
+    }
+  });
+
+  // Close button
+  document.addEventListener('click', function (e) {
+    if (e.target.closest('#drawer-close-btn')) closeDrawer();
+  });
+
+  // Reset button — clears SMART state for a drive
+  document.addEventListener('click', function (e) {
+    var btn = e.target.closest('.btn-reset');
+    if (!btn) return;
+    var driveId = btn.dataset.driveId;
+    if (!driveId) return;
+    var operator = (window._operator || 'operator');
+    fetch('/api/v1/drives/' + driveId + '/reset', {
+      method: 'POST',
+      headers: { 'Content-Type': 'application/json' },
+      body: JSON.stringify({ operator: operator }),
+    }).then(function (r) {
+      if (!r.ok) return r.json().then(function (d) { showToast(d.detail || 'Reset failed', 'error'); });
+      showToast('Drive reset — state cleared', 'success');
+    }).catch(function () { showToast('Network error', 'error'); });
+  });
+
+  // -----------------------------------------------------------------------
+  // Live Terminal  (xterm.js + SSH WebSocket)
+  // -----------------------------------------------------------------------
+
+  var _xtermReady = false;   // xterm.js + FitAddon libraries loaded
+  var _terminal   = null;    // xterm.js Terminal instance
+  var _termFit    = null;    // FitAddon instance
+  var _termWs     = null;    // active WebSocket (null = disconnected)
+
+  function _loadXtermLibs(cb) {
+    var link = document.createElement('link');
+    link.rel  = 'stylesheet';
+    link.href = 'https://cdn.jsdelivr.net/npm/xterm@5.3.0/css/xterm.css';
+    document.head.appendChild(link);
+
+    var s1 = document.createElement('script');
+    s1.src = 'https://cdn.jsdelivr.net/npm/xterm@5.3.0/lib/xterm.js';
+    s1.onload = function () {
+      var s2 = document.createElement('script');
+      s2.src = 'https://cdn.jsdelivr.net/npm/xterm-addon-fit@0.8.0/lib/xterm-addon-fit.js';
+      s2.onload = cb;
+      document.head.appendChild(s2);
+    };
+    document.head.appendChild(s1);
+  }
+
+  function openTerminalTab() {
+    var panel = document.getElementById('drawer-panel-terminal');
+    if (!panel) return;
+
+    if (!_xtermReady) {
+      panel.innerHTML = '<div class="drawer-loading">Loading terminal\u2026</div>';
+      _loadXtermLibs(function () {
+        _xtermReady = true;
+        _termInit(panel);
+      });
+      return;
+    }
+
+    if (!_terminal) {
+      _termInit(panel);
+      return;
+    }
+
+    // Already initialised — refit to current panel dimensions
+    setTimeout(function () {
+      if (_termFit) try { _termFit.fit(); } catch (_) {}
+    }, 30);
+  }
+
+  function _termInit(panel) {
+    panel.innerHTML = '';
+
+    var term = new Terminal({
+      cursorBlink: true,
+      fontSize: 13,
+      fontFamily: '"SF Mono","Fira Code",Consolas,"DejaVu Sans Mono",monospace',
+      theme: {
+        background:          '#0d1117',
+        foreground:          '#e6edf3',
+        cursor:              '#58a6ff',
+        cursorAccent:        '#0d1117',
+        selectionBackground: 'rgba(88,166,255,0.25)',
+        black:         '#484f58', red:     '#ff7b72', green:   '#3fb950', yellow:  '#d29922',
+        blue:          '#58a6ff', magenta: '#bc8cff', cyan:    '#39c5cf', white:   '#b1bac4',
+        brightBlack:   '#6e7681', brightRed: '#ffa198', brightGreen: '#56d364',
+        brightYellow:  '#e3b341', brightBlue: '#79c0ff', brightMagenta: '#d2a8ff',
+        brightCyan:    '#56d4dd', brightWhite: '#f0f6fc',
+      },
+      scrollback: 2000,
+      allowProposedApi: true,
+    });
+
+    var fit = new FitAddon.FitAddon();
+    term.loadAddon(fit);
+    term.open(panel);
+
+    _terminal = term;
+    _termFit  = fit;
+
+    // Initial fit after the panel is visible
+    setTimeout(function () {
+      if (_termFit) try { _termFit.fit(); } catch (_) {}
+    }, 30);
+
+    // Forward all keystrokes → SSH (onData registered once here)
+    term.onData(function (data) {
+      if (_termWs && _termWs.readyState === 1) {
+        _termWs.send(new TextEncoder().encode(data));
+      }
+    });
+
+    // Refit + notify server on resize
+    new ResizeObserver(function () {
+      if (!_termFit) return;
+      try { _termFit.fit(); } catch (_) {}
+      if (_termWs && _termWs.readyState === 1 && _terminal) {
+        _termWs.send(JSON.stringify({ type: 'resize', cols: _terminal.cols, rows: _terminal.rows }));
+      }
+    }).observe(panel);
+
+    _termConnect();
+  }
+
+  function _termConnect() {
+    if (_termWs && _termWs.readyState <= 1) return; // already open or connecting
+
+    var proto = location.protocol === 'https:' ? 'wss:' : 'ws:';
+    var ws    = new WebSocket(proto + '//' + location.host + '/ws/terminal');
+    ws.binaryType = 'arraybuffer';
+    _termWs = ws;
+
+    ws.onopen = function () {
+      _termHideReconnect();
+      if (_terminal && ws.readyState === 1) {
+        ws.send(JSON.stringify({ type: 'resize', cols: _terminal.cols, rows: _terminal.rows }));
+      }
+    };
+
+    ws.onmessage = function (e) {
+      if (!_terminal) return;
+      _terminal.write(e.data instanceof ArrayBuffer ? new Uint8Array(e.data) : e.data);
+    };
+
+    ws.onclose = function () {
+      if (_terminal) _terminal.write('\r\n\x1b[33m\u2500\u2500 disconnected \u2500\u2500\x1b[0m\r\n');
+      _termShowReconnect();
+    };
+
+    ws.onerror = function () { /* onclose fires too */ };
+  }
+
+  function _termShowReconnect() {
+    var panel = document.getElementById('drawer-panel-terminal');
+    if (!panel || panel.querySelector('.term-reconnect-bar')) return;
+    var bar = document.createElement('div');
+    bar.className = 'term-reconnect-bar';
+    bar.innerHTML = '<span>Connection closed</span>'
+                  + '<button class="btn-secondary">\u21ba Reconnect</button>';
+    bar.querySelector('button').onclick = function () {
+      bar.remove();
+      _termConnect();
+    };
+    panel.appendChild(bar);
+  }
+
+  function _termHideReconnect() {
+    var bar = document.querySelector('.term-reconnect-bar');
+    if (bar) bar.remove();
+  }
+
 }());
--- a/claude-sandbox/truenas-burnin/app/templates/components/drives_table.html
+++ b/claude-sandbox/truenas-burnin/app/templates/components/drives_table.html
@ -81,6 +81,10 @@
      {%- set short_busy = drive.smart_short and drive.smart_short.state == 'running' %}
      {%- set long_busy  = drive.smart_long  and drive.smart_long.state  == 'running' %}
      {%- set selectable = not bi_active and not short_busy and not long_busy %}
+      {%- set bi_done = drive.burnin and drive.burnin.state in ('passed', 'failed', 'cancelled', 'unknown') %}
+      {%- set smart_done = (drive.smart_short and drive.smart_short.state in ('passed','failed','aborted'))
+                        or (drive.smart_long  and drive.smart_long.state  in ('passed','failed','aborted')) %}
+      {%- set can_reset = (bi_done or smart_done) and not bi_active and not short_busy and not long_busy %}
      <tr data-status="{{ drive.status }}" id="drive-{{ drive.id }}">
        <td class="col-check">
          {%- if selectable %}
@ -160,6 +164,12 @@
                    data-health="{{ drive.smart_health }}"
                    {% if short_busy or long_busy %}disabled{% endif %}
                    title="Start Burn-In">Burn-In</button>
+            <!-- Reset — clears SMART state so drive can be re-tested from scratch -->
+            {%- if can_reset %}
+            <button class="btn-action btn-reset"
+                    data-drive-id="{{ drive.id }}"
+                    title="Reset SMART state — clears test results so drive shows as fresh">Reset</button>
+            {%- endif %}
            {%- endif %}
          </div>
        </td>
--- a/claude-sandbox/truenas-burnin/app/templates/dashboard.html
+++ b/claude-sandbox/truenas-burnin/app/templates/dashboard.html
@ -6,7 +6,7 @@
 {% include "components/modal_start.html" %}
 {% include "components/modal_batch.html" %}

-<!-- Stats bar — counts are updated live by app.js updateCounts() -->
+<!-- Stats bar — drive counts updated live by app.js updateCounts(); sensor chips updated by SSE system-sensors event -->
 <div class="stats-bar">
  <div class="stat-card" data-stat-filter="all">
    <span class="stat-value" id="stat-all">{{ drives | length }}</span>
@ -28,6 +28,33 @@
    <span class="stat-value" id="stat-idle">0</span>
    <span class="stat-label">Idle</span>
  </div>
+
+  {%- set st = poller.system_temps if (poller and poller.system_temps) else {} %}
+  {%- if st.get('cpu_c') is not none or st.get('pch_c') is not none %}
+  <div class="stats-bar-sep"></div>
+  {%- if st.get('cpu_c') is not none %}
+  <div class="stat-sensor" id="sensor-cpu">
+    <span class="stat-sensor-val {{ st.get('cpu_c') | temp_class }}" id="sensor-cpu-val">{{ st.get('cpu_c') }}°</span>
+    <span class="stat-sensor-label">CPU</span>
+  </div>
+  {%- endif %}
+  {%- if st.get('pch_c') is not none %}
+  <div class="stat-sensor" id="sensor-pch">
+    <span class="stat-sensor-val {{ st.get('pch_c') | temp_class }}" id="sensor-pch-val">{{ st.get('pch_c') }}°</span>
+    <span class="stat-sensor-label">PCH</span>
+  </div>
+  {%- endif %}
+  {%- endif %}
+
+  {%- set tp = poller.thermal_pressure if poller else 'ok' %}
+  <div class="stat-sensor stat-sensor-thermal stat-sensor-thermal-{{ tp }}"
+       id="sensor-thermal"
+       {% if not tp or tp == 'ok' %}hidden{% endif %}>
+    <span class="stat-sensor-val" id="sensor-thermal-val">
+      {%- if tp == 'warn' %}WARM{%- elif tp == 'crit' %}HOT{%- else %}OK{%- endif %}
+    </span>
+    <span class="stat-sensor-label">Thermal</span>
+  </div>
 </div>

 <!-- Failed drive banner — shown/hidden by JS when failed count > 0 -->
@ -71,4 +98,33 @@
    </div>
  </div>
 </div>
+
+<!-- Log Drawer (fixed, lives outside SSE swap area) -->
+<div id="log-drawer" class="log-drawer" hidden>
+  <div class="drawer-header">
+    <div class="drawer-drive-info">
+      <span class="drawer-devname" id="drawer-devname">—</span>
+      <span class="drawer-drive-meta" id="drawer-drive-meta"></span>
+    </div>
+    <nav class="drawer-tabs">
+      <button class="drawer-tab active" data-tab="burnin">Burn-In</button>
+      <button class="drawer-tab" data-tab="smart">SMART</button>
+      <button class="drawer-tab" data-tab="events">Events</button>
+      <button class="drawer-tab" data-tab="terminal">Terminal</button>
+    </nav>
+    <div class="drawer-controls">
+      <label class="autoscroll-label">
+        <input type="checkbox" id="autoscroll-toggle" checked>
+        <span>Auto-scroll</span>
+      </label>
+      <button class="drawer-close" id="drawer-close-btn" title="Close (Esc)">✕</button>
+    </div>
+  </div>
+  <div class="drawer-body">
+    <div class="drawer-panel active" id="drawer-panel-burnin"></div>
+    <div class="drawer-panel" id="drawer-panel-smart"></div>
+    <div class="drawer-panel" id="drawer-panel-events"></div>
+    <div class="drawer-panel drawer-panel-terminal" id="drawer-panel-terminal"></div>
+  </div>
+</div>
 {% endblock %}
--- a/claude-sandbox/truenas-burnin/app/templates/history.html
+++ b/claude-sandbox/truenas-burnin/app/templates/history.html
@ -32,6 +32,7 @@
        <th>State</th>
        <th>Operator</th>
        <th>Started</th>
+        <th>Completed</th>
        <th>Duration</th>
        <th>Error</th>
        <th class="col-actions"></th>
@ -54,6 +55,7 @@
          </td>
          <td class="text-muted">{{ j.operator or '—' }}</td>
          <td class="mono text-muted">{{ j.started_at | format_dt_full }}</td>
+          <td class="mono text-muted">{{ j.finished_at | format_dt_full }}</td>
          <td class="mono text-muted">{{ j.duration_seconds | format_duration }}</td>
          <td class="error-cell">
            {% if j.error_text %}
@ -67,7 +69,7 @@
        {% endfor %}
      {% else %}
        <tr>
-          <td colspan="9" class="empty-state">No burn-in jobs found.</td>
+          <td colspan="10" class="empty-state">No burn-in jobs found.</td>
        </tr>
      {% endif %}
    </tbody>
--- a/claude-sandbox/truenas-burnin/app/templates/layout.html
+++ b/claude-sandbox/truenas-burnin/app/templates/layout.html
@ -17,6 +17,7 @@
      <line x1="6" y1="18" x2="6.01" y2="18"></line>
    </svg>
    <span class="header-title">TrueNAS Burn-In</span>
+    <span class="header-version">v{{ app_version if app_version is defined else '—' }}</span>
  </a>
  <div class="header-meta">
    <span class="live-indicator">
--- a/claude-sandbox/truenas-burnin/app/templates/settings.html
+++ b/claude-sandbox/truenas-burnin/app/templates/settings.html
@ -6,12 +6,14 @@
 <div class="page-toolbar">
  <h1 class="page-title">Settings</h1>
  <div class="toolbar-right">
-    <a class="btn-export" href="/docs" target="_blank" rel="noopener">API Docs</a>
+    <button type="button" id="check-updates-btn" class="btn-secondary">Check for Updates</button>
+    <span id="update-result" class="settings-test-result" style="display:none;margin-left:8px"></span>
+    <a class="btn-export" href="/docs" target="_blank" rel="noopener" style="margin-left:8px">API Docs</a>
  </div>
 </div>
 <p class="page-subtitle">
  Changes take effect immediately. Settings marked
-  <span class="badge-restart">restart required</span> must be changed in <code>.env</code>.
+  <span class="badge-restart">restart required</span> are saved but need a container restart to fully apply.
 </p>

 <form id="settings-form" autocomplete="off">
@ -89,6 +91,57 @@
        </div>
      </div>

+      <!-- SSH -->
+      <div class="settings-card">
+        <div class="settings-card-header">
+          <span class="settings-card-title">SSH (TrueNAS Direct)</span>
+          {% if ssh_configured %}
+          <span class="chip chip-passed" style="font-size:10px">Configured</span>
+          {% else %}
+          <span class="chip chip-unknown" style="font-size:10px">Not configured — using REST API / mock</span>
+          {% endif %}
+        </div>
+        <p class="sf-hint" style="margin-bottom:8px">
+          When configured, burn-in stages run smartctl and badblocks directly on TrueNAS over SSH,
+          enabling SMART attribute monitoring and real bad-block detection. Leave Host empty to use
+          the TrueNAS REST API (mock / dev mode).
+        </p>
+        <div class="sf-fields">
+
+          <div class="sf-full sf-row-test" style="margin-bottom:4px">
+            <button type="button" id="test-ssh-btn" class="btn-secondary">Test SSH Connection</button>
+            <span id="ssh-test-result" class="settings-test-result" style="display:none"></span>
+          </div>
+
+          <label for="ssh_host">Host / IP</label>
+          <input class="sf-input" id="ssh_host" name="ssh_host" type="text"
+                 value="{{ editable.ssh_host }}" placeholder="10.0.0.x (same as TrueNAS IP)">
+
+          <label for="ssh_port">Port</label>
+          <input class="sf-input sf-input-xs" id="ssh_port" name="ssh_port"
+                 type="number" min="1" max="65535" value="{{ editable.ssh_port }}" style="width:70px">
+
+          <label for="ssh_user">Username</label>
+          <input class="sf-input" id="ssh_user" name="ssh_user" type="text"
+                 value="{{ editable.ssh_user }}" placeholder="root">
+
+          <label for="ssh_password">Password</label>
+          <input class="sf-input" id="ssh_password" name="ssh_password" type="password"
+                 placeholder="leave blank to keep existing" autocomplete="new-password">
+
+          <label for="ssh_key">Private Key</label>
+          <div>
+            <textarea class="sf-input sf-textarea" id="ssh_key" name="ssh_key"
+                      rows="6" placeholder="Paste PEM private key here (-----BEGIN ... KEY-----). Leave blank to keep existing." autocomplete="off"></textarea>
+            <span class="sf-hint" style="margin-top:3px">
+              Either password or key auth. Key takes precedence if both are set.
+              Key is stored securely in <code>/data/settings_overrides.json</code>.
+            </span>
+          </div>
+
+        </div>
+      </div>
+
    </div><!-- /left col -->

    <!-- RIGHT column: Notifications + Behavior -->
@ -157,9 +210,14 @@
        <div class="sf-row">
          <label class="sf-label" for="max_parallel_burnins">Max Parallel Burn-Ins</label>
          <input class="sf-input sf-input-xs" id="max_parallel_burnins" name="max_parallel_burnins"
-                 type="number" min="1" max="16" value="{{ editable.max_parallel_burnins }}">
+                 type="number" min="1" max="60" value="{{ editable.max_parallel_burnins }}">
          <span class="sf-hint">How many jobs can run at the same time</span>
        </div>
+        <div id="parallel-warn" class="sf-inline-warn"
+             {% if editable.max_parallel_burnins <= 8 %}style="display:none"{% endif %}>
+          ⚠ Running many simultaneous surface scans may saturate your storage controller
+          and produce unreliable results. Recommended: 2–4.
+        </div>

        <div class="sf-row">
          <label class="sf-label" for="stuck_job_hours">Stuck Job Threshold (hours)</label>
@ -167,52 +225,101 @@
                 type="number" min="1" max="168" value="{{ editable.stuck_job_hours }}">
          <span class="sf-hint">Jobs running longer than this → auto-marked unknown</span>
        </div>
+
+        <div class="sf-divider"></div>
+
+        <div class="sf-row">
+          <label class="sf-label" for="temp_warn_c">Temp Warning (°C)</label>
+          <input class="sf-input sf-input-xs" id="temp_warn_c" name="temp_warn_c"
+                 type="number" min="20" max="80" value="{{ editable.temp_warn_c }}">
+          <span class="sf-hint">Show orange above this temperature</span>
+        </div>
+
+        <div class="sf-row">
+          <label class="sf-label" for="temp_crit_c">Temp Critical (°C)</label>
+          <input class="sf-input sf-input-xs" id="temp_crit_c" name="temp_crit_c"
+                 type="number" min="20" max="80" value="{{ editable.temp_crit_c }}">
+          <span class="sf-hint">Show red + block burn-in start above this temperature</span>
+        </div>
+
+        <div class="sf-row">
+          <label class="sf-label" for="bad_block_threshold">Bad Block Threshold</label>
+          <input class="sf-input sf-input-xs" id="bad_block_threshold" name="bad_block_threshold"
+                 type="number" min="0" max="9999" value="{{ editable.bad_block_threshold }}">
+          <span class="sf-hint">Max bad blocks before surface validate fails (Stage 7)</span>
+        </div>
      </div>

    </div><!-- /right col -->
  </div><!-- /two-col -->

+  <!-- System settings (restart required) -->
+  <div class="settings-card" style="margin-top:16px">
+    <div class="settings-card-header">
+      <span class="settings-card-title">System</span>
+      <span class="badge-restart">restart required to apply</span>
+    </div>
+    <div class="settings-two-col" style="gap:16px">
+      <div class="sf-fields">
+
+        <label for="truenas_base_url">TrueNAS URL</label>
+        <input class="sf-input" id="truenas_base_url" name="truenas_base_url" type="text"
+               value="{{ editable.truenas_base_url }}" placeholder="http://10.0.0.x">
+
+        <label for="truenas_api_key">API Key</label>
+        <input class="sf-input" id="truenas_api_key" name="truenas_api_key" type="password"
+               placeholder="leave blank to keep existing" autocomplete="new-password">
+
+        <label for="truenas_verify_tls">Verify TLS</label>
+        <label class="toggle" style="margin-top:2px">
+          <input type="checkbox" id="truenas_verify_tls" name="truenas_verify_tls"
+                 {% if editable.truenas_verify_tls %}checked{% endif %}>
+          <span class="toggle-slider"></span>
+        </label>
+
+      </div>
+      <div class="sf-fields">
+
+        <label for="poll_interval_seconds">Poll Interval (s)</label>
+        <input class="sf-input sf-input-xs" id="poll_interval_seconds" name="poll_interval_seconds"
+               type="number" min="1" max="300" value="{{ editable.poll_interval_seconds }}">
+
+        <label for="stale_threshold_seconds">Stale Threshold (s)</label>
+        <input class="sf-input sf-input-xs" id="stale_threshold_seconds" name="stale_threshold_seconds"
+               type="number" min="1" max="600" value="{{ editable.stale_threshold_seconds }}">
+
+        <label for="log_level">Log Level</label>
+        <select class="sf-select" id="log_level" name="log_level">
+          {% for lvl in ['DEBUG','INFO','WARNING','ERROR','CRITICAL'] %}
+          <option value="{{ lvl }}" {% if editable.log_level == lvl %}selected{% endif %}>{{ lvl }}</option>
+          {% endfor %}
+        </select>
+
+        <label for="allowed_ips">IP Allowlist</label>
+        <div>
+          <input class="sf-input" id="allowed_ips" name="allowed_ips" type="text"
+                 value="{{ editable.allowed_ips }}" placeholder="10.0.0.0/24,127.0.0.1 (empty = allow all)">
+          <span class="sf-hint" style="margin-top:3px">Comma-separated IPs/CIDRs. Empty = allow all.</span>
+        </div>
+
+      </div>
+    </div>
+  </div>
+
  <!-- Save row -->
  <div class="settings-save-bar">
    <button type="submit" class="btn-primary" id="save-btn">Save Settings</button>
    <button type="button" class="btn-secondary" id="cancel-settings-btn">Cancel</button>
    <span id="save-result" class="settings-test-result" style="display:none"></span>
  </div>
-</form>

-<!-- System (read-only) -->
-<div class="settings-card settings-card-readonly">
-  <div class="settings-card-header">
-    <span class="settings-card-title">System</span>
-    <span class="badge-restart">restart required to change</span>
-  </div>
-  <div class="sf-readonly-grid">
-    <div class="sf-ro-row">
-      <span class="sf-ro-label">TrueNAS URL</span>
-      <span class="sf-ro-value mono">{{ readonly.truenas_base_url }}</span>
-    </div>
-    <div class="sf-ro-row">
-      <span class="sf-ro-label">Verify TLS</span>
-      <span class="sf-ro-value">{{ 'Yes' if readonly.truenas_verify_tls else 'No' }}</span>
-    </div>
-    <div class="sf-ro-row">
-      <span class="sf-ro-label">Poll Interval</span>
-      <span class="sf-ro-value mono">{{ readonly.poll_interval_seconds }}s</span>
-    </div>
-    <div class="sf-ro-row">
-      <span class="sf-ro-label">Stale Threshold</span>
-      <span class="sf-ro-value mono">{{ readonly.stale_threshold_seconds }}s</span>
-    </div>
-    <div class="sf-ro-row">
-      <span class="sf-ro-label">IP Allowlist</span>
-      <span class="sf-ro-value mono">{{ readonly.allowed_ips }}</span>
-    </div>
-    <div class="sf-ro-row">
-      <span class="sf-ro-label">Log Level</span>
-      <span class="sf-ro-value mono">{{ readonly.log_level }}</span>
-    </div>
-  </div>
+  <!-- Restart required banner — shown after saving system settings -->
+  <div id="restart-banner" style="display:none;margin-top:12px;padding:12px 16px;background:rgba(255,170,0,0.12);border:1px solid var(--yellow);border-radius:8px;color:var(--text-strong)">
+    <strong>&#9888; Container restart required</strong> — system settings are saved but won't take effect until you restart the app container:
+    <pre style="margin:8px 0 0;padding:8px 10px;background:var(--bg-card);border-radius:5px;font-size:12px;color:var(--text-strong);user-select:all">docker compose restart app</pre>
+    <span style="font-size:11px;color:var(--text-muted)">Run this on <strong>maple.local</strong> from <code>~/docker/stacks/truenas-burnin/</code></span>
  </div>
+</form>

 <script>
 (function () {
@ -260,7 +367,14 @@
      });
      var data = await resp.json();
      if (resp.ok) {
+        // Show restart notice if any system settings were saved
+        var systemFields = ['truenas_base_url','truenas_api_key','truenas_verify_tls',
+                            'poll_interval_seconds','stale_threshold_seconds','allowed_ips','log_level'];
+        var savedKeys = data.keys || [];
+        var needsRestart = savedKeys.some(function(k) { return systemFields.indexOf(k) >= 0; });
        showResult(saveResult, true, 'Saved');
+        var restartBanner = document.getElementById('restart-banner');
+        if (restartBanner) restartBanner.style.display = needsRestart ? '' : 'none';
      } else {
        showResult(saveResult, false, data.detail || 'Save failed');
      }
@ -298,6 +412,62 @@
      testBtn.textContent = 'Test Connection';
    }
  });
+
+  // Parallel burn-in warning
+  var parallelInput = document.getElementById('max_parallel_burnins');
+  var parallelWarn  = document.getElementById('parallel-warn');
+  if (parallelInput && parallelWarn) {
+    parallelInput.addEventListener('input', function () {
+      parallelWarn.style.display = parseInt(parallelInput.value, 10) > 8 ? '' : 'none';
+    });
+  }
+
+  // Test SSH
+  var sshBtn    = document.getElementById('test-ssh-btn');
+  var sshResult = document.getElementById('ssh-test-result');
+  if (sshBtn) {
+    sshBtn.addEventListener('click', async function () {
+      sshBtn.disabled = true;
+      sshBtn.textContent = 'Testing…';
+      sshResult.style.display = 'none';
+      try {
+        var resp = await fetch('/api/v1/settings/test-ssh', { method: 'POST' });
+        var data = await resp.json();
+        showResult(sshResult, resp.ok, resp.ok ? 'Connection OK' : (data.detail || 'Failed'));
+      } catch (e) {
+        showResult(sshResult, false, 'Network error');
+      } finally {
+        sshBtn.disabled = false;
+        sshBtn.textContent = 'Test SSH Connection';
+      }
+    });
+  }
+
+  // Check for Updates
+  var updBtn = document.getElementById('check-updates-btn');
+  var updResult = document.getElementById('update-result');
+  updBtn.addEventListener('click', async function () {
+    updBtn.disabled = true;
+    updBtn.textContent = 'Checking…';
+    updResult.style.display = 'none';
+    try {
+      var resp = await fetch('/api/v1/updates/check');
+      var data = await resp.json();
+      if (data.update_available) {
+        showResult(updResult, false, 'Update available: v' + data.latest + ' (current: v' + data.current + ')');
+      } else if (data.latest) {
+        showResult(updResult, true, 'Up to date (v' + data.current + ')');
+      } else {
+        var msg = data.message || ('v' + data.current + ' — no releases found');
+        showResult(updResult, true, msg);
+      }
+    } catch (e) {
+      showResult(updResult, false, 'Network error');
+    } finally {
+      updBtn.disabled = false;
+      updBtn.textContent = 'Check for Updates';
+    }
+  });
 }());
 </script>
 {% endblock %}
--- a/claude-sandbox/truenas-burnin/app/templates/stats.html
+++ b/claude-sandbox/truenas-burnin/app/templates/stats.html
@ -119,5 +119,65 @@
    {% endif %}
  </div>

+</div>
+
+<div class="stats-grid" style="margin-top:24px">
+
+  <!-- Average duration by drive size -->
+  <div class="stats-section">
+    <h2 class="section-title">Avg. Test Duration by Drive Size</h2>
+    {% if by_size %}
+    <div class="table-wrap" style="max-height:none">
+      <table>
+        <thead>
+          <tr>
+            <th>Size</th>
+            <th style="text-align:right">Jobs</th>
+            <th style="text-align:right">Avg Duration</th>
+          </tr>
+        </thead>
+        <tbody>
+          {% for s in by_size %}
+          <tr>
+            <td style="font-weight:500;color:var(--text-strong)">{{ s.size_tb }} TB</td>
+            <td class="mono text-muted" style="text-align:right">{{ s.total }}</td>
+            <td class="mono" style="text-align:right;color:var(--text-strong)">{{ s.avg_hours }}h</td>
+          </tr>
+          {% endfor %}
+        </tbody>
+      </table>
+    </div>
+    {% else %}
+    <div class="empty-state" style="border:1px solid var(--border);border-radius:8px;padding:32px">No completed jobs yet.</div>
+    {% endif %}
+  </div>
+
+  <!-- Failure breakdown by stage -->
+  <div class="stats-section">
+    <h2 class="section-title">Failures by Stage</h2>
+    {% if by_failure_stage %}
+    <div class="table-wrap" style="max-height:none">
+      <table>
+        <thead>
+          <tr>
+            <th>Stage</th>
+            <th style="text-align:right">Count</th>
+          </tr>
+        </thead>
+        <tbody>
+          {% for f in by_failure_stage %}
+          <tr>
+            <td style="font-weight:500;color:var(--red)">{{ f.failed_stage | replace('_',' ') | title }}</td>
+            <td class="mono" style="text-align:right;color:var(--red)">{{ f.count }}</td>
+          </tr>
+          {% endfor %}
+        </tbody>
+      </table>
+    </div>
+    {% else %}
+    <div class="empty-state" style="border:1px solid var(--border);border-radius:8px;padding:32px">No failures recorded.</div>
+    {% endif %}
+  </div>
+
 </div>
 {% endblock %}
--- a/claude-sandbox/truenas-burnin/app/terminal.py
+++ b/claude-sandbox/truenas-burnin/app/terminal.py
@ -0,0 +1,150 @@
+"""
+WebSocket → asyncssh PTY bridge for the live terminal drawer tab.
+
+Protocol
+--------
+Client  → server:  binary  = raw terminal input bytes
+                   text    = JSON control message, e.g. {"type":"resize","cols":80,"rows":24}
+Server  → client:  binary  = raw terminal output bytes
+"""
+
+import asyncio
+import json
+import logging
+
+import asyncssh
+from fastapi import WebSocket, WebSocketDisconnect
+
+log = logging.getLogger(__name__)
+
+
+async def handle(ws: WebSocket) -> None:
+    """Accept a WebSocket connection and bridge it to an SSH PTY."""
+    await ws.accept()
+
+    from app.config import settings  # late import — avoids circular at module level
+
+    # ── Guard: SSH must be configured ──────────────────────────────────────
+    if not settings.ssh_host:
+        await _send(ws,
+            b"\r\n\x1b[33mSSH not configured.\x1b[0m "
+            b"Set SSH Host in \x1b[1mSettings \u2192 SSH\x1b[0m first.\r\n"
+        )
+        await ws.close(1008)
+        return
+
+    connect_kw: dict = dict(
+        host=settings.ssh_host,
+        port=settings.ssh_port,
+        username=settings.ssh_user,
+        known_hosts=None,
+    )
+
+    if settings.ssh_key.strip():
+        try:
+            connect_kw["client_keys"] = [asyncssh.import_private_key(settings.ssh_key)]
+        except Exception as exc:
+            await _send(ws, f"\r\n\x1b[31mBad SSH key: {exc}\x1b[0m\r\n".encode())
+            await ws.close(1011)
+            return
+    elif settings.ssh_password:
+        connect_kw["password"] = settings.ssh_password
+    else:
+        # Fall back to mounted key file (same logic as ssh_client._connect)
+        import os
+        from app import ssh_client as _sc
+        key_path = os.environ.get("SSH_KEY_FILE", _sc._MOUNTED_KEY_PATH)
+        if os.path.exists(key_path):
+            connect_kw["client_keys"] = [key_path]
+        else:
+            await _send(ws,
+                b"\r\n\x1b[33mNo SSH credentials configured.\x1b[0m "
+                b"Set a password or private key in Settings.\r\n"
+            )
+            await ws.close(1008)
+            return
+
+    await _send(ws,
+        f"\r\n\x1b[36mConnecting to {settings.ssh_host}\u2026\x1b[0m\r\n".encode()
+    )
+
+    # ── Open SSH connection ─────────────────────────────────────────────────
+    try:
+        async with asyncssh.connect(**connect_kw) as conn:
+            process = await conn.create_process(
+                term_type="xterm-256color",
+                term_size=(80, 24),
+                encoding=None,   # raw bytes — xterm.js handles encoding
+            )
+            await _send(ws, b"\r\n\x1b[32mConnected\x1b[0m\r\n\r\n")
+
+            stop = asyncio.Event()
+
+            async def ssh_to_ws() -> None:
+                try:
+                    async for chunk in process.stdout:
+                        await ws.send_bytes(chunk)
+                except Exception:
+                    pass
+                finally:
+                    stop.set()
+
+            async def ws_to_ssh() -> None:
+                try:
+                    while not stop.is_set():
+                        msg = await ws.receive()
+                        if msg["type"] == "websocket.disconnect":
+                            break
+                        if msg.get("bytes"):
+                            process.stdin.write(msg["bytes"])
+                        elif msg.get("text"):
+                            try:
+                                ctrl = json.loads(msg["text"])
+                                if ctrl.get("type") == "resize":
+                                    process.change_terminal_size(
+                                        int(ctrl["cols"]), int(ctrl["rows"])
+                                    )
+                            except Exception:
+                                pass
+                except WebSocketDisconnect:
+                    pass
+                except Exception:
+                    pass
+                finally:
+                    stop.set()
+
+            t1 = asyncio.create_task(ssh_to_ws())
+            t2 = asyncio.create_task(ws_to_ssh())
+
+            _done, pending = await asyncio.wait(
+                [t1, t2], return_when=asyncio.FIRST_COMPLETED
+            )
+            for t in pending:
+                t.cancel()
+                try:
+                    await t
+                except asyncio.CancelledError:
+                    pass
+
+    except asyncssh.PermissionDenied:
+        await _send(ws, b"\r\n\x1b[31mSSH permission denied.\x1b[0m\r\n")
+    except asyncssh.DisconnectError as exc:
+        await _send(ws, f"\r\n\x1b[31mSSH disconnected: {exc}\x1b[0m\r\n".encode())
+    except OSError as exc:
+        await _send(ws, f"\r\n\x1b[31mCannot reach {settings.ssh_host}: {exc}\x1b[0m\r\n".encode())
+    except Exception as exc:
+        log.exception("Terminal WebSocket error")
+        await _send(ws, f"\r\n\x1b[31mError: {exc}\x1b[0m\r\n".encode())
+    finally:
+        try:
+            await ws.close()
+        except Exception:
+            pass
+
+
+async def _send(ws: WebSocket, data: bytes) -> None:
+    """Best-effort send — silently swallow errors if the socket is already gone."""
+    try:
+        await ws.send_bytes(data)
+    except Exception:
+        pass
--- a/claude-sandbox/truenas-burnin/app/truenas.py
+++ b/claude-sandbox/truenas-burnin/app/truenas.py
@ -65,7 +65,13 @@ class TrueNASClient:
            "get_disks",
        )
        r.raise_for_status()
-        return r.json()
+        disks = r.json()
+        # Filter out expired records — TrueNAS keeps historical entries for removed
+        # disks with expiretime set. Only return currently-present drives.
+        active = [d for d in disks if not d.get("expiretime")]
+        if len(active) < len(disks):
+            log.debug("get_disks: filtered %d expired record(s)", len(disks) - len(active))
+        return active

    async def get_smart_jobs(self, state: str | None = None) -> list[dict]:
        params: dict = {"method": "smart.test"}
@ -110,3 +116,49 @@ class TrueNASClient:
        )
        r.raise_for_status()
        return r.json()
+
+    async def get_disk_temperatures(self) -> dict[str, float | None]:
+        """
+        Returns {devname: celsius | None}.
+        Uses POST /api/v2.0/disk/temperatures — available on TrueNAS SCALE 25.10+.
+        CORE compatibility: raises on 404/405, caller should catch and skip.
+        """
+        r = await _with_retry(
+            lambda: self._client.post("/api/v2.0/disk/temperatures", json={}),
+            "get_disk_temperatures",
+        )
+        r.raise_for_status()
+        return r.json()
+
+    async def wipe_disk(self, devname: str, mode: str = "FULL") -> int:
+        """
+        Start a disk wipe job. Not retried — duplicate starts would launch a second wipe.
+        mode: "QUICK" (wipe MBR/partitions only), "FULL" (write zeros), "FULL_RANDOM" (write random)
+        devname: basename only, e.g. "ada0" (not "/dev/ada0")
+        Returns the TrueNAS job ID.
+        """
+        r = await self._client.post(
+            "/api/v2.0/disk/wipe",
+            json={"dev": devname, "mode": mode},
+        )
+        r.raise_for_status()
+        return r.json()
+
+    async def get_job(self, job_id: int) -> dict | None:
+        """
+        Fetch a single TrueNAS job by ID.
+        Returns the job dict, or None if not found.
+        """
+        import json as _json
+        r = await _with_retry(
+            lambda: self._client.get(
+                "/api/v2.0/core/get_jobs",
+                params={"filters": _json.dumps([["id", "=", job_id]])},
+            ),
+            f"get_job({job_id})",
+        )
+        r.raise_for_status()
+        jobs = r.json()
+        if isinstance(jobs, list) and jobs:
+            return jobs[0]
+        return None
--- a/claude-sandbox/truenas-burnin/docker-compose.yml
+++ b/claude-sandbox/truenas-burnin/docker-compose.yml
@ -1,10 +1,13 @@
 services:
-  mock-truenas:
-    build: ./mock-truenas
-    container_name: mock-truenas
-    ports:
-      - "8000:8000"
-    restart: unless-stopped
+  # mock-truenas is kept for local dev — not started in production
+  # To use mock mode: docker compose --profile mock up
+  # mock-truenas:
+  #   build: ./mock-truenas
+  #   container_name: mock-truenas
+  #   ports:
+  #     - "8000:8000"
+  #   profiles: [mock]
+  #   restart: unless-stopped

  app:
    build: .
@ -16,6 +19,5 @@ services:
      - ./data:/data
      - ./app/templates:/opt/app/app/templates
      - ./app/static:/opt/app/app/static
-    depends_on:
-      - mock-truenas
+      - /home/brandon/.ssh/id_ed25519:/run/secrets/ssh_key:ro
    restart: unless-stopped
--- a/claude-sandbox/truenas-burnin/requirements.txt
+++ b/claude-sandbox/truenas-burnin/requirements.txt
@ -1,7 +1,8 @@
 fastapi
-uvicorn
+uvicorn[standard]
 aiosqlite
 httpx
 pydantic-settings
 jinja2
 sse-starlette
+asyncssh