docs: update CLAUDE.md and SPEC.md for Stage 8 (live terminal)

Documents WebSocket terminal architecture, xterm.js lazy loading,
message protocol, tab lifecycle, and reconnect behavior.

SPEC.md: updated drawer tabs (4 tabs including Terminal), added WS
endpoint, corrected bad block threshold default (0, not 2), version
bumped to 1.0.0-8.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Brandon Walter 2026-02-24 11:16:29 -05:00
parent 5a802bff2e
commit 645d55cfcc
2 changed files with 67 additions and 12 deletions

View file

@ -1,7 +1,7 @@
# TrueNAS Burn-In Dashboard — Project Context # TrueNAS Burn-In Dashboard — Project Context
> Drop this file in any new Claude session to resume work with full context. > Drop this file in any new Claude session to resume work with full context.
> Last updated: 2026-02-24 (Stage 7) > Last updated: 2026-02-24 (Stage 8)
--- ---
@ -29,6 +29,7 @@ against a TrueNAS CORE instance. Deployed on **maple.local** (10.0.0.138).
| 6c | Settings overhaul (editable form, runtime store, SMTP fix, stage selection) | ✅ | | 6c | Settings overhaul (editable form, runtime store, SMTP fix, stage selection) | ✅ |
| 6d | Cancel SMART tests, Cancel All burn-ins, drag-to-reorder stages in modals | ✅ | | 6d | Cancel SMART tests, Cancel All burn-ins, drag-to-reorder stages in modals | ✅ |
| 7 | SSH burn-in execution, SMART attr monitoring, drive reset, version badge, stats polish | ✅ | | 7 | SSH burn-in execution, SMART attr monitoring, drive reset, version badge, stats polish | ✅ |
| 8 | Live SSH terminal in drawer (xterm.js + asyncssh WebSocket PTY bridge) | ✅ |
--- ---
@ -53,6 +54,7 @@ truenas-burnin/
├── models.py # Pydantic v2 models; StartBurninRequest has run_surface/run_short/run_long + profile property ├── models.py # Pydantic v2 models; StartBurninRequest has run_surface/run_short/run_long + profile property
├── settings_store.py # runtime settings store — persists to /data/settings_overrides.json ├── settings_store.py # runtime settings store — persists to /data/settings_overrides.json
├── ssh_client.py # asyncssh client: smartctl parsing, badblocks streaming, test_connection ├── ssh_client.py # asyncssh client: smartctl parsing, badblocks streaming, test_connection
├── terminal.py # WebSocket ↔ asyncssh PTY bridge for live terminal tab
├── truenas.py # httpx async client with retry (lambda factory pattern) ├── truenas.py # httpx async client with retry (lambda factory pattern)
├── poller.py # poll loop, SSE pub/sub, stale detection, stuck-job check ├── poller.py # poll loop, SSE pub/sub, stale detection, stuck-job check
├── burnin.py # orchestrator, semaphore, stages, check_stuck_jobs() ├── burnin.py # orchestrator, semaphore, stages, check_stuck_jobs()
@ -69,7 +71,7 @@ truenas-burnin/
└── templates/ └── templates/
├── layout.html # header nav: History, Stats, Audit, Settings, bell button ├── layout.html # header nav: History, Stats, Audit, Settings, bell button
├── dashboard.html # stats bar, failed banner, batch bar ├── dashboard.html # stats bar, failed banner, batch bar, log drawer (4 tabs: Burn-In/SMART/Events/Terminal)
├── history.html ├── history.html
├── job_detail.html # + Print/Export button ├── job_detail.html # + Print/Export button
├── audit.html # audit event log ├── audit.html # audit event log
@ -326,6 +328,7 @@ async def burnin_get(job_id: int, ...): ...
| SSL mode missing EHLO | `smtplib.SMTP_SSL` was created without calling `ehlo()` | Added `server.ehlo()` after both SSL and STARTTLS connections | | SSL mode missing EHLO | `smtplib.SMTP_SSL` was created without calling `ehlo()` | Added `server.ehlo()` after both SSL and STARTTLS connections |
| `profile` NameError in `_execute_stages` | `_execute_stages` called `_recalculate_progress(job_id, profile)` but `profile` not in scope | Changed to `_recalculate_progress(job_id)` — profile param was unused | | `profile` NameError in `_execute_stages` | `_execute_stages` called `_recalculate_progress(job_id, profile)` but `profile` not in scope | Changed to `_recalculate_progress(job_id)` — profile param was unused |
| `app_version` Jinja2 global rendered as function | Set `templates.env.globals["app_version"] = _get_app_version` (callable) | Set to the static string value directly: `= _settings.app_version` | | `app_version` Jinja2 global rendered as function | Set `templates.env.globals["app_version"] = _get_app_version` (callable) | Set to the static string value directly: `= _settings.app_version` |
| All buttons broken (Short/Long/Burn-In/Cancel) | `stages.forEach(function(s){` in `_drawerRenderBurnin` missing closing `});` — JS syntax error prevented entire IIFE from loading | Added missing `});` before `} else {` |
--- ---
@ -414,6 +417,57 @@ if temp >= settings.temp_warn_c: return "temp-warn"
Surface validate fails if `bad_blocks > settings.bad_block_threshold` (default 0 = any bad sector = fail). Surface validate fails if `bad_blocks > settings.bad_block_threshold` (default 0 = any bad sector = fail).
## Feature Reference (Stage 8)
### Live Terminal
A full PTY SSH terminal embedded in the log drawer as a fourth tab ("Terminal"). Requires SSH to be configured in Settings.
**Architecture:**
```
Browser (xterm.js) ──WS binary──▶ /ws/terminal (FastAPI WebSocket)
terminal.py handle()
asyncssh.connect() → create_process(term_type="xterm-256color")
asyncio tasks: ssh_to_ws() + ws_to_ssh()
```
**Message protocol** (client ↔ server):
- Client → server **binary**: raw keyboard input bytes forwarded to SSH stdin
- Client → server **text**: JSON control message — only `{"type":"resize","cols":N,"rows":N}` used currently
- Server → client **binary**: raw terminal output bytes from SSH stdout
**`app/terminal.py`** — `handle(ws)`:
1. Guard: `ssh_host` must be set; key or password must be present
2. `asyncssh.connect(known_hosts=None)` with key loaded via `import_private_key()` (never written to disk)
3. `conn.create_process(term_type="xterm-256color", term_size=(80,24), encoding=None)` — opens shell PTY
4. Two asyncio tasks bridging the streams; `asyncio.wait(FIRST_COMPLETED)` + cancel pending on disconnect
5. ANSI-formatted status messages for connect/error states
**Frontend (app.js):**
- xterm.js 5.3.0 + xterm-addon-fit 0.8.0 loaded **lazily** on first Terminal tab click (CDN, ~300KB — not loaded until needed)
- `_termInit()` creates Terminal + FitAddon, opens into the panel div, registers `onData` once
- `ResizeObserver` on the panel → `fit()` + sends `resize` JSON to server
- `_termConnect()` called on init and by Reconnect button — guards against double-connect with `readyState <= 1` check
- `onData` always writes to current `_termWs` by reference — multiple reconnects don't add duplicate handlers
- Reconnect bar floats over terminal on `ws.onclose`; removed on `ws.onopen`
**Tab lifecycle:**
- Terminal tab click → `openTerminalTab()`: loads libs → `_termInit()``_termConnect()` on first open; just refits on subsequent opens
- Autoscroll label hidden when terminal tab is active (not applicable)
- WebSocket stays alive when drawer closes — shell persists until page unload or explicit disconnect
**New route:**
| Method | Path | Description |
|--------|------|-------------|
| `WS` | `/ws/terminal` | asyncssh PTY bridge |
**Config used:** `ssh_host`, `ssh_port`, `ssh_user`, `ssh_key`, `ssh_password` — same SSH settings as burn-in stages.
**xterm.js theme:** GitHub Dark color palette (matches app dark theme). `scrollback: 2000`. Font: SF Mono / Fira Code / Consolas.
### Cutting to Real TrueNAS (Next Steps) ### Cutting to Real TrueNAS (Next Steps)
When ready to test against a real TrueNAS CORE box: When ready to test against a real TrueNAS CORE box:

19
SPEC.md
View file

@ -1,6 +1,6 @@
# TrueNAS Burn-In — Project Specification # TrueNAS Burn-In — Project Specification
**Version:** 0.5.0 **Version:** 1.0.0-8
**Status:** Active Development **Status:** Active Development
**Audience:** Public / Open Source **Audience:** Public / Open Source
@ -49,7 +49,7 @@ badblocks -wsv -b 4096 -p 1 /dev/sdX
``` ```
This is a **destructive write test**. The UI must display a prominent warning before this stage begins, and again in the Settings page where the behavior is documented. The `-w` flag overwrites all data on the drive. This is intentional — these are new drives being validated before pool use. This is a **destructive write test**. The UI must display a prominent warning before this stage begins, and again in the Settings page where the behavior is documented. The `-w` flag overwrites all data on the drive. This is intentional — these are new drives being validated before pool use.
**Failure threshold:** 2 or more bad blocks found triggers immediate abort and FAILED status. The threshold should be configurable in Settings (default: 2). **Failure threshold:** Any bad blocks found triggers immediate abort and FAILED status by default. The threshold is configurable in Settings (`Bad Block Threshold`, default: 0 — meaning any bad sector = fail).
--- ---
@ -97,10 +97,11 @@ A **Reset** action clears the test state for a drive so it can be re-queued. It
Slides up from the bottom of the page when a drive row is clicked. Does not navigate away — the table remains visible and scrollable above. Slides up from the bottom of the page when a drive row is clicked. Does not navigate away — the table remains visible and scrollable above.
Three tabs: Four tabs:
- **badblocks** — live tail of badblocks stdout, including error lines with sector numbers highlighted in red. - **Burn-In** — stage-by-stage progress for the latest burn-in job; shows live elapsed time, raw SSH log output (smartctl / badblocks), and bad block count.
- **SMART** — output of the last smartctl run for this drive, with monitored attribute values highlighted. - **SMART** — output of the last smartctl run for this drive, with monitored attribute values highlighted (green/yellow/red). Raw `smartctl -a` output also shown when SSH mode is active.
- **Events** — chronological timeline of everything that happened to this drive (test started, test passed, failure detected, alert sent, etc.). - **Events** — chronological timeline of everything that happened to this drive (test started, test passed, failure detected, alert sent, reset, etc.).
- **Terminal** — live SSH PTY session (xterm.js). Opens an interactive shell on the TrueNAS host. Requires SSH to be configured in Settings. Supports full colour, resize, paste, and reconnect. xterm.js is loaded lazily on first use.
Features: Features:
- Auto-scroll toggle (on by default). - Auto-scroll toggle (on by default).
@ -233,6 +234,7 @@ Key endpoints:
- `POST /api/v1/burnin/start` — start a burn-in job. - `POST /api/v1/burnin/start` — start a burn-in job.
- `POST /api/v1/burnin/{job_id}/cancel` — cancel a burn-in job. - `POST /api/v1/burnin/{job_id}/cancel` — cancel a burn-in job.
- `GET /sse/drives` — Server-Sent Events stream powering the real-time dashboard UI. - `GET /sse/drives` — Server-Sent Events stream powering the real-time dashboard UI.
- `WS /ws/terminal` — WebSocket endpoint bridging xterm.js to an asyncssh PTY on TrueNAS.
- `GET /health` — health check endpoint. - `GET /health` — health check endpoint.
The API makes this app a strong candidate for MCP server integration, allowing an AI assistant to query drive status, start tests, or receive alerts conversationally. The API makes this app a strong candidate for MCP server integration, allowing an AI assistant to query drive status, start tests, or receive alerts conversationally.
@ -280,9 +282,8 @@ To validate against real hardware:
## Version ## Version
- App version starts at **0.5.0** - App version: **1.0.0-8** (displayed in header next to the title, and in Settings).
- Displayed on the dashboard landing page header and in Settings. - Update check in Settings queries Forgejo releases API (`git.hellocomputer.xyz`).
- Update check in Settings queries GitHub releases API.
- API version tracked separately, currently **0.1.0**. - API version tracked separately, currently **0.1.0**.
--- ---