Compare commits
No commits in common. "main" and "v1.0.0-41" have entirely different histories.
18 changed files with 109 additions and 1388 deletions
101
README.md
101
README.md
|
|
@ -83,12 +83,11 @@ runtime roughly in half at ~2× RAM cost — matches the upstream
|
||||||
|
|
||||||
### Watch out
|
### Watch out
|
||||||
|
|
||||||
- **Stuck-job timeout** — `stuck_job_hours` (default 168 = 7 days)
|
- **Stuck-job timeout** — `stuck_job_hours` (default 24) marks any job
|
||||||
marks any job past that threshold as `unknown` and kills the remote
|
past that threshold as `unknown` and kills the remote process. If
|
||||||
process. The default covers `-w` surface_validate on 14 TB+ HDDs with
|
you're burning in 14 TB drives with default block size, raise this to
|
||||||
margin. If you're running short SSDs and want faster detection of
|
**48** in Settings before starting, or you'll get false positives near
|
||||||
genuinely stuck jobs, drop it. (Earlier versions defaulted to 24h
|
the end of surface_validate.
|
||||||
which false-positived on multi-TB drives.)
|
|
||||||
- **Thermal gate** — if drives currently under burn-in hit the
|
- **Thermal gate** — if drives currently under burn-in hit the
|
||||||
temperature warning threshold, new jobs wait up to 3 minutes before
|
temperature warning threshold, new jobs wait up to 3 minutes before
|
||||||
acquiring a slot. Increase `temp_warn_c` if your chassis runs hot but
|
acquiring a slot. Increase `temp_warn_c` if your chassis runs hot but
|
||||||
|
|
@ -106,91 +105,6 @@ Click the red ✕ next to a running job. The orchestrator:
|
||||||
Cancellations are durable — restart the container and queued jobs resume,
|
Cancellations are durable — restart the container and queued jobs resume,
|
||||||
cancelled jobs stay cancelled.
|
cancelled jobs stay cancelled.
|
||||||
|
|
||||||
### Job states explained
|
|
||||||
|
|
||||||
| State | When it's set |
|
|
||||||
|-------------|-------------------------------------------------------------------------------|
|
|
||||||
| `queued` | Submitted, waiting for a `max_parallel_burnins` slot |
|
|
||||||
| `running` | Actively executing some stage |
|
|
||||||
| `passed` | All stages finished green |
|
|
||||||
| `failed` | A stage failed deterministically (bad blocks > threshold, SMART failure, etc.) |
|
|
||||||
| `cancelled` | Operator clicked ✕ |
|
|
||||||
| `unknown` | Job was alive but its outcome is indeterminate — see below |
|
|
||||||
|
|
||||||
`unknown` fires in two situations:
|
|
||||||
|
|
||||||
1. The stuck-job detector (`stuck_job_hours`, default 7 days) trips because
|
|
||||||
the job has been running too long without finishing.
|
|
||||||
2. The asyncio task got cancelled mid-stage by something *other* than an
|
|
||||||
operator click — usually a container restart (`docker compose up -d`,
|
|
||||||
`--build`, or the host rebooting). Burn-in source code goes through
|
|
||||||
the Dockerfile `COPY`, so any source-code deploy recreates the
|
|
||||||
container, drops the SSH connection to TrueNAS, and would orphan the
|
|
||||||
running burn-in. Avoid `--build` while burn-ins are active.
|
|
||||||
|
|
||||||
When `unknown` fires the drawer's per-stage Reason block shows
|
|
||||||
*"Task cancelled mid-run — likely container restart or shutdown"* so the
|
|
||||||
classification is explicit, not silent.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Drive drawer
|
|
||||||
|
|
||||||
Click any drive row to slide a detail drawer down from the top. Three tabs:
|
|
||||||
|
|
||||||
- **Burn-In** — per-stage breakdown of the latest job
|
|
||||||
- **SMART** — short/long test states + cached SMART attributes
|
|
||||||
- **Events** — last 50 audit events for the drive
|
|
||||||
|
|
||||||
### Surface-validate visualization
|
|
||||||
|
|
||||||
For drives in a `surface_validate` stage (running or finished), the Burn-In
|
|
||||||
tab renders:
|
|
||||||
|
|
||||||
1. **Vital-signs strip** — `Start` (with date) · `Elapsed` · `ETA` (duration
|
|
||||||
remaining) · `Finish` (wall-clock estimate, browser-local timezone) ·
|
|
||||||
`Temp` (cool/warm/hot colour). Computed from data in the drawer payload;
|
|
||||||
ETA + Finish suppressed below 0.5% so you don't see a "Finish: Jun 22"
|
|
||||||
stutter at the very start.
|
|
||||||
2. **Four pattern meters** — `0xaa` / `0x55` / `0xff` / `0x00`. Each meter
|
|
||||||
is split into a left half (write phase, blue) and a right half (verify
|
|
||||||
phase, green). Current pattern's label glows blue; completed patterns'
|
|
||||||
labels go green. This translates badblocks's per-phase percent into
|
|
||||||
monotonic 0-99% overall progress, so the bar never appears to "rewind"
|
|
||||||
when a new phase starts.
|
|
||||||
3. **Phase caption** — explicit text: *"Pattern 2 of 4 · Verify 0x55 · 47%
|
|
||||||
within phase"*. Makes the visual grammar unambiguous.
|
|
||||||
4. **Completed-pattern history** — once pattern 1 finishes, a chip appears
|
|
||||||
showing `0xaa: 14h 22m`. Lets you predict the rest of the run from the
|
|
||||||
first pattern's elapsed time.
|
|
||||||
|
|
||||||
### Failure reason block
|
|
||||||
|
|
||||||
Stages that ended `failed` / `cancelled` / `unknown` show a coloured Reason
|
|
||||||
pill at the top of the stage section. Sources, in order of preference:
|
|
||||||
|
|
||||||
1. The stage's own `error_text`
|
|
||||||
2. The parent job's `error_text` (backfilled by the drawer when the stage's
|
|
||||||
own is empty — catches orphan rows from hard crashes)
|
|
||||||
3. A heuristic: if the log is tiny and no real progress was recorded,
|
|
||||||
*"Stopped without recording an error — likely cause: SSH connection drop
|
|
||||||
or container restart while this stage was running"*
|
|
||||||
|
|
||||||
Otherwise: *"No error message recorded."* — there's never a blank where you
|
|
||||||
expect to see why something broke.
|
|
||||||
|
|
||||||
### Column sorting
|
|
||||||
|
|
||||||
Click any column header (Drive, Serial, Size, Temp, Health, Short SMART,
|
|
||||||
Long SMART, Burn-In) to sort. Cycle: ascending → descending → cleared. Sort
|
|
||||||
state persists in `localStorage` so it survives page reload AND every
|
|
||||||
SSE-driven tbody refresh (~12 s poll cycle). Empty values always sink to
|
|
||||||
the bottom regardless of direction.
|
|
||||||
|
|
||||||
Sortable values are emitted as `data-sort-*` attributes on each `<tr>`,
|
|
||||||
with numeric priority maps for SMART states (e.g. `running` always sorts
|
|
||||||
ahead of `idle`).
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Drive locks
|
## Drive locks
|
||||||
|
|
@ -230,8 +144,7 @@ All settings live under `/settings` (header link). Key knobs:
|
||||||
- **`surface_validate_block_size` / `_block_buffer` / `_passes`** —
|
- **`surface_validate_block_size` / `_block_buffer` / `_passes`** —
|
||||||
badblocks `-b` / `-c` / `-p`. Defaults preserve original behaviour;
|
badblocks `-b` / `-c` / `-p`. Defaults preserve original behaviour;
|
||||||
tune for speed vs paranoia.
|
tune for speed vs paranoia.
|
||||||
- **`stuck_job_hours`** (default 168 = 7 days) — covers 14 TB+ HDDs;
|
- **`stuck_job_hours`** (default 24) — raise for big drives.
|
||||||
drop for faster detection on small fast drives.
|
|
||||||
- **`temp_warn_c` / `temp_crit_c`** — thermal gating thresholds.
|
- **`temp_warn_c` / `temp_crit_c`** — thermal gating thresholds.
|
||||||
- **`bad_block_threshold`** (default 0) — number of bad blocks
|
- **`bad_block_threshold`** (default 0) — number of bad blocks
|
||||||
surface_validate tolerates before failing the stage.
|
surface_validate tolerates before failing the stage.
|
||||||
|
|
@ -346,7 +259,7 @@ pinned version after the fact.
|
||||||
- `CLAUDE.md` — full architecture, file map, deploy workflow, and the
|
- `CLAUDE.md` — full architecture, file map, deploy workflow, and the
|
||||||
rationale behind every non-obvious design decision.
|
rationale behind every non-obvious design decision.
|
||||||
- `SPEC.md` — canonical feature reference per version.
|
- `SPEC.md` — canonical feature reference per version.
|
||||||
- `tests/` — `python -m unittest discover tests/` (65 tests, stdlib-only). Or run inside the deployed container with `scripts/run-tests.sh`.
|
- `tests/` — `python -m unittest discover tests/` (44 tests, stdlib-only).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -72,14 +72,6 @@ class User:
|
||||||
is_admin: bool
|
is_admin: bool
|
||||||
|
|
||||||
|
|
||||||
def LoopbackUser(username: str = "monitor", full_name: str = "Autonomous Monitor") -> User:
|
|
||||||
"""Synthetic admin used by the loopback bypass in _AuthGateMiddleware.
|
|
||||||
id=0 (no real DB row) and is_admin=True so admin-gated routes work.
|
|
||||||
Only reachable when request.client.host is 127.0.0.1 / ::1 —
|
|
||||||
a process inside the container's network namespace (docker exec)."""
|
|
||||||
return User(id=0, username=username, full_name=full_name, is_admin=True)
|
|
||||||
|
|
||||||
|
|
||||||
def _now() -> str:
|
def _now() -> str:
|
||||||
return datetime.now(timezone.utc).isoformat()
|
return datetime.now(timezone.utc).isoformat()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -93,7 +93,6 @@ async def init(client: TrueNASClient) -> None:
|
||||||
async with _db() as db:
|
async with _db() as db:
|
||||||
db.row_factory = aiosqlite.Row
|
db.row_factory = aiosqlite.Row
|
||||||
await db.execute("PRAGMA journal_mode=WAL")
|
await db.execute("PRAGMA journal_mode=WAL")
|
||||||
await db.execute("PRAGMA busy_timeout=60000")
|
|
||||||
await db.execute("PRAGMA foreign_keys=ON")
|
await db.execute("PRAGMA foreign_keys=ON")
|
||||||
|
|
||||||
# Mark interrupted running jobs as unknown
|
# Mark interrupted running jobs as unknown
|
||||||
|
|
@ -162,7 +161,6 @@ async def start_job(drive_id: int, profile: str, operator: str,
|
||||||
async with _db() as db:
|
async with _db() as db:
|
||||||
db.row_factory = aiosqlite.Row
|
db.row_factory = aiosqlite.Row
|
||||||
await db.execute("PRAGMA journal_mode=WAL")
|
await db.execute("PRAGMA journal_mode=WAL")
|
||||||
await db.execute("PRAGMA busy_timeout=60000")
|
|
||||||
await db.execute("PRAGMA foreign_keys=ON")
|
await db.execute("PRAGMA foreign_keys=ON")
|
||||||
|
|
||||||
# Reject duplicate active burn-in for same drive
|
# Reject duplicate active burn-in for same drive
|
||||||
|
|
@ -263,7 +261,6 @@ async def cancel_job(job_id: int, operator: str) -> bool:
|
||||||
async with _db() as db:
|
async with _db() as db:
|
||||||
db.row_factory = aiosqlite.Row
|
db.row_factory = aiosqlite.Row
|
||||||
await db.execute("PRAGMA journal_mode=WAL")
|
await db.execute("PRAGMA journal_mode=WAL")
|
||||||
await db.execute("PRAGMA busy_timeout=60000")
|
|
||||||
|
|
||||||
cur = await db.execute(
|
cur = await db.execute(
|
||||||
"SELECT state, drive_id FROM burnin_jobs WHERE id=?", (job_id,)
|
"SELECT state, drive_id FROM burnin_jobs WHERE id=?", (job_id,)
|
||||||
|
|
@ -348,7 +345,6 @@ async def _run_job(job_id: int) -> None:
|
||||||
# Transition queued → running
|
# Transition queued → running
|
||||||
async with _db() as db:
|
async with _db() as db:
|
||||||
await db.execute("PRAGMA journal_mode=WAL")
|
await db.execute("PRAGMA journal_mode=WAL")
|
||||||
await db.execute("PRAGMA busy_timeout=60000")
|
|
||||||
row = await (await db.execute(
|
row = await (await db.execute(
|
||||||
"SELECT drive_id, profile FROM burnin_jobs WHERE id=?", (job_id,)
|
"SELECT drive_id, profile FROM burnin_jobs WHERE id=?", (job_id,)
|
||||||
)).fetchone()
|
)).fetchone()
|
||||||
|
|
@ -415,33 +411,11 @@ async def _run_job(job_id: int) -> None:
|
||||||
final_state = "unknown"
|
final_state = "unknown"
|
||||||
else:
|
else:
|
||||||
final_state = "passed" if success else "failed"
|
final_state = "passed" if success else "failed"
|
||||||
# If the asyncio task was cancelled mid-stage (container shutdown,
|
|
||||||
# uvicorn reload, etc.), CancelledError propagates past
|
|
||||||
# _execute_stages, so any running stage row is still marked
|
|
||||||
# 'running' in the DB. Reconcile here: mark every still-running
|
|
||||||
# stage on this job as 'unknown' with the parent's finished_at,
|
|
||||||
# and stamp a default error_text so the drawer's Reason block has
|
|
||||||
# something concrete to show. Use a write that's idempotent under
|
|
||||||
# repeat (only touches rows still 'running').
|
|
||||||
cancel_err = (
|
|
||||||
"Task cancelled mid-run — likely container restart or shutdown"
|
|
||||||
if was_cancelled else None
|
|
||||||
)
|
|
||||||
async with _db() as db:
|
async with _db() as db:
|
||||||
await db.execute("PRAGMA journal_mode=WAL")
|
await db.execute("PRAGMA journal_mode=WAL")
|
||||||
await db.execute("PRAGMA busy_timeout=60000")
|
|
||||||
await db.execute(
|
await db.execute(
|
||||||
"UPDATE burnin_jobs SET state=?, percent=?, finished_at=?, error_text=? WHERE id=?",
|
"UPDATE burnin_jobs SET state=?, percent=?, finished_at=?, error_text=? WHERE id=?",
|
||||||
(final_state, 100 if success else None, _now(),
|
(final_state, 100 if success else None, _now(), error_text, job_id),
|
||||||
error_text or cancel_err, job_id),
|
|
||||||
)
|
|
||||||
if was_cancelled:
|
|
||||||
await db.execute(
|
|
||||||
"""UPDATE burnin_stages
|
|
||||||
SET state='unknown', finished_at=?,
|
|
||||||
error_text=COALESCE(error_text, ?)
|
|
||||||
WHERE burnin_job_id=? AND state='running'""",
|
|
||||||
(_now(), cancel_err, job_id),
|
|
||||||
)
|
)
|
||||||
await db.execute(
|
await db.execute(
|
||||||
"""INSERT INTO audit_events (event_type, drive_id, burnin_job_id, operator, message)
|
"""INSERT INTO audit_events (event_type, drive_id, burnin_job_id, operator, message)
|
||||||
|
|
@ -568,7 +542,6 @@ async def check_stuck_jobs() -> None:
|
||||||
async with _db() as db:
|
async with _db() as db:
|
||||||
db.row_factory = aiosqlite.Row
|
db.row_factory = aiosqlite.Row
|
||||||
await db.execute("PRAGMA journal_mode=WAL")
|
await db.execute("PRAGMA journal_mode=WAL")
|
||||||
await db.execute("PRAGMA busy_timeout=60000")
|
|
||||||
|
|
||||||
cur = await db.execute("""
|
cur = await db.execute("""
|
||||||
SELECT bj.id, bj.drive_id, d.devname, bj.started_at
|
SELECT bj.id, bj.drive_id, d.devname, bj.started_at
|
||||||
|
|
|
||||||
|
|
@ -77,13 +77,9 @@ def _now() -> str:
|
||||||
@asynccontextmanager
|
@asynccontextmanager
|
||||||
async def _db():
|
async def _db():
|
||||||
"""Open a WAL-mode connection with busy_timeout so writers wait for the lock
|
"""Open a WAL-mode connection with busy_timeout so writers wait for the lock
|
||||||
instead of immediately raising 'database is locked' under contention.
|
instead of immediately raising 'database is locked' under contention."""
|
||||||
|
|
||||||
60s timeout is intentionally generous: with 4 concurrent burn-in drains
|
|
||||||
+ the poller + retention + auth all writing, brief contention spikes
|
|
||||||
are normal and waiting is the right behavior. 10s was too tight."""
|
|
||||||
async with aiosqlite.connect(settings.db_path) as db:
|
async with aiosqlite.connect(settings.db_path) as db:
|
||||||
await db.execute("PRAGMA busy_timeout=60000")
|
await db.execute("PRAGMA busy_timeout=10000")
|
||||||
yield db
|
yield db
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -194,72 +190,6 @@ async def _update_stage_bad_blocks(job_id: int, stage_name: str, count: int) ->
|
||||||
await db.commit()
|
await db.commit()
|
||||||
|
|
||||||
|
|
||||||
async def _update_stage_bb_phase(
|
|
||||||
job_id: int, stage_name: str, phase: int, phase_pct: float,
|
|
||||||
) -> None:
|
|
||||||
"""Persist per-pattern badblocks progress so the drive-drawer UI
|
|
||||||
can render 4 meters with separate write/verify halves."""
|
|
||||||
async with _db() as db:
|
|
||||||
await db.execute("PRAGMA journal_mode=WAL")
|
|
||||||
await db.execute(
|
|
||||||
"UPDATE burnin_stages SET bb_phase=?, bb_phase_pct=? "
|
|
||||||
"WHERE burnin_job_id=? AND stage_name=?",
|
|
||||||
(phase, phase_pct, job_id, stage_name),
|
|
||||||
)
|
|
||||||
await db.commit()
|
|
||||||
|
|
||||||
|
|
||||||
async def _update_stage_bb_mbps(
|
|
||||||
job_id: int, stage_name: str, mbps: float,
|
|
||||||
) -> None:
|
|
||||||
"""Persist live throughput for the surface_validate meter strip.
|
|
||||||
Computed from delta_overall_pct between successive badblocks
|
|
||||||
progress lines, scaled by drive size_bytes / 800 (8 phases × 100)."""
|
|
||||||
async with _db() as db:
|
|
||||||
await db.execute("PRAGMA journal_mode=WAL")
|
|
||||||
await db.execute(
|
|
||||||
"UPDATE burnin_stages SET bb_mbps=? "
|
|
||||||
"WHERE burnin_job_id=? AND stage_name=?",
|
|
||||||
(mbps, job_id, stage_name),
|
|
||||||
)
|
|
||||||
await db.commit()
|
|
||||||
|
|
||||||
|
|
||||||
async def _record_bb_phase_start(
|
|
||||||
job_id: int, stage_name: str, phase: int, ts: str,
|
|
||||||
) -> None:
|
|
||||||
"""Record the moment a phase first becomes current. Idempotent:
|
|
||||||
re-entry of the same phase keeps the original timestamp so a
|
|
||||||
transient parser reset doesn't blow away history.
|
|
||||||
|
|
||||||
Stored as a JSON object keyed by phase number (string). The
|
|
||||||
drawer reads it to compute per-pattern elapsed times.
|
|
||||||
"""
|
|
||||||
async with _db() as db:
|
|
||||||
await db.execute("PRAGMA journal_mode=WAL")
|
|
||||||
cur = await db.execute(
|
|
||||||
"SELECT bb_phase_history FROM burnin_stages "
|
|
||||||
"WHERE burnin_job_id=? AND stage_name=?",
|
|
||||||
(job_id, stage_name),
|
|
||||||
)
|
|
||||||
row = await cur.fetchone()
|
|
||||||
existing = {}
|
|
||||||
if row and row[0]:
|
|
||||||
try:
|
|
||||||
existing = json.loads(row[0])
|
|
||||||
except (json.JSONDecodeError, TypeError):
|
|
||||||
existing = {}
|
|
||||||
key = str(phase)
|
|
||||||
if key not in existing:
|
|
||||||
existing[key] = ts
|
|
||||||
await db.execute(
|
|
||||||
"UPDATE burnin_stages SET bb_phase_history=? "
|
|
||||||
"WHERE burnin_job_id=? AND stage_name=?",
|
|
||||||
(json.dumps(existing), job_id, stage_name),
|
|
||||||
)
|
|
||||||
await db.commit()
|
|
||||||
|
|
||||||
|
|
||||||
async def _store_smart_attrs(drive_id: int, attrs: dict) -> None:
|
async def _store_smart_attrs(drive_id: int, attrs: dict) -> None:
|
||||||
"""Persist latest SMART attribute dict to drives.smart_attrs (JSON)."""
|
"""Persist latest SMART attribute dict to drives.smart_attrs (JSON)."""
|
||||||
# Convert int keys to str for JSON serialisation
|
# Convert int keys to str for JSON serialisation
|
||||||
|
|
|
||||||
|
|
@ -25,110 +25,23 @@ class _BadblocksResult(TypedDict):
|
||||||
aborted: bool
|
aborted: bool
|
||||||
|
|
||||||
|
|
||||||
# `badblocks -w` cycles through 4 patterns (0xaa, 0x55, 0xff, 0x00),
|
|
||||||
# each with a write phase followed by a read-back/verify phase = 8 phases.
|
|
||||||
# Per-phase percent comes back via `XX% done`; without translation, the
|
|
||||||
# dashboard appears to "rewind" every ~2 hours when a new phase starts.
|
|
||||||
_BB_PATTERN_PHASE = {"0xaa": 1, "0x55": 3, "0xff": 5, "0x00": 7}
|
|
||||||
_BB_TOTAL_PHASES = 8
|
|
||||||
# Throttle DB writes from the badblocks parser. Each progress line used
|
|
||||||
# to trigger 4-6 transactions; with 4 concurrent burn-ins emitting sub-
|
|
||||||
# second progress lines, the asyncssh drain couldn't keep up — the
|
|
||||||
# stdout pipe on TrueNAS filled, badblocks blocked on pipe_write,
|
|
||||||
# disk I/O effectively stopped. 5 seconds is fine for the UI (drawer
|
|
||||||
# polls every ~12s anyway) and cuts DB load 60-80x.
|
|
||||||
BB_DB_MIN_SECONDS = 5.0
|
|
||||||
|
|
||||||
import re as _re_pre # noqa: E402
|
|
||||||
|
|
||||||
_BB_PATTERN_RE = _re_pre.compile(r"Testing with pattern\s+(0x[0-9a-fA-F]+)")
|
|
||||||
_BB_VERIFY_RE = _re_pre.compile(r"Reading and comparing")
|
|
||||||
_BB_PERCENT_RE = _re_pre.compile(r"([\d.]+)%\s+done")
|
|
||||||
|
|
||||||
|
|
||||||
class _BadblocksProgress:
|
|
||||||
"""Track which phase of `badblocks -w -p N` we're in so the
|
|
||||||
displayed percent maps to overall progress, not per-phase progress.
|
|
||||||
|
|
||||||
Pure state machine — no I/O. Feed it lines from the badblocks output
|
|
||||||
via :meth:`update`; read :attr:`overall_pct` after each call.
|
|
||||||
|
|
||||||
Behavior:
|
|
||||||
- Defaults to phase 1 (write 0xaa) before any header is seen.
|
|
||||||
- "Testing with pattern 0xXX" sets the phase to the write-phase index
|
|
||||||
for that pattern (1, 3, 5, or 7).
|
|
||||||
- "Reading and comparing" advances to the matching verify phase
|
|
||||||
(last_write_phase + 1).
|
|
||||||
- "XX% done" updates the in-phase percent.
|
|
||||||
- overall_pct = ((phase - 1) * 100 + phase_pct) / 8, clipped to 99
|
|
||||||
so we don't claim "100%" until the stage's success path explicitly
|
|
||||||
writes 100.
|
|
||||||
"""
|
|
||||||
|
|
||||||
__slots__ = ("phase", "phase_pct", "_last_write_phase")
|
|
||||||
|
|
||||||
def __init__(self) -> None:
|
|
||||||
self.phase: int = 1
|
|
||||||
self.phase_pct: float = 0.0
|
|
||||||
self._last_write_phase: int = 1
|
|
||||||
|
|
||||||
def update(self, line: str) -> None:
|
|
||||||
m = _BB_PATTERN_RE.search(line)
|
|
||||||
if m:
|
|
||||||
p = m.group(1).lower()
|
|
||||||
if p in _BB_PATTERN_PHASE:
|
|
||||||
self.phase = _BB_PATTERN_PHASE[p]
|
|
||||||
self._last_write_phase = self.phase
|
|
||||||
self.phase_pct = 0.0
|
|
||||||
return
|
|
||||||
if _BB_VERIFY_RE.search(line):
|
|
||||||
self.phase = self._last_write_phase + 1
|
|
||||||
self.phase_pct = 0.0
|
|
||||||
return
|
|
||||||
m = _BB_PERCENT_RE.search(line)
|
|
||||||
if m:
|
|
||||||
try:
|
|
||||||
self.phase_pct = float(m.group(1))
|
|
||||||
except ValueError:
|
|
||||||
pass
|
|
||||||
|
|
||||||
@property
|
|
||||||
def overall_pct(self) -> int:
|
|
||||||
total = (self.phase - 1) * 100.0 + self.phase_pct
|
|
||||||
return min(99, int(total / _BB_TOTAL_PHASES))
|
|
||||||
|
|
||||||
|
|
||||||
def _build_badblocks_cmd(devname: str) -> str:
|
def _build_badblocks_cmd(devname: str) -> str:
|
||||||
"""Construct the wrapped badblocks command for a given device.
|
"""Construct the wrapped badblocks command for a given device.
|
||||||
|
|
||||||
badblocks's progress output uses '\\b' backspace characters to
|
Wraps badblocks under `sh -c 'echo PID:$$; exec ...'` so we can
|
||||||
overwrite the previous "XX% done" line — there's no '\\n' between
|
capture the remote PID for out-of-band kill -9 (asyncssh's signal
|
||||||
updates until a phase transition. asyncssh's line-buffered reader
|
channel is ignored by sshd). Geometry (-b -c -p) is operator-tunable
|
||||||
needs a real '\\n' to yield a line, so we pipe the output through
|
via Settings → Burn-in; defaults match the Spearfoot disk-burnin.sh
|
||||||
`tr '\\b' '\\n'` at the shell level. After this, every progress
|
recommendation for large HDDs.
|
||||||
update is a normal newline-terminated line.
|
|
||||||
|
|
||||||
Inner shell does `echo PID:$$; exec badblocks ...` so $$ is the
|
|
||||||
badblocks PID after exec (needed for out-of-band kill -9; asyncssh's
|
|
||||||
signal channel is ignored by sshd). 2>&1 merges stderr into stdout
|
|
||||||
so tr sees the progress lines (badblocks emits them on stderr).
|
|
||||||
|
|
||||||
Geometry (-b -c -p) is operator-tunable via Settings → Burn-in;
|
|
||||||
defaults match the Spearfoot disk-burnin.sh recommendation.
|
|
||||||
"""
|
"""
|
||||||
inner = (
|
return (
|
||||||
f"echo PID:$$; exec badblocks "
|
f"sh -c 'echo PID:$$; exec badblocks "
|
||||||
f"-wsv "
|
f"-wsv "
|
||||||
f"-b {settings.surface_validate_block_size} "
|
f"-b {settings.surface_validate_block_size} "
|
||||||
f"-c {settings.surface_validate_block_buffer} "
|
f"-c {settings.surface_validate_block_buffer} "
|
||||||
f"-p {settings.surface_validate_passes} "
|
f"-p {settings.surface_validate_passes} "
|
||||||
f"/dev/{devname} 2>&1"
|
f"/dev/{devname}'"
|
||||||
)
|
)
|
||||||
# The outer pipeline lets tr translate \\b → \\n. stdbuf -oL forces
|
|
||||||
# tr's stdout to line-buffered mode; without it tr's stdout is
|
|
||||||
# block-buffered (4 KB chunks) when its destination is a pipe,
|
|
||||||
# which delays each progress line by ~6 minutes at our throughput.
|
|
||||||
return f"sh -c '{inner}' | stdbuf -oL tr '\\b' '\\n'"
|
|
||||||
|
|
||||||
from . import kill
|
from . import kill
|
||||||
from ._common import (
|
from ._common import (
|
||||||
|
|
@ -136,16 +49,12 @@ from ._common import (
|
||||||
_append_stage_log,
|
_append_stage_log,
|
||||||
_db,
|
_db,
|
||||||
_is_cancelled,
|
_is_cancelled,
|
||||||
_now,
|
|
||||||
_push_update,
|
_push_update,
|
||||||
_recalculate_progress,
|
_recalculate_progress,
|
||||||
_record_bb_phase_start,
|
|
||||||
_set_stage_error,
|
_set_stage_error,
|
||||||
_store_smart_attrs,
|
_store_smart_attrs,
|
||||||
_store_smart_raw_output,
|
_store_smart_raw_output,
|
||||||
_update_stage_bad_blocks,
|
_update_stage_bad_blocks,
|
||||||
_update_stage_bb_mbps,
|
|
||||||
_update_stage_bb_phase,
|
|
||||||
_update_stage_percent,
|
_update_stage_percent,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
@ -490,17 +399,6 @@ async def _stage_surface_validate_ssh(job_id: int, devname: str, drive_id: int)
|
||||||
"""Run badblocks over SSH, streaming output to stage log."""
|
"""Run badblocks over SSH, streaming output to stage log."""
|
||||||
from app import ssh_client
|
from app import ssh_client
|
||||||
|
|
||||||
# Pull drive size for the throughput calculation. Each badblocks
|
|
||||||
# phase covers the full disk once, so 1% overall progress = size/800
|
|
||||||
# bytes (8 phases × 100). NULL-safe: if size lookup fails we just
|
|
||||||
# skip the MB/s update.
|
|
||||||
drive_size_bytes: int | None = None
|
|
||||||
async with _db() as db:
|
|
||||||
cur = await db.execute("SELECT size_bytes FROM drives WHERE id=?", (drive_id,))
|
|
||||||
row = await cur.fetchone()
|
|
||||||
if row and row[0]:
|
|
||||||
drive_size_bytes = int(row[0])
|
|
||||||
|
|
||||||
await _append_stage_log(
|
await _append_stage_log(
|
||||||
job_id, "surface_validate",
|
job_id, "surface_validate",
|
||||||
f"[START] badblocks -wsv -b {settings.surface_validate_block_size} "
|
f"[START] badblocks -wsv -b {settings.surface_validate_block_size} "
|
||||||
|
|
@ -527,47 +425,17 @@ async def _stage_surface_validate_ssh(job_id: int, devname: str, drive_id: int)
|
||||||
#
|
#
|
||||||
cmd = _build_badblocks_cmd(devname)
|
cmd = _build_badblocks_cmd(devname)
|
||||||
async with conn.create_process(cmd) as proc:
|
async with conn.create_process(cmd) as proc:
|
||||||
|
import re as _re
|
||||||
|
|
||||||
pid_seen = False
|
pid_seen = False
|
||||||
progress = _BadblocksProgress()
|
|
||||||
|
|
||||||
# Throughput tracker — store (overall_pct, monotonic_ts)
|
|
||||||
# of the previous progress sample so we can compute MB/s
|
|
||||||
# from the delta on each new sample.
|
|
||||||
last_pct_sample: float = progress.overall_pct
|
|
||||||
last_db_write_ts: float = time.monotonic()
|
|
||||||
# Lines accumulated since last log flush. Flushed in the
|
|
||||||
# throttled DB-write window (see BB_DB_MIN_SECONDS).
|
|
||||||
pending_log_chunks: list[str] = []
|
|
||||||
|
|
||||||
# Seed bb_phase=1, bb_phase_pct=0 immediately so the
|
|
||||||
# drawer's per-pattern meters have something to render
|
|
||||||
# before badblocks emits its first "X% done" line. On a
|
|
||||||
# 14 TB drive that first line can be several minutes in,
|
|
||||||
# and a blank meter strip looks broken to the operator.
|
|
||||||
await _update_stage_bb_phase(
|
|
||||||
job_id, "surface_validate",
|
|
||||||
progress.phase, progress.phase_pct,
|
|
||||||
)
|
|
||||||
# Stamp phase 1 (write 0xaa) start so the drawer's
|
|
||||||
# duration history starts populating immediately.
|
|
||||||
await _record_bb_phase_start(
|
|
||||||
job_id, "surface_validate", progress.phase, _now(),
|
|
||||||
)
|
|
||||||
_push_update()
|
|
||||||
|
|
||||||
async def _drain(stream, is_stderr: bool):
|
async def _drain(stream, is_stderr: bool):
|
||||||
nonlocal bad_blocks_total, pid_seen, last_db_write_ts, last_pct_sample
|
nonlocal bad_blocks_total, pid_seen
|
||||||
# Line-based drain. The wrapped badblocks command
|
|
||||||
# pipes through `tr '\b' '\n'` at the shell level
|
|
||||||
# so every progress update is a real newline-
|
|
||||||
# terminated line by the time it reaches us.
|
|
||||||
async for raw in stream:
|
async for raw in stream:
|
||||||
line = raw if isinstance(raw, str) else raw.decode("utf-8", errors="replace")
|
line = raw if isinstance(raw, str) else raw.decode("utf-8", errors="replace")
|
||||||
if not line.strip():
|
|
||||||
continue
|
|
||||||
|
|
||||||
# First stdout line is "PID:<n>" from the
|
# First stdout line is "PID:<n>" from the wrapping shell.
|
||||||
# wrapping shell. Capture and skip.
|
# Capture it and don't append it to the user-visible log.
|
||||||
if not is_stderr and not pid_seen and line.startswith("PID:"):
|
if not is_stderr and not pid_seen and line.startswith("PID:"):
|
||||||
pid_seen = True
|
pid_seen = True
|
||||||
try:
|
try:
|
||||||
|
|
@ -580,86 +448,27 @@ async def _stage_surface_validate_ssh(job_id: int, devname: str, drive_id: int)
|
||||||
pass
|
pass
|
||||||
continue
|
continue
|
||||||
|
|
||||||
# Note: with the `tr` pipe, badblocks's stderr is
|
|
||||||
# merged into stdout (`2>&1`). is_stderr is now
|
|
||||||
# always False — we treat every non-PID line as
|
|
||||||
# potentially containing progress or bad-block
|
|
||||||
# output. The phase parser is idempotent on
|
|
||||||
# unrelated lines.
|
|
||||||
prev_phase = progress.phase
|
|
||||||
progress.update(line)
|
|
||||||
phase_changed = progress.phase != prev_phase
|
|
||||||
is_progress_line = bool(_BB_PERCENT_RE.search(line))
|
|
||||||
# Bare-number lines from badblocks are bad-block
|
|
||||||
# block numbers (one per line on stdout).
|
|
||||||
stripped = line.strip()
|
|
||||||
if stripped and stripped.isdigit() and not is_progress_line:
|
|
||||||
bad_blocks_total += 1
|
|
||||||
|
|
||||||
# Keep "XX% done" lines OUT of output_lines. Big
|
|
||||||
# volume + quadratic log_text concat.
|
|
||||||
if not is_progress_line:
|
|
||||||
output_lines.append(line)
|
output_lines.append(line)
|
||||||
|
|
||||||
# Single throttle gate covering EVERY DB touch.
|
if is_stderr:
|
||||||
# Cumulative DB load otherwise overwhelms the
|
m = _re.search(r"([\d.]+)%\s+done", line)
|
||||||
# asyncio loop → asyncssh drain falls behind →
|
if m:
|
||||||
# SSH window stops advancing → pipe fills →
|
pct = min(99, int(float(m.group(1))))
|
||||||
# badblocks blocks on pipe_write → disk I/O stops.
|
await _update_stage_percent(job_id, "surface_validate", pct)
|
||||||
now_ts = time.monotonic()
|
await _update_stage_bad_blocks(job_id, "surface_validate", bad_blocks_total)
|
||||||
time_since_last_db = now_ts - last_db_write_ts
|
|
||||||
should_write = phase_changed or time_since_last_db >= BB_DB_MIN_SECONDS
|
|
||||||
|
|
||||||
if should_write:
|
|
||||||
if await _is_cancelled(job_id):
|
|
||||||
await kill.kill_remote_process(job_id)
|
|
||||||
return
|
|
||||||
|
|
||||||
if phase_changed:
|
|
||||||
await _record_bb_phase_start(
|
|
||||||
job_id, "surface_validate",
|
|
||||||
progress.phase, _now(),
|
|
||||||
)
|
|
||||||
await _update_stage_percent(
|
|
||||||
job_id, "surface_validate", progress.overall_pct,
|
|
||||||
)
|
|
||||||
await _update_stage_bb_phase(
|
|
||||||
job_id, "surface_validate",
|
|
||||||
progress.phase, progress.phase_pct,
|
|
||||||
)
|
|
||||||
await _update_stage_bad_blocks(
|
|
||||||
job_id, "surface_validate", bad_blocks_total,
|
|
||||||
)
|
|
||||||
|
|
||||||
if (
|
|
||||||
drive_size_bytes
|
|
||||||
and not phase_changed
|
|
||||||
and progress.overall_pct > last_pct_sample
|
|
||||||
and time_since_last_db >= 1.0
|
|
||||||
):
|
|
||||||
d_pct = progress.overall_pct - last_pct_sample
|
|
||||||
bytes_done = (d_pct / 800.0) * drive_size_bytes
|
|
||||||
mbps = bytes_done / time_since_last_db / 1_000_000
|
|
||||||
await _update_stage_bb_mbps(
|
|
||||||
job_id, "surface_validate", mbps,
|
|
||||||
)
|
|
||||||
|
|
||||||
if pending_log_chunks:
|
|
||||||
chunk = "".join(pending_log_chunks)
|
|
||||||
pending_log_chunks.clear()
|
|
||||||
await _append_stage_log(
|
|
||||||
job_id, "surface_validate", chunk,
|
|
||||||
)
|
|
||||||
|
|
||||||
last_pct_sample = progress.overall_pct
|
|
||||||
last_db_write_ts = now_ts
|
|
||||||
await _recalculate_progress(job_id)
|
await _recalculate_progress(job_id)
|
||||||
_push_update()
|
_push_update()
|
||||||
|
else:
|
||||||
|
stripped = line.strip()
|
||||||
|
if stripped and stripped.isdigit():
|
||||||
|
bad_blocks_total += 1
|
||||||
|
|
||||||
if not is_progress_line:
|
# Append to DB log in chunks
|
||||||
pending_log_chunks.append(line)
|
if len(output_lines) % 20 == 0:
|
||||||
|
chunk = "".join(output_lines[-20:])
|
||||||
|
await _append_stage_log(job_id, "surface_validate", chunk)
|
||||||
|
|
||||||
# Abort on bad block threshold — immediate.
|
# Abort on bad block threshold
|
||||||
if bad_blocks_total > settings.bad_block_threshold:
|
if bad_blocks_total > settings.bad_block_threshold:
|
||||||
await kill.kill_remote_process(job_id)
|
await kill.kill_remote_process(job_id)
|
||||||
output_lines.append(
|
output_lines.append(
|
||||||
|
|
@ -668,9 +477,15 @@ async def _stage_surface_validate_ssh(job_id: int, devname: str, drive_id: int)
|
||||||
)
|
)
|
||||||
return
|
return
|
||||||
|
|
||||||
# Single stream now — the `2>&1` in _build_badblocks_cmd
|
if await _is_cancelled(job_id):
|
||||||
# merges stderr into stdout before the `tr` pipe.
|
await kill.kill_remote_process(job_id)
|
||||||
await _drain(proc.stdout, False)
|
return
|
||||||
|
|
||||||
|
await asyncio.gather(
|
||||||
|
_drain(proc.stdout, False),
|
||||||
|
_drain(proc.stderr, True),
|
||||||
|
return_exceptions=True,
|
||||||
|
)
|
||||||
# Bound proc.wait so a remote process that ignored our kill
|
# Bound proc.wait so a remote process that ignored our kill
|
||||||
# signal (or that we never managed to kill) can't pin this
|
# signal (or that we never managed to kill) can't pin this
|
||||||
# task in the semaphore forever. Closing the connection on
|
# task in the semaphore forever. Closing the connection on
|
||||||
|
|
@ -695,21 +510,7 @@ async def _stage_surface_validate_ssh(job_id: int, devname: str, drive_id: int)
|
||||||
result["aborted"] = bad_blocks_total > settings.bad_block_threshold
|
result["aborted"] = bad_blocks_total > settings.bad_block_threshold
|
||||||
|
|
||||||
except asyncio.CancelledError:
|
except asyncio.CancelledError:
|
||||||
# Best-effort kill of the remote badblocks process before
|
return False
|
||||||
# propagating the cancel. asyncio.shield() so the kill attempt
|
|
||||||
# itself isn't interrupted by ongoing loop shutdown. Then
|
|
||||||
# re-raise so _run_job marks the job 'unknown' (honest about
|
|
||||||
# the indeterminate outcome) instead of 'failed' (which
|
|
||||||
# implies the burn-in itself failed, which we don't know).
|
|
||||||
try:
|
|
||||||
await asyncio.shield(kill.kill_remote_process(job_id))
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
await _append_stage_log(
|
|
||||||
job_id, "surface_validate",
|
|
||||||
"\n[ABORTED] task cancelled (likely container restart or shutdown)\n",
|
|
||||||
)
|
|
||||||
raise
|
|
||||||
except Exception as exc:
|
except Exception as exc:
|
||||||
await _append_stage_log(job_id, "surface_validate", f"\n[SSH error] {exc}\n")
|
await _append_stage_log(job_id, "surface_validate", f"\n[SSH error] {exc}\n")
|
||||||
await _set_stage_error(job_id, "surface_validate", f"SSH badblocks error: {exc}")
|
await _set_stage_error(job_id, "surface_validate", f"SSH badblocks error: {exc}")
|
||||||
|
|
|
||||||
|
|
@ -49,10 +49,7 @@ class Settings(BaseSettings):
|
||||||
webhook_url: str = ""
|
webhook_url: str = ""
|
||||||
|
|
||||||
# Stuck-job detection: jobs running longer than this are marked 'unknown'
|
# Stuck-job detection: jobs running longer than this are marked 'unknown'
|
||||||
# and the remote badblocks/smartctl is killed. 168h (7 days) covers a
|
stuck_job_hours: int = 24
|
||||||
# full -w surface_validate on a 14 TB+ HDD with margin. Older default
|
|
||||||
# was 24h which false-positived on multi-TB drives almost every time.
|
|
||||||
stuck_job_hours: int = 168
|
|
||||||
|
|
||||||
# Temperature thresholds (°C) — drives table colouring + precheck gate
|
# Temperature thresholds (°C) — drives table colouring + precheck gate
|
||||||
temp_warn_c: int = 46 # orange warning
|
temp_warn_c: int = 46 # orange warning
|
||||||
|
|
@ -86,7 +83,7 @@ class Settings(BaseSettings):
|
||||||
ssh_key: str = "" # PEM private key content (paste full key including headers)
|
ssh_key: str = "" # PEM private key content (paste full key including headers)
|
||||||
|
|
||||||
# Application version — used by the /api/v1/updates/check endpoint
|
# Application version — used by the /api/v1/updates/check endpoint
|
||||||
app_version: str = "1.0.0-60"
|
app_version: str = "1.0.0-41"
|
||||||
|
|
||||||
# ---- Authentication (1.0.0-22) ----
|
# ---- Authentication (1.0.0-22) ----
|
||||||
# session_secret: HMAC key for signing session cookies. Empty = generate
|
# session_secret: HMAC key for signing session cookies. Empty = generate
|
||||||
|
|
|
||||||
|
|
@ -93,24 +93,6 @@ _MIGRATIONS = [
|
||||||
"ALTER TABLE drives ADD COLUMN pool_name TEXT",
|
"ALTER TABLE drives ADD COLUMN pool_name TEXT",
|
||||||
"ALTER TABLE drives ADD COLUMN pool_role TEXT",
|
"ALTER TABLE drives ADD COLUMN pool_role TEXT",
|
||||||
"ALTER TABLE drives ADD COLUMN pool_seen_at TEXT",
|
"ALTER TABLE drives ADD COLUMN pool_seen_at TEXT",
|
||||||
# 1.0.0-44: per-pattern badblocks progress for the drive drawer's
|
|
||||||
# 4-meter UI. bb_phase is 1-8 (1=write 0xaa, 2=verify 0xaa, 3=write
|
|
||||||
# 0x55, 4=verify 0x55, 5=write 0xff, 6=verify 0xff, 7=write 0x00,
|
|
||||||
# 8=verify 0x00). bb_phase_pct is 0-100 within the current phase.
|
|
||||||
"ALTER TABLE burnin_stages ADD COLUMN bb_phase INTEGER",
|
|
||||||
"ALTER TABLE burnin_stages ADD COLUMN bb_phase_pct REAL",
|
|
||||||
# 1.0.0-46: live write/read throughput for the per-pattern meters.
|
|
||||||
# Computed from successive `XX% done` lines in badblocks output:
|
|
||||||
# delta_bytes = (overall_pct_delta / 800) * drive_size_bytes.
|
|
||||||
# Updated on every progress line; NULL until the second progress
|
|
||||||
# line arrives (need two samples to compute a rate).
|
|
||||||
"ALTER TABLE burnin_stages ADD COLUMN bb_mbps REAL",
|
|
||||||
# 1.0.0-47: per-pattern duration history. JSON map of
|
|
||||||
# {"1": "2026-05-09T05:39:44+00:00", "2": ..., ...} where each key
|
|
||||||
# is the phase number (1-8) and the value is when the parser first
|
|
||||||
# observed that phase. Drawer derives "0xaa: 14h 22m" by diffing
|
|
||||||
# consecutive phase-1 keys.
|
|
||||||
"ALTER TABLE burnin_stages ADD COLUMN bb_phase_history TEXT",
|
|
||||||
# 1.0.0-19: enforce one active burn-in per drive at the storage layer.
|
# 1.0.0-19: enforce one active burn-in per drive at the storage layer.
|
||||||
# Closes the read-then-insert race in burnin.start_job — without this,
|
# Closes the read-then-insert race in burnin.start_job — without this,
|
||||||
# two concurrent /api/v1/burnin/start requests for the same drive could
|
# two concurrent /api/v1/burnin/start requests for the same drive could
|
||||||
|
|
@ -176,7 +158,6 @@ async def init_db() -> None:
|
||||||
Path(settings.db_path).parent.mkdir(parents=True, exist_ok=True)
|
Path(settings.db_path).parent.mkdir(parents=True, exist_ok=True)
|
||||||
async with aiosqlite.connect(settings.db_path) as db:
|
async with aiosqlite.connect(settings.db_path) as db:
|
||||||
await db.execute("PRAGMA journal_mode=WAL")
|
await db.execute("PRAGMA journal_mode=WAL")
|
||||||
await db.execute("PRAGMA busy_timeout=60000")
|
|
||||||
await db.execute("PRAGMA foreign_keys=ON")
|
await db.execute("PRAGMA foreign_keys=ON")
|
||||||
await db.executescript(SCHEMA)
|
await db.executescript(SCHEMA)
|
||||||
await _run_migrations(db)
|
await _run_migrations(db)
|
||||||
|
|
@ -188,7 +169,6 @@ async def get_db():
|
||||||
db.row_factory = aiosqlite.Row
|
db.row_factory = aiosqlite.Row
|
||||||
try:
|
try:
|
||||||
await db.execute("PRAGMA journal_mode=WAL")
|
await db.execute("PRAGMA journal_mode=WAL")
|
||||||
await db.execute("PRAGMA busy_timeout=60000")
|
|
||||||
await db.execute("PRAGMA foreign_keys=ON")
|
await db.execute("PRAGMA foreign_keys=ON")
|
||||||
yield db
|
yield db
|
||||||
finally:
|
finally:
|
||||||
|
|
|
||||||
|
|
@ -334,7 +334,6 @@ async def _fetch_report_data() -> list[dict]:
|
||||||
async with aiosqlite.connect(settings.db_path) as db:
|
async with aiosqlite.connect(settings.db_path) as db:
|
||||||
db.row_factory = aiosqlite.Row
|
db.row_factory = aiosqlite.Row
|
||||||
await db.execute("PRAGMA journal_mode=WAL")
|
await db.execute("PRAGMA journal_mode=WAL")
|
||||||
await db.execute("PRAGMA busy_timeout=60000")
|
|
||||||
return await _fetch_drives_for_template(db)
|
return await _fetch_drives_for_template(db)
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -348,7 +347,6 @@ async def _fetch_unlock_events_24h() -> list[dict]:
|
||||||
async with aiosqlite.connect(settings.db_path) as db:
|
async with aiosqlite.connect(settings.db_path) as db:
|
||||||
db.row_factory = aiosqlite.Row
|
db.row_factory = aiosqlite.Row
|
||||||
await db.execute("PRAGMA journal_mode=WAL")
|
await db.execute("PRAGMA journal_mode=WAL")
|
||||||
await db.execute("PRAGMA busy_timeout=60000")
|
|
||||||
# julianday() handles the 'YYYY-MM-DDTHH:MM:SS.fff+00:00' format
|
# julianday() handles the 'YYYY-MM-DDTHH:MM:SS.fff+00:00' format
|
||||||
# we write from Python; comparing the raw string against
|
# we write from Python; comparing the raw string against
|
||||||
# datetime('now','-1 day') (which formats as 'YYYY-MM-DD HH:MM:SS')
|
# datetime('now','-1 day') (which formats as 'YYYY-MM-DD HH:MM:SS')
|
||||||
|
|
|
||||||
15
app/main.py
15
app/main.py
|
|
@ -189,21 +189,6 @@ class _AuthGateMiddleware(BaseHTTPMiddleware):
|
||||||
await auth.get_user_by_id(int(user_id)) if user_id else None
|
await auth.get_user_by_id(int(user_id)) if user_id else None
|
||||||
)
|
)
|
||||||
|
|
||||||
# Loopback bypass (1.0.0-56): requests from 127.0.0.1 / ::1
|
|
||||||
# inside the container skip the auth gate. The only way to hit
|
|
||||||
# that source IP is a process in the container's network
|
|
||||||
# namespace — `docker exec` from the host. External traffic
|
|
||||||
# comes through the docker bridge with a non-loopback source,
|
|
||||||
# so it still goes through full auth. We read request.client.host
|
|
||||||
# directly (raw TCP socket), NOT X-Forwarded-For, so external
|
|
||||||
# attackers can't spoof loopback via headers. This unlocks the
|
|
||||||
# autonomous monitor's ability to POST /api/v1/burnin/start
|
|
||||||
# without provisioning a session cookie.
|
|
||||||
if request.client and request.client.host in ("127.0.0.1", "::1"):
|
|
||||||
if request.state.current_user is None:
|
|
||||||
request.state.current_user = auth.LoopbackUser()
|
|
||||||
return await call_next(request)
|
|
||||||
|
|
||||||
if path in _PUBLIC_PATHS or path.startswith(_PUBLIC_PREFIXES):
|
if path in _PUBLIC_PATHS or path.startswith(_PUBLIC_PREFIXES):
|
||||||
return await call_next(request)
|
return await call_next(request)
|
||||||
if request.state.current_user is not None:
|
if request.state.current_user is not None:
|
||||||
|
|
|
||||||
|
|
@ -437,7 +437,6 @@ async def poll_cycle(client: TrueNASClient) -> int:
|
||||||
async with aiosqlite.connect(settings.db_path) as db:
|
async with aiosqlite.connect(settings.db_path) as db:
|
||||||
db.row_factory = aiosqlite.Row
|
db.row_factory = aiosqlite.Row
|
||||||
await db.execute("PRAGMA journal_mode=WAL")
|
await db.execute("PRAGMA journal_mode=WAL")
|
||||||
await db.execute("PRAGMA busy_timeout=60000")
|
|
||||||
await db.execute("PRAGMA foreign_keys=ON")
|
await db.execute("PRAGMA foreign_keys=ON")
|
||||||
|
|
||||||
for disk in disks:
|
for disk in disks:
|
||||||
|
|
@ -493,7 +492,6 @@ async def run(client: TrueNASClient) -> None:
|
||||||
async with aiosqlite.connect(settings.db_path) as _tdb:
|
async with aiosqlite.connect(settings.db_path) as _tdb:
|
||||||
_tdb.row_factory = aiosqlite.Row
|
_tdb.row_factory = aiosqlite.Row
|
||||||
await _tdb.execute("PRAGMA journal_mode=WAL")
|
await _tdb.execute("PRAGMA journal_mode=WAL")
|
||||||
await _tdb.execute("PRAGMA busy_timeout=60000")
|
|
||||||
_cur = await _tdb.execute("""
|
_cur = await _tdb.execute("""
|
||||||
SELECT MAX(d.temperature_c)
|
SELECT MAX(d.temperature_c)
|
||||||
FROM drives d
|
FROM drives d
|
||||||
|
|
|
||||||
|
|
@ -128,7 +128,6 @@ async def sse_drives(request: Request):
|
||||||
async with aiosqlite.connect(settings.db_path) as db:
|
async with aiosqlite.connect(settings.db_path) as db:
|
||||||
db.row_factory = aiosqlite.Row
|
db.row_factory = aiosqlite.Row
|
||||||
await db.execute("PRAGMA journal_mode=WAL")
|
await db.execute("PRAGMA journal_mode=WAL")
|
||||||
await db.execute("PRAGMA busy_timeout=60000")
|
|
||||||
drives = await _fetch_drives_for_template(db)
|
drives = await _fetch_drives_for_template(db)
|
||||||
|
|
||||||
html = templates.env.get_template(
|
html = templates.env.get_template(
|
||||||
|
|
|
||||||
|
|
@ -147,12 +147,11 @@ async def _fetch_drives_for_template(db: aiosqlite.Connection) -> list[dict]:
|
||||||
|
|
||||||
# For burn-ins that include SMART stages, fetch those stages so we can
|
# For burn-ins that include SMART stages, fetch those stages so we can
|
||||||
# mirror their progress/result in the Short/Long SMART columns.
|
# mirror their progress/result in the Short/Long SMART columns.
|
||||||
# We include burn-ins in ANY state — including failed/passed/cancelled —
|
|
||||||
# so the SMART columns don't go blank when the burn-in finishes. Without
|
|
||||||
# this, "FAILED (LONG SMART)" appears in the Burn-In column while the
|
|
||||||
# Long SMART column shows "—", which contradicts itself.
|
|
||||||
bi_smart_stages: dict[int, dict[str, dict]] = {} # job_id -> {stage_name: row}
|
bi_smart_stages: dict[int, dict[str, dict]] = {} # job_id -> {stage_name: row}
|
||||||
bi_ids_with_smart = [bi["id"] for bi in burnin_by_drive.values()]
|
bi_ids_with_smart = [
|
||||||
|
bi["id"] for bi in burnin_by_drive.values()
|
||||||
|
if bi["state"] in ("running", "queued")
|
||||||
|
]
|
||||||
if bi_ids_with_smart:
|
if bi_ids_with_smart:
|
||||||
placeholders = ",".join("?" * len(bi_ids_with_smart))
|
placeholders = ",".join("?" * len(bi_ids_with_smart))
|
||||||
# placeholders is purely structural ("?,?,?"); IDs themselves are
|
# placeholders is purely structural ("?,?,?"); IDs themselves are
|
||||||
|
|
@ -164,7 +163,7 @@ async def _fetch_drives_for_template(db: aiosqlite.Connection) -> list[dict]:
|
||||||
"FROM burnin_stages bs "
|
"FROM burnin_stages bs "
|
||||||
"WHERE bs.burnin_job_id IN (" + placeholders + ") "
|
"WHERE bs.burnin_job_id IN (" + placeholders + ") "
|
||||||
" AND bs.stage_name IN ('short_smart', 'long_smart') "
|
" AND bs.stage_name IN ('short_smart', 'long_smart') "
|
||||||
" AND bs.state IN ('running', 'passed', 'failed', 'aborted')"
|
" AND bs.state IN ('running', 'passed', 'failed')"
|
||||||
)
|
)
|
||||||
cur = await db.execute(sql, bi_ids_with_smart)
|
cur = await db.execute(sql, bi_ids_with_smart)
|
||||||
for r in await cur.fetchall():
|
for r in await cur.fetchall():
|
||||||
|
|
@ -186,26 +185,14 @@ async def _fetch_drives_for_template(db: aiosqlite.Connection) -> list[dict]:
|
||||||
if existing.get("state") not in (None, "idle"):
|
if existing.get("state") not in (None, "idle"):
|
||||||
continue
|
continue
|
||||||
pct = stage["percent"] or 0
|
pct = stage["percent"] or 0
|
||||||
stage_state = stage["state"]
|
|
||||||
# If the parent burn-in ended in failure but this SMART
|
|
||||||
# stage is still recorded as "running", that's an
|
|
||||||
# orphaned stage row from a hard crash (e.g. the old
|
|
||||||
# `database is locked` failure mode). Surface as failed
|
|
||||||
# so the column matches the Burn-In column.
|
|
||||||
if stage_state == "running" and bi.get("state") in (
|
|
||||||
"failed", "cancelled", "unknown"
|
|
||||||
):
|
|
||||||
stage_state = bi["state"] if bi["state"] != "unknown" else "failed"
|
|
||||||
d[target] = {
|
d[target] = {
|
||||||
"state": stage_state,
|
"state": stage["state"],
|
||||||
"percent": pct if stage_state == "running" else (100 if stage_state == "passed" else 0),
|
"percent": pct if stage["state"] == "running" else (100 if stage["state"] == "passed" else 0),
|
||||||
"eta_seconds": _compute_eta_seconds(stage["started_at"], pct) if stage_state == "running" else None,
|
"eta_seconds": _compute_eta_seconds(stage["started_at"], pct) if stage["state"] == "running" else None,
|
||||||
"eta_timestamp": None,
|
"eta_timestamp": None,
|
||||||
"started_at": stage["started_at"],
|
"started_at": stage["started_at"],
|
||||||
"finished_at": stage["finished_at"],
|
"finished_at": stage["finished_at"],
|
||||||
"error_text": stage["error_text"] or (
|
"error_text": stage["error_text"],
|
||||||
bi.get("error_text") if stage_state == "failed" else None
|
|
||||||
),
|
|
||||||
}
|
}
|
||||||
|
|
||||||
drives.append(d)
|
drives.append(d)
|
||||||
|
|
|
||||||
|
|
@ -57,26 +57,11 @@ async def drive_drawer(drive_id: int, db: aiosqlite.Connection = Depends(get_db)
|
||||||
job = dict(job_row)
|
job = dict(job_row)
|
||||||
cur = await db.execute(
|
cur = await db.execute(
|
||||||
"SELECT id, stage_name, state, percent, started_at, finished_at, "
|
"SELECT id, stage_name, state, percent, started_at, finished_at, "
|
||||||
"duration_seconds, error_text, log_text, bad_blocks, "
|
"duration_seconds, error_text, log_text, bad_blocks "
|
||||||
"bb_phase, bb_phase_pct, bb_mbps, bb_phase_history "
|
|
||||||
"FROM burnin_stages WHERE burnin_job_id=? ORDER BY id",
|
"FROM burnin_stages WHERE burnin_job_id=? ORDER BY id",
|
||||||
(job_row["id"],),
|
(job_row["id"],),
|
||||||
)
|
)
|
||||||
stages = [dict(r) for r in await cur.fetchall()]
|
job["stages"] = [dict(r) for r in await cur.fetchall()]
|
||||||
# Backfill empty stage.error_text from the parent job's error_text
|
|
||||||
# for any stage that ended in a terminal state without recording
|
|
||||||
# an error of its own. This catches the orphan pattern from hard
|
|
||||||
# crashes (DB-locked, SSH disconnect, container restart) where
|
|
||||||
# the failure didn't get to write a per-stage explanation.
|
|
||||||
job_err = job.get("error_text")
|
|
||||||
for s in stages:
|
|
||||||
if (
|
|
||||||
s.get("state") in ("failed", "cancelled", "unknown")
|
|
||||||
and not s.get("error_text")
|
|
||||||
and job_err
|
|
||||||
):
|
|
||||||
s["error_text"] = job_err
|
|
||||||
job["stages"] = stages
|
|
||||||
burnin_job = job
|
burnin_job = job
|
||||||
|
|
||||||
# SMART raw output from smart_tests table
|
# SMART raw output from smart_tests table
|
||||||
|
|
@ -121,7 +106,6 @@ async def drive_drawer(drive_id: int, db: aiosqlite.Connection = Depends(get_db)
|
||||||
"serial": drive.serial,
|
"serial": drive.serial,
|
||||||
"model": drive.model,
|
"model": drive.model,
|
||||||
"size_bytes": drive.size_bytes,
|
"size_bytes": drive.size_bytes,
|
||||||
"temperature_c": drive.temperature_c,
|
|
||||||
},
|
},
|
||||||
"burnin": burnin_job,
|
"burnin": burnin_job,
|
||||||
"smart": {
|
"smart": {
|
||||||
|
|
|
||||||
|
|
@ -244,7 +244,7 @@ thead {
|
||||||
}
|
}
|
||||||
|
|
||||||
th {
|
th {
|
||||||
padding: 6px 8px;
|
padding: 9px 14px;
|
||||||
font-size: 11px;
|
font-size: 11px;
|
||||||
font-weight: 600;
|
font-weight: 600;
|
||||||
text-transform: uppercase;
|
text-transform: uppercase;
|
||||||
|
|
@ -256,10 +256,9 @@ th {
|
||||||
}
|
}
|
||||||
|
|
||||||
td {
|
td {
|
||||||
padding: 7px 8px;
|
padding: 10px 14px;
|
||||||
border-bottom: 1px solid var(--border);
|
border-bottom: 1px solid var(--border);
|
||||||
vertical-align: middle;
|
vertical-align: middle;
|
||||||
line-height: 1.3;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
tr:last-child td {
|
tr:last-child td {
|
||||||
|
|
@ -277,15 +276,17 @@ tr:hover td {
|
||||||
/* -----------------------------------------------------------------------
|
/* -----------------------------------------------------------------------
|
||||||
Column widths
|
Column widths
|
||||||
----------------------------------------------------------------------- */
|
----------------------------------------------------------------------- */
|
||||||
.col-drive { min-width: 160px; }
|
.col-drive { min-width: 180px; }
|
||||||
.col-serial { min-width: 95px; }
|
.col-serial { min-width: 110px; }
|
||||||
.col-size { min-width: 60px; text-align: right; }
|
.col-size { min-width: 70px; text-align: right; }
|
||||||
.col-temp { min-width: 60px; text-align: right; }
|
.col-temp { min-width: 75px; text-align: right; }
|
||||||
.col-health { min-width: 70px; }
|
.col-health { min-width: 85px; }
|
||||||
.col-smart { min-width: 80px; }
|
.col-smart { min-width: 95px; }
|
||||||
/* Tighter SMART columns — they hold short pills or a progress bar. */
|
/* Tighter horizontal padding on the SMART columns — they hold short
|
||||||
th.col-smart, td.col-smart { padding-left: 5px; padding-right: 5px; }
|
pills ("Passed"/"—") or a progress bar, so the default 14px gutter
|
||||||
.col-actions { min-width: 150px; }
|
wastes space on 13" laptops. */
|
||||||
|
th.col-smart, td.col-smart { padding-left: 6px; padding-right: 6px; }
|
||||||
|
.col-actions { min-width: 170px; }
|
||||||
|
|
||||||
/* -----------------------------------------------------------------------
|
/* -----------------------------------------------------------------------
|
||||||
Drive cell
|
Drive cell
|
||||||
|
|
@ -294,23 +295,14 @@ th.col-smart, td.col-smart { padding-left: 5px; padding-right: 5px; }
|
||||||
display: block;
|
display: block;
|
||||||
font-weight: 500;
|
font-weight: 500;
|
||||||
color: var(--text-strong);
|
color: var(--text-strong);
|
||||||
font-size: 13px;
|
font-size: 14px;
|
||||||
line-height: 1.25;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
.drive-model {
|
.drive-model {
|
||||||
display: inline;
|
display: block;
|
||||||
font-size: 10px;
|
font-size: 11px;
|
||||||
color: var(--text-muted);
|
color: var(--text-muted);
|
||||||
margin-top: 0;
|
margin-top: 1px;
|
||||||
line-height: 1.25;
|
|
||||||
}
|
|
||||||
/* Separator between model and location when both are present on the
|
|
||||||
same line. ::after on .drive-model puts a thin dot between them. */
|
|
||||||
.drive-model + .drive-location::before {
|
|
||||||
content: " · ";
|
|
||||||
color: var(--border);
|
|
||||||
margin: 0 2px;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/* -----------------------------------------------------------------------
|
/* -----------------------------------------------------------------------
|
||||||
|
|
@ -433,7 +425,7 @@ th.col-smart, td.col-smart { padding-left: 5px; padding-right: 5px; }
|
||||||
/* -----------------------------------------------------------------------
|
/* -----------------------------------------------------------------------
|
||||||
Burn-in column
|
Burn-in column
|
||||||
----------------------------------------------------------------------- */
|
----------------------------------------------------------------------- */
|
||||||
.col-burnin { min-width: 130px; }
|
.col-burnin { min-width: 160px; }
|
||||||
|
|
||||||
.burnin-cell { min-width: 140px; }
|
.burnin-cell { min-width: 140px; }
|
||||||
|
|
||||||
|
|
@ -1188,9 +1180,9 @@ a.stat-card:hover {
|
||||||
Checkbox column
|
Checkbox column
|
||||||
----------------------------------------------------------------------- */
|
----------------------------------------------------------------------- */
|
||||||
.col-check {
|
.col-check {
|
||||||
width: 32px;
|
width: 36px;
|
||||||
min-width: 32px;
|
min-width: 36px;
|
||||||
padding: 7px 4px 7px 8px;
|
padding: 10px 8px 10px 14px;
|
||||||
}
|
}
|
||||||
|
|
||||||
.drive-checkbox, #select-all-cb {
|
.drive-checkbox, #select-all-cb {
|
||||||
|
|
@ -1204,15 +1196,18 @@ a.stat-card:hover {
|
||||||
Drive location inline edit
|
Drive location inline edit
|
||||||
----------------------------------------------------------------------- */
|
----------------------------------------------------------------------- */
|
||||||
.drive-location {
|
.drive-location {
|
||||||
display: inline;
|
display: block;
|
||||||
font-size: 10px;
|
font-size: 10px;
|
||||||
color: var(--text-muted);
|
color: var(--text-muted);
|
||||||
margin-top: 0;
|
margin-top: 2px;
|
||||||
cursor: pointer;
|
cursor: pointer;
|
||||||
border-radius: 3px;
|
border-radius: 3px;
|
||||||
padding: 0 3px;
|
padding: 1px 3px;
|
||||||
line-height: 1.1;
|
|
||||||
transition: background 0.1s;
|
transition: background 0.1s;
|
||||||
|
max-width: 160px;
|
||||||
|
overflow: hidden;
|
||||||
|
text-overflow: ellipsis;
|
||||||
|
white-space: nowrap;
|
||||||
}
|
}
|
||||||
.drive-location:hover { background: var(--border); color: var(--text); }
|
.drive-location:hover { background: var(--border); color: var(--text); }
|
||||||
|
|
||||||
|
|
@ -2699,276 +2694,3 @@ tr.drawer-row-active {
|
||||||
font-variant-numeric: tabular-nums;
|
font-variant-numeric: tabular-nums;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
/* -----------------------------------------------------------------------
|
|
||||||
Per-pattern badblocks meters in the drive drawer (1.0.0-44).
|
|
||||||
Four meters, one per pattern (0xaa / 0x55 / 0xff / 0x00). Each meter
|
|
||||||
has two halves: write (left) and verify (right), so a glance shows
|
|
||||||
both which pattern is running and which sub-phase within it.
|
|
||||||
----------------------------------------------------------------------- */
|
|
||||||
.bb-meters {
|
|
||||||
display: grid;
|
|
||||||
grid-template-columns: repeat(4, 1fr);
|
|
||||||
gap: 8px;
|
|
||||||
padding: 10px 12px;
|
|
||||||
background: var(--bg-soft, #161b22);
|
|
||||||
border-radius: 6px;
|
|
||||||
margin: 6px 0 8px 0;
|
|
||||||
}
|
|
||||||
.bb-meter {
|
|
||||||
display: flex;
|
|
||||||
flex-direction: column;
|
|
||||||
gap: 4px;
|
|
||||||
}
|
|
||||||
.bb-meter-label {
|
|
||||||
font-family: "SF Mono", "Consolas", monospace;
|
|
||||||
font-size: 10px;
|
|
||||||
color: var(--text-muted);
|
|
||||||
text-transform: uppercase;
|
|
||||||
letter-spacing: .04em;
|
|
||||||
}
|
|
||||||
.bb-meter-current .bb-meter-label {
|
|
||||||
color: var(--blue, #58a6ff);
|
|
||||||
font-weight: 600;
|
|
||||||
}
|
|
||||||
.bb-meter-done .bb-meter-label {
|
|
||||||
color: var(--green, #3fb950);
|
|
||||||
}
|
|
||||||
.bb-meter-bar {
|
|
||||||
display: flex;
|
|
||||||
height: 10px;
|
|
||||||
background: var(--bg, #0d1117);
|
|
||||||
border: 1px solid var(--border, #30363d);
|
|
||||||
border-radius: 3px;
|
|
||||||
overflow: hidden;
|
|
||||||
position: relative;
|
|
||||||
}
|
|
||||||
.bb-meter-half {
|
|
||||||
height: 100%;
|
|
||||||
transition: width .3s ease;
|
|
||||||
}
|
|
||||||
.bb-write {
|
|
||||||
background: var(--blue, #58a6ff);
|
|
||||||
flex: 0 0 auto;
|
|
||||||
max-width: 50%;
|
|
||||||
}
|
|
||||||
.bb-verify {
|
|
||||||
background: var(--green, #3fb950);
|
|
||||||
flex: 0 0 auto;
|
|
||||||
max-width: 50%;
|
|
||||||
}
|
|
||||||
.bb-meter-half-spacer {
|
|
||||||
flex: 0 0 auto;
|
|
||||||
width: 1px;
|
|
||||||
background: var(--border, #30363d);
|
|
||||||
height: 100%;
|
|
||||||
}
|
|
||||||
.bb-meter-done .bb-write,
|
|
||||||
.bb-meter-done .bb-verify {
|
|
||||||
opacity: .55;
|
|
||||||
}
|
|
||||||
.bb-meter-sub {
|
|
||||||
display: flex;
|
|
||||||
justify-content: space-between;
|
|
||||||
font-family: "SF Mono", "Consolas", monospace;
|
|
||||||
font-size: 9px;
|
|
||||||
color: var(--text-muted);
|
|
||||||
}
|
|
||||||
.bb-sub-write { color: color-mix(in srgb, var(--blue) 80%, var(--text-muted)); }
|
|
||||||
.bb-sub-verify { color: color-mix(in srgb, var(--green) 80%, var(--text-muted)); }
|
|
||||||
|
|
||||||
/* -----------------------------------------------------------------------
|
|
||||||
Surface-scan vital-signs row in the drawer (1.0.0-46).
|
|
||||||
Sits directly above the per-pattern meters. Temperature with
|
|
||||||
green/yellow/red colour, live MB/s, elapsed, ETA — all derived
|
|
||||||
from data already in the drawer payload.
|
|
||||||
----------------------------------------------------------------------- */
|
|
||||||
.bb-vitals {
|
|
||||||
display: flex;
|
|
||||||
gap: 14px;
|
|
||||||
flex-wrap: wrap;
|
|
||||||
padding: 8px 12px 4px 12px;
|
|
||||||
background: var(--bg-soft, #161b22);
|
|
||||||
border-radius: 6px 6px 0 0;
|
|
||||||
margin: 6px 0 0 0;
|
|
||||||
border-bottom: 1px solid var(--border, #30363d);
|
|
||||||
}
|
|
||||||
/* When vitals lead, suppress the meter strip's top radius + margin so
|
|
||||||
they read as one stacked unit. */
|
|
||||||
.bb-vitals + .bb-meters {
|
|
||||||
border-radius: 0 0 6px 6px;
|
|
||||||
margin-top: 0;
|
|
||||||
}
|
|
||||||
.bb-vital {
|
|
||||||
display: flex;
|
|
||||||
flex-direction: column;
|
|
||||||
gap: 1px;
|
|
||||||
font-family: "SF Mono", "Consolas", monospace;
|
|
||||||
}
|
|
||||||
.bb-vital-label {
|
|
||||||
font-size: 9px;
|
|
||||||
color: var(--text-muted);
|
|
||||||
text-transform: uppercase;
|
|
||||||
letter-spacing: .04em;
|
|
||||||
}
|
|
||||||
.bb-vital-value {
|
|
||||||
font-size: 13px;
|
|
||||||
color: var(--text-strong, #f0f6fc);
|
|
||||||
font-weight: 500;
|
|
||||||
font-variant-numeric: tabular-nums;
|
|
||||||
}
|
|
||||||
|
|
||||||
/* -----------------------------------------------------------------------
|
|
||||||
Phase caption + per-pattern history (1.0.0-47).
|
|
||||||
----------------------------------------------------------------------- */
|
|
||||||
.bb-caption {
|
|
||||||
font-family: "SF Mono", "Consolas", monospace;
|
|
||||||
font-size: 11px;
|
|
||||||
color: var(--text-muted);
|
|
||||||
padding: 6px 12px 0 12px;
|
|
||||||
letter-spacing: .02em;
|
|
||||||
}
|
|
||||||
.bb-history {
|
|
||||||
display: flex;
|
|
||||||
flex-wrap: wrap;
|
|
||||||
align-items: center;
|
|
||||||
gap: 10px;
|
|
||||||
padding: 6px 12px 8px 12px;
|
|
||||||
font-family: "SF Mono", "Consolas", monospace;
|
|
||||||
font-size: 10px;
|
|
||||||
color: var(--text-muted);
|
|
||||||
}
|
|
||||||
.bb-hist-title {
|
|
||||||
text-transform: uppercase;
|
|
||||||
letter-spacing: .04em;
|
|
||||||
font-size: 9px;
|
|
||||||
margin-right: 4px;
|
|
||||||
}
|
|
||||||
.bb-hist-row {
|
|
||||||
display: inline-flex;
|
|
||||||
align-items: baseline;
|
|
||||||
gap: 4px;
|
|
||||||
background: var(--bg, #0d1117);
|
|
||||||
border: 1px solid var(--border, #30363d);
|
|
||||||
border-radius: 3px;
|
|
||||||
padding: 1px 6px;
|
|
||||||
}
|
|
||||||
.bb-hist-label {
|
|
||||||
color: var(--green, #3fb950);
|
|
||||||
font-weight: 600;
|
|
||||||
}
|
|
||||||
.bb-hist-dur {
|
|
||||||
color: var(--text-strong, #f0f6fc);
|
|
||||||
font-variant-numeric: tabular-nums;
|
|
||||||
}
|
|
||||||
|
|
||||||
/* Bad-block counter colour states inside the vitals row */
|
|
||||||
.bb-vital-good { color: var(--green, #3fb950); }
|
|
||||||
.bb-vital-bad { color: var(--red, #f85149); }
|
|
||||||
|
|
||||||
/* -----------------------------------------------------------------------
|
|
||||||
Column sort (1.0.0-48). Click a sortable TH to cycle asc → desc →
|
|
||||||
cleared. Indicator arrow appears next to the column label.
|
|
||||||
----------------------------------------------------------------------- */
|
|
||||||
th.sortable {
|
|
||||||
cursor: pointer;
|
|
||||||
user-select: none;
|
|
||||||
position: relative;
|
|
||||||
}
|
|
||||||
th.sortable:hover { color: var(--text); }
|
|
||||||
th.sortable::after {
|
|
||||||
content: "";
|
|
||||||
display: inline-block;
|
|
||||||
width: 0;
|
|
||||||
height: 0;
|
|
||||||
margin-left: 4px;
|
|
||||||
border-left: 4px solid transparent;
|
|
||||||
border-right: 4px solid transparent;
|
|
||||||
vertical-align: middle;
|
|
||||||
opacity: 0;
|
|
||||||
}
|
|
||||||
th.sortable:hover::after { opacity: 0.4; border-bottom: 5px solid currentColor; }
|
|
||||||
th.sort-asc::after {
|
|
||||||
opacity: 1;
|
|
||||||
border-bottom: 5px solid var(--blue, #58a6ff);
|
|
||||||
}
|
|
||||||
th.sort-desc::after {
|
|
||||||
opacity: 1;
|
|
||||||
border-top: 5px solid var(--blue, #58a6ff);
|
|
||||||
}
|
|
||||||
|
|
||||||
/* -----------------------------------------------------------------------
|
|
||||||
Stage "Reason" block — explains why a stage ended in a terminal
|
|
||||||
state. Replaces the old single-line stage-error-line for
|
|
||||||
failed/cancelled/unknown stages so the operator gets a clear,
|
|
||||||
prominent explanation at the top.
|
|
||||||
----------------------------------------------------------------------- */
|
|
||||||
.stage-reason {
|
|
||||||
display: flex;
|
|
||||||
gap: 10px;
|
|
||||||
align-items: baseline;
|
|
||||||
padding: 8px 12px;
|
|
||||||
margin: 6px 0;
|
|
||||||
border-radius: 5px;
|
|
||||||
font-size: 12px;
|
|
||||||
border: 1px solid;
|
|
||||||
}
|
|
||||||
.stage-reason-failed {
|
|
||||||
background: var(--red-bg, color-mix(in srgb, var(--red) 12%, transparent));
|
|
||||||
border-color: var(--red-bd, color-mix(in srgb, var(--red) 40%, transparent));
|
|
||||||
}
|
|
||||||
.stage-reason-cancelled,
|
|
||||||
.stage-reason-unknown {
|
|
||||||
background: var(--yellow-bg, color-mix(in srgb, var(--yellow) 12%, transparent));
|
|
||||||
border-color: var(--yellow-bd, color-mix(in srgb, var(--yellow) 40%, transparent));
|
|
||||||
}
|
|
||||||
.stage-reason-label {
|
|
||||||
font-size: 10px;
|
|
||||||
text-transform: uppercase;
|
|
||||||
letter-spacing: .06em;
|
|
||||||
font-weight: 600;
|
|
||||||
color: var(--text-muted);
|
|
||||||
flex-shrink: 0;
|
|
||||||
}
|
|
||||||
.stage-reason-text {
|
|
||||||
flex: 1;
|
|
||||||
color: var(--text-strong, #f0f6fc);
|
|
||||||
line-height: 1.4;
|
|
||||||
word-wrap: break-word;
|
|
||||||
}
|
|
||||||
.stage-reason-failed .stage-reason-text { color: var(--red, #f85149); }
|
|
||||||
.stage-reason-cancelled .stage-reason-text,
|
|
||||||
.stage-reason-unknown .stage-reason-text { color: var(--yellow, #d29922); }
|
|
||||||
|
|
||||||
/* -----------------------------------------------------------------------
|
|
||||||
Drawer job-level estimated completion (right-aligned in the header,
|
|
||||||
so it doesn't compete with the state chip + operator info).
|
|
||||||
----------------------------------------------------------------------- */
|
|
||||||
.drawer-job-header {
|
|
||||||
display: flex;
|
|
||||||
align-items: center;
|
|
||||||
gap: 10px;
|
|
||||||
flex-wrap: wrap;
|
|
||||||
}
|
|
||||||
.drawer-job-finish {
|
|
||||||
display: inline-flex;
|
|
||||||
align-items: baseline;
|
|
||||||
gap: 8px;
|
|
||||||
padding: 4px 10px;
|
|
||||||
background: var(--bg-soft, #161b22);
|
|
||||||
border: 1px solid var(--border, #30363d);
|
|
||||||
border-radius: 5px;
|
|
||||||
font-family: "SF Mono", "Consolas", monospace;
|
|
||||||
}
|
|
||||||
.drawer-job-finish-label {
|
|
||||||
font-size: 9px;
|
|
||||||
color: var(--text-muted);
|
|
||||||
text-transform: uppercase;
|
|
||||||
letter-spacing: .04em;
|
|
||||||
}
|
|
||||||
.drawer-job-finish-value {
|
|
||||||
font-size: 12px;
|
|
||||||
color: var(--text-strong, #f0f6fc);
|
|
||||||
font-weight: 500;
|
|
||||||
font-variant-numeric: tabular-nums;
|
|
||||||
}
|
|
||||||
|
|
|
||||||
|
|
@ -79,86 +79,12 @@
|
||||||
initElapsedTimers();
|
initElapsedTimers();
|
||||||
initUnlockCountdowns();
|
initUnlockCountdowns();
|
||||||
initLocationEdits();
|
initLocationEdits();
|
||||||
applySort(); // SSE swap replaces #drives-tbody — re-apply persisted sort
|
|
||||||
paintSortIndicators();
|
|
||||||
if (_drawerDriveId) {
|
if (_drawerDriveId) {
|
||||||
_drawerHighlightRow(_drawerDriveId);
|
_drawerHighlightRow(_drawerDriveId);
|
||||||
drawerFetch(_drawerDriveId);
|
drawerFetch(_drawerDriveId);
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
// ---------------------------------------------------------------
|
|
||||||
// Column sorting (client-side, persisted in localStorage so it
|
|
||||||
// survives reload AND survives every SSE-driven tbody refresh).
|
|
||||||
// ---------------------------------------------------------------
|
|
||||||
var SORT_KEY = 'nasburnin.sort';
|
|
||||||
function getSort() {
|
|
||||||
try {
|
|
||||||
var raw = localStorage.getItem(SORT_KEY);
|
|
||||||
if (!raw) return null;
|
|
||||||
var p = JSON.parse(raw);
|
|
||||||
if (p && p.col && (p.dir === 'asc' || p.dir === 'desc')) return p;
|
|
||||||
} catch (e) {}
|
|
||||||
return null;
|
|
||||||
}
|
|
||||||
function setSort(col, dir) {
|
|
||||||
if (!col) localStorage.removeItem(SORT_KEY);
|
|
||||||
else localStorage.setItem(SORT_KEY, JSON.stringify({col: col, dir: dir}));
|
|
||||||
}
|
|
||||||
function applySort() {
|
|
||||||
var s = getSort();
|
|
||||||
var tbody = document.getElementById('drives-tbody');
|
|
||||||
if (!tbody || !s) return;
|
|
||||||
var rows = Array.from(tbody.querySelectorAll('tr[id^="drive-"]'));
|
|
||||||
if (!rows.length) return;
|
|
||||||
var attr = 'data-sort-' + s.col;
|
|
||||||
var dirMul = s.dir === 'asc' ? 1 : -1;
|
|
||||||
rows.sort(function (a, b) {
|
|
||||||
var av = a.getAttribute(attr);
|
|
||||||
var bv = b.getAttribute(attr);
|
|
||||||
// Empty values always sink to the bottom regardless of direction.
|
|
||||||
var aEmpty = av === null || av === '';
|
|
||||||
var bEmpty = bv === null || bv === '';
|
|
||||||
if (aEmpty && !bEmpty) return 1;
|
|
||||||
if (!aEmpty && bEmpty) return -1;
|
|
||||||
if (aEmpty && bEmpty) return 0;
|
|
||||||
// Numeric comparison if both parse cleanly, else string.
|
|
||||||
var an = parseFloat(av), bn = parseFloat(bv);
|
|
||||||
if (!isNaN(an) && !isNaN(bn) && String(an) === av && String(bn) === bv) {
|
|
||||||
return (an - bn) * dirMul;
|
|
||||||
}
|
|
||||||
return av.localeCompare(bv) * dirMul;
|
|
||||||
});
|
|
||||||
rows.forEach(function (r) { tbody.appendChild(r); });
|
|
||||||
}
|
|
||||||
function paintSortIndicators() {
|
|
||||||
var s = getSort();
|
|
||||||
document.querySelectorAll('th.sortable').forEach(function (th) {
|
|
||||||
th.classList.remove('sort-asc', 'sort-desc');
|
|
||||||
if (s && th.dataset.sortKey === s.col) {
|
|
||||||
th.classList.add(s.dir === 'asc' ? 'sort-asc' : 'sort-desc');
|
|
||||||
}
|
|
||||||
});
|
|
||||||
}
|
|
||||||
document.addEventListener('click', function (e) {
|
|
||||||
var th = e.target.closest('th.sortable');
|
|
||||||
if (!th) return;
|
|
||||||
var col = th.dataset.sortKey;
|
|
||||||
var s = getSort();
|
|
||||||
var dir = 'asc';
|
|
||||||
if (s && s.col === col) {
|
|
||||||
// Click cycle: asc → desc → cleared
|
|
||||||
if (s.dir === 'asc') dir = 'desc';
|
|
||||||
else { setSort(null); applySort(); paintSortIndicators(); return; }
|
|
||||||
}
|
|
||||||
setSort(col, dir);
|
|
||||||
applySort();
|
|
||||||
paintSortIndicators();
|
|
||||||
});
|
|
||||||
// Initial paint on page load (HTML is already rendered server-side).
|
|
||||||
applySort();
|
|
||||||
paintSortIndicators();
|
|
||||||
|
|
||||||
updateCounts();
|
updateCounts();
|
||||||
|
|
||||||
// -----------------------------------------------------------------------
|
// -----------------------------------------------------------------------
|
||||||
|
|
@ -1345,14 +1271,8 @@
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Stash the last drive object so the burn-in panel renderer can
|
|
||||||
// pull temperature_c into the vital-signs row without having to
|
|
||||||
// pass it through the Burn-In renderer's signature.
|
|
||||||
var _DRAWER_LAST_DRIVE = null;
|
|
||||||
|
|
||||||
function _drawerRender(data) {
|
function _drawerRender(data) {
|
||||||
var drive = data.drive || {};
|
var drive = data.drive || {};
|
||||||
_DRAWER_LAST_DRIVE = drive;
|
|
||||||
var devnameEl = document.getElementById('drawer-devname');
|
var devnameEl = document.getElementById('drawer-devname');
|
||||||
var metaEl = document.getElementById('drawer-drive-meta');
|
var metaEl = document.getElementById('drawer-drive-meta');
|
||||||
if (devnameEl) devnameEl.textContent = drive.devname || '\u2014';
|
if (devnameEl) devnameEl.textContent = drive.devname || '\u2014';
|
||||||
|
|
@ -1366,170 +1286,6 @@
|
||||||
_drawerRenderEvents(data.events);
|
_drawerRenderEvents(data.events);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Vital-signs row above the meters: drive temp, live throughput,
|
|
||||||
// elapsed time, ETA. Computed from data already in the drawer payload.
|
|
||||||
function _drawerRenderBadblocksVitals(stage, drive) {
|
|
||||||
var phase = parseInt(stage.bb_phase, 10) || 1;
|
|
||||||
var phasePct = parseFloat(stage.bb_phase_pct || 0);
|
|
||||||
var overallPct = ((phase - 1) * 100 + phasePct) / 8; // 0..100
|
|
||||||
var html = '<div class="bb-vitals">';
|
|
||||||
var dateOpts = {
|
|
||||||
weekday: 'short', month: 'short', day: 'numeric',
|
|
||||||
hour: 'numeric', minute: '2-digit',
|
|
||||||
};
|
|
||||||
|
|
||||||
// Start (wall-clock, with date)
|
|
||||||
if (stage.started_at) {
|
|
||||||
var startMs = Date.parse(stage.started_at);
|
|
||||||
var startStr = new Date(startMs).toLocaleString(undefined, dateOpts);
|
|
||||||
html += '<div class="bb-vital">';
|
|
||||||
html += '<span class="bb-vital-label">Start</span>';
|
|
||||||
html += '<span class="bb-vital-value">' + startStr + '</span>';
|
|
||||||
html += '</div>';
|
|
||||||
|
|
||||||
// Elapsed
|
|
||||||
var elapsedSec = Math.max(0, (Date.now() - startMs) / 1000);
|
|
||||||
html += '<div class="bb-vital">';
|
|
||||||
html += '<span class="bb-vital-label">Elapsed</span>';
|
|
||||||
html += '<span class="bb-vital-value">' + _bbFmtDuration(elapsedSec) + '</span>';
|
|
||||||
html += '</div>';
|
|
||||||
|
|
||||||
// ETA + Finish — only once we have measurable progress, so the
|
|
||||||
// first samples don't paint a "47 days" estimate.
|
|
||||||
if (overallPct >= 0.5) {
|
|
||||||
var totalSec = elapsedSec * (100 / overallPct);
|
|
||||||
var remainingSec = Math.max(0, totalSec - elapsedSec);
|
|
||||||
html += '<div class="bb-vital">';
|
|
||||||
html += '<span class="bb-vital-label">ETA</span>';
|
|
||||||
html += '<span class="bb-vital-value">' + _bbFmtDuration(remainingSec) + '</span>';
|
|
||||||
html += '</div>';
|
|
||||||
|
|
||||||
var finishStr = new Date(Date.now() + remainingSec * 1000)
|
|
||||||
.toLocaleString(undefined, dateOpts);
|
|
||||||
html += '<div class="bb-vital">';
|
|
||||||
html += '<span class="bb-vital-label">Finish</span>';
|
|
||||||
html += '<span class="bb-vital-value">' + finishStr + '</span>';
|
|
||||||
html += '</div>';
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Temp with hot/warm/cool colour
|
|
||||||
if (drive && typeof drive.temperature_c === 'number') {
|
|
||||||
var tc = drive.temperature_c;
|
|
||||||
var tClass = 'temp-cool';
|
|
||||||
if (tc >= 48) tClass = 'temp-hot';
|
|
||||||
else if (tc >= 42) tClass = 'temp-warm';
|
|
||||||
html += '<div class="bb-vital">';
|
|
||||||
html += '<span class="bb-vital-label">Temp</span>';
|
|
||||||
html += '<span class="bb-vital-value temp ' + tClass + '">' + tc + '°C</span>';
|
|
||||||
html += '</div>';
|
|
||||||
}
|
|
||||||
|
|
||||||
html += '</div>';
|
|
||||||
return html;
|
|
||||||
}
|
|
||||||
|
|
||||||
function _bbFmtDuration(sec) {
|
|
||||||
sec = Math.floor(sec);
|
|
||||||
var d = Math.floor(sec / 86400);
|
|
||||||
var h = Math.floor((sec % 86400) / 3600);
|
|
||||||
var m = Math.floor((sec % 3600) / 60);
|
|
||||||
if (d > 0) return d + 'd ' + h + 'h';
|
|
||||||
if (h > 0) return h + 'h ' + m + 'm';
|
|
||||||
return m + 'm';
|
|
||||||
}
|
|
||||||
|
|
||||||
// Phase caption — explicit text below the meters: e.g.
|
|
||||||
// "Pattern 2 of 4 · Verify 0x55 · 47% within phase".
|
|
||||||
function _drawerRenderBadblocksCaption(phase, phasePct) {
|
|
||||||
if (!phase) return '';
|
|
||||||
var p = parseInt(phase, 10);
|
|
||||||
var pct = parseFloat(phasePct || 0);
|
|
||||||
var labels = ['0xaa', '0x55', '0xff', '0x00'];
|
|
||||||
var pattern = Math.ceil(p / 2);
|
|
||||||
var subPhase = (p % 2 === 1) ? 'Write' : 'Verify';
|
|
||||||
var label = labels[pattern - 1];
|
|
||||||
var html = '<div class="bb-caption">';
|
|
||||||
html += 'Pattern ' + pattern + ' of 4 · ';
|
|
||||||
html += subPhase + ' ' + label + ' · ';
|
|
||||||
html += pct.toFixed(1) + '% within phase';
|
|
||||||
html += '</div>';
|
|
||||||
return html;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Per-pattern duration history. Reads bb_phase_history (JSON) and
|
|
||||||
// emits "0xaa: 14h 22m" rows for completed patterns. Pattern N is
|
|
||||||
// "complete" when its verify-phase end timestamp is known (= the
|
|
||||||
// next pattern's write-phase start, or stage.finished_at for the
|
|
||||||
// final one).
|
|
||||||
function _drawerRenderBadblocksHistory(stage) {
|
|
||||||
if (!stage.bb_phase_history) return '';
|
|
||||||
var hist;
|
|
||||||
try { hist = JSON.parse(stage.bb_phase_history); }
|
|
||||||
catch (e) { return ''; }
|
|
||||||
if (!hist || typeof hist !== 'object') return '';
|
|
||||||
var labels = ['0xaa', '0x55', '0xff', '0x00'];
|
|
||||||
var rows = [];
|
|
||||||
for (var n = 1; n <= 4; n++) {
|
|
||||||
var writeStart = hist[String(2 * n - 1)];
|
|
||||||
if (!writeStart) continue;
|
|
||||||
var endTs = (n < 4) ? hist[String(2 * n + 1)] : stage.finished_at;
|
|
||||||
if (!endTs) continue;
|
|
||||||
var elapsedSec = (Date.parse(endTs) - Date.parse(writeStart)) / 1000;
|
|
||||||
if (elapsedSec <= 0) continue;
|
|
||||||
rows.push('<span class="bb-hist-row">' +
|
|
||||||
'<span class="bb-hist-label">' + labels[n - 1] + '</span>' +
|
|
||||||
'<span class="bb-hist-dur">' + _bbFmtDuration(elapsedSec) + '</span>' +
|
|
||||||
'</span>');
|
|
||||||
}
|
|
||||||
if (!rows.length) return '';
|
|
||||||
return '<div class="bb-history"><span class="bb-hist-title">Completed patterns</span>' +
|
|
||||||
rows.join('') + '</div>';
|
|
||||||
}
|
|
||||||
|
|
||||||
// Render 4 pattern meters for badblocks -w surface_validate. Each
|
|
||||||
// meter splits write/verify halves so you can see at a glance which
|
|
||||||
// pattern is current AND whether you're writing or verifying within
|
|
||||||
// it. phase: 1-8 (1=write 0xaa, 2=verify 0xaa, 3=write 0x55, ...).
|
|
||||||
function _drawerRenderBadblocksMeters(phase, phasePct) {
|
|
||||||
if (!phase) return '';
|
|
||||||
var p = parseInt(phase, 10);
|
|
||||||
var pct = parseFloat(phasePct || 0);
|
|
||||||
var labels = ['0xaa', '0x55', '0xff', '0x00'];
|
|
||||||
var html = '<div class="bb-meters">';
|
|
||||||
for (var i = 0; i < 4; i++) {
|
|
||||||
var writePhase = i * 2 + 1;
|
|
||||||
var verifyPhase = writePhase + 1;
|
|
||||||
var writeFill, verifyFill;
|
|
||||||
if (p > verifyPhase) {
|
|
||||||
writeFill = 100; verifyFill = 100;
|
|
||||||
} else if (p === verifyPhase) {
|
|
||||||
writeFill = 100; verifyFill = pct;
|
|
||||||
} else if (p === writePhase) {
|
|
||||||
writeFill = pct; verifyFill = 0;
|
|
||||||
} else {
|
|
||||||
writeFill = 0; verifyFill = 0;
|
|
||||||
}
|
|
||||||
var classes = 'bb-meter';
|
|
||||||
if (p === writePhase || p === verifyPhase) classes += ' bb-meter-current';
|
|
||||||
if (p > verifyPhase) classes += ' bb-meter-done';
|
|
||||||
html += '<div class="' + classes + '">';
|
|
||||||
html += '<div class="bb-meter-label">' + labels[i] + '</div>';
|
|
||||||
html += '<div class="bb-meter-bar">';
|
|
||||||
html += '<div class="bb-meter-half bb-write" style="width:' + writeFill.toFixed(1) + '%"></div>';
|
|
||||||
html += '<div class="bb-meter-half-spacer"></div>';
|
|
||||||
html += '<div class="bb-meter-half bb-verify" style="width:' + verifyFill.toFixed(1) + '%"></div>';
|
|
||||||
html += '</div>';
|
|
||||||
html += '<div class="bb-meter-sub">';
|
|
||||||
html += '<span class="bb-sub-write">W ' + Math.round(writeFill) + '%</span>';
|
|
||||||
html += '<span class="bb-sub-verify">V ' + Math.round(verifyFill) + '%</span>';
|
|
||||||
html += '</div>';
|
|
||||||
html += '</div>';
|
|
||||||
}
|
|
||||||
html += '</div>';
|
|
||||||
return html;
|
|
||||||
}
|
|
||||||
|
|
||||||
function _drawerRenderBurnin(burnin) {
|
function _drawerRenderBurnin(burnin) {
|
||||||
var panel = document.getElementById('drawer-panel-burnin');
|
var panel = document.getElementById('drawer-panel-burnin');
|
||||||
if (!panel) return;
|
if (!panel) return;
|
||||||
|
|
@ -1544,30 +1300,7 @@
|
||||||
html += '<span class="drawer-job-meta">';
|
html += '<span class="drawer-job-meta">';
|
||||||
if (burnin.operator) html += 'by ' + _esc(burnin.operator);
|
if (burnin.operator) html += 'by ' + _esc(burnin.operator);
|
||||||
if (burnin.started_at) html += ' \u00b7 ' + _drawerFmtDt(burnin.started_at);
|
if (burnin.started_at) html += ' \u00b7 ' + _drawerFmtDt(burnin.started_at);
|
||||||
html += '</span>';
|
html += '</span></div>';
|
||||||
// Job-level estimated completion. Uses the weighted overall job %
|
|
||||||
// (recalculated server-side from stage progress) so it reflects
|
|
||||||
// every stage, not just the current one. Suppressed under 0.5%
|
|
||||||
// so the early sample doesn't paint a "Finish: Sep 22" stutter.
|
|
||||||
if (burnin.state === 'running' && burnin.started_at) {
|
|
||||||
var jobPct = parseFloat(burnin.percent || 0);
|
|
||||||
if (jobPct >= 0.5) {
|
|
||||||
var jobStartMs = Date.parse(burnin.started_at);
|
|
||||||
var jobElapsedSec = Math.max(0, (Date.now() - jobStartMs) / 1000);
|
|
||||||
var jobTotalSec = jobElapsedSec * (100 / jobPct);
|
|
||||||
var jobRemainSec = Math.max(0, jobTotalSec - jobElapsedSec);
|
|
||||||
var jobFinish = new Date(Date.now() + jobRemainSec * 1000);
|
|
||||||
var jobFinishStr = jobFinish.toLocaleString(undefined, {
|
|
||||||
weekday: 'short', month: 'short', day: 'numeric',
|
|
||||||
hour: 'numeric', minute: '2-digit',
|
|
||||||
});
|
|
||||||
html += '<span class="drawer-job-finish" title="Estimated completion of the entire burn-in (all stages)">';
|
|
||||||
html += '<span class="drawer-job-finish-label">Est. completion</span>';
|
|
||||||
html += '<span class="drawer-job-finish-value">' + jobFinishStr + '</span>';
|
|
||||||
html += '</span>';
|
|
||||||
}
|
|
||||||
}
|
|
||||||
html += '</div>';
|
|
||||||
|
|
||||||
html += '<div class="drawer-stages">';
|
html += '<div class="drawer-stages">';
|
||||||
var stages = burnin.stages || [];
|
var stages = burnin.stages || [];
|
||||||
|
|
@ -1587,37 +1320,9 @@
|
||||||
html += '<span class="stage-duration">' + _drawerFmtDuration(s.started_at, s.finished_at) + '</span>';
|
html += '<span class="stage-duration">' + _drawerFmtDuration(s.started_at, s.finished_at) + '</span>';
|
||||||
}
|
}
|
||||||
html += '</div>';
|
html += '</div>';
|
||||||
// Prominent "Why it failed" block at the top of failed/cancelled/
|
if (s.error_text) {
|
||||||
// unknown stages. Falls back to a heuristic when no error was
|
|
||||||
// recorded — e.g. a tiny log + no badblocks progress + terminal
|
|
||||||
// state means the stage was killed externally (SSH disconnect or
|
|
||||||
// container restart) before it could record an error.
|
|
||||||
if (s.state === 'failed' || s.state === 'cancelled' || s.state === 'unknown') {
|
|
||||||
var reason = s.error_text;
|
|
||||||
if (!reason) {
|
|
||||||
var logLen = (s.log_text || '').length;
|
|
||||||
var noBbProgress = !s.bb_phase || (s.bb_phase === 1 && (parseFloat(s.bb_phase_pct || 0) < 0.1));
|
|
||||||
if (logLen < 500 && noBbProgress) {
|
|
||||||
reason = 'Stopped without recording an error — likely cause: SSH connection drop or container restart while this stage was running.';
|
|
||||||
} else {
|
|
||||||
reason = 'No error message recorded.';
|
|
||||||
}
|
|
||||||
}
|
|
||||||
html += '<div class="stage-reason stage-reason-' + _esc(s.state) + '">';
|
|
||||||
html += '<span class="stage-reason-label">Reason</span>';
|
|
||||||
html += '<span class="stage-reason-text">' + _esc(reason) + '</span>';
|
|
||||||
html += '</div>';
|
|
||||||
} else if (s.error_text) {
|
|
||||||
html += '<div class="stage-error-line">' + _esc(s.error_text) + '</div>';
|
html += '<div class="stage-error-line">' + _esc(s.error_text) + '</div>';
|
||||||
}
|
}
|
||||||
// Per-pattern meters for badblocks surface_validate, plus the
|
|
||||||
// vital-signs row above (temp / speed / elapsed / ETA).
|
|
||||||
if (s.stage_name === 'surface_validate' && s.bb_phase) {
|
|
||||||
html += _drawerRenderBadblocksVitals(s, _DRAWER_LAST_DRIVE);
|
|
||||||
html += _drawerRenderBadblocksMeters(s.bb_phase, s.bb_phase_pct);
|
|
||||||
html += _drawerRenderBadblocksCaption(s.bb_phase, s.bb_phase_pct);
|
|
||||||
html += _drawerRenderBadblocksHistory(s);
|
|
||||||
}
|
|
||||||
// Raw SSH log output (if available)
|
// Raw SSH log output (if available)
|
||||||
if (s.log_text) {
|
if (s.log_text) {
|
||||||
var logHtml = _esc(s.log_text)
|
var logHtml = _esc(s.log_text)
|
||||||
|
|
|
||||||
|
|
@ -46,13 +46,7 @@
|
||||||
{%- elif bi.state == 'passed' -%}
|
{%- elif bi.state == 'passed' -%}
|
||||||
<span class="chip chip-passed">Passed</span>
|
<span class="chip chip-passed">Passed</span>
|
||||||
{%- elif bi.state == 'failed' -%}
|
{%- elif bi.state == 'failed' -%}
|
||||||
{# Suppress the stage suffix for SMART + surface_validate stages.
|
<span class="chip chip-failed">Failed{% if bi.stage_name %} ({{ bi.stage_name | replace('_',' ') }}){% endif %}</span>
|
||||||
SMART has its own columns, and surface_validate is the dominant
|
|
||||||
case so a redundant suffix just adds visual noise. The drawer
|
|
||||||
shows the per-stage Reason for any digging. Keep the suffix for
|
|
||||||
precheck / final_check since those are rare enough that the hint
|
|
||||||
is helpful. #}
|
|
||||||
<span class="chip chip-failed">Failed{% if bi.stage_name and bi.stage_name not in ('short_smart', 'long_smart', 'surface_validate') %} ({{ bi.stage_name | replace('_',' ') }}){% endif %}</span>
|
|
||||||
{%- elif bi.state == 'cancelled' -%}
|
{%- elif bi.state == 'cancelled' -%}
|
||||||
<span class="chip chip-aborted">Cancelled</span>
|
<span class="chip chip-aborted">Cancelled</span>
|
||||||
{%- elif bi.state == 'unknown' -%}
|
{%- elif bi.state == 'unknown' -%}
|
||||||
|
|
@ -69,14 +63,14 @@
|
||||||
<th class="col-check">
|
<th class="col-check">
|
||||||
<input type="checkbox" id="select-all-cb" class="drive-cb" title="Select all idle drives">
|
<input type="checkbox" id="select-all-cb" class="drive-cb" title="Select all idle drives">
|
||||||
</th>
|
</th>
|
||||||
<th class="col-drive sortable" data-sort-key="drive">Drive</th>
|
<th class="col-drive">Drive</th>
|
||||||
<th class="col-serial sortable" data-sort-key="serial">Serial</th>
|
<th class="col-serial">Serial</th>
|
||||||
<th class="col-size sortable" data-sort-key="size">Size</th>
|
<th class="col-size">Size</th>
|
||||||
<th class="col-temp sortable" data-sort-key="temp">Temp</th>
|
<th class="col-temp">Temp</th>
|
||||||
<th class="col-health sortable" data-sort-key="health">Health</th>
|
<th class="col-health">Health</th>
|
||||||
<th class="col-smart sortable" data-sort-key="short">Short SMART</th>
|
<th class="col-smart">Short SMART</th>
|
||||||
<th class="col-smart sortable" data-sort-key="long">Long SMART</th>
|
<th class="col-smart">Long SMART</th>
|
||||||
<th class="col-burnin sortable" data-sort-key="burnin">Burn-In</th>
|
<th class="col-burnin">Burn-In</th>
|
||||||
<th class="col-actions">Actions</th>
|
<th class="col-actions">Actions</th>
|
||||||
</tr>
|
</tr>
|
||||||
</thead>
|
</thead>
|
||||||
|
|
@ -95,19 +89,7 @@
|
||||||
{%- set smart_done = (drive.smart_short and drive.smart_short.state in ('passed','failed','aborted'))
|
{%- set smart_done = (drive.smart_short and drive.smart_short.state in ('passed','failed','aborted'))
|
||||||
or (drive.smart_long and drive.smart_long.state in ('passed','failed','aborted')) %}
|
or (drive.smart_long and drive.smart_long.state in ('passed','failed','aborted')) %}
|
||||||
{%- set can_reset = (bi_done or smart_done) and not bi_active and not short_busy and not long_busy and not pool_locked %}
|
{%- set can_reset = (bi_done or smart_done) and not bi_active and not short_busy and not long_busy and not pool_locked %}
|
||||||
{%- set short_state = drive.smart_short.state if drive.smart_short else 'idle' %}
|
<tr data-status="{{ drive.status }}" id="drive-{{ drive.id }}">
|
||||||
{%- set long_state = drive.smart_long.state if drive.smart_long else 'idle' %}
|
|
||||||
{%- set burnin_state = drive.burnin.state if drive.burnin else '' %}
|
|
||||||
<tr data-status="{{ drive.status }}" id="drive-{{ drive.id }}"
|
|
||||||
data-sort-drive="{{ drive.devname }}"
|
|
||||||
data-sort-serial="{{ (drive.serial or '') | lower }}"
|
|
||||||
data-sort-size="{{ drive.size_bytes or 0 }}"
|
|
||||||
data-sort-temp="{{ drive.temperature_c if drive.temperature_c is not none else '' }}"
|
|
||||||
data-sort-health="{{ {'PASSED': 1, 'WARNING': 2, 'FAILED': 3, 'UNKNOWN': 4}.get(drive.smart_health, 9) }}"
|
|
||||||
data-sort-short="{{ {'running': 1, 'failed': 2, 'aborted': 3, 'passed': 4, 'idle': 5}.get(short_state, 9) }}"
|
|
||||||
data-sort-long="{{ {'running': 1, 'failed': 2, 'aborted': 3, 'passed': 4, 'idle': 5}.get(long_state, 9) }}"
|
|
||||||
data-sort-burnin="{{ {'running': 1, 'queued': 2, 'failed': 3, 'unknown': 4, 'cancelled': 5, 'passed': 6}.get(burnin_state, 9) }}"
|
|
||||||
>
|
|
||||||
<td class="col-check">
|
<td class="col-check">
|
||||||
{%- if selectable %}
|
{%- if selectable %}
|
||||||
<input type="checkbox" class="drive-checkbox" data-drive-id="{{ drive.id }}">
|
<input type="checkbox" class="drive-checkbox" data-drive-id="{{ drive.id }}">
|
||||||
|
|
|
||||||
|
|
@ -1,125 +0,0 @@
|
||||||
"""Verifies _BadblocksProgress translates per-phase badblocks output
|
|
||||||
into a monotonic 0-99% overall progress.
|
|
||||||
|
|
||||||
`badblocks -w` cycles through 4 patterns × {write, verify} = 8 phases.
|
|
||||||
Each phase prints "XX% done" relative to its own 0-100 range. Without
|
|
||||||
this translation the dashboard appeared to "rewind" every ~2 hours
|
|
||||||
when a new phase started — and two drives racing each other could
|
|
||||||
look 4× apart in displayed progress despite identical hardware.
|
|
||||||
|
|
||||||
Run inside the container image so app deps are present.
|
|
||||||
"""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import unittest
|
|
||||||
|
|
||||||
from app.burnin.stages import _BadblocksProgress
|
|
||||||
|
|
||||||
|
|
||||||
class TestBadblocksProgress(unittest.TestCase):
|
|
||||||
|
|
||||||
def test_default_phase_one(self):
|
|
||||||
"""Before any header, treat as start of pattern-1 write."""
|
|
||||||
p = _BadblocksProgress()
|
|
||||||
self.assertEqual(p.phase, 1)
|
|
||||||
self.assertEqual(p.overall_pct, 0)
|
|
||||||
|
|
||||||
def test_pattern_headers_set_phase(self):
|
|
||||||
"""0xaa→1, 0x55→3, 0xff→5, 0x00→7 (write phases)."""
|
|
||||||
p = _BadblocksProgress()
|
|
||||||
for header, want in [
|
|
||||||
("Testing with pattern 0xaa: ", 1),
|
|
||||||
("Testing with pattern 0x55: ", 3),
|
|
||||||
("Testing with pattern 0xff: ", 5),
|
|
||||||
("Testing with pattern 0x00: ", 7),
|
|
||||||
]:
|
|
||||||
p.update(header)
|
|
||||||
self.assertEqual(p.phase, want, f"after {header!r}")
|
|
||||||
|
|
||||||
def test_verify_advances_to_next_phase(self):
|
|
||||||
"""`Reading and comparing` after `Testing with pattern 0x55`
|
|
||||||
(phase 3) advances to phase 4."""
|
|
||||||
p = _BadblocksProgress()
|
|
||||||
p.update("Testing with pattern 0x55: 100.00% done")
|
|
||||||
self.assertEqual(p.phase, 3)
|
|
||||||
p.update("Reading and comparing: 0.00% done")
|
|
||||||
self.assertEqual(p.phase, 4)
|
|
||||||
|
|
||||||
def test_overall_pct_at_phase_boundaries(self):
|
|
||||||
"""Verify the math at each phase boundary: phase N at 100% =
|
|
||||||
N * 12.5% overall (clipped to 99 at the end)."""
|
|
||||||
cases = [
|
|
||||||
(1, 0.0, 0), # start of run
|
|
||||||
(1, 100.0, 12), # 100/800 = 12.5
|
|
||||||
(2, 100.0, 25), # 200/800
|
|
||||||
(4, 100.0, 50), # 400/800
|
|
||||||
(7, 100.0, 87), # 700/800
|
|
||||||
(8, 100.0, 99), # 800/800 → clipped to 99
|
|
||||||
]
|
|
||||||
for phase, phase_pct, want in cases:
|
|
||||||
p = _BadblocksProgress()
|
|
||||||
p.phase = phase
|
|
||||||
p.phase_pct = phase_pct
|
|
||||||
self.assertEqual(
|
|
||||||
p.overall_pct, want,
|
|
||||||
f"phase={phase} phase_pct={phase_pct}",
|
|
||||||
)
|
|
||||||
|
|
||||||
def test_realistic_sequence(self):
|
|
||||||
"""End-to-end: feed a synthetic badblocks output stream and
|
|
||||||
check the overall percent stays monotonically non-decreasing."""
|
|
||||||
lines = [
|
|
||||||
"Testing with pattern 0xaa: ",
|
|
||||||
"10.00% done, 1:00:00 elapsed. (0/0/0 errors)",
|
|
||||||
"50.00% done, 5:00:00 elapsed. (0/0/0 errors)",
|
|
||||||
"99.99% done, 10:00:00 elapsed. (0/0/0 errors)",
|
|
||||||
"Reading and comparing: ",
|
|
||||||
"0.00% done, 10:00:01 elapsed. (0/0/0 errors)",
|
|
||||||
"50.00% done, 12:30:00 elapsed. (0/0/0 errors)",
|
|
||||||
"Testing with pattern 0x55: ",
|
|
||||||
"0.00% done, 15:00:00 elapsed. (0/0/0 errors)",
|
|
||||||
"50.00% done, 17:30:00 elapsed. (0/0/0 errors)",
|
|
||||||
]
|
|
||||||
p = _BadblocksProgress()
|
|
||||||
seen = []
|
|
||||||
for line in lines:
|
|
||||||
p.update(line)
|
|
||||||
seen.append(p.overall_pct)
|
|
||||||
self.assertEqual(
|
|
||||||
seen, sorted(seen),
|
|
||||||
f"progress went backwards: {seen}",
|
|
||||||
)
|
|
||||||
# Sanity: by the time we're halfway through pattern-2 write
|
|
||||||
# (phase 3, 50%), we should report ((3-1)*100 + 50) / 8 = 31%.
|
|
||||||
self.assertEqual(seen[-1], 31)
|
|
||||||
|
|
||||||
def test_drives_at_different_phases_show_different_overall(self):
|
|
||||||
"""The original bug: two drives at the same per-phase 60%
|
|
||||||
but different phases used to look identical (both '60%').
|
|
||||||
Now they correctly diverge."""
|
|
||||||
slow = _BadblocksProgress()
|
|
||||||
slow.update("Testing with pattern 0xaa: ")
|
|
||||||
slow.update("60.00% done")
|
|
||||||
|
|
||||||
fast = _BadblocksProgress()
|
|
||||||
fast.update("Testing with pattern 0xaa: ")
|
|
||||||
fast.update("99.99% done")
|
|
||||||
fast.update("Reading and comparing: ")
|
|
||||||
fast.update("60.00% done")
|
|
||||||
|
|
||||||
# slow: 60/800 = 7%; fast: (1*100 + 60)/800 = 20%
|
|
||||||
self.assertEqual(slow.overall_pct, 7)
|
|
||||||
self.assertEqual(fast.overall_pct, 20)
|
|
||||||
|
|
||||||
def test_unknown_pattern_does_not_crash(self):
|
|
||||||
"""An unrecognized pattern (e.g. badblocks future versions or
|
|
||||||
custom patterns) just leaves phase unchanged."""
|
|
||||||
p = _BadblocksProgress()
|
|
||||||
p.update("Testing with pattern 0xab: ")
|
|
||||||
# phase stays at the default 1
|
|
||||||
self.assertEqual(p.phase, 1)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
unittest.main()
|
|
||||||
|
|
@ -1,100 +0,0 @@
|
||||||
"""Verifies _update_stage_bb_phase actually writes to burnin_stages
|
|
||||||
and the migration adds the columns idempotently.
|
|
||||||
|
|
||||||
The drive-drawer's 4-meter UI depends on these columns being populated
|
|
||||||
on every parser tick. If a future refactor drops the call or breaks
|
|
||||||
the migration, this test catches it before users see the meters
|
|
||||||
go blank.
|
|
||||||
|
|
||||||
Run inside the container image so app deps are present.
|
|
||||||
"""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import os
|
|
||||||
import tempfile
|
|
||||||
import unittest
|
|
||||||
|
|
||||||
import aiosqlite
|
|
||||||
|
|
||||||
|
|
||||||
async def _setup_db_with_stage() -> str:
|
|
||||||
fd, path = tempfile.mkstemp(suffix=".db")
|
|
||||||
os.close(fd)
|
|
||||||
from app.config import settings
|
|
||||||
settings.db_path = path
|
|
||||||
|
|
||||||
from app.database import init_db
|
|
||||||
await init_db()
|
|
||||||
|
|
||||||
async with aiosqlite.connect(path) as db:
|
|
||||||
await db.execute(
|
|
||||||
"INSERT INTO drives "
|
|
||||||
"(truenas_disk_id, devname, serial, model, size_bytes, "
|
|
||||||
" temperature_c, smart_health, last_seen_at, last_polled_at) "
|
|
||||||
"VALUES ('id-1', 'sda', 'SER1', 'TestModel', 14000000000000, "
|
|
||||||
" 30, 'PASSED', '2026-05-09T00:00:00+00:00', "
|
|
||||||
" '2026-05-09T00:00:00+00:00')"
|
|
||||||
)
|
|
||||||
await db.execute(
|
|
||||||
"INSERT INTO burnin_jobs "
|
|
||||||
"(drive_id, profile, state, operator, created_at) "
|
|
||||||
"VALUES (1, 'surface', 'running', 'op', "
|
|
||||||
" '2026-05-09T00:00:00+00:00')"
|
|
||||||
)
|
|
||||||
await db.execute(
|
|
||||||
"INSERT INTO burnin_stages "
|
|
||||||
"(burnin_job_id, stage_name, state) "
|
|
||||||
"VALUES (1, 'surface_validate', 'running')"
|
|
||||||
)
|
|
||||||
await db.commit()
|
|
||||||
return path
|
|
||||||
|
|
||||||
|
|
||||||
class TestBBPhasePersistence(unittest.IsolatedAsyncioTestCase):
|
|
||||||
|
|
||||||
async def asyncSetUp(self):
|
|
||||||
self.path = await _setup_db_with_stage()
|
|
||||||
|
|
||||||
async def asyncTearDown(self):
|
|
||||||
try:
|
|
||||||
os.unlink(self.path)
|
|
||||||
except OSError:
|
|
||||||
pass
|
|
||||||
|
|
||||||
async def test_columns_exist_after_init(self):
|
|
||||||
async with aiosqlite.connect(self.path) as db:
|
|
||||||
cur = await db.execute("PRAGMA table_info(burnin_stages)")
|
|
||||||
cols = {r[1] for r in await cur.fetchall()}
|
|
||||||
self.assertIn("bb_phase", cols)
|
|
||||||
self.assertIn("bb_phase_pct", cols)
|
|
||||||
|
|
||||||
async def test_update_writes_phase_and_pct(self):
|
|
||||||
from app.burnin._common import _update_stage_bb_phase
|
|
||||||
await _update_stage_bb_phase(1, "surface_validate", 3, 47.5)
|
|
||||||
async with aiosqlite.connect(self.path) as db:
|
|
||||||
cur = await db.execute(
|
|
||||||
"SELECT bb_phase, bb_phase_pct FROM burnin_stages "
|
|
||||||
"WHERE burnin_job_id=1 AND stage_name='surface_validate'"
|
|
||||||
)
|
|
||||||
row = await cur.fetchone()
|
|
||||||
self.assertEqual(row[0], 3)
|
|
||||||
self.assertAlmostEqual(row[1], 47.5)
|
|
||||||
|
|
||||||
async def test_update_overwrites(self):
|
|
||||||
"""Each tick should replace the previous value, not accumulate."""
|
|
||||||
from app.burnin._common import _update_stage_bb_phase
|
|
||||||
await _update_stage_bb_phase(1, "surface_validate", 1, 10.0)
|
|
||||||
await _update_stage_bb_phase(1, "surface_validate", 2, 80.0)
|
|
||||||
async with aiosqlite.connect(self.path) as db:
|
|
||||||
cur = await db.execute(
|
|
||||||
"SELECT bb_phase, bb_phase_pct FROM burnin_stages "
|
|
||||||
"WHERE burnin_job_id=1 AND stage_name='surface_validate'"
|
|
||||||
)
|
|
||||||
row = await cur.fetchone()
|
|
||||||
self.assertEqual(row[0], 2)
|
|
||||||
self.assertAlmostEqual(row[1], 80.0)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
unittest.main()
|
|
||||||
Loading…
Add table
Reference in a new issue