nas-burnin/app
Brandon Walter c5a41d0260
Some checks are pending
Security scan / pip-audit (push) Waiting to run
Security scan / bandit (push) Waiting to run
Security scan / gitleaks (push) Waiting to run
Security scan / mypy (push) Waiting to run
fix: throttle badblocks parser DB writes (1.0.0-52)
User reported sdb showing 134-day ETA. Investigation: badblocks
processes all stuck in pipe_write wchan, iostat showing 0 throughput
across all drives despite badblocks supposedly running.

Root cause: each progress line was triggering 4-6 DB transactions
(_update_stage_percent, _update_stage_bb_phase, _update_stage_bad_blocks,
_update_stage_bb_mbps, _record_bb_phase_start, _recalculate_progress).
With 4 concurrent burn-ins and sub-second progress lines, the
asyncssh _drain couldn't keep up. Drain fell behind → asyncssh
channel buffer filled → SSH window stopped advancing → sshd stopped
reading from badblocks's stdout pipe → pipe filled → badblocks
blocked on pipe_write() → no more disk I/O.

That regression came in across 1.0.0-44 → 1.0.0-47 as I added each
new persisted field. The previous per-line write path worked when
there was only one DB call; it doesn't with five-plus.

Fix: BB_DB_MIN_SECONDS=5 throttle on the DB-write path. The drain
loop still consumes every progress line (so the pipe drains
continuously), but commits to DB at most once every 5 seconds.
Phase transitions always commit immediately (rare and important —
they stamp bb_phase_history and advance the per-pattern meter).

UI impact: minimal — drawer polls drives every ~12s anyway, so the
displayed % was already at 12s resolution. The meter strip just
won't sub-tick within a 5s window.

DB load impact: 60-80x reduction during surface_validate.
2026-05-10 22:12:02 -07:00
..
burnin fix: throttle badblocks parser DB writes (1.0.0-52) 2026-05-10 22:12:02 -07:00
routes feat: prominent failure-reason block + heuristic in drawer (1.0.0-50) 2026-05-09 12:06:11 -07:00
static feat: prominent failure-reason block + heuristic in drawer (1.0.0-50) 2026-05-09 12:06:11 -07:00
templates fix: drop redundant stage suffix from Burn-In failed chip 2026-05-09 12:33:26 -07:00
__init__.py Initial commit — TrueNAS Burn-In Dashboard v0.5.0 2026-02-24 00:08:29 -05:00
auth.py feat: rate limiter + mypy + lifecycle tests + routes/ split (1.0.0-33/-34) 2026-05-03 09:29:53 -04:00
auth_cli.py infra: rename truenas-burnin → nas-burnin (1.0.0-41) 2026-05-04 07:16:02 -07:00
config.py fix: throttle badblocks parser DB writes (1.0.0-52) 2026-05-10 22:12:02 -07:00
database.py feat: phase caption + bad-block badge + per-pattern history (1.0.0-47) 2026-05-08 23:23:02 -07:00
logging_config.py Initial commit — TrueNAS Burn-In Dashboard v0.5.0 2026-02-24 00:08:29 -05:00
mailer.py fix: annotate to mypy-clean + promote to gating (1.0.0-40) 2026-05-03 21:21:55 -07:00
main.py rename: TrueNAS Burn-In → NAS Burn-In (1.0.0-38) 2026-05-03 14:01:40 -04:00
models.py feat: pool-membership lock + cancellation hardening + smart_health refresh + tunables (1.0.0-13 -> 1.0.0-21) 2026-05-02 09:25:56 -04:00
notifier.py Stage 7: SSH architecture, SMART attribute monitoring, drive reset, and polish 2026-02-24 08:09:30 -05:00
poller.py fix: address Codex audit findings (1.0.0-28) 2026-05-02 18:48:16 -04:00
renderer.py Stage 7: SSH architecture, SMART attribute monitoring, drive reset, and polish 2026-02-24 08:09:30 -05:00
retention.py fix: annotate to mypy-clean + promote to gating (1.0.0-40) 2026-05-03 21:21:55 -07:00
settings_store.py fix: annotate to mypy-clean + promote to gating (1.0.0-40) 2026-05-03 21:21:55 -07:00
ssh_client.py fix: live pool re-check before start_job + drop dead run_badblocks (1.0.0-29) 2026-05-02 21:29:11 -04:00
terminal.py chore: re-sync deployed work that pre-dates this session 2026-05-02 09:24:42 -04:00
truenas.py chore: dev-experience + mypy noise cleanup 2026-05-03 21:11:23 -07:00