nas-burnin/app/burnin
Brandon Walter c5a41d0260
Some checks are pending
Security scan / pip-audit (push) Waiting to run
Security scan / bandit (push) Waiting to run
Security scan / gitleaks (push) Waiting to run
Security scan / mypy (push) Waiting to run
fix: throttle badblocks parser DB writes (1.0.0-52)
User reported sdb showing 134-day ETA. Investigation: badblocks
processes all stuck in pipe_write wchan, iostat showing 0 throughput
across all drives despite badblocks supposedly running.

Root cause: each progress line was triggering 4-6 DB transactions
(_update_stage_percent, _update_stage_bb_phase, _update_stage_bad_blocks,
_update_stage_bb_mbps, _record_bb_phase_start, _recalculate_progress).
With 4 concurrent burn-ins and sub-second progress lines, the
asyncssh _drain couldn't keep up. Drain fell behind → asyncssh
channel buffer filled → SSH window stopped advancing → sshd stopped
reading from badblocks's stdout pipe → pipe filled → badblocks
blocked on pipe_write() → no more disk I/O.

That regression came in across 1.0.0-44 → 1.0.0-47 as I added each
new persisted field. The previous per-line write path worked when
there was only one DB call; it doesn't with five-plus.

Fix: BB_DB_MIN_SECONDS=5 throttle on the DB-write path. The drain
loop still consumes every progress line (so the pipe drains
continuously), but commits to DB at most once every 5 seconds.
Phase transitions always commit immediately (rare and important —
they stamp bb_phase_history and advance the per-pattern meter).

UI impact: minimal — drawer polls drives every ~12s anyway, so the
displayed % was already at 12s resolution. The meter strip just
won't sub-tick within a 5s window.

DB load impact: 60-80x reduction during surface_validate.
2026-05-10 22:12:02 -07:00
..
__init__.py fix: cancel-mid-stage marks job 'unknown' not 'failed' (1.0.0-51) 2026-05-09 12:32:46 -07:00
_common.py feat: phase caption + bad-block badge + per-pattern history (1.0.0-47) 2026-05-08 23:23:02 -07:00
kill.py refactor: split burnin.py into a package — extract unlock + kill (1.0.0-30) 2026-05-03 00:44:28 -04:00
stages.py fix: throttle badblocks parser DB writes (1.0.0-52) 2026-05-10 22:12:02 -07:00
unlock.py refactor: split burnin.py into a package — extract unlock + kill (1.0.0-30) 2026-05-03 00:44:28 -04:00