nas-burnin/app/burnin
Brandon Walter 149f2901b7
Some checks are pending
Security scan / pip-audit (push) Waiting to run
Security scan / bandit (push) Waiting to run
Security scan / gitleaks (push) Waiting to run
Security scan / mypy (push) Waiting to run
fix: throttle ALL drain-loop DB calls + drop progress noise from log (1.0.0-54)
1.0.0-52 throttled the percent/bb_phase writes but missed:

- `_is_cancelled` ran a DB query on EVERY stderr line (sub-second
  cadence × 4 concurrent burn-ins = ~10+ DB connection opens/s)
- `_append_stage_log` ran every 20 output_lines (~once per second)
  doing a quadratic `log_text || ?` concat that gets multi-MB
  rewrites as the log grows
- `_recalculate_progress` + `_push_update` also fired per gated tick

Cumulative load kept the asyncssh drain coroutine too busy to
consume the SSH channel buffer; SSH window stalled; sshd stopped
reading the pipe; badblocks blocked on pipe_write with state=S
wchan=pipe_write. /sys/block sectors_written delta confirmed
0 disk I/O across all running drives despite 23h elapsed.

Fix:
1. Single throttle gate (BB_DB_MIN_SECONDS=5s) covers EVERY DB
   touch in the drain — cancel check, percent/phase/bb_count
   updates, throughput sample, log flush, recalc, SSE push.
   Phase transitions still bypass the throttle (rare + important).
2. Exclude "XX% done" lines from the log entirely. They were the
   dominant volume; meaningful content (pattern headers, errors,
   bad-block numbers) still gets logged via the throttled flush.
3. log_text concat still quadratic but the volume reduction makes
   it tractable — buffer to pending_log_chunks, flush on the gate.

Net effect: ~99% reduction in drain-loop DB load. asyncssh drain
keeps up; pipe drains; badblocks writes; disk goes brr.
2026-05-11 22:07:39 -07:00
..
__init__.py fix: cancel-mid-stage marks job 'unknown' not 'failed' (1.0.0-51) 2026-05-09 12:32:46 -07:00
_common.py feat: phase caption + bad-block badge + per-pattern history (1.0.0-47) 2026-05-08 23:23:02 -07:00
kill.py refactor: split burnin.py into a package — extract unlock + kill (1.0.0-30) 2026-05-03 00:44:28 -04:00
stages.py fix: throttle ALL drain-loop DB calls + drop progress noise from log (1.0.0-54) 2026-05-11 22:07:39 -07:00
unlock.py refactor: split burnin.py into a package — extract unlock + kill (1.0.0-30) 2026-05-03 00:44:28 -04:00