nas-burnin/app
Brandon Walter 149f2901b7
Some checks are pending
Security scan / pip-audit (push) Waiting to run
Security scan / bandit (push) Waiting to run
Security scan / gitleaks (push) Waiting to run
Security scan / mypy (push) Waiting to run
fix: throttle ALL drain-loop DB calls + drop progress noise from log (1.0.0-54)
1.0.0-52 throttled the percent/bb_phase writes but missed:

- `_is_cancelled` ran a DB query on EVERY stderr line (sub-second
  cadence × 4 concurrent burn-ins = ~10+ DB connection opens/s)
- `_append_stage_log` ran every 20 output_lines (~once per second)
  doing a quadratic `log_text || ?` concat that gets multi-MB
  rewrites as the log grows
- `_recalculate_progress` + `_push_update` also fired per gated tick

Cumulative load kept the asyncssh drain coroutine too busy to
consume the SSH channel buffer; SSH window stalled; sshd stopped
reading the pipe; badblocks blocked on pipe_write with state=S
wchan=pipe_write. /sys/block sectors_written delta confirmed
0 disk I/O across all running drives despite 23h elapsed.

Fix:
1. Single throttle gate (BB_DB_MIN_SECONDS=5s) covers EVERY DB
   touch in the drain — cancel check, percent/phase/bb_count
   updates, throughput sample, log flush, recalc, SSE push.
   Phase transitions still bypass the throttle (rare + important).
2. Exclude "XX% done" lines from the log entirely. They were the
   dominant volume; meaningful content (pattern headers, errors,
   bad-block numbers) still gets logged via the throttled flush.
3. log_text concat still quadratic but the volume reduction makes
   it tractable — buffer to pending_log_chunks, flush on the gate.

Net effect: ~99% reduction in drain-loop DB load. asyncssh drain
keeps up; pipe drains; badblocks writes; disk goes brr.
2026-05-11 22:07:39 -07:00
..
burnin fix: throttle ALL drain-loop DB calls + drop progress noise from log (1.0.0-54) 2026-05-11 22:07:39 -07:00
routes feat: prominent failure-reason block + heuristic in drawer (1.0.0-50) 2026-05-09 12:06:11 -07:00
static feat: job-level Est. completion in drawer header (1.0.0-53) 2026-05-10 22:45:04 -07:00
templates fix: drop redundant stage suffix from Burn-In failed chip 2026-05-09 12:33:26 -07:00
__init__.py Initial commit — TrueNAS Burn-In Dashboard v0.5.0 2026-02-24 00:08:29 -05:00
auth.py feat: rate limiter + mypy + lifecycle tests + routes/ split (1.0.0-33/-34) 2026-05-03 09:29:53 -04:00
auth_cli.py infra: rename truenas-burnin → nas-burnin (1.0.0-41) 2026-05-04 07:16:02 -07:00
config.py fix: throttle ALL drain-loop DB calls + drop progress noise from log (1.0.0-54) 2026-05-11 22:07:39 -07:00
database.py feat: phase caption + bad-block badge + per-pattern history (1.0.0-47) 2026-05-08 23:23:02 -07:00
logging_config.py Initial commit — TrueNAS Burn-In Dashboard v0.5.0 2026-02-24 00:08:29 -05:00
mailer.py fix: annotate to mypy-clean + promote to gating (1.0.0-40) 2026-05-03 21:21:55 -07:00
main.py rename: TrueNAS Burn-In → NAS Burn-In (1.0.0-38) 2026-05-03 14:01:40 -04:00
models.py feat: pool-membership lock + cancellation hardening + smart_health refresh + tunables (1.0.0-13 -> 1.0.0-21) 2026-05-02 09:25:56 -04:00
notifier.py Stage 7: SSH architecture, SMART attribute monitoring, drive reset, and polish 2026-02-24 08:09:30 -05:00
poller.py fix: address Codex audit findings (1.0.0-28) 2026-05-02 18:48:16 -04:00
renderer.py Stage 7: SSH architecture, SMART attribute monitoring, drive reset, and polish 2026-02-24 08:09:30 -05:00
retention.py fix: annotate to mypy-clean + promote to gating (1.0.0-40) 2026-05-03 21:21:55 -07:00
settings_store.py fix: annotate to mypy-clean + promote to gating (1.0.0-40) 2026-05-03 21:21:55 -07:00
ssh_client.py fix: live pool re-check before start_job + drop dead run_badblocks (1.0.0-29) 2026-05-02 21:29:11 -04:00
terminal.py chore: re-sync deployed work that pre-dates this session 2026-05-02 09:24:42 -04:00
truenas.py chore: dev-experience + mypy noise cleanup 2026-05-03 21:11:23 -07:00