nas-burnin/app
Brandon Walter ec636f8f3a
Some checks failed
Security scan / pip-audit (push) Has been cancelled
Security scan / bandit (push) Has been cancelled
Security scan / gitleaks (push) Has been cancelled
Security scan / mypy (push) Has been cancelled
fix: PRAGMA busy_timeout on every SQLite connection (1.0.0-60)
Jobs 60-63 ran healthy for 16h then all 4 died simultaneously with
'database is locked'. The burnin drain used _db() which set
busy_timeout=10000, but:

1. 10s was sometimes too short under heavy contention (4 burn-in
   drains writing every 5s + poller every 12s + retention scan +
   auth + lifespan = many concurrent writers).
2. OTHER aiosqlite.connect() sites (poller, retention, auth, mailer,
   routes/__init__'s SSE, burnin/__init__.py's various helpers,
   database.get_db) didn't set busy_timeout at all. Without it,
   SQLite raises 'database is locked' INSTANTLY on any contention,
   which forced concurrency back onto the drain's connection.

Fix:
- _db() busy_timeout 10000 → 60000 (60s; aggressive but right for
  this workload — brief contention spikes are normal and waiting
  beats failing).
- PRAGMA busy_timeout=60000 added on every aiosqlite.connect() site
  next to the existing PRAGMA calls. Applied via a small Python
  pass that preserves the original variable name (db / _tdb / src
  / dst etc.) and indentation.

Same restart sequence applied: rebuild container, reset 4 drives,
relaunch via loopback bypass. Jobs 64-67 are now running.

This is auto-restart #2 in 24h. Safety brake at 3.
2026-05-14 06:39:33 -04:00
..
burnin fix: PRAGMA busy_timeout on every SQLite connection (1.0.0-60) 2026-05-14 06:39:33 -04:00
routes fix: PRAGMA busy_timeout on every SQLite connection (1.0.0-60) 2026-05-14 06:39:33 -04:00
static fix: backport stages.py \b-parser fix + drawer-finish inline (uncommitted from 1.0.0-55) 2026-05-12 07:53:33 -07:00
templates fix: drop redundant stage suffix from Burn-In failed chip 2026-05-09 12:33:26 -07:00
__init__.py Initial commit — TrueNAS Burn-In Dashboard v0.5.0 2026-02-24 00:08:29 -05:00
auth.py feat: loopback auth bypass for autonomous monitor (1.0.0-56) 2026-05-12 07:52:20 -07:00
auth_cli.py infra: rename truenas-burnin → nas-burnin (1.0.0-41) 2026-05-04 07:16:02 -07:00
config.py fix: PRAGMA busy_timeout on every SQLite connection (1.0.0-60) 2026-05-14 06:39:33 -04:00
database.py fix: PRAGMA busy_timeout on every SQLite connection (1.0.0-60) 2026-05-14 06:39:33 -04:00
logging_config.py Initial commit — TrueNAS Burn-In Dashboard v0.5.0 2026-02-24 00:08:29 -05:00
mailer.py fix: PRAGMA busy_timeout on every SQLite connection (1.0.0-60) 2026-05-14 06:39:33 -04:00
main.py feat: loopback auth bypass for autonomous monitor (1.0.0-56) 2026-05-12 07:52:20 -07:00
models.py feat: pool-membership lock + cancellation hardening + smart_health refresh + tunables (1.0.0-13 -> 1.0.0-21) 2026-05-02 09:25:56 -04:00
notifier.py Stage 7: SSH architecture, SMART attribute monitoring, drive reset, and polish 2026-02-24 08:09:30 -05:00
poller.py fix: PRAGMA busy_timeout on every SQLite connection (1.0.0-60) 2026-05-14 06:39:33 -04:00
renderer.py Stage 7: SSH architecture, SMART attribute monitoring, drive reset, and polish 2026-02-24 08:09:30 -05:00
retention.py fix: annotate to mypy-clean + promote to gating (1.0.0-40) 2026-05-03 21:21:55 -07:00
settings_store.py fix: annotate to mypy-clean + promote to gating (1.0.0-40) 2026-05-03 21:21:55 -07:00
ssh_client.py fix: live pool re-check before start_job + drop dead run_badblocks (1.0.0-29) 2026-05-02 21:29:11 -04:00
terminal.py chore: re-sync deployed work that pre-dates this session 2026-05-02 09:24:42 -04:00
truenas.py chore: dev-experience + mypy noise cleanup 2026-05-03 21:11:23 -07:00