nas-burnin/app/routes
Brandon Walter ec636f8f3a
Some checks failed
Security scan / pip-audit (push) Has been cancelled
Security scan / bandit (push) Has been cancelled
Security scan / gitleaks (push) Has been cancelled
Security scan / mypy (push) Has been cancelled
fix: PRAGMA busy_timeout on every SQLite connection (1.0.0-60)
Jobs 60-63 ran healthy for 16h then all 4 died simultaneously with
'database is locked'. The burnin drain used _db() which set
busy_timeout=10000, but:

1. 10s was sometimes too short under heavy contention (4 burn-in
   drains writing every 5s + poller every 12s + retention scan +
   auth + lifespan = many concurrent writers).
2. OTHER aiosqlite.connect() sites (poller, retention, auth, mailer,
   routes/__init__'s SSE, burnin/__init__.py's various helpers,
   database.get_db) didn't set busy_timeout at all. Without it,
   SQLite raises 'database is locked' INSTANTLY on any contention,
   which forced concurrency back onto the drain's connection.

Fix:
- _db() busy_timeout 10000 → 60000 (60s; aggressive but right for
  this workload — brief contention spikes are normal and waiting
  beats failing).
- PRAGMA busy_timeout=60000 added on every aiosqlite.connect() site
  next to the existing PRAGMA calls. Applied via a small Python
  pass that preserves the original variable name (db / _tdb / src
  / dst etc.) and indentation.

Same restart sequence applied: rebuild container, reset 4 drives,
relaunch via loopback bypass. Jobs 64-67 are now running.

This is auto-restart #2 in 24h. Safety brake at 3.
2026-05-14 06:39:33 -04:00
..
__init__.py fix: PRAGMA busy_timeout on every SQLite connection (1.0.0-60) 2026-05-14 06:39:33 -04:00
_drives_helpers.py fix: SMART overlay shows terminal states + reconciles orphans (1.0.0-49) 2026-05-09 11:46:45 -07:00
_helpers.py feat: rate limiter + mypy + lifecycle tests + routes/ split (1.0.0-33/-34) 2026-05-03 09:29:53 -04:00
audit.py refactor: extract history + audit + stats + report routes (1.0.0-35) 2026-05-03 09:44:22 -04:00
auth.py feat: rate limiter + mypy + lifecycle tests + routes/ split (1.0.0-33/-34) 2026-05-03 09:29:53 -04:00
burnin.py refactor: extract drives + burnin routes (1.0.0-37) 2026-05-03 09:59:15 -04:00
drives.py feat: prominent failure-reason block + heuristic in drawer (1.0.0-50) 2026-05-09 12:06:11 -07:00
history.py refactor: extract history + audit + stats + report routes (1.0.0-35) 2026-05-03 09:44:22 -04:00
report.py refactor: extract history + audit + stats + report routes (1.0.0-35) 2026-05-03 09:44:22 -04:00
settings.py refactor: extract settings routes (1.0.0-36) 2026-05-03 09:48:24 -04:00
stats.py refactor: extract history + audit + stats + report routes (1.0.0-35) 2026-05-03 09:44:22 -04:00
system.py infra: rename truenas-burnin → nas-burnin (1.0.0-41) 2026-05-04 07:16:02 -07:00