nas-burnin

History

Brandon Walter ec636f8f3a Some checks failed Security scan / pip-audit (push) Has been cancelled Details Security scan / bandit (push) Has been cancelled Details Security scan / gitleaks (push) Has been cancelled Details Security scan / mypy (push) Has been cancelled Details fix: PRAGMA busy_timeout on every SQLite connection (1.0.0-60) Jobs 60-63 ran healthy for 16h then all 4 died simultaneously with 'database is locked'. The burnin drain used _db() which set busy_timeout=10000, but: 1. 10s was sometimes too short under heavy contention (4 burn-in drains writing every 5s + poller every 12s + retention scan + auth + lifespan = many concurrent writers). 2. OTHER aiosqlite.connect() sites (poller, retention, auth, mailer, routes/__init__'s SSE, burnin/__init__.py's various helpers, database.get_db) didn't set busy_timeout at all. Without it, SQLite raises 'database is locked' INSTANTLY on any contention, which forced concurrency back onto the drain's connection. Fix: - _db() busy_timeout 10000 → 60000 (60s; aggressive but right for this workload — brief contention spikes are normal and waiting beats failing). - PRAGMA busy_timeout=60000 added on every aiosqlite.connect() site next to the existing PRAGMA calls. Applied via a small Python pass that preserves the original variable name (db / _tdb / src / dst etc.) and indentation. Same restart sequence applied: rebuild container, reset 4 drives, relaunch via loopback bypass. Jobs 64-67 are now running. This is auto-restart #2 in 24h. Safety brake at 3.		2026-05-14 06:39:33 -04:00
..
__init__.py	fix: PRAGMA busy_timeout on every SQLite connection (1.0.0-60)	2026-05-14 06:39:33 -04:00
_drives_helpers.py	fix: SMART overlay shows terminal states + reconciles orphans (1.0.0-49)	2026-05-09 11:46:45 -07:00
_helpers.py	feat: rate limiter + mypy + lifecycle tests + routes/ split (1.0.0-33/-34)	2026-05-03 09:29:53 -04:00
audit.py	refactor: extract history + audit + stats + report routes (1.0.0-35)	2026-05-03 09:44:22 -04:00
auth.py	feat: rate limiter + mypy + lifecycle tests + routes/ split (1.0.0-33/-34)	2026-05-03 09:29:53 -04:00
burnin.py	refactor: extract drives + burnin routes (1.0.0-37)	2026-05-03 09:59:15 -04:00
drives.py	feat: prominent failure-reason block + heuristic in drawer (1.0.0-50)	2026-05-09 12:06:11 -07:00
history.py	refactor: extract history + audit + stats + report routes (1.0.0-35)	2026-05-03 09:44:22 -04:00
report.py	refactor: extract history + audit + stats + report routes (1.0.0-35)	2026-05-03 09:44:22 -04:00
settings.py	refactor: extract settings routes (1.0.0-36)	2026-05-03 09:48:24 -04:00
stats.py	refactor: extract history + audit + stats + report routes (1.0.0-35)	2026-05-03 09:44:22 -04:00
system.py	infra: rename truenas-burnin → nas-burnin (1.0.0-41)	2026-05-04 07:16:02 -07:00