nas-burnin

History

Brandon Walter 1bc1b378ab Some checks are pending Security scan / pip-audit (push) Waiting to run Details Security scan / bandit (push) Waiting to run Details Security scan / gitleaks (push) Waiting to run Details Security scan / mypy (push) Waiting to run Details fix: cancel-mid-stage marks job 'unknown' not 'failed' (1.0.0-51) Container restarts (uvicorn shutdown / 'docker compose up -d') were silently classifying running burn-ins as 'failed' with empty error_text. Two reasons converged: 1. _stage_surface_validate_ssh caught asyncio.CancelledError at the stage level and returned False, swallowing the cancel signal. 2. _run_job's outer CancelledError handler then never fired, so was_cancelled stayed False and the job got marked 'failed' (the "burn-in itself failed" classification) instead of 'unknown' (the honest "we don't know whether it would have passed"). Fix: - Stage now does best-effort kill of remote badblocks (shielded so loop shutdown doesn't interrupt the kill), appends an [ABORTED] marker to the log, and re-raises CancelledError. _execute_stages doesn't catch it (CancelledError is BaseException, not Exception in 3.8+) so it propagates up to _run_job. - _run_job's existing CancelledError handler now also reconciles any stage rows still recorded as 'running' by setting them to 'unknown' with a clear error_text: "Task cancelled mid-run — likely container restart or shutdown". The job's error_text gets the same message so the drawer's Reason block has something specific to display, instead of falling back to the heuristic. Future container restarts on running burn-ins will now show as yellow "UNKNOWN" with the explicit cancel reason, matching the existing behaviour of check_stuck_jobs() for stuck timeouts.		2026-05-09 12:32:46 -07:00
..
__init__.py	fix: cancel-mid-stage marks job 'unknown' not 'failed' (1.0.0-51)	2026-05-09 12:32:46 -07:00
_common.py	feat: phase caption + bad-block badge + per-pattern history (1.0.0-47)	2026-05-08 23:23:02 -07:00
kill.py	refactor: split burnin.py into a package — extract unlock + kill (1.0.0-30)	2026-05-03 00:44:28 -04:00
stages.py	fix: cancel-mid-stage marks job 'unknown' not 'failed' (1.0.0-51)	2026-05-09 12:32:46 -07:00
unlock.py	refactor: split burnin.py into a package — extract unlock + kill (1.0.0-30)	2026-05-03 00:44:28 -04:00