nas-burnin/claude-sandbox/truenas-burnin/app/notifier.py
echoparkbaby 3e0000528f TrueNAS Burn-In Dashboard v0.9.0 — Live mode, thermal monitoring, adaptive concurrency
Go live against real TrueNAS SCALE 25.10:
- Remove mock-truenas dependency; mount SSH key as Docker secret
- Filter expired disk records from /api/v2.0/disk (expiretime field)
- Route all SMART operations through SSH (SCALE 25.10 removed REST smart/test endpoint)
- Poll drive temperatures via POST /api/v2.0/disk/temperatures (SCALE-specific)
- Store raw smartctl output in smart_tests.raw_output for proof of test execution
- Fix percent-remaining=0 false jump to 100% on test start
- Fix terminal WebSocket: add mounted key file fallback (/run/secrets/ssh_key)
- Fix WebSocket support: uvicorn → uvicorn[standard] (installs websockets)

HBA/system sensor temps on dashboard:
- SSH to TrueNAS and run sensors -j each poll cycle
- Parse coretemp (CPU package) and pch_* (PCH/chipset — storage I/O proxy)
- Render as compact chips in stats bar, color-coded green/yellow/red
- Live updates via new SSE system-sensors event every 12s

Adaptive concurrency signal:
- Thermal pressure indicator in stats bar: hidden when OK, WARM/HOT when running
  burn-in drives hit temp_warn_c / temp_crit_c thresholds
- Thermal gate in burn-in queue: jobs wait up to 3 min before acquiring semaphore
  slot if running drives are already at warning temp; times out and proceeds

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 06:33:36 -05:00

84 lines
2.5 KiB
Python

"""
Notification dispatcher — webhooks and immediate email alerts.
Called from burnin.py when a job reaches a terminal state (passed/failed).
Webhook fires unconditionally when WEBHOOK_URL is set.
Email alerts fire based on smtp_alert_on_fail / smtp_alert_on_pass settings.
"""
import asyncio
import logging
from app.config import settings
log = logging.getLogger(__name__)
async def notify_job_complete(
job_id: int,
devname: str,
serial: str | None,
model: str | None,
state: str,
profile: str,
operator: str,
error_text: str | None,
bad_blocks: int = 0,
) -> None:
"""Fire all configured notifications for a completed burn-in job."""
from datetime import datetime, timezone
tasks = []
if settings.webhook_url:
tasks.append(_send_webhook({
"event": f"burnin_{state}",
"job_id": job_id,
"devname": devname,
"serial": serial,
"model": model,
"state": state,
"profile": profile,
"operator": operator,
"error_text": error_text,
"bad_blocks": bad_blocks,
"timestamp": datetime.now(timezone.utc).isoformat(),
}))
if settings.smtp_host:
should_alert = (
(state == "failed" and settings.smtp_alert_on_fail) or
(state == "passed" and settings.smtp_alert_on_pass)
)
if should_alert:
tasks.append(_send_alert_email(job_id, devname, serial, model, state, error_text))
if not tasks:
return
results = await asyncio.gather(*tasks, return_exceptions=True)
for r in results:
if isinstance(r, Exception):
log.error("Notification failed: %s", r, extra={"job_id": job_id, "devname": devname})
async def _send_webhook(payload: dict) -> None:
import httpx
async with httpx.AsyncClient(timeout=10.0) as client:
r = await client.post(settings.webhook_url, json=payload)
r.raise_for_status()
log.info(
"Webhook sent",
extra={"event": payload.get("event"), "job_id": payload.get("job_id"), "url": settings.webhook_url},
)
async def _send_alert_email(
job_id: int,
devname: str,
serial: str | None,
model: str | None,
state: str,
error_text: str | None,
) -> None:
from app import mailer
await mailer.send_job_alert(job_id, devname, serial, model, state, error_text)