truenas-burnin/app/settings_store.py
Brandon Walter 2dff58bd52 Stage 7: SSH architecture, SMART attribute monitoring, drive reset, and polish
SSH (app/ssh_client.py — new):
- asyncssh-based client: start_smart_test, poll_smart_progress, abort_smart_test,
  get_smart_attributes, run_badblocks with streaming progress callbacks
- SMART attribute table: monitors attrs 5/10/188/197/198/199 for warn/fail thresholds
- Falls back to REST API / mock simulation when ssh_host is not configured

Burn-in stages updated (burnin.py):
- _stage_smart_test: SSH path polls smartctl -a, stores raw output + parsed attributes
- _stage_surface_validate: SSH path streams badblocks, counts bad blocks vs configurable threshold
- _stage_final_check: SSH path checks smartctl attributes; DB fallback for mock mode
- New DB helpers: _append_stage_log, _update_stage_bad_blocks, _store_smart_attrs,
  _store_smart_raw_output

Database (database.py):
- Migrations: burnin_stages.log_text, burnin_stages.bad_blocks,
  drives.smart_attrs (JSON), smart_tests.raw_output

Settings (config.py + settings_store.py):
- ssh_host, ssh_port, ssh_user, ssh_password, ssh_key — all runtime-editable
- SSH section in Settings UI with Test SSH Connection button

Webhook (notifier.py):
- Added bad_blocks and timestamp fields to payload per SPEC

Drive reset (routes.py + drives_table.html):
- POST /api/v1/drives/{id}/reset — clears SMART state, smart_attrs; audit logged
- Reset button visible on drives with completed test state (no active burn-in)

Log drawer (app.js):
- Burn-In tab: shows raw stage log_text (SSH output) with bad block highlighting
- SMART tab: shows SMART attribute table with warn/fail colouring + raw smartctl output

Polish:
- Version badge (v1.0.0-6d) in header via Jinja2 global
- Parallel burn-in warning when max_parallel_burnins > 8 in Settings
- Stats page: avg duration by drive size + failure breakdown by stage
- settings.html: SSH section with key textarea, parallel warn div

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 08:09:30 -05:00

140 lines
5.3 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

"""
Runtime settings store — persists editable settings to /data/settings_overrides.json.
Changes take effect immediately (in-memory setattr on the global Settings object)
and survive restarts (JSON file is loaded in main.py lifespan).
System settings (TrueNAS URL, poll interval, etc.) are saved to JSON but require
a container restart to fully take effect (clients/middleware are initialized at boot).
"""
import json
import logging
from pathlib import Path
from app.config import settings
log = logging.getLogger(__name__)
# Field name → coerce function. Only fields listed here are accepted by save().
_EDITABLE: dict[str, type] = {
# Email / SMTP
"smtp_host": str,
"smtp_ssl_mode": str,
"smtp_timeout": int,
"smtp_user": str,
"smtp_password": str,
"smtp_from": str,
"smtp_to": str,
"smtp_daily_report_enabled": bool,
"smtp_report_hour": int,
"smtp_alert_on_fail": bool,
"smtp_alert_on_pass": bool,
# Webhook
"webhook_url": str,
# Burn-in behaviour
"stuck_job_hours": int,
"max_parallel_burnins": int,
"temp_warn_c": int,
"temp_crit_c": int,
"bad_block_threshold": int,
# SSH credentials — take effect immediately (each connection reads live settings)
"ssh_host": str,
"ssh_port": int,
"ssh_user": str,
"ssh_password": str,
"ssh_key": str,
# System settings — saved to JSON; require container restart to fully apply
"truenas_base_url": str,
"truenas_api_key": str,
"truenas_verify_tls": bool,
"poll_interval_seconds": int,
"stale_threshold_seconds": int,
"allowed_ips": str,
"log_level": str,
}
_VALID_SSL_MODES = {"starttls", "ssl", "plain"}
_VALID_LOG_LEVELS = {"DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"}
def _overrides_path() -> Path:
return Path(settings.db_path).parent / "settings_overrides.json"
def _coerce(key: str, raw) -> object:
coerce = _EDITABLE[key]
if coerce is bool:
if isinstance(raw, bool):
return raw
return str(raw).lower() in ("1", "true", "yes", "on")
return coerce(raw)
def _apply(data: dict) -> None:
"""Apply a dict of updates to the live settings object."""
for key, raw in data.items():
if key not in _EDITABLE:
continue
try:
val = _coerce(key, raw)
if key == "smtp_ssl_mode" and val not in _VALID_SSL_MODES:
log.warning("settings_store: invalid smtp_ssl_mode %r — ignoring", val)
continue
if key == "smtp_report_hour" and not (0 <= int(val) <= 23):
log.warning("settings_store: smtp_report_hour out of range — ignoring")
continue
if key == "log_level" and val not in _VALID_LOG_LEVELS:
log.warning("settings_store: invalid log_level %r — ignoring", val)
continue
if key in ("poll_interval_seconds", "stale_threshold_seconds") and int(val) < 1:
log.warning("settings_store: %s must be >= 1 — ignoring", key)
continue
if key in ("temp_warn_c", "temp_crit_c") and not (20 <= int(val) <= 80):
log.warning("settings_store: %s out of range (2080) — ignoring", key)
continue
if key == "bad_block_threshold" and int(val) < 0:
log.warning("settings_store: bad_block_threshold must be >= 0 — ignoring")
continue
if key == "ssh_port" and not (1 <= int(val) <= 65535):
log.warning("settings_store: ssh_port out of range — ignoring")
continue
setattr(settings, key, val)
except (ValueError, TypeError) as exc:
log.warning("settings_store: invalid value for %s: %s", key, exc)
def init() -> None:
"""Load persisted overrides at startup. Call once from lifespan."""
path = _overrides_path()
if not path.exists():
return
try:
data = json.loads(path.read_text())
_apply(data)
log.info("settings_store: loaded %d override(s) from %s", len(data), path)
except Exception as exc:
log.warning("settings_store: could not load overrides from %s: %s", path, exc)
def save(updates: dict) -> list[str]:
"""
Validate, apply, and persist a dict of settings updates.
Returns list of keys that were actually saved.
Raises ValueError for unknown or invalid fields.
"""
accepted: dict = {}
for key, raw in updates.items():
if key not in _EDITABLE:
raise ValueError(f"Unknown or non-editable setting: {key!r}")
accepted[key] = raw
_apply(accepted)
# Persist ALL currently-applied editable values (not just the delta)
snapshot = {k: getattr(settings, k) for k in _EDITABLE}
path = _overrides_path()
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(json.dumps(snapshot, indent=2))
log.info("settings_store: saved %d key(s) — snapshot written to %s", len(accepted), path)
return list(accepted.keys())