fix: address Codex audit findings (1.0.0-28)
Some checks are pending
Security scan / pip-audit (push) Waiting to run
Security scan / bandit (push) Waiting to run
Security scan / gitleaks (push) Waiting to run

Addresses 12 of 13 findings from the Codex tech-debt + security review
of versions 1.0.0-22 through 1.0.0-27. Item #5 (live pool re-check
before start_job) deferred — would add an SSH round-trip per start.

#1  Pool detection now treats zpool / lsblk / findmnt failures
    INDEPENDENTLY. Previously a single None blew away the whole map,
    so a host where lsblk lacks zfs_member info but zpool works would
    never lock pool members. Extended findmnt parser to recognise
    /dev/mapper/*, /dev/dm-*, /dev/md*, /dev/da*, /dev/ada* (LVM,
    devicemapper, MD RAID, FreeBSD CORE devnames).

#2  Admin role enforced on every settings mutation. New
    auth.require_admin() helper applied to GET /settings,
    POST /api/v1/settings, /test-smtp, /test-ssh. Previously any
    authenticated user (the CLI explicitly supports non-admin
    accounts) could rewrite SMTP/SSH/API secrets.

#3  First-user setup race closed. auth.create_user() now accepts
    bootstrap_only=True which wraps the existence check + insert in
    BEGIN IMMEDIATE so two concurrent /api/v1/auth/setup requests
    can't both create admin accounts during the bootstrap window.

#4  Case-insensitive uniqueness enforced via new
    `uniq_users_username_nocase` index. Login does NOCASE lookup so
    without this `Admin` and `admin` could coexist as distinct rows.

#6  New `session_cookie_secure` setting (default False for LAN/dev
    deploys, set True in production behind HTTPS) flips the session
    cookie's Secure flag. Defends against on-the-wire exposure when
    the dashboard is reachable over plain HTTP.

#7  Audit trail bound to authenticated identity. Burn-in start /
    cancel / unlock / drive reset all now use `_operator_for(request)`
    which reads `request.state.current_user.full_name|username`
    instead of the body's operator field. Logged-in users can no
    longer spoof attribution. Drive reset's literal-"operator"
    fallback (window._operator was never set) is also fixed by this.

#8  Login rate-limit race fixed. New `register_login_attempt()` is
    atomic check-AND-increment in synchronous code (no awaits inside),
    so a parallel burst can't slip past the threshold.
    `record_login_failure()` removed; `clear_login_failures()` now
    also drops any active lockout for a successful auth. Pre-existing
    bug where `tripped` was always False (so user_login_locked_out
    audit events never fired) also fixed.

#9  NVMe surface_validate post-format check now mirrors the SSH path:
    fails on FAILED health AND on real SMART attribute failures,
    soft-passes SSH-only failures (logged), surfaces warnings to the
    stage log without failing.

#10 retention.backup_db() now writes to `.tmp` then atomic-renames
    into the canonical daily slot — an interrupted backup leaves the
    tmp behind but doesn't corrupt the real snapshot. Scheduler marks
    last_run_date only on (prune AND backup) success so a transient
    failure gets retried within the 03:00 hour.

#11 /health DB probe now exercises the WRITE path via a temp-table
    INSERT/SELECT/COMMIT round-trip. Previously only read PRAGMA
    journal_mode + a row count, which silently passes on read-only
    mounts and broken-WAL conditions.

#12 security-scan.sh now fails loudly if `git fetch` or
    `git reset --hard origin/main` errors (was `|| true`, scanning
    stale code silently). pip-audit now runs in a throwaway
    python:3.12-slim container against requirements.txt instead of
    `docker exec`-ing into the live truenas-burnin container —
    cleaner separation, no transient package install on prod.

#13 Badblocks SSH stage no longer doubles its log_text. Previously
    appended every 20-line chunk during streaming AND the full
    accumulated output at end. Now only flushes the un-flushed tail
    (typically <20 lines). `result["output"]` stays in-memory only.

Verification: all 44 unit tests pass in container; /health 200;
security scan returns 0 findings; deployed maple build is green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Brandon Walter 2026-05-02 18:48:16 -04:00
parent 3a9bdc9e15
commit 066fbbc403
10 changed files with 298 additions and 94 deletions

View file

@ -136,8 +136,15 @@ async def get_user_by_id(user_id: int) -> User | None:
async def create_user(username: str, password: str,
full_name: str | None = None,
is_admin: bool = False) -> User:
"""Insert a new user. Raises ValueError if the username collides."""
is_admin: bool = False,
bootstrap_only: bool = False) -> User:
"""Insert a new user. Raises ValueError if the username collides.
bootstrap_only=True: serializes the insert with a check that the
users table is empty inside an IMMEDIATE transaction. Used for the
/api/v1/auth/setup first-user flow so two concurrent requests can't
both create admin accounts during the bootstrap window.
"""
username = (username or "").strip()
if not username:
raise ValueError("Username is required.")
@ -146,6 +153,16 @@ async def create_user(username: str, password: str,
h = hash_password(password)
try:
async with aiosqlite.connect(settings.db_path) as db:
if bootstrap_only:
# IMMEDIATE acquires the write lock up-front so a parallel
# setup request waits or fails — no two-step race.
await db.execute("BEGIN IMMEDIATE")
cur = await db.execute("SELECT COUNT(*) FROM users")
if (await cur.fetchone())[0] != 0:
await db.execute("ROLLBACK")
raise ValueError(
"Users already exist — first-user setup is closed."
)
cur = await db.execute(
"""INSERT INTO users
(username, password_hash, full_name, is_admin, created_at)
@ -237,23 +254,48 @@ def login_locked_until(username: str, ip: str) -> float | None:
return soonest
def record_login_failure(username: str, ip: str) -> bool:
"""Returns True if this failure tripped a lockout."""
tripped = False
def register_login_attempt(username: str, ip: str) -> str:
"""Atomic check-then-increment for a login attempt.
Returns:
"ok" allowed, counter incremented
"locked_out" already locked from a prior attempt
"now_locked_out" THIS attempt is what tripped the lockout
The increment runs synchronously (no awaits) so concurrent requests
can't slip past the threshold in CPython's single-threaded asyncio
loop. Caller must invoke clear_login_failures() on successful auth
to roll back this attempt's contribution.
"""
now = _time.time()
# Check existing lockouts first; if already locked, don't even
# increment — the lockout window absorbs everything.
for key in (("user", username.lower()), ("ip", ip)):
exp = _login_lockouts.get(key)
if exp is None:
continue
if now >= exp:
del _login_lockouts[key]
continue
return "locked_out"
# Increment + arm lockout if this push crosses the threshold.
tripped = False
for key in (("user", username.lower()), ("ip", ip)):
_gc_failures(key)
_login_failures.setdefault(key, []).append(now)
if len(_login_failures[key]) >= LOGIN_FAILURE_THRESHOLD:
_login_lockouts[key] = now + LOGIN_LOCKOUT_SECONDS
_login_failures[key] = [] # reset counter once lockout armed
_login_failures[key] = []
tripped = True
return tripped
return "now_locked_out" if tripped else "ok"
def clear_login_failures(username: str, ip: str) -> None:
"""Erase counters AND any lockout for a successful auth — caller
proved they have credentials, so the brute-force ladder resets."""
for key in (("user", username.lower()), ("ip", ip)):
_login_failures.pop(key, None)
_login_lockouts.pop(key, None)
# ---------------------------------------------------------------------------
@ -309,6 +351,24 @@ async def get_current_user_optional(request: Request) -> User | None:
return await get_user_by_id(int(sess_user_id))
def require_admin(request: Request) -> User:
"""Strict admin gate — for any settings-mutating endpoint. The
AuthGate middleware has already populated request.state.current_user;
this just enforces is_admin on top."""
user = getattr(request.state, "current_user", None)
if not user:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Authentication required",
)
if not user.is_admin:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="Admin only",
)
return user
async def get_current_user(request: Request) -> User:
"""Strict version — for routes. 401 (or redirect for HTML) if missing."""
user = await get_current_user_optional(request)

View file

@ -952,17 +952,49 @@ async def _stage_surface_validate_nvme(job_id: int, devname: str,
)
return False
# Sanity-check post-format SMART health.
# Sanity-check post-format SMART health. Mirrors the surface_validate
# SSH path's check parity — fail on FAILED health, fail on real
# SMART attribute failures, log warnings but don't fail. A transport
# error here is treated as a soft pass (log + continue) so a single
# SSH blip after a successful format doesn't undo the work.
try:
attrs = await ssh_client.get_smart_attributes(devname)
ssh_only_failures = [
f for f in (attrs.get("failures") or []) if f.startswith("SSH error:")
]
real_failures = [
f for f in (attrs.get("failures") or []) if not f.startswith("SSH error:")
]
if attrs.get("health") == "FAILED":
await _set_stage_error(
job_id, "surface_validate",
"NVMe SMART health FAILED after format"
"NVMe SMART health FAILED after format",
)
return False
if real_failures:
await _set_stage_error(
job_id, "surface_validate",
"NVMe SMART attribute failures after format: "
+ "; ".join(real_failures),
)
return False
if ssh_only_failures:
await _append_stage_log(
job_id, "surface_validate",
"[WARN] post-format SMART check had SSH errors "
"(soft-passing): " + "; ".join(ssh_only_failures) + "\n",
)
if attrs.get("warnings"):
await _append_stage_log(
job_id, "surface_validate",
"[WARN] " + "; ".join(attrs["warnings"]) + "\n",
)
except Exception as exc:
log.warning("Post-format SMART check error on %s: %s", devname, exc)
await _append_stage_log(
job_id, "surface_validate",
f"[WARN] post-format SMART check raised: {exc}\n",
)
await _update_stage_percent(job_id, "surface_validate", 100)
await _recalculate_progress(job_id)
@ -1116,11 +1148,16 @@ async def _stage_surface_validate_ssh(job_id: int, devname: str, drive_id: int)
job_id,
)
# Flush remaining output
remainder = "".join(output_lines)
await _append_stage_log(job_id, "surface_validate", remainder)
# Flush only lines we haven't already written in 20-line chunks.
# Previously we appended the FULL accumulated output here too,
# doubling the stored log_text size for every surface_validate
# stage and pushing app.db into hundreds of MB.
flushed_count = (len(output_lines) // 20) * 20
tail = "".join(output_lines[flushed_count:])
if tail:
await _append_stage_log(job_id, "surface_validate", tail)
result["bad_blocks"] = bad_blocks_total
result["output"] = remainder
result["output"] = "".join(output_lines) # in-memory only, not re-stored
result["aborted"] = bad_blocks_total > settings.bad_block_threshold
except asyncio.CancelledError:

View file

@ -83,7 +83,7 @@ class Settings(BaseSettings):
ssh_key: str = "" # PEM private key content (paste full key including headers)
# Application version — used by the /api/v1/updates/check endpoint
app_version: str = "1.0.0-27"
app_version: str = "1.0.0-28"
# ---- Authentication (1.0.0-22) ----
# session_secret: HMAC key for signing session cookies. Empty = generate
@ -92,6 +92,11 @@ class Settings(BaseSettings):
# SESSION_SECRET env var if you want to share secrets across replicas.
session_secret: str = ""
session_max_age_seconds: int = 60 * 60 * 24 * 7 # 7 days
# Set to True when the dashboard is exclusively reachable over HTTPS
# (typical when fronted by nginx-proxy-manager with TLS). Refuses to
# send the session cookie on plain HTTP, eliminating the on-the-wire
# exposure surface. Leaving False allows initial deploy + LAN testing.
session_cookie_secure: bool = False
# Initial admin bootstrap. If both env vars are set AND the users table
# is empty at startup, create that account immediately. After that the
# env vars are ignored — change passwords via the UI / database, not

View file

@ -109,6 +109,11 @@ _MIGRATIONS = [
created_at TEXT NOT NULL,
last_login_at TEXT
)""",
# 1.0.0-28: case-insensitive uniqueness. The base UNIQUE on username
# is case-sensitive but login does NOCASE — without this index two
# users `Admin` and `admin` could coexist and shadow each other.
"""CREATE UNIQUE INDEX IF NOT EXISTS uniq_users_username_nocase
ON users (username COLLATE NOCASE)""",
]

View file

@ -213,7 +213,10 @@ app.add_middleware(
secret_key=auth.get_session_secret(),
session_cookie="burnin_session",
max_age=settings.session_max_age_seconds,
https_only=False, # we sit behind nginx-proxy-manager; trust upstream
# session_cookie_secure flips the cookie's Secure flag. Set to True
# in production behind HTTPS (nginx-proxy-manager) so the auth cookie
# is never sent on plain HTTP.
https_only=settings.session_cookie_secure,
# SameSite=strict is the primary CSRF mitigation: the browser never
# sends the session cookie on cross-site requests, so an attacker
# page can't trigger any state-changing endpoint even if it knows

View file

@ -375,52 +375,54 @@ async def poll_cycle(client: TrueNASClient) -> int:
# locked, and previously-unlocked drives stay unlocked, until detection
# recovers. Treating a transient SSH blip as "no pool members" would
# silently unlock every drive on the next poll.
detection_ok = True
# Each detection probe (pool / exported / mounted) succeeds or fails
# INDEPENDENTLY. Previously a single None blew away the whole map,
# so a fresh DB on a host where lsblk lacks zfs_member info but
# zpool works would never lock pool members. Now we apply each
# successful probe and only fail-closed for the ones that actually
# errored.
pool_map: dict = {}
zfs_member_set: set = set()
mounted_set: set = set()
pool_probe_ok = True # zpool list -vHP succeeded
zfs_probe_ok = True # lsblk zfs_member succeeded
mounted_probe_ok = True # findmnt succeeded
try:
from app import ssh_client as _ssh
if _ssh.is_configured():
pm = await _ssh.get_pool_membership()
zs = await _ssh.get_zfs_member_drives()
ms = await _ssh.get_mounted_drives()
if pm is None or zs is None or ms is None:
detection_ok = False
else:
pool_map = pm
zfs_member_set = zs
mounted_set = ms
# SSH unconfigured (mock/dev mode) — detection_ok stays True with
pool_probe_ok = pm is not None
zfs_probe_ok = zs is not None
mounted_probe_ok = ms is not None
if pool_probe_ok:
pool_map.update(pm)
if zfs_probe_ok:
for devname in zs:
if devname not in pool_map:
pool_map[devname] = {"pool": "(exported)", "role": "exported"}
if mounted_probe_ok:
for devname in ms:
if devname not in pool_map:
pool_map[devname] = {"pool": "(mounted)", "role": "mounted"}
# SSH unconfigured (mock/dev mode) — all probes "succeed" with
# empty maps, so dev mode never artificially locks drives.
except Exception:
detection_ok = False
pool_probe_ok = zfs_probe_ok = mounted_probe_ok = False
pool_map = {}
if not detection_ok:
# If ALL probes failed we have no fresh data at all — preserve the
# existing pool columns to keep locks honest. If at least one probe
# succeeded the new pool_map is a partial truth: we apply it and
# only refuse to clear locks coming from a probe that failed.
detection_ok = pool_probe_ok or zfs_probe_ok or mounted_probe_ok
if not (pool_probe_ok and zfs_probe_ok and mounted_probe_ok):
log.warning(
"Pool detection failed this cycle — preserving existing "
"pool_name/pool_role columns. Locked drives stay locked, "
"unlocked drives stay unlocked, until SSH recovers."
"Pool detection partial: pool=%s zfs=%s mounted=%s — preserving "
"stale lock state from any probe that failed.",
pool_probe_ok, zfs_probe_ok, mounted_probe_ok,
)
if detection_ok:
# Drives carrying ZFS labels but not in any active pool are
# "exported" — same hazard as an active pool member, so lock them
# too. We don't know the original pool name without
# `zpool import`-style scanning (slow + blocks); display
# "(exported)" and use a special token.
for devname in zfs_member_set:
if devname not in pool_map:
pool_map[devname] = {"pool": "(exported)", "role": "exported"}
# Drives with a non-ZFS mount somewhere (XFS/ext4/scratch/etc.)
# also lock — wiping a mounted FS is just as catastrophic. Lower
# precedence than active pool membership, since a drive in `tank`
# would also show under findmnt for the pool's mountpoint via
# /dev/zd* or zvol — but those are filtered in the parser.
for devname in mounted_set:
if devname not in pool_map:
pool_map[devname] = {"pool": "(mounted)", "role": "mounted"}
# Index running jobs by (devname, test_type)
active: dict[tuple[str, str], dict] = {}
for job in running_jobs:

View file

@ -74,19 +74,36 @@ def _backup_dir() -> Path:
async def backup_db(keep_count: int) -> Path | None:
"""Online-backup the live DB to backups/app-YYYY-MM-DD.db. Returns
the new file's path. Old backups beyond keep_count are deleted."""
the new file's path. Old backups beyond keep_count are deleted.
Atomicity: writes to a sibling tmp file first and renames into the
canonical daily slot only after backup succeeds. An interrupted
backup leaves the tmp file (cleaned up on next run); the previous
day's snapshot stays intact. os.replace is atomic within the same
filesystem on POSIX.
"""
import os as _os
bdir = _backup_dir()
bdir.mkdir(parents=True, exist_ok=True)
today = datetime.now().strftime("%Y-%m-%d")
out = bdir / f"app-{today}.db"
tmp = bdir / f"app-{today}.db.tmp"
# Drop any leftover tmp from a previous interrupted run.
if tmp.exists():
try:
tmp.unlink()
except OSError:
pass
# aiosqlite.Connection.backup() is an async wrapper around
# sqlite3.Connection.backup — atomic online snapshot that doesn't
# block writers (it copies pages in batches and yields between).
async with aiosqlite.connect(settings.db_path) as src:
async with aiosqlite.connect(str(out)) as dst:
async with aiosqlite.connect(str(tmp)) as dst:
await src.backup(dst)
_os.replace(tmp, out)
log.info("Retention: DB backed up to %s (%d bytes)", out, out.stat().st_size)
# Keep the N most recent backups; delete older.
@ -124,17 +141,26 @@ async def run() -> None:
now = datetime.now()
today = now.strftime("%Y-%m-%d")
if now.hour == _RUN_HOUR and _state["last_run_date"] != today:
_state["last_run_date"] = today
# Track prune + backup success independently. Mark the
# day "done" only when BOTH succeed so a transient
# failure gets retried on the next 5-min tick (still
# within the 03:00 hour).
prune_ok = False
backup_ok = False
try:
pruned = await prune_stage_logs(settings.retention_log_days)
if pruned:
await vacuum_db()
prune_ok = True
except Exception as exc:
log.exception("Retention: pruning failed: %s", exc)
try:
await backup_db(settings.retention_backup_keep)
backup_ok = True
except Exception as exc:
log.exception("Retention: backup failed: %s", exc)
if prune_ok and backup_ok:
_state["last_run_date"] = today
except asyncio.CancelledError:
raise
except Exception as exc:

View file

@ -2,6 +2,7 @@ import asyncio
import csv
import io
import json
import time as _time
from datetime import datetime, timezone
import aiosqlite
@ -263,14 +264,22 @@ async def login_submit(request: Request):
next_url = "/"
ip = _client_ip(request)
# Rate-limit gate — checked BEFORE bcrypt so an attacker can't burn CPU.
locked_until = auth.login_locked_until(username, ip)
if locked_until is not None:
remaining = int(locked_until - __import__("time").time())
# Atomic register-and-check: increments the counter NOW (before any
# await), so a parallel burst of guesses can't all slip past the
# threshold. Cleared on successful auth via clear_login_failures.
attempt = auth.register_login_attempt(username, ip)
if attempt != "ok":
if attempt == "now_locked_out":
await auth.audit_auth_event(
"user_login_locked_out", username,
f"Failed login from {ip} — IP/user locked out for {auth.LOGIN_LOCKOUT_SECONDS // 60} min",
)
locked_until = auth.login_locked_until(username, ip)
remaining = int((locked_until or _time.time()) - _time.time())
return templates.TemplateResponse(request, "login.html", {
"request": request,
"needs_setup": False,
"error": f"Too many failed attempts. Try again in {remaining // 60} min.",
"error": f"Too many failed attempts. Try again in {remaining // 60 + 1} min.",
"next": next_url,
}, status_code=429)
@ -280,14 +289,8 @@ async def login_submit(request: Request):
# so the timing of "user not found" matches "wrong password."
if not found:
auth.verify_password(password, "$2b$12$" + "x" * 53)
tripped = auth.record_login_failure(username, ip)
await auth.audit_auth_event(
"user_login_locked_out" if tripped else "user_login_failed",
username,
f"Failed login from {ip}" + (
f" — IP/user locked out for {auth.LOGIN_LOCKOUT_SECONDS // 60} min"
if tripped else ""
),
"user_login_failed", username, f"Failed login from {ip}",
)
return templates.TemplateResponse(request, "login.html", {
"request": request,
@ -323,7 +326,12 @@ async def auth_first_user_setup(request: Request):
password = form.get("password") or ""
full_name = (form.get("full_name") or "").strip() or None
try:
user = await auth.create_user(username, password, full_name, is_admin=True)
# bootstrap_only=True wraps the existence check + insert in an
# IMMEDIATE transaction so two concurrent setup requests can't
# both create admin accounts during the bootstrap window.
user = await auth.create_user(
username, password, full_name, is_admin=True, bootstrap_only=True
)
except ValueError as exc:
raise HTTPException(status_code=400, detail=str(exc))
# Same fixation defense as the login flow — discard any pre-existing
@ -466,12 +474,20 @@ async def health(db: aiosqlite.Connection = Depends(get_db)):
checks: dict[str, dict] = {}
# DB probe — confirm the journal is healthy (PRAGMA reads journal_mode
# and would fail loudly if WAL is wedged or the file is unreadable).
# DB probe — actually exercise the write path (read-only mounts,
# full disks, broken WAL all silently pass a journal_mode read).
# Uses a temp table that lives only inside the connection so the
# round-trip touches the writer without polluting real data.
try:
cur = await db.execute("PRAGMA journal_mode")
await cur.fetchone()
checks["db"] = {"ok": True}
await db.execute(
"CREATE TEMP TABLE IF NOT EXISTS _hc (k INTEGER PRIMARY KEY, v TEXT)"
)
await db.execute("INSERT OR REPLACE INTO _hc (k, v) VALUES (1, ?)",
(datetime.now(timezone.utc).isoformat(),))
cur = await db.execute("SELECT v FROM _hc WHERE k=1")
row = await cur.fetchone()
await db.commit()
checks["db"] = {"ok": bool(row)}
except Exception as exc:
checks["db"] = {"ok": False, "error": str(exc)}
@ -781,14 +797,25 @@ def _row_to_burnin(row: aiosqlite.Row, stages: list[aiosqlite.Row]) -> BurninJob
)
def _operator_for(request: Request, _ignored_body_value: str | None = None) -> str:
"""Always return the logged-in user's name for audit attribution.
The request body's `operator` field (if any) is ignored — clients
can't spoof the operator identity any more."""
user = getattr(request.state, "current_user", None)
if not user:
raise HTTPException(status_code=401, detail="Authentication required")
return user.full_name or user.username
@router.post("/api/v1/burnin/start")
async def burnin_start(req: StartBurninRequest):
async def burnin_start(request: Request, req: StartBurninRequest):
operator = _operator_for(request, req.operator)
results = []
errors = []
for drive_id in req.drive_ids:
try:
job_id = await burnin.start_job(
drive_id, req.profile, req.operator, stage_order=req.stage_order
drive_id, req.profile, operator, stage_order=req.stage_order
)
results.append({"drive_id": drive_id, "job_id": job_id})
except burnin.PoolMemberError as exc:
@ -809,10 +836,11 @@ async def burnin_start(req: StartBurninRequest):
@router.post("/api/v1/drives/{drive_id}/unlock")
async def unlock_pool_drive(drive_id: int, req: UnlockPoolDriveRequest):
async def unlock_pool_drive(drive_id: int, request: Request, req: UnlockPoolDriveRequest):
operator = _operator_for(request, req.operator)
try:
expiry = await burnin.grant_pool_unlock(
drive_id, req.confirm_token, req.operator, req.reason,
drive_id, req.confirm_token, operator, req.reason,
)
except ValueError as exc:
raise HTTPException(status_code=400, detail=str(exc))
@ -821,8 +849,9 @@ async def unlock_pool_drive(drive_id: int, req: UnlockPoolDriveRequest):
@router.post("/api/v1/burnin/{job_id}/cancel")
async def burnin_cancel(job_id: int, req: CancelBurninRequest):
ok = await burnin.cancel_job(job_id, req.operator)
async def burnin_cancel(job_id: int, request: Request, req: CancelBurninRequest):
operator = _operator_for(request, req.operator)
ok = await burnin.cancel_job(job_id, operator)
if not ok:
raise HTTPException(status_code=409, detail="Job not found or not cancellable")
return {"cancelled": True}
@ -1044,6 +1073,7 @@ async def update_drive(
@router.post("/api/v1/drives/{drive_id}/reset")
async def reset_drive(
drive_id: int,
request: Request,
body: dict,
db: aiosqlite.Connection = Depends(get_db),
):
@ -1064,7 +1094,9 @@ async def reset_drive(
if (await cur.fetchone())[0] > 0:
raise HTTPException(status_code=409, detail="Cannot reset while a burn-in is active")
operator = body.get("operator", "operator")
# Trust the logged-in user, not the body (the JS used to send a
# literal "operator" because window._operator was never set).
operator = _operator_for(request, body.get("operator"))
# Reset SMART test state to idle
await db.execute(
@ -1243,6 +1275,7 @@ async def settings_page(
request: Request,
db: aiosqlite.Connection = Depends(get_db),
):
auth.require_admin(request)
# Editable values — real values for form fields (secrets excluded)
editable = {
# SMTP
@ -1359,7 +1392,7 @@ async def get_settings_redacted(request: Request):
@router.post("/api/v1/settings")
async def save_settings(request: Request, body: dict):
"""Save editable runtime settings. Secrets are only updated if non-empty."""
user = request.state.current_user
user = auth.require_admin(request)
# Don't overwrite secrets if client sent empty string. Track which
# ones DID get a real change so we can audit the rotation.
rotated: list[str] = []
@ -1389,8 +1422,9 @@ async def save_settings(request: Request, body: dict):
@router.post("/api/v1/settings/test-smtp")
async def test_smtp():
async def test_smtp(request: Request):
"""Test the current SMTP configuration without sending an email."""
auth.require_admin(request)
result = await mailer.test_smtp_connection()
if not result["ok"]:
raise HTTPException(status_code=502, detail=result["error"])
@ -1398,8 +1432,9 @@ async def test_smtp():
@router.post("/api/v1/settings/test-ssh")
async def test_ssh():
async def test_ssh(request: Request):
"""Test the current SSH configuration."""
auth.require_admin(request)
from app import ssh_client
result = await ssh_client.test_connection()
if not result["ok"]:

View file

@ -388,7 +388,16 @@ async def get_mounted_drives() -> set | None:
def _parse_findmnt_sources(stdout: str) -> set:
"""Pure parser for findmnt output. Strips partitions; ignores tmpfs,
overlay, zfs (zfs is handled by pool detection)."""
overlay, zfs (zfs is handled by pool detection).
Recognised devnames (covers TrueNAS SCALE + CORE + LVM/MD stacks):
sd[a-z]+ Linux SCSI/SATA (sda, sdb, ..., sdaa)
nvmeXnY[pZ] Linux NVMe namespaces
mapper/<name> LVM logical volumes (/dev/mapper/vg-lv)
dm-N devicemapper short names
mdN Linux MD RAID arrays
ada[0-9]+, da[0-9]+ TrueNAS CORE (FreeBSD) SATA/SAS
"""
import re as _re
out: set = set()
for raw in stdout.splitlines():
@ -400,14 +409,22 @@ def _parse_findmnt_sources(stdout: str) -> set:
if "/dev/zd" in s or "/dev/zvol" in s:
continue
name = s[len("/dev/"):].split("[")[0] # bind mounts can have [subdir]
if name.startswith("nvme"):
m = _re.match(r"^(nvme\d+n\d+)", name)
if m:
out.add(m.group(1))
else:
m = _re.match(r"^(sd[a-z]+)", name)
# Try each recognised devname pattern in order. Mapper/dm-/md
# entries are kept whole because they represent a stack the
# operator should resolve manually before burn-in.
for pat in (
r"^(nvme\d+n\d+)", # NVMe (strip pN)
r"^(sd[a-z]+)", # Linux SCSI/SATA (strip number)
r"^(mapper/[^/]+)", # LVM logical volume
r"^(dm-\d+)", # devicemapper short name
r"^(md\d+)", # MD RAID
r"^(ada\d+)", # FreeBSD SATA
r"^(da\d+)", # FreeBSD SAS/SCSI
):
m = _re.match(pat, name)
if m:
out.add(m.group(1))
break
return out

View file

@ -41,19 +41,33 @@ if [ ! -d "$REPO/.git" ]; then
fi
cd "$REPO"
git fetch --quiet --prune origin 2>&1 || true
git checkout --quiet main 2>&1 || true
git reset --hard --quiet origin/main 2>&1 || true
# Refresh the scan checkout. Failures here mean we'd be scanning stale
# code without knowing — fail loudly instead of soldiering on silently.
if ! git fetch --quiet --prune origin; then
echo "fatal: git fetch failed in $REPO" >&2
exit 65
fi
git checkout --quiet main || true # ok if already on main
if ! git reset --hard --quiet origin/main; then
echo "fatal: git reset --hard failed in $REPO" >&2
exit 65
fi
echo "=== Security scan $DATE ===" > "$OUT_DIR/summary.txt"
date -Iseconds >> "$OUT_DIR/summary.txt"
echo >> "$OUT_DIR/summary.txt"
# --- pip-audit against the LIVE container's installed packages ----------
# Catches CVEs that hit a transitive dep we don't pin in requirements.txt.
echo "--- pip-audit (live container) ---" | tee -a "$OUT_DIR/summary.txt"
docker exec truenas-burnin sh -c \
"pip install --quiet --no-cache-dir --disable-pip-version-check pip-audit 2>/dev/null && pip-audit --strict --format=columns" \
# --- pip-audit against the lockfile in a throwaway container ------------
# Previously we did `docker exec truenas-burnin pip install pip-audit`
# which mutated the live production container with a transient package.
# Now scan the lockfile in an ephemeral container — same coverage of
# pinned versions + their transitives, no side effects on prod.
echo "--- pip-audit (requirements.txt in throwaway container) ---" | tee -a "$OUT_DIR/summary.txt"
docker run --rm \
-v "$REPO/requirements.txt:/work/requirements.txt:ro" \
-w /work \
python:3.12-slim sh -c \
"pip install --quiet --no-cache-dir --disable-pip-version-check pip-audit 2>/dev/null && pip-audit --requirement requirements.txt --strict --format=columns" \
> "$OUT_DIR/pip-audit.txt" 2>&1
PIPS=$?
echo " exit=$PIPS ($OUT_DIR/pip-audit.txt)" | tee -a "$OUT_DIR/summary.txt"