nas-burnin/app/main.py
Brandon Walter d4c0770b9e feat: app-level login + hardening sweep (1.0.0-22 -> 1.0.0-23)
Two layered changes shipped in this branch:

== 1.0.0-22: app-level authentication ==

The dashboard previously had only an IP allowlist. Adds username +
bcrypt password auth, signed-cookie sessions, and a "first user setup"
flow.

* New app/auth.py: User dataclass, bcrypt hash/verify, get_user_by_id/
  username, create_user, touch_last_login, FastAPI `get_current_user`
  dependency. Session secret loaded from SESSION_SECRET env or persisted
  to /data/session_secret.
* New app/auth_cli.py: `python -m app.auth_cli list|reset|add` for
  out-of-band user management. Passwords always read from a TTY prompt.
* Schema: idempotent ALTER for `users` table (id, username unique,
  password_hash, full_name, is_admin, created_at, last_login_at).
* main.py: SessionMiddleware (HMAC-signed cookie, max-age 7 days,
  SameSite=strict — see hardening section) + _AuthGateMiddleware that
  populates request.state.current_user and bounces unauth'd HTML GETs
  to /login while returning 401 JSON for everything else.
* Routes: GET /login renders first-user-setup form when users table is
  empty otherwise sign-in form; POST /login; POST /api/v1/auth/setup
  (only works while empty); GET|POST /logout.
* Bootstrap: env vars INITIAL_ADMIN_USERNAME + INITIAL_ADMIN_PASSWORD
  create the first admin on startup if both set AND users table empty.
  Ignored thereafter — change passwords via UI or CLI.
* Layout: header shows current_user.full_name|username + Logout link.
  Modal operator field auto-fills from the logged-in user via
  <meta name="default-operator"> rendered in layout (replaces the
  localStorage-only previous behaviour).
* requirements.txt: pinned bcrypt>=4.0,<5.0, itsdangerous>=2.1,
  python-multipart>=0.0.7. First step toward addressing the
  unpinned-deps gotcha.
* New app/templates/login.html with first-user-setup variant.

== 1.0.0-23: hardening sweep ==

Closes the eight-item gap audit:

* DB retention + automated backup. New app/retention.py runs daily at
  03:00 local. Nulls burnin_stages.log_text on stages older than
  retention_log_days (default 35), VACUUMs to reclaim pages, then runs
  `sqlite3 .backup` to /data/backups/app-YYYY-MM-DD.db keeping the
  retention_backup_keep most recent (default 14). Wired into the
  lifespan supervisor next to mailer/poller.

* CSRF mitigation. SessionMiddleware bumped to SameSite=strict so the
  browser refuses to send the session cookie on cross-site POSTs —
  removes the actual CSRF vector. Trade-off: external links into the
  app require re-auth.

* Login rate limiting. In-memory per-username AND per-source-IP failure
  counters in auth.py. 10 failures within 10 min trips a 15-min lockout
  for both keys. Returns HTTP 429 with a clear "try again in N min"
  message. Cleared on successful login.

* Login audit events. New event types in audit_events: user_login,
  user_login_failed, user_login_locked_out, user_logout,
  user_password_changed. All include source IP. Recorded via
  auth.audit_auth_event().

* Password change UI. Header link "Change password" opens
  templates/components/modal_password.html (current/new/confirm).
  Posts to POST /api/v1/auth/change-password — bcrypt-verifies current,
  requires >=8 char new pw, writes audit event.

* NVMe burn-in path. _stage_surface_validate now detects nvme*
  devnames and routes to _stage_surface_validate_nvme() which runs
  `nvme format -s 1 --force` (cryptographic erase). Seconds vs hours
  of badblocks, exercises the controller's secure-erase. Falls back
  to badblocks if nvme-cli isn't installed. Post-format SMART check.

* Mounted-FS detection. ssh_client.get_mounted_drives() runs
  `findmnt -no SOURCE`, parses non-ZFS sources back to base devnames.
  Poller treats them as pool_name='(mounted)', pool_role='mounted'.
  Confirm token DESTROY MOUNTED FILESYSTEM, distinct purple styling,
  audit event mounted_drive_unlocked, daily-report banner picks it up.

* Deeper /health. Real readiness check — DB write probe (PRAGMA
  journal_mode), poller freshness (age <= 3x stale_threshold), SSH
  test_connection() when configured. Returns 503 when any check fails
  so a proxy/orchestrator can take the container out of rotation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 11:08:29 -04:00

185 lines
7 KiB
Python

import asyncio
import ipaddress
import logging
from contextlib import asynccontextmanager
from fastapi import FastAPI
from fastapi.staticfiles import StaticFiles
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.middleware.sessions import SessionMiddleware
from starlette.requests import Request
from starlette.responses import JSONResponse, PlainTextResponse
from app import auth, burnin, mailer, poller, retention, settings_store
from app.config import settings
from app.database import init_db
from app.logging_config import configure as configure_logging
from app.renderer import templates # noqa: F401 — registers filters as side-effect
from app.routes import router
from app.truenas import TrueNASClient
# Configure structured JSON logging before anything else logs
configure_logging()
log = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# IP allowlist middleware
# ---------------------------------------------------------------------------
class _IPAllowlistMiddleware(BaseHTTPMiddleware):
"""
Block requests from IPs not in ALLOWED_IPS.
When ALLOWED_IPS is empty the middleware is a no-op.
Checks X-Forwarded-For first (trusts the leftmost address), then the
direct client IP.
"""
def __init__(self, app, allowed_ips: str) -> None:
super().__init__(app)
self._networks: list[ipaddress.IPv4Network | ipaddress.IPv6Network] = []
for entry in (s.strip() for s in allowed_ips.split(",") if s.strip()):
try:
self._networks.append(ipaddress.ip_network(entry, strict=False))
except ValueError:
log.warning("Invalid ALLOWED_IPS entry ignored: %r", entry)
def _is_allowed(self, ip_str: str) -> bool:
try:
addr = ipaddress.ip_address(ip_str)
return any(addr in net for net in self._networks)
except ValueError:
return False
async def dispatch(self, request: Request, call_next):
if not self._networks:
return await call_next(request)
# Prefer X-Forwarded-For (leftmost = original client)
forwarded = request.headers.get("X-Forwarded-For", "").split(",")[0].strip()
client_ip = forwarded or (request.client.host if request.client else "")
if self._is_allowed(client_ip):
return await call_next(request)
log.warning("Request blocked by IP allowlist", extra={"client_ip": client_ip})
return PlainTextResponse("Forbidden", status_code=403)
# ---------------------------------------------------------------------------
# Poller supervisor — restarts run() if it ever exits unexpectedly
# ---------------------------------------------------------------------------
async def _supervised_poller(client: TrueNASClient) -> None:
while True:
try:
await poller.run(client)
except asyncio.CancelledError:
raise # Propagate shutdown signal cleanly
except Exception as exc:
log.critical("Poller crashed unexpectedly — restarting in 5s: %s", exc)
await asyncio.sleep(5)
# ---------------------------------------------------------------------------
# Lifespan
# ---------------------------------------------------------------------------
_client: TrueNASClient | None = None
@asynccontextmanager
async def lifespan(app: FastAPI):
global _client
log.info("Starting up")
await init_db()
settings_store.init()
await auth.bootstrap_admin_if_empty()
_client = TrueNASClient()
await burnin.init(_client)
poll_task = asyncio.create_task(_supervised_poller(_client))
mailer_task = asyncio.create_task(mailer.run())
retention_task = asyncio.create_task(retention.run())
yield
log.info("Shutting down")
poll_task.cancel()
mailer_task.cancel()
retention_task.cancel()
try:
await asyncio.gather(poll_task, mailer_task, retention_task,
return_exceptions=True)
except asyncio.CancelledError:
pass
await _client.close()
# ---------------------------------------------------------------------------
# App
# ---------------------------------------------------------------------------
app = FastAPI(title="TrueNAS Burn-In Dashboard", lifespan=lifespan)
# ---------------------------------------------------------------------------
# Auth gate — must be added BEFORE include_router so it runs first.
# Path-prefix allowlist below covers anything we want reachable without
# a session cookie. SSE streams + WebSockets fall through to the dependency
# in their handler so they 401 cleanly.
# ---------------------------------------------------------------------------
_PUBLIC_PATHS = {"/login", "/logout", "/health", "/auth/setup"}
_PUBLIC_PREFIXES = ("/static/", "/api/v1/auth/")
class _AuthGateMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
path = request.url.path
# Always populate request.state.current_user from the session so
# templates and route handlers can both rely on it. None when
# unauthenticated.
user_id = request.session.get("user_id")
request.state.current_user = (
await auth.get_user_by_id(int(user_id)) if user_id else None
)
if path in _PUBLIC_PATHS or path.startswith(_PUBLIC_PREFIXES):
return await call_next(request)
if request.state.current_user is not None:
return await call_next(request)
# Unauthenticated. HTML GETs bounce to /login with a `next` query
# arg so the user lands back where they tried to go after logging
# in. Anything else (API calls, SSE, POSTs) gets a 401.
accept = request.headers.get("accept", "")
if request.method == "GET" and "text/html" in accept:
return auth.login_redirect(path)
return JSONResponse(
{"detail": "Authentication required"}, status_code=401
)
app.add_middleware(_AuthGateMiddleware)
# SessionMiddleware must be added LAST (it wraps innermost so request.session
# is populated before AuthGate runs).
app.add_middleware(
SessionMiddleware,
secret_key=auth.get_session_secret(),
session_cookie="burnin_session",
max_age=settings.session_max_age_seconds,
https_only=False, # we sit behind nginx-proxy-manager; trust upstream
# SameSite=strict is the primary CSRF mitigation: the browser never
# sends the session cookie on cross-site requests, so an attacker
# page can't trigger any state-changing endpoint even if it knows
# the URL. Trade-off: an external link (email, chat) into the app
# won't carry the session — user has to re-auth via /login. For an
# internal-only tool that's the right default.
same_site="strict",
)
if settings.allowed_ips:
app.add_middleware(_IPAllowlistMiddleware, allowed_ips=settings.allowed_ips)
log.info("IP allowlist active: %s", settings.allowed_ips)
app.mount("/static", StaticFiles(directory="app/static"), name="static")
app.include_router(router)