Addresses 12 of 13 findings from the Codex tech-debt + security review of versions 1.0.0-22 through 1.0.0-27. Item #5 (live pool re-check before start_job) deferred — would add an SSH round-trip per start. #1 Pool detection now treats zpool / lsblk / findmnt failures INDEPENDENTLY. Previously a single None blew away the whole map, so a host where lsblk lacks zfs_member info but zpool works would never lock pool members. Extended findmnt parser to recognise /dev/mapper/*, /dev/dm-*, /dev/md*, /dev/da*, /dev/ada* (LVM, devicemapper, MD RAID, FreeBSD CORE devnames). #2 Admin role enforced on every settings mutation. New auth.require_admin() helper applied to GET /settings, POST /api/v1/settings, /test-smtp, /test-ssh. Previously any authenticated user (the CLI explicitly supports non-admin accounts) could rewrite SMTP/SSH/API secrets. #3 First-user setup race closed. auth.create_user() now accepts bootstrap_only=True which wraps the existence check + insert in BEGIN IMMEDIATE so two concurrent /api/v1/auth/setup requests can't both create admin accounts during the bootstrap window. #4 Case-insensitive uniqueness enforced via new `uniq_users_username_nocase` index. Login does NOCASE lookup so without this `Admin` and `admin` could coexist as distinct rows. #6 New `session_cookie_secure` setting (default False for LAN/dev deploys, set True in production behind HTTPS) flips the session cookie's Secure flag. Defends against on-the-wire exposure when the dashboard is reachable over plain HTTP. #7 Audit trail bound to authenticated identity. Burn-in start / cancel / unlock / drive reset all now use `_operator_for(request)` which reads `request.state.current_user.full_name|username` instead of the body's operator field. Logged-in users can no longer spoof attribution. Drive reset's literal-"operator" fallback (window._operator was never set) is also fixed by this. #8 Login rate-limit race fixed. New `register_login_attempt()` is atomic check-AND-increment in synchronous code (no awaits inside), so a parallel burst can't slip past the threshold. `record_login_failure()` removed; `clear_login_failures()` now also drops any active lockout for a successful auth. Pre-existing bug where `tripped` was always False (so user_login_locked_out audit events never fired) also fixed. #9 NVMe surface_validate post-format check now mirrors the SSH path: fails on FAILED health AND on real SMART attribute failures, soft-passes SSH-only failures (logged), surfaces warnings to the stage log without failing. #10 retention.backup_db() now writes to `.tmp` then atomic-renames into the canonical daily slot — an interrupted backup leaves the tmp behind but doesn't corrupt the real snapshot. Scheduler marks last_run_date only on (prune AND backup) success so a transient failure gets retried within the 03:00 hour. #11 /health DB probe now exercises the WRITE path via a temp-table INSERT/SELECT/COMMIT round-trip. Previously only read PRAGMA journal_mode + a row count, which silently passes on read-only mounts and broken-WAL conditions. #12 security-scan.sh now fails loudly if `git fetch` or `git reset --hard origin/main` errors (was `|| true`, scanning stale code silently). pip-audit now runs in a throwaway python:3.12-slim container against requirements.txt instead of `docker exec`-ing into the live truenas-burnin container — cleaner separation, no transient package install on prod. #13 Badblocks SSH stage no longer doubles its log_text. Previously appended every 20-line chunk during streaming AND the full accumulated output at end. Now only flushes the un-flushed tail (typically <20 lines). `result["output"]` stays in-memory only. Verification: all 44 unit tests pass in container; /health 200; security scan returns 0 findings; deployed maple build is green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
127 lines
5.2 KiB
Bash
127 lines
5.2 KiB
Bash
#!/usr/bin/env bash
|
|
# Daily security scan of the deployed truenas-burnin source on maple.
|
|
# Mirrors the .forgejo/workflows/security-scan.yml CI pipeline so a finding
|
|
# the runner-less forge would have flagged still surfaces here.
|
|
#
|
|
# Tools all run in containers — nothing installed on the host.
|
|
# pip-audit — known CVEs in installed packages (scans the LIVE container)
|
|
# bandit — Python static security analysis on host source tree
|
|
# gitleaks — secrets across the full git history
|
|
#
|
|
# Output:
|
|
# ~/security-scans/scan-YYYY-MM-DD/{pip-audit,bandit,gitleaks}.txt
|
|
# ~/security-scans/findings.log — appended one line per scan with findings
|
|
#
|
|
# Wiring:
|
|
# Daily systemd user timer at 03:30 local (after the in-app retention job
|
|
# so backups are fresh). See scripts/security-scan.{service,timer}.
|
|
|
|
set -uo pipefail
|
|
|
|
REPO_URL="${REPO_URL:-https://git.hellocomputer.xyz/brandon/truenas-burnin.git}"
|
|
REPO="${REPO:-$HOME/scan-checkouts/truenas-burnin}"
|
|
OUT_BASE="${OUT_BASE:-$HOME/security-scans}"
|
|
DATE="$(date +%Y-%m-%d)"
|
|
OUT_DIR="$OUT_BASE/scan-$DATE"
|
|
SUMMARY="$OUT_BASE/findings.log"
|
|
GITLEAKS_VERSION="${GITLEAKS_VERSION:-8.21.2}"
|
|
|
|
mkdir -p "$OUT_DIR" "$(dirname "$REPO")"
|
|
|
|
# Maintain a dedicated checkout for scanning. The deploy at
|
|
# ~/docker/stacks/truenas-burnin/ is just the bind-mounted source — no
|
|
# .git, no history — so gitleaks can't scan there. We keep a separate
|
|
# clone, fast-forward it to origin/main each run.
|
|
if [ ! -d "$REPO/.git" ]; then
|
|
echo "Cloning $REPO_URL to $REPO ..."
|
|
git clone --quiet "$REPO_URL" "$REPO" || {
|
|
echo "fatal: git clone failed" >&2
|
|
exit 65
|
|
}
|
|
fi
|
|
|
|
cd "$REPO"
|
|
# Refresh the scan checkout. Failures here mean we'd be scanning stale
|
|
# code without knowing — fail loudly instead of soldiering on silently.
|
|
if ! git fetch --quiet --prune origin; then
|
|
echo "fatal: git fetch failed in $REPO" >&2
|
|
exit 65
|
|
fi
|
|
git checkout --quiet main || true # ok if already on main
|
|
if ! git reset --hard --quiet origin/main; then
|
|
echo "fatal: git reset --hard failed in $REPO" >&2
|
|
exit 65
|
|
fi
|
|
|
|
echo "=== Security scan $DATE ===" > "$OUT_DIR/summary.txt"
|
|
date -Iseconds >> "$OUT_DIR/summary.txt"
|
|
echo >> "$OUT_DIR/summary.txt"
|
|
|
|
# --- pip-audit against the lockfile in a throwaway container ------------
|
|
# Previously we did `docker exec truenas-burnin pip install pip-audit`
|
|
# which mutated the live production container with a transient package.
|
|
# Now scan the lockfile in an ephemeral container — same coverage of
|
|
# pinned versions + their transitives, no side effects on prod.
|
|
echo "--- pip-audit (requirements.txt in throwaway container) ---" | tee -a "$OUT_DIR/summary.txt"
|
|
docker run --rm \
|
|
-v "$REPO/requirements.txt:/work/requirements.txt:ro" \
|
|
-w /work \
|
|
python:3.12-slim sh -c \
|
|
"pip install --quiet --no-cache-dir --disable-pip-version-check pip-audit 2>/dev/null && pip-audit --requirement requirements.txt --strict --format=columns" \
|
|
> "$OUT_DIR/pip-audit.txt" 2>&1
|
|
PIPS=$?
|
|
echo " exit=$PIPS ($OUT_DIR/pip-audit.txt)" | tee -a "$OUT_DIR/summary.txt"
|
|
|
|
# --- bandit against the LIVE deploy dir ---------------------------------
|
|
# Scan what's actually running, not what's in git — catches drift between
|
|
# forge HEAD and maple. B608 (SQL injection via dynamic strings) is
|
|
# skipped globally: every dynamic SQL build in this codebase uses
|
|
# bound parameters for data and structural placeholders only.
|
|
DEPLOY_DIR="${DEPLOY_DIR:-$HOME/docker/stacks/truenas-burnin}"
|
|
echo "--- bandit (deploy: $DEPLOY_DIR) ---" | tee -a "$OUT_DIR/summary.txt"
|
|
docker run --rm \
|
|
-v "$DEPLOY_DIR/app:/src:ro" \
|
|
python:3.12-slim sh -c \
|
|
"pip install --quiet --no-cache-dir --disable-pip-version-check bandit 2>/dev/null && bandit -r /src -ll -ii --skip B608" \
|
|
> "$OUT_DIR/bandit.txt" 2>&1
|
|
BANDITS=$?
|
|
echo " exit=$BANDITS ($OUT_DIR/bandit.txt)" | tee -a "$OUT_DIR/summary.txt"
|
|
|
|
# --- gitleaks against the full git history ------------------------------
|
|
echo "--- gitleaks ---" | tee -a "$OUT_DIR/summary.txt"
|
|
docker run --rm \
|
|
-v "$REPO:/repo:ro" \
|
|
"zricethezav/gitleaks:v$GITLEAKS_VERSION" \
|
|
detect --source /repo --no-banner --redact --verbose \
|
|
> "$OUT_DIR/gitleaks.txt" 2>&1
|
|
LEAKS=$?
|
|
echo " exit=$LEAKS ($OUT_DIR/gitleaks.txt)" | tee -a "$OUT_DIR/summary.txt"
|
|
|
|
# --- summary + notification --------------------------------------------
|
|
TOTAL_EXIT=$(( PIPS + BANDITS + LEAKS ))
|
|
{
|
|
echo
|
|
echo "Total findings exit-code sum: $TOTAL_EXIT"
|
|
echo " pip-audit: $PIPS"
|
|
echo " bandit: $BANDITS"
|
|
echo " gitleaks: $LEAKS"
|
|
} >> "$OUT_DIR/summary.txt"
|
|
|
|
if [ "$TOTAL_EXIT" -ne 0 ]; then
|
|
printf '%s — findings (pip-audit=%d bandit=%d gitleaks=%d) — see %s\n' \
|
|
"$DATE" "$PIPS" "$BANDITS" "$LEAKS" "$OUT_DIR" >> "$SUMMARY"
|
|
# Hook for downstream notification — wire to your existing Mattermost
|
|
# / Fastmail / webhook chain. Stays a no-op until SECURITY_SCAN_WEBHOOK
|
|
# is set in the systemd unit's Environment=.
|
|
if [ -n "${SECURITY_SCAN_WEBHOOK:-}" ]; then
|
|
curl -fsS -X POST -H 'Content-Type: text/plain' \
|
|
--data-binary "@$OUT_DIR/summary.txt" \
|
|
"$SECURITY_SCAN_WEBHOOK" || true
|
|
fi
|
|
fi
|
|
|
|
# Retention — keep last 30 daily directories, prune older.
|
|
find "$OUT_BASE" -maxdepth 1 -type d -name "scan-*" -mtime +30 \
|
|
-exec rm -rf {} \;
|
|
|
|
exit "$TOTAL_EXIT"
|