Skip to content

Harden Raspberry Pi 24/7 operation with systemd resilience, health probes, archival, and recovery tooling#12

Draft
Copilot wants to merge 2 commits into
mainfrom
copilot/make-python-run-247
Draft

Harden Raspberry Pi 24/7 operation with systemd resilience, health probes, archival, and recovery tooling#12
Copilot wants to merge 2 commits into
mainfrom
copilot/make-python-run-247

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 14, 2026

This PR implements an always-on operating model for the Python monitor on Raspberry Pi, focused on service resilience and continuous queryability. It adds production-oriented operational controls around /api/status health, data retention, and fast recovery.

  • Service hardening (24/7 uptime)

    • Strengthened python/connectivity-monitor@.service with restart semantics for long-running operation (Restart=always, shorter backoff), startup ordering on network-online.target, runtime limits/timeouts, and safer service isolation defaults.
    • Kept headless execution as the primary service mode for unattended deployment.
  • Scheduled health supervision

    • Added python/ops/health_probe.py to probe /api/status, evaluate degradations (health score / packet loss), persist consecutive-failure state, and optionally trigger reboot after threshold.
    • Added python/systemd/connectivity-monitor-healthcheck@.service + .timer for periodic health checks.
    • Healthcheck port is configurable via environment files/drop-ins (WEB_PORT).
  • Log/report persistence + retention automation

    • Added python/ops/archive_artifacts.py to archive ~/ConnectivityMonitor/logs and ~/ConnectivityMonitor/reports, prune old archives, and handle deletion failures explicitly.
    • Added python/systemd/connectivity-monitor-archive@.service + .timer for daily archival rotation.
  • Recovery ergonomics

    • Added python/ops/recover_service.sh as a one-command recovery/verification entrypoint:
      • daemon reload
      • enable/restart instance
      • status + recent logs
      • API validation (with explicit failure reporting)
    • Script supports configurable web port (arg or config-derived).
  • Operational documentation for production Pi setup

    • Expanded python/README.md and top-level README.md with:
      • stable IP/hostname guidance
      • reverse proxy pattern for TLS/auth-controlled access
      • healthcheck/archival timer setup
      • optional reboot policy notes
      • soak-test checklist (48–72h) and recovery command usage
# python/systemd/connectivity-monitor-healthcheck@.service
[Service]
Environment=WEB_PORT=8080
EnvironmentFile=-/etc/default/connectivity-monitor
EnvironmentFile=-/etc/default/connectivity-monitor-%i
ExecStart=/usr/bin/python3 %h/ConnectivityMonitor/python/ops/health_probe.py \
  --url http://127.0.0.1:${WEB_PORT}/api/status \
  --min-health 60 --max-loss 10 \
  --state-file %h/ConnectivityMonitor/health_probe_state.json

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an “always-on” operational model for the Python Connectivity Monitor on Raspberry Pi by introducing systemd hardening, periodic health probing, daily artifact archiving, and a recovery helper script, along with updated operational documentation.

Changes:

  • Hardened the main connectivity-monitor@.service unit for long-running, resilient operation.
  • Added systemd timer-driven health probes (/api/status) and daily log/report archival + pruning.
  • Added recovery/validation tooling and expanded deployment/operations documentation for 24/7 Pi setups.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
README.md Mentions the new 24/7 ops units/tools at the top level.
python/connectivity-monitor@.service Updates the main monitor systemd unit with restart behavior and sandboxing/hardening options.
python/systemd/connectivity-monitor-healthcheck@.service Adds a oneshot systemd unit to run the API health probe.
python/systemd/connectivity-monitor-healthcheck@.timer Schedules the health probe to run periodically.
python/systemd/connectivity-monitor-archive@.service Adds a oneshot systemd unit to run log/report archival + pruning.
python/systemd/connectivity-monitor-archive@.timer Schedules daily archival rotation.
python/ops/health_probe.py Implements /api/status probing with stateful consecutive-failure tracking and optional reboot triggering.
python/ops/archive_artifacts.py Implements tar.gz archival of logs/reports plus retention pruning.
python/ops/recover_service.sh Adds a one-command systemd recovery + API validation helper.
python/README.md Expands Raspberry Pi production setup and operational guidance (healthcheck, archival, recovery, soak tests).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

UMask=0027
LimitNOFILE=65536
TasksMax=512
NoNewPrivileges=true
[Unit]
Description=Connectivity Monitor API health probe for %i
After=connectivity-monitor@%i.service
Requires=connectivity-monitor@%i.service
Comment on lines +13 to +16
if [[ -z "${PORT}" ]]; then
CONFIG_PATH="/home/${USER_NAME}/ConnectivityMonitor/monitor_config.json"
if [[ -f "${CONFIG_PATH}" ]]; then
if ! PORT="$(python3 - "${CONFIG_PATH}" <<'PY'
Comment on lines +43 to +49
for name in os.listdir(archive_dir):
if not name.endswith(".tar.gz"):
continue
full = os.path.join(archive_dir, name)
if os.path.isfile(full) and (now - os.path.getmtime(full)) > max_age:
os.remove(full)
removed += 1
Comment thread python/README.md
Comment on lines +184 to +190
Keep the monitor on localhost and publish through a reverse proxy (Nginx/Caddy/Traefik) to add:

- HTTPS/TLS certificates
- Basic auth or SSO
- IP allow-listing/rate limits

Proxy upstream target: `http://127.0.0.1:8080`
Comment thread README.md
Comment on lines +119 to +120
- `python/systemd/connectivity-monitor-healthcheck@.service|.timer` (scheduled API health probe)
- `python/systemd/connectivity-monitor-archive@.service|.timer` (daily archive/prune of logs/reports)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants