Skip to content

Comprehensive Backup & Recovery Plan

This document defines the backup strategy for all Research Relay systems, data stores, and configuration. It covers what to back up, how, where, how often, and how to restore from backups in a disaster recovery scenario.


1. Inventory of What Needs Backup

1.1 Data Stores

Data Store Location Contents Criticality
PostgreSQL (Medusa) App server (/var/backup/pg/) Products, orders, customers, inventory, compliance data, attestations, COAs, lots, purity records Critical — total data loss without backup
Redis App server (port 6379) Session cache, event bus, workflow state Low — ephemeral; rebuilt on restart
Cloudflare R2 rr-bizops-files bucket COA PDFs, SDS documents, product images High — source documents; hard to recreate

1.2 Application Code & Configuration

Component Location Contents Backup Method
Application source code GitHub (rr-bizops repo) Medusa backend, storefront, docs, NixOS config Git (GitHub is the primary backup)
Docker images GHCR (ghcr.io/research-relay/rr-bizops/) Built application images (medusa, storefront) Container registry; rebuilt from source
NixOS system config nixos/ in repo + /etc/nixos/ on server Full server configuration (reproducible) Git + NixOS flake lock
Server environment vars /etc/rr-bizops/.env on app server DB credentials, API keys, secrets 1Password vault (manual sync)

1.3 External Services (SaaS — not self-hosted)

Service Data Backup Responsibility Export Method
Cloudflare DNS records, R2 data, Pages config Cloudflare manages; export DNS as zone file Cloudflare API or dashboard export
BTCPay Server Payment invoices, BTC transaction history Self-hosted on shops-btc-01; separate backup plan BTCPay built-in backup + nix-bitcoin
Zoho Mail Business email archive Zoho manages; export via IMAP backup offlineimap or Zoho admin export
Zoho Books Accounting records, invoices Zoho manages; periodic CSV/PDF export Zoho Books export function
Mercury Bank transactions Mercury manages; download statements monthly Dashboard CSV export
Koinly Crypto tax records Koinly manages; annual export Koinly export function

2. Backup Strategy by Component

2.1 PostgreSQL Database

The PostgreSQL database is the single most critical piece of data. It contains all orders, customers, products, inventory, and compliance records.

Current State

A basic daily backup already exists in nixos/app-server.nix:

systemd.services.pg-backup = {
  description = "PostgreSQL backup";
  serviceConfig = {
    Type = "oneshot";
    ExecStart = pkgs.writeShellScript "pg-backup" ''
      ${pkgs.postgresql_16}/bin/pg_dump -U medusa medusa \
        | ${pkgs.gzip}/bin/gzip > /var/backup/pg/medusa-$(date +%Y%m%d_%H%M%S).sql.gz
      find /var/backup/pg -name "*.sql.gz" -mtime +7 -delete
    '';
  };
};

systemd.timers.pg-backup = {
  wantedBy = [ "timers.target" ];
  timerConfig = {
    OnCalendar = "daily";
    Persistent = true;
  };
};

Current gaps:

  • Backups are stored only on the same server (no off-site copy)
  • No backup verification (checksums, test restores)
  • 7-day retention is short for compliance/legal needs
  • No point-in-time recovery (PITR) — only daily snapshots
  • No alerting on backup failures

Enhanced Backup Plan

Frequency & Retention:

Tier Frequency Retention Storage
Local daily Every 24h (02:00 UTC) 7 days /var/backup/pg/ on app server
Off-site daily Every 24h (03:00 UTC, after local) 30 days Cloudflare R2 bucket rr-backups
Off-site weekly Every Sunday (04:00 UTC) 90 days Cloudflare R2 bucket rr-backups
Off-site monthly 1st of month (04:00 UTC) 1 year Cloudflare R2 bucket rr-backups

Implementation — Off-site upload script:

Add a new systemd service that runs after the local backup and uploads to R2:

# Add to nixos/app-server.nix

systemd.services.pg-backup-offsite = {
  description = "Upload PostgreSQL backup to Cloudflare R2";
  after = [ "pg-backup.service" ];
  requires = [ "pg-backup.service" ];
  serviceConfig = {
    Type = "oneshot";
    EnvironmentFile = "/etc/rr-bizops/backup.env";
    ExecStart = pkgs.writeShellScript "pg-backup-offsite" ''
      LATEST=$(ls -t /var/backup/pg/*.sql.gz | head -1)
      BASENAME=$(basename "$LATEST")
      DATE=$(date +%Y-%m-%d)
      DAY_OF_WEEK=$(date +%u)
      DAY_OF_MONTH=$(date +%d)

      # Daily backup
      ${pkgs.awscli2}/bin/aws s3 cp "$LATEST" \
        "s3://rr-backups/pg/daily/$BASENAME" \
        --endpoint-url "$R2_ENDPOINT"

      # Weekly backup (Sunday = 7)
      if [ "$DAY_OF_WEEK" = "7" ]; then
        ${pkgs.awscli2}/bin/aws s3 cp "$LATEST" \
          "s3://rr-backups/pg/weekly/$BASENAME" \
          --endpoint-url "$R2_ENDPOINT"
      fi

      # Monthly backup (1st of month)
      if [ "$DAY_OF_MONTH" = "01" ]; then
        ${pkgs.awscli2}/bin/aws s3 cp "$LATEST" \
          "s3://rr-backups/pg/monthly/$BASENAME" \
          --endpoint-url "$R2_ENDPOINT"
      fi
    '';
  };
};

systemd.timers.pg-backup-offsite = {
  wantedBy = [ "timers.target" ];
  timerConfig = {
    OnCalendar = "*-*-* 03:00:00";
    Persistent = true;
  };
};

R2 lifecycle rules (set in Cloudflare dashboard):

Prefix Rule Effect
pg/daily/ Delete after 30 days Auto-cleanup of old daily backups
pg/weekly/ Delete after 90 days Auto-cleanup of old weekly backups
pg/monthly/ Delete after 365 days Auto-cleanup of old monthly backups

Backup verification:

Add a weekly test-restore job that:

  1. Downloads the latest daily backup from R2
  2. Restores it into a temporary database
  3. Runs a basic integrity check (table count, row counts for critical tables)
  4. Drops the temporary database
  5. Logs success/failure
systemd.services.pg-backup-verify = {
  description = "Verify PostgreSQL backup integrity";
  serviceConfig = {
    Type = "oneshot";
    ExecStart = pkgs.writeShellScript "pg-backup-verify" ''
      set -e
      LATEST=$(ls -t /var/backup/pg/*.sql.gz | head -1)
      TESTDB="medusa_backup_test"

      # Create test database
      ${pkgs.postgresql_16}/bin/createdb -U medusa "$TESTDB" || true

      # Restore
      ${pkgs.gzip}/bin/gunzip -c "$LATEST" | \
        ${pkgs.postgresql_16}/bin/psql -U medusa -d "$TESTDB" -q

      # Verify critical tables exist and have rows
      TABLES=("product" "order" "customer")
      for TABLE in "''${TABLES[@]}"; do
        COUNT=$(${pkgs.postgresql_16}/bin/psql -U medusa -d "$TESTDB" -t -c \
          "SELECT COUNT(*) FROM $TABLE;" 2>/dev/null || echo "0")
        echo "Table $TABLE: $COUNT rows"
      done

      # Cleanup
      ${pkgs.postgresql_16}/bin/dropdb -U medusa "$TESTDB"
      echo "Backup verification completed successfully"
    '';
  };
};

systemd.timers.pg-backup-verify = {
  wantedBy = [ "timers.target" ];
  timerConfig = {
    OnCalendar = "Sun *-*-* 05:00:00";
    Persistent = true;
  };
};

2.2 Cloudflare R2 File Storage (COAs, Documents)

COA PDFs and other uploaded files in R2 need backup to protect against accidental deletion or bucket misconfiguration.

Strategy

Approach Description
Primary R2 bucket rr-bizops-files (production data)
Backup R2 bucket rr-backups with files/ prefix (weekly sync)
Method aws s3 sync using R2's S3-compatible API

Implementation

systemd.services.r2-files-backup = {
  description = "Sync R2 files bucket to backup bucket";
  serviceConfig = {
    Type = "oneshot";
    EnvironmentFile = "/etc/rr-bizops/backup.env";
    ExecStart = pkgs.writeShellScript "r2-files-backup" ''
      ${pkgs.awscli2}/bin/aws s3 sync \
        "s3://rr-bizops-files/" \
        "s3://rr-backups/files/$(date +%Y-%m-%d)/" \
        --endpoint-url "$R2_ENDPOINT"

      # Keep only 4 weekly snapshots
      ${pkgs.awscli2}/bin/aws s3 ls \
        "s3://rr-backups/files/" \
        --endpoint-url "$R2_ENDPOINT" | \
        sort -r | tail -n +5 | while read -r line; do
          PREFIX=$(echo "$line" | awk '{print $NF}')
          ${pkgs.awscli2}/bin/aws s3 rm \
            "s3://rr-backups/files/$PREFIX" \
            --recursive --endpoint-url "$R2_ENDPOINT"
        done
    '';
  };
};

systemd.timers.r2-files-backup = {
  wantedBy = [ "timers.target" ];
  timerConfig = {
    OnCalendar = "Sun *-*-* 04:00:00";
    Persistent = true;
  };
};

2.3 Application Source Code

Already backed up via Git/GitHub. The rr-bizops repository is the single source of truth for:

  • Medusa backend (app/)
  • Next.js storefront (storefront/)
  • NixOS server configuration (nixos/)
  • Operational documentation (docs/)
  • CI/CD pipelines (.github/)

Additional protections

Protection Implementation
Branch protection Require PR reviews for main branch
Local clone Keep a local clone on a separate machine (developer workstation)
GitHub backup GitHub's own redundancy (geo-replicated)

No additional backup infrastructure needed for source code.


2.4 Docker Images

Docker images are stored in GHCR (ghcr.io/research-relay/rr-bizops/). Images are tagged with both latest and the git SHA.

Backup strategy: Images can be rebuilt from source at any time. The git SHA tag ensures any deployed version can be precisely reproduced.

No additional backup needed beyond GHCR retention.


2.5 Server Configuration & Secrets

NixOS Configuration

The NixOS server configuration is fully declarative and stored in the repo (nixos/). To rebuild the server from scratch:

# On a fresh NixOS installation:
nixos-rebuild switch --flake .#app-server

This is the key advantage of NixOS — the entire server is reproducible from the flake.

Secrets & Environment Variables

Secrets are stored in two places:

Secret Primary Backup
Database URL /etc/rr-bizops/.env 1Password vault
Redis password /etc/rr-bizops/.env 1Password vault
JWT / Cookie secrets /etc/rr-bizops/.env 1Password vault
Resend API key /etc/rr-bizops/.env 1Password vault
R2 credentials /etc/rr-bizops/.env 1Password vault
Deploy SSH key GitHub Secrets 1Password vault
GHCR token GitHub Secrets (GitHub-managed)
Medusa publishable key GitHub Secrets 1Password vault

Procedure: When any secret is created or rotated, immediately update both the server .env file and the 1Password vault entry.


2.6 BTCPay Server (shops-btc-01)

The BTCPay/Bitcoin infrastructure runs on a separate NixOS server managed by nix-bitcoin. It has its own backup requirements:

Component What Backup Method
BTCPay PostgreSQL Invoices, payment data Daily pg_dump (nix-bitcoin default)
CLN database Lightning channels, state nix-bitcoin automated backup
Bitcoin wallet xpub-derived keys Hardware wallet seed (metal backup, physical safe)
Lightning SCB Static Channel Backup Automated to encrypted file; copy off-server
BTCPay config Server settings Nix flake (declarative, in git)

BTCPay backup is managed by the nix-bitcoin configuration and is outside the scope of this document. Refer to the nix-bitcoin documentation and the BTCPay runbook (docs/payments/btcpay-runbook.md).


2.7 Cloudflare DNS

Export DNS zone file quarterly for disaster recovery in case of Cloudflare account issues:

# Export DNS records via Cloudflare API
curl -X GET "https://api.cloudflare.com/client/v4/zones/{zone_id}/dns_records/export" \
  -H "Authorization: Bearer {api_token}" \
  > dns-backup-$(date +%Y%m%d).txt

Store the export in 1Password as an attachment.


2.8 Email Archive (Zoho Mail)

Quarterly IMAP backup of business email:

# Using offlineimap or similar IMAP sync tool
offlineimap -c offlineimap.conf

Store the archive on a local encrypted drive. This protects against Zoho account issues and provides a local searchable archive.


3. Backup Infrastructure Summary

R2 Backup Bucket Layout

rr-backups/                         # Dedicated backup bucket
├── pg/
│   ├── daily/                      # 30-day retention
│   │   ├── medusa-20260301_020000.sql.gz
│   │   ├── medusa-20260302_020000.sql.gz
│   │   └── ...
│   ├── weekly/                     # 90-day retention
│   │   ├── medusa-20260223_020000.sql.gz
│   │   └── ...
│   └── monthly/                    # 1-year retention
│       ├── medusa-20260201_020000.sql.gz
│       └── ...
├── files/                          # R2 files bucket mirror
│   ├── 2026-03-02/                 # Weekly snapshot
│   │   ├── coa/
│   │   ├── sds/
│   │   └── products/
│   └── ...
└── misc/
    └── dns/                        # DNS zone file exports

Environment File for Backup Jobs

Create /etc/rr-bizops/backup.env:

AWS_ACCESS_KEY_ID=<r2-backup-access-key>
AWS_SECRET_ACCESS_KEY=<r2-backup-secret-key>
R2_ENDPOINT=https://<account-id>.r2.cloudflarestorage.com

Use a separate R2 API token scoped to the rr-backups bucket only.


4. Backup Schedule Summary

Time (UTC) Day Job Target
02:00 Daily PostgreSQL dump (local) /var/backup/pg/
03:00 Daily PostgreSQL upload to R2 rr-backups/pg/daily/
04:00 Sunday R2 files sync rr-backups/files/
04:00 Sunday Weekly PG backup copy rr-backups/pg/weekly/
04:00 1st Monthly PG backup copy rr-backups/pg/monthly/
05:00 Sunday Backup verification Temp DB on app server

5. Disaster Recovery Procedures

5.1 Scenario: App Server Total Loss

RTO target: 2 hours | RPO target: 24 hours (last daily backup)

Steps to recover:

1. Provision new VPS (same provider or different)
2. Install NixOS minimal ISO
3. Clone the rr-bizops repo
4. Apply NixOS configuration:
   $ nixos-rebuild switch --flake .#app-server
   → This installs PostgreSQL, Redis, Docker, Caddy, and all systemd services
5. Copy /etc/rr-bizops/.env from 1Password vault
6. Copy /etc/rr-bizops/backup.env from 1Password vault
7. Restore PostgreSQL from latest R2 backup:
   $ aws s3 cp s3://rr-backups/pg/daily/$(aws s3 ls s3://rr-backups/pg/daily/ \
     --endpoint-url $R2_ENDPOINT | sort | tail -1 | awk '{print $4}') \
     /tmp/restore.sql.gz --endpoint-url $R2_ENDPOINT
   $ gunzip -c /tmp/restore.sql.gz | psql -U medusa medusa
8. Pull latest Docker images:
   $ docker pull ghcr.io/research-relay/rr-bizops/medusa:latest
   $ docker pull ghcr.io/research-relay/rr-bizops/storefront:latest
9. Start services:
   $ sudo systemctl start docker-medusa-server docker-medusa-worker docker-storefront
10. Update DNS A record for api.research-relay.com to new server IP
11. Verify:
    - Admin dashboard accessible at api.research-relay.com/app
    - Storefront loads at research-relay.com
    - Test a product page with COA links
    - Verify order history is intact

5.2 Scenario: Database Corruption

RTO target: 30 minutes | RPO target: 24 hours

1. Stop the Medusa server and worker:
   $ sudo systemctl stop docker-medusa-server docker-medusa-worker
2. Drop and recreate the database:
   $ psql -U postgres -c "DROP DATABASE medusa;"
   $ psql -U postgres -c "CREATE DATABASE medusa OWNER medusa;"
3. Restore from latest local backup:
   $ LATEST=$(ls -t /var/backup/pg/*.sql.gz | head -1)
   $ gunzip -c "$LATEST" | psql -U medusa medusa
4. If local backups are also corrupted, restore from R2:
   $ aws s3 cp s3://rr-backups/pg/daily/<latest>.sql.gz /tmp/restore.sql.gz \
     --endpoint-url $R2_ENDPOINT
   $ gunzip -c /tmp/restore.sql.gz | psql -U medusa medusa
5. Restart services:
   $ sudo systemctl start docker-medusa-server docker-medusa-worker
6. Run Medusa migrations to catch up (if backup is older than latest deploy):
   $ docker exec <medusa-container> npx medusa db:migrate
7. Verify order counts, product counts, latest orders are present

5.3 Scenario: Accidental File Deletion (R2)

1. Identify the deleted file(s) by checking COA records in the database
2. Restore from the weekly R2 backup:
   $ aws s3 cp s3://rr-backups/files/<latest-date>/coa/<file-path> \
     s3://rr-bizops-files/coa/<file-path> \
     --endpoint-url $R2_ENDPOINT
3. Verify the file is accessible via cdn.research-relay.com

5.4 Scenario: Secret Compromise

1. Identify which secrets were compromised
2. Rotate affected credentials:
   - Database password: ALTER USER medusa WITH PASSWORD '...'
   - Redis password: Update app-server.nix, rebuild NixOS
   - JWT/Cookie secrets: Update .env, restart Medusa (invalidates all sessions)
   - R2 credentials: Revoke in Cloudflare dashboard, create new token
   - API keys (Resend, etc.): Revoke in provider dashboard, create new key
3. Update /etc/rr-bizops/.env with new values
4. Update 1Password vault entries
5. Restart affected services
6. Audit access logs for unauthorized activity during exposure window

5.5 Scenario: GitHub Account/Repo Loss

1. Source code is in local developer clones
2. Push to a new repository (GitHub, GitLab, or self-hosted)
3. Update CI/CD workflow and deploy secrets
4. Docker images can be rebuilt from source
5. Resume normal operations

6. Monitoring & Alerting

Backup Job Monitoring

Add systemd service status checks to catch silent failures:

# Add to app-server.nix — simple health check that alerts on backup failure

systemd.services.backup-health-check = {
  description = "Check backup job health";
  serviceConfig = {
    Type = "oneshot";
    ExecStart = pkgs.writeShellScript "backup-health-check" ''
      # Check that a backup was created in the last 26 hours
      LATEST=$(find /var/backup/pg -name "*.sql.gz" -mmin -1560 | head -1)
      if [ -z "$LATEST" ]; then
        echo "ALERT: No PostgreSQL backup found in the last 26 hours!"
        # Send alert via Resend API or write to a monitored log
        exit 1
      fi

      # Check backup file is not empty (minimum 1KB)
      SIZE=$(stat -f%z "$LATEST" 2>/dev/null || stat -c%s "$LATEST")
      if [ "$SIZE" -lt 1024 ]; then
        echo "ALERT: Latest backup file is suspiciously small ($SIZE bytes)"
        exit 1
      fi

      echo "Backup health check passed: $LATEST ($SIZE bytes)"
    '';
  };
};

systemd.timers.backup-health-check = {
  wantedBy = [ "timers.target" ];
  timerConfig = {
    OnCalendar = "*-*-* 06:00:00";
    Persistent = true;
  };
};

7. Implementation Checklist

  • R2 Backup Bucket

    • Create R2 bucket rr-backups in Cloudflare dashboard
    • Create dedicated R2 API token for backup jobs (scoped to rr-backups)
    • Set lifecycle rules: daily/30d, weekly/90d, monthly/365d
    • Create /etc/rr-bizops/backup.env on server
  • PostgreSQL Off-site Backups

    • Add pg-backup-offsite systemd service to app-server.nix
    • Add pg-backup-offsite timer (03:00 UTC daily)
    • Test: run manually, verify file appears in R2
    • Verify lifecycle rules clean up old backups
  • Backup Verification

    • Add pg-backup-verify systemd service to app-server.nix
    • Add weekly timer (Sunday 05:00 UTC)
    • Test: run manually, verify test restore succeeds
  • R2 Files Backup

    • Add r2-files-backup systemd service to app-server.nix
    • Add weekly timer (Sunday 04:00 UTC)
    • Test: run manually, verify sync to backup bucket
  • Monitoring

    • Add backup-health-check service and timer
    • Verify alerting works (test with missing backup)
  • Secrets Management

    • Audit: all secrets in /etc/rr-bizops/.env are also in 1Password
    • Document secret rotation procedure in 1Password vault notes
  • External Service Exports

    • Set quarterly calendar reminder for DNS zone export
    • Set quarterly calendar reminder for Zoho Mail IMAP backup
    • Set monthly calendar reminder for Mercury statement download
  • Documentation

    • Test full disaster recovery procedure (Scenario 5.1) on a throwaway VPS
    • Test database restore procedure (Scenario 5.2)
    • Record actual RTO/RPO achieved during test

8. Cost Estimate

Item Monthly Cost Notes
R2 storage (backups) ~$0.15 ~10 GB estimated (compressed PG dumps + file copies)
R2 operations ~$0.05 Class A/B operations for sync jobs
Total ~$0.20/mo Negligible addition to infrastructure costs

R2's zero egress fees mean restoring from backup costs nothing regardless of data size.