Comprehensive Backup & Recovery Plan¶
This document defines the backup strategy for all Research Relay systems, data stores, and configuration. It covers what to back up, how, where, how often, and how to restore from backups in a disaster recovery scenario.
1. Inventory of What Needs Backup¶
1.1 Data Stores¶
| Data Store | Location | Contents | Criticality |
|---|---|---|---|
| PostgreSQL (Medusa) | App server (/var/backup/pg/) |
Products, orders, customers, inventory, compliance data, attestations, COAs, lots, purity records | Critical — total data loss without backup |
| Redis | App server (port 6379) | Session cache, event bus, workflow state | Low — ephemeral; rebuilt on restart |
| Cloudflare R2 | rr-bizops-files bucket |
COA PDFs, SDS documents, product images | High — source documents; hard to recreate |
1.2 Application Code & Configuration¶
| Component | Location | Contents | Backup Method |
|---|---|---|---|
| Application source code | GitHub (rr-bizops repo) |
Medusa backend, storefront, docs, NixOS config | Git (GitHub is the primary backup) |
| Docker images | GHCR (ghcr.io/research-relay/rr-bizops/) |
Built application images (medusa, storefront) | Container registry; rebuilt from source |
| NixOS system config | nixos/ in repo + /etc/nixos/ on server |
Full server configuration (reproducible) | Git + NixOS flake lock |
| Server environment vars | /etc/rr-bizops/.env on app server |
DB credentials, API keys, secrets | 1Password vault (manual sync) |
1.3 External Services (SaaS — not self-hosted)¶
| Service | Data | Backup Responsibility | Export Method |
|---|---|---|---|
| Cloudflare | DNS records, R2 data, Pages config | Cloudflare manages; export DNS as zone file | Cloudflare API or dashboard export |
| BTCPay Server | Payment invoices, BTC transaction history | Self-hosted on shops-btc-01; separate backup plan |
BTCPay built-in backup + nix-bitcoin |
| Zoho Mail | Business email archive | Zoho manages; export via IMAP backup | offlineimap or Zoho admin export |
| Zoho Books | Accounting records, invoices | Zoho manages; periodic CSV/PDF export | Zoho Books export function |
| Mercury | Bank transactions | Mercury manages; download statements monthly | Dashboard CSV export |
| Koinly | Crypto tax records | Koinly manages; annual export | Koinly export function |
2. Backup Strategy by Component¶
2.1 PostgreSQL Database¶
The PostgreSQL database is the single most critical piece of data. It contains all orders, customers, products, inventory, and compliance records.
Current State¶
A basic daily backup already exists in nixos/app-server.nix:
systemd.services.pg-backup = {
description = "PostgreSQL backup";
serviceConfig = {
Type = "oneshot";
ExecStart = pkgs.writeShellScript "pg-backup" ''
${pkgs.postgresql_16}/bin/pg_dump -U medusa medusa \
| ${pkgs.gzip}/bin/gzip > /var/backup/pg/medusa-$(date +%Y%m%d_%H%M%S).sql.gz
find /var/backup/pg -name "*.sql.gz" -mtime +7 -delete
'';
};
};
systemd.timers.pg-backup = {
wantedBy = [ "timers.target" ];
timerConfig = {
OnCalendar = "daily";
Persistent = true;
};
};
Current gaps:
- Backups are stored only on the same server (no off-site copy)
- No backup verification (checksums, test restores)
- 7-day retention is short for compliance/legal needs
- No point-in-time recovery (PITR) — only daily snapshots
- No alerting on backup failures
Enhanced Backup Plan¶
Frequency & Retention:
| Tier | Frequency | Retention | Storage |
|---|---|---|---|
| Local daily | Every 24h (02:00 UTC) | 7 days | /var/backup/pg/ on app server |
| Off-site daily | Every 24h (03:00 UTC, after local) | 30 days | Cloudflare R2 bucket rr-backups |
| Off-site weekly | Every Sunday (04:00 UTC) | 90 days | Cloudflare R2 bucket rr-backups |
| Off-site monthly | 1st of month (04:00 UTC) | 1 year | Cloudflare R2 bucket rr-backups |
Implementation — Off-site upload script:
Add a new systemd service that runs after the local backup and uploads to R2:
# Add to nixos/app-server.nix
systemd.services.pg-backup-offsite = {
description = "Upload PostgreSQL backup to Cloudflare R2";
after = [ "pg-backup.service" ];
requires = [ "pg-backup.service" ];
serviceConfig = {
Type = "oneshot";
EnvironmentFile = "/etc/rr-bizops/backup.env";
ExecStart = pkgs.writeShellScript "pg-backup-offsite" ''
LATEST=$(ls -t /var/backup/pg/*.sql.gz | head -1)
BASENAME=$(basename "$LATEST")
DATE=$(date +%Y-%m-%d)
DAY_OF_WEEK=$(date +%u)
DAY_OF_MONTH=$(date +%d)
# Daily backup
${pkgs.awscli2}/bin/aws s3 cp "$LATEST" \
"s3://rr-backups/pg/daily/$BASENAME" \
--endpoint-url "$R2_ENDPOINT"
# Weekly backup (Sunday = 7)
if [ "$DAY_OF_WEEK" = "7" ]; then
${pkgs.awscli2}/bin/aws s3 cp "$LATEST" \
"s3://rr-backups/pg/weekly/$BASENAME" \
--endpoint-url "$R2_ENDPOINT"
fi
# Monthly backup (1st of month)
if [ "$DAY_OF_MONTH" = "01" ]; then
${pkgs.awscli2}/bin/aws s3 cp "$LATEST" \
"s3://rr-backups/pg/monthly/$BASENAME" \
--endpoint-url "$R2_ENDPOINT"
fi
'';
};
};
systemd.timers.pg-backup-offsite = {
wantedBy = [ "timers.target" ];
timerConfig = {
OnCalendar = "*-*-* 03:00:00";
Persistent = true;
};
};
R2 lifecycle rules (set in Cloudflare dashboard):
| Prefix | Rule | Effect |
|---|---|---|
pg/daily/ |
Delete after 30 days | Auto-cleanup of old daily backups |
pg/weekly/ |
Delete after 90 days | Auto-cleanup of old weekly backups |
pg/monthly/ |
Delete after 365 days | Auto-cleanup of old monthly backups |
Backup verification:
Add a weekly test-restore job that:
- Downloads the latest daily backup from R2
- Restores it into a temporary database
- Runs a basic integrity check (table count, row counts for critical tables)
- Drops the temporary database
- Logs success/failure
systemd.services.pg-backup-verify = {
description = "Verify PostgreSQL backup integrity";
serviceConfig = {
Type = "oneshot";
ExecStart = pkgs.writeShellScript "pg-backup-verify" ''
set -e
LATEST=$(ls -t /var/backup/pg/*.sql.gz | head -1)
TESTDB="medusa_backup_test"
# Create test database
${pkgs.postgresql_16}/bin/createdb -U medusa "$TESTDB" || true
# Restore
${pkgs.gzip}/bin/gunzip -c "$LATEST" | \
${pkgs.postgresql_16}/bin/psql -U medusa -d "$TESTDB" -q
# Verify critical tables exist and have rows
TABLES=("product" "order" "customer")
for TABLE in "''${TABLES[@]}"; do
COUNT=$(${pkgs.postgresql_16}/bin/psql -U medusa -d "$TESTDB" -t -c \
"SELECT COUNT(*) FROM $TABLE;" 2>/dev/null || echo "0")
echo "Table $TABLE: $COUNT rows"
done
# Cleanup
${pkgs.postgresql_16}/bin/dropdb -U medusa "$TESTDB"
echo "Backup verification completed successfully"
'';
};
};
systemd.timers.pg-backup-verify = {
wantedBy = [ "timers.target" ];
timerConfig = {
OnCalendar = "Sun *-*-* 05:00:00";
Persistent = true;
};
};
2.2 Cloudflare R2 File Storage (COAs, Documents)¶
COA PDFs and other uploaded files in R2 need backup to protect against accidental deletion or bucket misconfiguration.
Strategy¶
| Approach | Description |
|---|---|
| Primary | R2 bucket rr-bizops-files (production data) |
| Backup | R2 bucket rr-backups with files/ prefix (weekly sync) |
| Method | aws s3 sync using R2's S3-compatible API |
Implementation¶
systemd.services.r2-files-backup = {
description = "Sync R2 files bucket to backup bucket";
serviceConfig = {
Type = "oneshot";
EnvironmentFile = "/etc/rr-bizops/backup.env";
ExecStart = pkgs.writeShellScript "r2-files-backup" ''
${pkgs.awscli2}/bin/aws s3 sync \
"s3://rr-bizops-files/" \
"s3://rr-backups/files/$(date +%Y-%m-%d)/" \
--endpoint-url "$R2_ENDPOINT"
# Keep only 4 weekly snapshots
${pkgs.awscli2}/bin/aws s3 ls \
"s3://rr-backups/files/" \
--endpoint-url "$R2_ENDPOINT" | \
sort -r | tail -n +5 | while read -r line; do
PREFIX=$(echo "$line" | awk '{print $NF}')
${pkgs.awscli2}/bin/aws s3 rm \
"s3://rr-backups/files/$PREFIX" \
--recursive --endpoint-url "$R2_ENDPOINT"
done
'';
};
};
systemd.timers.r2-files-backup = {
wantedBy = [ "timers.target" ];
timerConfig = {
OnCalendar = "Sun *-*-* 04:00:00";
Persistent = true;
};
};
2.3 Application Source Code¶
Already backed up via Git/GitHub. The rr-bizops repository is the single source of truth for:
- Medusa backend (
app/) - Next.js storefront (
storefront/) - NixOS server configuration (
nixos/) - Operational documentation (
docs/) - CI/CD pipelines (
.github/)
Additional protections¶
| Protection | Implementation |
|---|---|
| Branch protection | Require PR reviews for main branch |
| Local clone | Keep a local clone on a separate machine (developer workstation) |
| GitHub backup | GitHub's own redundancy (geo-replicated) |
No additional backup infrastructure needed for source code.
2.4 Docker Images¶
Docker images are stored in GHCR (ghcr.io/research-relay/rr-bizops/). Images are tagged with both latest and the git SHA.
Backup strategy: Images can be rebuilt from source at any time. The git SHA tag ensures any deployed version can be precisely reproduced.
No additional backup needed beyond GHCR retention.
2.5 Server Configuration & Secrets¶
NixOS Configuration¶
The NixOS server configuration is fully declarative and stored in the repo (nixos/). To rebuild the server from scratch:
This is the key advantage of NixOS — the entire server is reproducible from the flake.
Secrets & Environment Variables¶
Secrets are stored in two places:
| Secret | Primary | Backup |
|---|---|---|
| Database URL | /etc/rr-bizops/.env |
1Password vault |
| Redis password | /etc/rr-bizops/.env |
1Password vault |
| JWT / Cookie secrets | /etc/rr-bizops/.env |
1Password vault |
| Resend API key | /etc/rr-bizops/.env |
1Password vault |
| R2 credentials | /etc/rr-bizops/.env |
1Password vault |
| Deploy SSH key | GitHub Secrets | 1Password vault |
| GHCR token | GitHub Secrets | (GitHub-managed) |
| Medusa publishable key | GitHub Secrets | 1Password vault |
Procedure: When any secret is created or rotated, immediately update both the server .env file and the 1Password vault entry.
2.6 BTCPay Server (shops-btc-01)¶
The BTCPay/Bitcoin infrastructure runs on a separate NixOS server managed by nix-bitcoin. It has its own backup requirements:
| Component | What | Backup Method |
|---|---|---|
| BTCPay PostgreSQL | Invoices, payment data | Daily pg_dump (nix-bitcoin default) |
| CLN database | Lightning channels, state | nix-bitcoin automated backup |
| Bitcoin wallet | xpub-derived keys | Hardware wallet seed (metal backup, physical safe) |
| Lightning SCB | Static Channel Backup | Automated to encrypted file; copy off-server |
| BTCPay config | Server settings | Nix flake (declarative, in git) |
BTCPay backup is managed by the nix-bitcoin configuration and is outside the scope of this document. Refer to the nix-bitcoin documentation and the BTCPay runbook (docs/payments/btcpay-runbook.md).
2.7 Cloudflare DNS¶
Export DNS zone file quarterly for disaster recovery in case of Cloudflare account issues:
# Export DNS records via Cloudflare API
curl -X GET "https://api.cloudflare.com/client/v4/zones/{zone_id}/dns_records/export" \
-H "Authorization: Bearer {api_token}" \
> dns-backup-$(date +%Y%m%d).txt
Store the export in 1Password as an attachment.
2.8 Email Archive (Zoho Mail)¶
Quarterly IMAP backup of business email:
Store the archive on a local encrypted drive. This protects against Zoho account issues and provides a local searchable archive.
3. Backup Infrastructure Summary¶
R2 Backup Bucket Layout¶
rr-backups/ # Dedicated backup bucket
├── pg/
│ ├── daily/ # 30-day retention
│ │ ├── medusa-20260301_020000.sql.gz
│ │ ├── medusa-20260302_020000.sql.gz
│ │ └── ...
│ ├── weekly/ # 90-day retention
│ │ ├── medusa-20260223_020000.sql.gz
│ │ └── ...
│ └── monthly/ # 1-year retention
│ ├── medusa-20260201_020000.sql.gz
│ └── ...
├── files/ # R2 files bucket mirror
│ ├── 2026-03-02/ # Weekly snapshot
│ │ ├── coa/
│ │ ├── sds/
│ │ └── products/
│ └── ...
└── misc/
└── dns/ # DNS zone file exports
Environment File for Backup Jobs¶
Create /etc/rr-bizops/backup.env:
AWS_ACCESS_KEY_ID=<r2-backup-access-key>
AWS_SECRET_ACCESS_KEY=<r2-backup-secret-key>
R2_ENDPOINT=https://<account-id>.r2.cloudflarestorage.com
Use a separate R2 API token scoped to the rr-backups bucket only.
4. Backup Schedule Summary¶
| Time (UTC) | Day | Job | Target |
|---|---|---|---|
| 02:00 | Daily | PostgreSQL dump (local) | /var/backup/pg/ |
| 03:00 | Daily | PostgreSQL upload to R2 | rr-backups/pg/daily/ |
| 04:00 | Sunday | R2 files sync | rr-backups/files/ |
| 04:00 | Sunday | Weekly PG backup copy | rr-backups/pg/weekly/ |
| 04:00 | 1st | Monthly PG backup copy | rr-backups/pg/monthly/ |
| 05:00 | Sunday | Backup verification | Temp DB on app server |
5. Disaster Recovery Procedures¶
5.1 Scenario: App Server Total Loss¶
RTO target: 2 hours | RPO target: 24 hours (last daily backup)
Steps to recover:
1. Provision new VPS (same provider or different)
2. Install NixOS minimal ISO
3. Clone the rr-bizops repo
4. Apply NixOS configuration:
$ nixos-rebuild switch --flake .#app-server
→ This installs PostgreSQL, Redis, Docker, Caddy, and all systemd services
5. Copy /etc/rr-bizops/.env from 1Password vault
6. Copy /etc/rr-bizops/backup.env from 1Password vault
7. Restore PostgreSQL from latest R2 backup:
$ aws s3 cp s3://rr-backups/pg/daily/$(aws s3 ls s3://rr-backups/pg/daily/ \
--endpoint-url $R2_ENDPOINT | sort | tail -1 | awk '{print $4}') \
/tmp/restore.sql.gz --endpoint-url $R2_ENDPOINT
$ gunzip -c /tmp/restore.sql.gz | psql -U medusa medusa
8. Pull latest Docker images:
$ docker pull ghcr.io/research-relay/rr-bizops/medusa:latest
$ docker pull ghcr.io/research-relay/rr-bizops/storefront:latest
9. Start services:
$ sudo systemctl start docker-medusa-server docker-medusa-worker docker-storefront
10. Update DNS A record for api.research-relay.com to new server IP
11. Verify:
- Admin dashboard accessible at api.research-relay.com/app
- Storefront loads at research-relay.com
- Test a product page with COA links
- Verify order history is intact
5.2 Scenario: Database Corruption¶
RTO target: 30 minutes | RPO target: 24 hours
1. Stop the Medusa server and worker:
$ sudo systemctl stop docker-medusa-server docker-medusa-worker
2. Drop and recreate the database:
$ psql -U postgres -c "DROP DATABASE medusa;"
$ psql -U postgres -c "CREATE DATABASE medusa OWNER medusa;"
3. Restore from latest local backup:
$ LATEST=$(ls -t /var/backup/pg/*.sql.gz | head -1)
$ gunzip -c "$LATEST" | psql -U medusa medusa
4. If local backups are also corrupted, restore from R2:
$ aws s3 cp s3://rr-backups/pg/daily/<latest>.sql.gz /tmp/restore.sql.gz \
--endpoint-url $R2_ENDPOINT
$ gunzip -c /tmp/restore.sql.gz | psql -U medusa medusa
5. Restart services:
$ sudo systemctl start docker-medusa-server docker-medusa-worker
6. Run Medusa migrations to catch up (if backup is older than latest deploy):
$ docker exec <medusa-container> npx medusa db:migrate
7. Verify order counts, product counts, latest orders are present
5.3 Scenario: Accidental File Deletion (R2)¶
1. Identify the deleted file(s) by checking COA records in the database
2. Restore from the weekly R2 backup:
$ aws s3 cp s3://rr-backups/files/<latest-date>/coa/<file-path> \
s3://rr-bizops-files/coa/<file-path> \
--endpoint-url $R2_ENDPOINT
3. Verify the file is accessible via cdn.research-relay.com
5.4 Scenario: Secret Compromise¶
1. Identify which secrets were compromised
2. Rotate affected credentials:
- Database password: ALTER USER medusa WITH PASSWORD '...'
- Redis password: Update app-server.nix, rebuild NixOS
- JWT/Cookie secrets: Update .env, restart Medusa (invalidates all sessions)
- R2 credentials: Revoke in Cloudflare dashboard, create new token
- API keys (Resend, etc.): Revoke in provider dashboard, create new key
3. Update /etc/rr-bizops/.env with new values
4. Update 1Password vault entries
5. Restart affected services
6. Audit access logs for unauthorized activity during exposure window
5.5 Scenario: GitHub Account/Repo Loss¶
1. Source code is in local developer clones
2. Push to a new repository (GitHub, GitLab, or self-hosted)
3. Update CI/CD workflow and deploy secrets
4. Docker images can be rebuilt from source
5. Resume normal operations
6. Monitoring & Alerting¶
Backup Job Monitoring¶
Add systemd service status checks to catch silent failures:
# Add to app-server.nix — simple health check that alerts on backup failure
systemd.services.backup-health-check = {
description = "Check backup job health";
serviceConfig = {
Type = "oneshot";
ExecStart = pkgs.writeShellScript "backup-health-check" ''
# Check that a backup was created in the last 26 hours
LATEST=$(find /var/backup/pg -name "*.sql.gz" -mmin -1560 | head -1)
if [ -z "$LATEST" ]; then
echo "ALERT: No PostgreSQL backup found in the last 26 hours!"
# Send alert via Resend API or write to a monitored log
exit 1
fi
# Check backup file is not empty (minimum 1KB)
SIZE=$(stat -f%z "$LATEST" 2>/dev/null || stat -c%s "$LATEST")
if [ "$SIZE" -lt 1024 ]; then
echo "ALERT: Latest backup file is suspiciously small ($SIZE bytes)"
exit 1
fi
echo "Backup health check passed: $LATEST ($SIZE bytes)"
'';
};
};
systemd.timers.backup-health-check = {
wantedBy = [ "timers.target" ];
timerConfig = {
OnCalendar = "*-*-* 06:00:00";
Persistent = true;
};
};
7. Implementation Checklist¶
-
R2 Backup Bucket
- Create R2 bucket
rr-backupsin Cloudflare dashboard - Create dedicated R2 API token for backup jobs (scoped to
rr-backups) - Set lifecycle rules: daily/30d, weekly/90d, monthly/365d
- Create
/etc/rr-bizops/backup.envon server
- Create R2 bucket
-
PostgreSQL Off-site Backups
- Add
pg-backup-offsitesystemd service toapp-server.nix - Add
pg-backup-offsitetimer (03:00 UTC daily) - Test: run manually, verify file appears in R2
- Verify lifecycle rules clean up old backups
- Add
-
Backup Verification
- Add
pg-backup-verifysystemd service toapp-server.nix - Add weekly timer (Sunday 05:00 UTC)
- Test: run manually, verify test restore succeeds
- Add
-
R2 Files Backup
- Add
r2-files-backupsystemd service toapp-server.nix - Add weekly timer (Sunday 04:00 UTC)
- Test: run manually, verify sync to backup bucket
- Add
-
Monitoring
- Add
backup-health-checkservice and timer - Verify alerting works (test with missing backup)
- Add
-
Secrets Management
- Audit: all secrets in
/etc/rr-bizops/.envare also in 1Password - Document secret rotation procedure in 1Password vault notes
- Audit: all secrets in
-
External Service Exports
- Set quarterly calendar reminder for DNS zone export
- Set quarterly calendar reminder for Zoho Mail IMAP backup
- Set monthly calendar reminder for Mercury statement download
-
Documentation
- Test full disaster recovery procedure (Scenario 5.1) on a throwaway VPS
- Test database restore procedure (Scenario 5.2)
- Record actual RTO/RPO achieved during test
8. Cost Estimate¶
| Item | Monthly Cost | Notes |
|---|---|---|
| R2 storage (backups) | ~$0.15 | ~10 GB estimated (compressed PG dumps + file copies) |
| R2 operations | ~$0.05 | Class A/B operations for sync jobs |
| Total | ~$0.20/mo | Negligible addition to infrastructure costs |
R2's zero egress fees mean restoring from backup costs nothing regardless of data size.