Skip to content

BTCPay Server Operational Runbook

Overview

This runbook covers day-to-day operations, maintenance schedules, monitoring, and incident response for the Research Relay BTCPay Server + Lightning node deployment on NixOS.


1. Daily Checks (~5 minutes)

Quick Health Check

Run from the server or via SSH:

# Check all services are running
systemctl status bitcoind clightning btcpayserver nginx

# Bitcoin node sync status
bitcoin-cli getblockchaininfo | jq '{blocks, headers, verificationprogress, pruned}'

# Lightning node status
lightning-cli getinfo | jq '{id, alias, num_peers, num_active_channels, blockheight}'

# Channel balances (local = your funds, remote = inbound capacity)
lightning-cli listfunds | jq '{
  onchain_sats: ([.outputs[].amount_msat] | add / 1000),
  channels: [.channels[] | {
    peer_id: .peer_id[:16],
    local_msat: .our_amount_msat,
    capacity_msat: .amount_msat,
    state: .state
  }]
}'

# Pending invoices (if any are stuck)
lightning-cli listinvoices | jq '[.invoices[] | select(.status == "unpaid")] | length'

What to Look For

Check Healthy Action Needed
bitcoind status active (running) Restart: systemctl restart bitcoind
Block height Matches mempool.space If >2 blocks behind, check disk/network
CLN peers >0 active peers Check network connectivity
CLN channels All CHANNELD_NORMAL Investigate any AWAITING_* or ONCHAIN
Inbound capacity >500K sats total Add inbound (see Lightning setup guide)
Disk usage <80% of partition Prune more aggressively or expand disk
BTCPay Server Accessible via HTTPS Check nginx, TLS cert, port 443

Automated Daily Health Script

Add to NixOS configuration as a systemd timer:

systemd.services.btcpay-health-check = {
  description = "BTCPay stack health check";
  serviceConfig.Type = "oneshot";
  path = with pkgs; [ coreutils curl jq bash ];
  script = ''
    #!/usr/bin/env bash
    set -euo pipefail

    ALERT_EMAIL="admin@research-relay.com"
    HEALTHY=true
    REPORT=""

    # Check bitcoind
    if ! systemctl is-active --quiet bitcoind; then
      REPORT+="CRITICAL: bitcoind is not running\n"
      HEALTHY=false
    fi

    # Check CLN
    if ! systemctl is-active --quiet clightning; then
      REPORT+="CRITICAL: clightning is not running\n"
      HEALTHY=false
    fi

    # Check BTCPay
    if ! systemctl is-active --quiet btcpayserver; then
      REPORT+="CRITICAL: btcpayserver is not running\n"
      HEALTHY=false
    fi

    # Check HTTPS endpoint
    HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://btcpay.research-relay.com || echo "000")
    if [ "$HTTP_STATUS" != "200" ] && [ "$HTTP_STATUS" != "302" ]; then
      REPORT+="WARNING: BTCPay HTTPS returned status $HTTP_STATUS\n"
      HEALTHY=false
    fi

    # Check disk usage
    DISK_PCT=$(df /var/lib/bitcoind --output=pcent | tail -1 | tr -d ' %')
    if [ "$DISK_PCT" -gt 85 ]; then
      REPORT+="WARNING: Disk usage at $DISK_PCT%\n"
      HEALTHY=false
    fi

    # Check Lightning channel count
    ACTIVE_CHANNELS=$(lightning-cli getinfo 2>/dev/null | jq '.num_active_channels // 0')
    if [ "$ACTIVE_CHANNELS" -eq 0 ]; then
      REPORT+="WARNING: No active Lightning channels\n"
      HEALTHY=false
    fi

    # Check inbound liquidity
    INBOUND=$(lightning-cli listfunds 2>/dev/null | jq '
      [.channels[] | (.amount_msat - .our_amount_msat)] | add // 0
    ')
    INBOUND_SATS=$((INBOUND / 1000))
    if [ "$INBOUND_SATS" -lt 200000 ]; then
      REPORT+="WARNING: Low inbound liquidity: $INBOUND_SATS sats\n"
      HEALTHY=false
    fi

    if [ "$HEALTHY" = false ]; then
      echo -e "BTCPay Health Check FAILED:\n$REPORT" | \
        mail -s "BTCPay Health Alert" "$ALERT_EMAIL"
      echo -e "UNHEALTHY:\n$REPORT"
      exit 1
    fi

    echo "All checks passed. Inbound: $INBOUND_SATS sats, Disk: $DISK_PCT%, Channels: $ACTIVE_CHANNELS"
  '';
};

systemd.timers.btcpay-health-check = {
  wantedBy = [ "timers.target" ];
  timerConfig = {
    OnCalendar = "*-*-* 08:00:00";  # Run daily at 8 AM
    Persistent = true;
  };
};

2. Weekly Maintenance (~15 minutes)

Channel Health Review

# List all channels with balance info
lightning-cli listpeerchannels | jq '[.channels[] | {
  peer: .peer_id[:16],
  state: .state,
  local_msat: .to_us_msat,
  capacity_msat: .total_msat,
  local_pct: ((.to_us_msat // 0) * 100 / (.total_msat // 1)),
  htlcs_pending: (.htlcs | length)
}]'

# Check for channels needing attention
# Channels >90% on your side = no inbound capacity remaining
# Channels <10% on your side = nearly drained outbound

Rebalancing (if CLBOSS is not handling it)

# Check if any channels are imbalanced
# If channel A is 95% local and channel B is 5% local,
# do a circular rebalance:
lightning-cli rebalance <outgoing_channel_id> <incoming_channel_id> <amount_msat>

# Or use the rebalance plugin if installed
lightning-cli rebalanceall

Software Updates Check

# Check for nix-bitcoin updates
cd /path/to/nixos-config
git fetch upstream
git log HEAD..upstream/master --oneline

# Check BTCPay release notes
# https://github.com/btcpayserver/btcpayserver/releases

# Apply updates via NixOS rebuild (test first)
# nixos-rebuild test --flake .
# nixos-rebuild switch --flake .

Backup Verification

# Verify latest backup exists and is recent
ls -la /var/backups/btcpay/

# Test backup decryption (don't actually restore)
gpg --decrypt --batch --passphrase-file /run/secrets/backup-passphrase \
  /var/backups/btcpay/backup-$(date +%Y%m%d).tar.gz.gpg | tar tzf -

# Verify CLN hsm_secret backup matches live
sha256sum /var/lib/clightning/bitcoin/hsm_secret
# Compare with stored backup hash

Review BTCPay Notifications

Log into BTCPay Server dashboard and check: - Notification bell (top right) for any pending notifications - Invoice list for expired or invalid invoices - Wallet balance vs expected


3. Monthly Tasks (~30 minutes)

Accounting Reconciliation

# Export BTCPay payment data
# Dashboard > Reporting > Payments > Export CSV

# Cross-reference with Medusa orders
# For each BTCPay settled invoice:
#   1. Match to Medusa order via orderId in metadata
#   2. Verify USD amount matches order total
#   3. Record BTC amount and FMV at settlement time
#   4. Flag any discrepancies

# Import to Koinly
# Upload the BTCPay CSV as custom import
# Verify Koinly correctly calculates FMV for each payment

Cold Storage Sweep

# Check on-chain wallet balance
lightning-cli listfunds | jq '[.outputs[].amount_msat] | add / 1000'

# If balance > 0.01 BTC (1,000,000 sats), sweep to cold storage:
# 1. In BTCPay dashboard: Wallets > Send
# 2. Enter cold storage address from hardware wallet
# 3. Set fee rate (check mempool.space for current rates)
# 4. Create PSBT
# 5. Sign with hardware wallet
# 6. Broadcast

# For Lightning funds:
# CLBOSS handles submarine swaps to on-chain automatically
# Alternatively, manual swap via Boltz:
# https://boltz.exchange/swap (Lightning → On-chain)

Security Audit

  • Review BTCPay Server access logs for unusual activity
  • Verify no new API keys were created without your knowledge
  • Check server SSH authorized_keys
  • Verify TLS certificate is valid and auto-renewing
  • Review NixOS configuration for any drift from expected state

Capacity Planning

# Review payment volume trends
# BTCPay Dashboard > Reporting > Payments

# If Lightning volume is growing:
# - Open additional channels
# - Increase channel sizes
# - Add more inbound liquidity

# If disk usage is growing:
df -h /var/lib/bitcoind
# Adjust prune value if needed

4. Incident Response

Incident: Node Goes Down

Symptoms: BTCPay Server unreachable, Lightning payments failing, no new blocks

Diagnosis:

# SSH into server
ssh shops-btc-01

# Check service status
systemctl status bitcoind clightning btcpayserver nginx

# Check system resources
top -bn1 | head -20
df -h
free -m

# Check recent logs
journalctl -u bitcoind --since "1 hour ago" --no-pager | tail -50
journalctl -u clightning --since "1 hour ago" --no-pager | tail -50
journalctl -u btcpayserver --since "1 hour ago" --no-pager | tail -50

Recovery:

# Restart individual services
systemctl restart btcpayserver
systemctl restart clightning
systemctl restart bitcoind

# If system-wide issue, full reboot
systemctl reboot

# After reboot, verify services started
systemctl status bitcoind clightning btcpayserver

# Check Bitcoin sync status (may take minutes to catch up)
bitcoin-cli getblockchaininfo | jq '.verificationprogress'

# Check Lightning channels reconnected
lightning-cli getinfo | jq '.num_active_channels'

Impact: During downtime, Lightning payments fail immediately. On-chain payments may still be detected once the node syncs. BTCPay invoices created during downtime will show errors.

Mitigation: On-chain payments via xpub still work even if BTCPay is temporarily down (the addresses are deterministic). Customers can be shown a fallback on-chain address.


Incident: Lightning Channel Force-Closed

Symptoms: Channel disappears from listpeerchannels, funds locked in timelock

Diagnosis:

# Check for closed channels
lightning-cli listclosedchannels | jq '.closedchannels[] | {
  peer_id: .peer_id[:16],
  capacity_msat: .capacity_msat,
  close_cause: .close_cause,
  funding_txid: .funding_txid
}'

# Check on-chain transactions for timelocked outputs
lightning-cli listfunds | jq '.outputs[] | select(.status == "unconfirmed")'

Recovery:

# Wait for timelock to expire (check locktime)
# Funds return to your on-chain wallet automatically after timelock

# Open a replacement channel if needed
lightning-cli connect <peer_id>@<host>:<port>
lightning-cli fundchannel <peer_id> <amount_sat>

Prevention: - Maintain high uptime to avoid peer-initiated force-closes - Avoid closing channels yourself unless absolutely necessary (cooperative close is much cheaper and faster) - CLBOSS avoids problematic peers automatically


Incident: Payment Stuck (HTLC pending)

Symptoms: Customer says they paid but invoice still shows "processing" or payment is in limbo

Diagnosis:

# Check pending HTLCs
lightning-cli listpeerchannels | jq '[.channels[] | select(.htlcs | length > 0) | {
  peer: .peer_id[:16],
  htlcs: .htlcs
}]'

# Check specific invoice
lightning-cli listinvoices | jq '.invoices[] | select(.label == "<invoice_label>")'

# For on-chain payments, check mempool
bitcoin-cli getrawtransaction <txid> true | jq '.confirmations'

Recovery: - Lightning HTLC stuck: Usually resolves within the HTLC timeout (typically 40-144 blocks). If the HTLC is stuck for >24 hours, the channel may need to be force-closed as a last resort. - On-chain unconfirmed: Wait for confirmation. If fee was too low, the sender needs to use RBF (Replace-By-Fee) or CPFP (Child-Pays-For-Parent). - BTCPay shows "Processing" indefinitely: Check if the configured number of confirmations has been reached. Default is 1 for on-chain. Lightning should settle instantly.

Customer communication: - On-chain: "Your payment has been received and is awaiting blockchain confirmation. This typically takes 10-60 minutes." - Lightning stuck: "There was an issue routing your Lightning payment. Please try again or use the on-chain option."


Incident: Wallet Compromise Suspected

Symptoms: Unexpected outgoing transactions, unauthorized channel opens/closes, unknown API keys

Immediate Actions:

# 1. IMMEDIATELY: Move funds to cold storage
# If hot wallet is enabled, create a PSBT to sweep all funds NOW

# 2. Revoke all API keys
# BTCPay Dashboard > Account > API Keys > Delete all

# 3. Close all Lightning channels cooperatively (if possible)
lightning-cli close <channel_id>
# Repeat for each channel

# 4. Disable BTCPay Server public access
systemctl stop nginx
# Or: iptables -A INPUT -p tcp --dport 443 -j DROP

# 5. Rotate all secrets
# - Change BTCPay admin password
# - Generate new API keys
# - Rotate webhook secrets
# - Change SSH keys
# - Check for unauthorized SSH authorized_keys

# 6. Investigate
journalctl --since "7 days ago" -u btcpayserver | grep -i "auth\|login\|api"
# Check access logs
journalctl --since "7 days ago" -u nginx | grep -v "200\|301\|302"

# 7. Rebuild server from NixOS config if in doubt
# NixOS is declarative — rebuild guarantees clean state
# nixos-rebuild switch --flake .

After recovery: - Generate new hsm_secret (new Lightning identity) - Generate new on-chain wallet (new xpub from hardware wallet) - Open fresh channels - Update all integration credentials (Medusa .env)


Incident: Disk Full

Symptoms: Services crash, database errors, Bitcoin node stops syncing

Diagnosis:

df -h
du -sh /var/lib/bitcoind /var/lib/clightning /var/lib/btcpayserver

# Check Bitcoin data size
bitcoin-cli getblockchaininfo | jq '.size_on_disk'

Recovery:

# 1. Free immediate space
journalctl --vacuum-size=100M

# 2. If Bitcoin data is too large, increase pruning
# In NixOS config:
# services.bitcoind.prune = 50000;  # Reduce to 50GB
# nixos-rebuild switch --flake .

# 3. Bitcoin will prune on next restart
systemctl restart bitcoind

# 4. Verify services recover
systemctl status bitcoind clightning btcpayserver

Prevention: Monitor disk usage in daily health check. Alert at 80% capacity.


5. Monitoring and Alerting

BTCPay Server lacks built-in comprehensive monitoring. Build external monitoring:

Layer 1: Uptime Monitoring (External)

Use an external service to monitor HTTPS availability: - Uptime Kuma (self-hosted) or Better Uptime (SaaS) - Monitor: https://btcpay.research-relay.com — expect 200 or 302 - Alert via email/SMS if down for >5 minutes

Layer 2: System Metrics (On-Server)

# Optional: Prometheus + node_exporter for system metrics
services.prometheus = {
  enable = true;
  exporters.node = {
    enable = true;
    enabledCollectors = [ "systemd" "diskstats" "filesystem" "meminfo" "netdev" ];
  };
};

# CLN Prometheus plugin (if available in nix-bitcoin)
# services.clightning.plugins.prometheus.enable = true;

Layer 3: Application Alerts (Systemd + Email)

The daily health check script (Section 1) covers application-level monitoring. For real-time alerts, add systemd failure notifications:

# Alert on service failure
systemd.services.bitcoind.unitConfig.OnFailure = "notify-failure@%n.service";
systemd.services.clightning.unitConfig.OnFailure = "notify-failure@%n.service";
systemd.services.btcpayserver.unitConfig.OnFailure = "notify-failure@%n.service";

systemd.services."notify-failure@" = {
  description = "Send failure notification for %i";
  serviceConfig.Type = "oneshot";
  script = ''
    echo "Service %i failed on $(hostname) at $(date)" | \
      mail -s "CRITICAL: %i failed" admin@research-relay.com
  '';
};

What to Alert On

Alert Severity Threshold
BTCPay HTTPS down Critical >5 min
bitcoind stopped Critical Immediate
clightning stopped Critical Immediate
btcpayserver stopped Critical Immediate
Disk usage Warning/Critical >80% / >90%
No active Lightning channels Warning 0 channels
Low inbound liquidity Warning <200K sats
Bitcoin node behind Warning >10 blocks behind
TLS cert expiring Warning <7 days
Force-close detected Warning Any occurrence
High memory usage Warning >90%

Log Locations

Service Log Command
bitcoind journalctl -u bitcoind -f
CLN journalctl -u clightning -f
BTCPay Server journalctl -u btcpayserver -f
nginx journalctl -u nginx -f
System journalctl -b
CLBOSS journalctl -u clightning -f (CLBOSS logs through CLN)

6. Common Operations Reference

Open a New Lightning Channel

# 1. Find a peer (check 1ML.com or Amboss.space)
# 2. Connect
lightning-cli connect <pubkey>@<host>:<port>

# 3. Fund channel
lightning-cli fundchannel <pubkey> <amount_sat>
# Example: lightning-cli fundchannel 03abcdef... 1000000

# 4. Wait for funding transaction to confirm (1 block)
# 5. Channel enters CHANNELD_NORMAL state

Close a Lightning Channel

# Cooperative close (preferred — cheaper, faster)
lightning-cli close <channel_id>

# Unilateral close (if peer is offline — last resort)
lightning-cli close <channel_id> 1
# Funds locked for timelock period

Send On-Chain Payment from BTCPay

# Via BTCPay Dashboard: Wallets > Send
# Or via API:
curl -X POST "https://btcpay.research-relay.com/api/v1/stores/{storeId}/wallet/transactions" \
  -H "Authorization: token $BTCPAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "destinations": [{"destination": "bc1q...", "amount": "0.001"}],
    "feerate": 10,
    "sign": false
  }'
# Returns PSBT for hardware wallet signing

Create a Manual Invoice

curl -X POST "https://btcpay.research-relay.com/api/v1/stores/{storeId}/invoices" \
  -H "Authorization: token $BTCPAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "amount": "49.99",
    "currency": "USD",
    "metadata": {"orderId": "manual-001", "buyer": "test@example.com"}
  }'

Check Invoice Status

curl "https://btcpay.research-relay.com/api/v1/stores/{storeId}/invoices/{invoiceId}" \
  -H "Authorization: token $BTCPAY_API_KEY" | jq '{status, amount, currency, metadata}'

Rotate API Keys

  1. BTCPay Dashboard > Account > Manage Account > API Keys
  2. Create new key with minimum required permissions:
  3. btcpay.store.cancreateinvoice
  4. btcpay.store.canviewinvoices
  5. btcpay.store.canmodifystoresettings (for webhook management)
  6. Update Medusa .env with new key
  7. Restart Medusa
  8. Delete old API key

NixOS Configuration Update

# Edit NixOS config
cd /path/to/nixos-config
vim flake.nix  # or relevant module

# Test build (does not switch)
nixos-rebuild test --flake .

# If test passes, switch
nixos-rebuild switch --flake .

# Verify services
systemctl status bitcoind clightning btcpayserver

7. Disaster Recovery

Scenario: Complete Server Loss

Recovery from NixOS config + backups:

  1. Provision new server (or reinstall shops-btc-01)
  2. Deploy NixOS configuration from git:
    nixos-rebuild switch --flake .
    
  3. Restore CLN hsm_secret:
    cp /backup/hsm_secret /var/lib/clightning/bitcoin/hsm_secret
    chown clightning:clightning /var/lib/clightning/bitcoin/hsm_secret
    chmod 600 /var/lib/clightning/bitcoin/hsm_secret
    
  4. Restore BTCPay PostgreSQL database:
    gpg --decrypt backup-latest.tar.gz.gpg | tar xzf -
    psql btcpayserver < btcpay-db.sql
    
  5. Wait for Bitcoin node to sync (pruned: ~24-48 hours, full: ~3-7 days)
  6. CLN will detect the hsm_secret and recover on-chain funds
  7. Old Lightning channels are lost — open new channels
  8. Restore Medusa integration settings (API keys, webhook secrets)

Recovery time estimate: 1-3 days (dominated by Bitcoin sync time)

Scenario: Corrupted CLN Database

If CLN database is corrupted but hsm_secret is intact:

  1. Stop CLN: systemctl stop clightning
  2. Remove corrupted database: rm /var/lib/clightning/bitcoin/lightningd.sqlite3
  3. Start CLN: systemctl start clightning
  4. CLN recreates database and recovers on-chain funds from hsm_secret
  5. Lightning channel funds may be lost or recovered via DLP (Data Loss Protection) if peers cooperate
  6. Open new channels

Scenario: Bitcoin Node Corruption

  1. Stop bitcoind: systemctl stop bitcoind
  2. Remove chainstate: rm -rf /var/lib/bitcoind/blocks /var/lib/bitcoind/chainstate
  3. Start bitcoind: systemctl start bitcoind
  4. Node re-syncs from scratch (pruned: ~24-48 hours)
  5. CLN and BTCPay reconnect automatically after sync completes