BTCPay Server Operational Runbook¶
Overview¶
This runbook covers day-to-day operations, maintenance schedules, monitoring, and incident response for the Research Relay BTCPay Server + Lightning node deployment on NixOS.
1. Daily Checks (~5 minutes)¶
Quick Health Check¶
Run from the server or via SSH:
# Check all services are running
systemctl status bitcoind clightning btcpayserver nginx
# Bitcoin node sync status
bitcoin-cli getblockchaininfo | jq '{blocks, headers, verificationprogress, pruned}'
# Lightning node status
lightning-cli getinfo | jq '{id, alias, num_peers, num_active_channels, blockheight}'
# Channel balances (local = your funds, remote = inbound capacity)
lightning-cli listfunds | jq '{
onchain_sats: ([.outputs[].amount_msat] | add / 1000),
channels: [.channels[] | {
peer_id: .peer_id[:16],
local_msat: .our_amount_msat,
capacity_msat: .amount_msat,
state: .state
}]
}'
# Pending invoices (if any are stuck)
lightning-cli listinvoices | jq '[.invoices[] | select(.status == "unpaid")] | length'
What to Look For¶
| Check | Healthy | Action Needed |
|---|---|---|
| bitcoind status | active (running) |
Restart: systemctl restart bitcoind |
| Block height | Matches mempool.space | If >2 blocks behind, check disk/network |
| CLN peers | >0 active peers | Check network connectivity |
| CLN channels | All CHANNELD_NORMAL |
Investigate any AWAITING_* or ONCHAIN |
| Inbound capacity | >500K sats total | Add inbound (see Lightning setup guide) |
| Disk usage | <80% of partition | Prune more aggressively or expand disk |
| BTCPay Server | Accessible via HTTPS | Check nginx, TLS cert, port 443 |
Automated Daily Health Script¶
Add to NixOS configuration as a systemd timer:
systemd.services.btcpay-health-check = {
description = "BTCPay stack health check";
serviceConfig.Type = "oneshot";
path = with pkgs; [ coreutils curl jq bash ];
script = ''
#!/usr/bin/env bash
set -euo pipefail
ALERT_EMAIL="admin@research-relay.com"
HEALTHY=true
REPORT=""
# Check bitcoind
if ! systemctl is-active --quiet bitcoind; then
REPORT+="CRITICAL: bitcoind is not running\n"
HEALTHY=false
fi
# Check CLN
if ! systemctl is-active --quiet clightning; then
REPORT+="CRITICAL: clightning is not running\n"
HEALTHY=false
fi
# Check BTCPay
if ! systemctl is-active --quiet btcpayserver; then
REPORT+="CRITICAL: btcpayserver is not running\n"
HEALTHY=false
fi
# Check HTTPS endpoint
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://btcpay.research-relay.com || echo "000")
if [ "$HTTP_STATUS" != "200" ] && [ "$HTTP_STATUS" != "302" ]; then
REPORT+="WARNING: BTCPay HTTPS returned status $HTTP_STATUS\n"
HEALTHY=false
fi
# Check disk usage
DISK_PCT=$(df /var/lib/bitcoind --output=pcent | tail -1 | tr -d ' %')
if [ "$DISK_PCT" -gt 85 ]; then
REPORT+="WARNING: Disk usage at $DISK_PCT%\n"
HEALTHY=false
fi
# Check Lightning channel count
ACTIVE_CHANNELS=$(lightning-cli getinfo 2>/dev/null | jq '.num_active_channels // 0')
if [ "$ACTIVE_CHANNELS" -eq 0 ]; then
REPORT+="WARNING: No active Lightning channels\n"
HEALTHY=false
fi
# Check inbound liquidity
INBOUND=$(lightning-cli listfunds 2>/dev/null | jq '
[.channels[] | (.amount_msat - .our_amount_msat)] | add // 0
')
INBOUND_SATS=$((INBOUND / 1000))
if [ "$INBOUND_SATS" -lt 200000 ]; then
REPORT+="WARNING: Low inbound liquidity: $INBOUND_SATS sats\n"
HEALTHY=false
fi
if [ "$HEALTHY" = false ]; then
echo -e "BTCPay Health Check FAILED:\n$REPORT" | \
mail -s "BTCPay Health Alert" "$ALERT_EMAIL"
echo -e "UNHEALTHY:\n$REPORT"
exit 1
fi
echo "All checks passed. Inbound: $INBOUND_SATS sats, Disk: $DISK_PCT%, Channels: $ACTIVE_CHANNELS"
'';
};
systemd.timers.btcpay-health-check = {
wantedBy = [ "timers.target" ];
timerConfig = {
OnCalendar = "*-*-* 08:00:00"; # Run daily at 8 AM
Persistent = true;
};
};
2. Weekly Maintenance (~15 minutes)¶
Channel Health Review¶
# List all channels with balance info
lightning-cli listpeerchannels | jq '[.channels[] | {
peer: .peer_id[:16],
state: .state,
local_msat: .to_us_msat,
capacity_msat: .total_msat,
local_pct: ((.to_us_msat // 0) * 100 / (.total_msat // 1)),
htlcs_pending: (.htlcs | length)
}]'
# Check for channels needing attention
# Channels >90% on your side = no inbound capacity remaining
# Channels <10% on your side = nearly drained outbound
Rebalancing (if CLBOSS is not handling it)¶
# Check if any channels are imbalanced
# If channel A is 95% local and channel B is 5% local,
# do a circular rebalance:
lightning-cli rebalance <outgoing_channel_id> <incoming_channel_id> <amount_msat>
# Or use the rebalance plugin if installed
lightning-cli rebalanceall
Software Updates Check¶
# Check for nix-bitcoin updates
cd /path/to/nixos-config
git fetch upstream
git log HEAD..upstream/master --oneline
# Check BTCPay release notes
# https://github.com/btcpayserver/btcpayserver/releases
# Apply updates via NixOS rebuild (test first)
# nixos-rebuild test --flake .
# nixos-rebuild switch --flake .
Backup Verification¶
# Verify latest backup exists and is recent
ls -la /var/backups/btcpay/
# Test backup decryption (don't actually restore)
gpg --decrypt --batch --passphrase-file /run/secrets/backup-passphrase \
/var/backups/btcpay/backup-$(date +%Y%m%d).tar.gz.gpg | tar tzf -
# Verify CLN hsm_secret backup matches live
sha256sum /var/lib/clightning/bitcoin/hsm_secret
# Compare with stored backup hash
Review BTCPay Notifications¶
Log into BTCPay Server dashboard and check: - Notification bell (top right) for any pending notifications - Invoice list for expired or invalid invoices - Wallet balance vs expected
3. Monthly Tasks (~30 minutes)¶
Accounting Reconciliation¶
# Export BTCPay payment data
# Dashboard > Reporting > Payments > Export CSV
# Cross-reference with Medusa orders
# For each BTCPay settled invoice:
# 1. Match to Medusa order via orderId in metadata
# 2. Verify USD amount matches order total
# 3. Record BTC amount and FMV at settlement time
# 4. Flag any discrepancies
# Import to Koinly
# Upload the BTCPay CSV as custom import
# Verify Koinly correctly calculates FMV for each payment
Cold Storage Sweep¶
# Check on-chain wallet balance
lightning-cli listfunds | jq '[.outputs[].amount_msat] | add / 1000'
# If balance > 0.01 BTC (1,000,000 sats), sweep to cold storage:
# 1. In BTCPay dashboard: Wallets > Send
# 2. Enter cold storage address from hardware wallet
# 3. Set fee rate (check mempool.space for current rates)
# 4. Create PSBT
# 5. Sign with hardware wallet
# 6. Broadcast
# For Lightning funds:
# CLBOSS handles submarine swaps to on-chain automatically
# Alternatively, manual swap via Boltz:
# https://boltz.exchange/swap (Lightning → On-chain)
Security Audit¶
- Review BTCPay Server access logs for unusual activity
- Verify no new API keys were created without your knowledge
- Check server SSH authorized_keys
- Verify TLS certificate is valid and auto-renewing
- Review NixOS configuration for any drift from expected state
Capacity Planning¶
# Review payment volume trends
# BTCPay Dashboard > Reporting > Payments
# If Lightning volume is growing:
# - Open additional channels
# - Increase channel sizes
# - Add more inbound liquidity
# If disk usage is growing:
df -h /var/lib/bitcoind
# Adjust prune value if needed
4. Incident Response¶
Incident: Node Goes Down¶
Symptoms: BTCPay Server unreachable, Lightning payments failing, no new blocks
Diagnosis:
# SSH into server
ssh shops-btc-01
# Check service status
systemctl status bitcoind clightning btcpayserver nginx
# Check system resources
top -bn1 | head -20
df -h
free -m
# Check recent logs
journalctl -u bitcoind --since "1 hour ago" --no-pager | tail -50
journalctl -u clightning --since "1 hour ago" --no-pager | tail -50
journalctl -u btcpayserver --since "1 hour ago" --no-pager | tail -50
Recovery:
# Restart individual services
systemctl restart btcpayserver
systemctl restart clightning
systemctl restart bitcoind
# If system-wide issue, full reboot
systemctl reboot
# After reboot, verify services started
systemctl status bitcoind clightning btcpayserver
# Check Bitcoin sync status (may take minutes to catch up)
bitcoin-cli getblockchaininfo | jq '.verificationprogress'
# Check Lightning channels reconnected
lightning-cli getinfo | jq '.num_active_channels'
Impact: During downtime, Lightning payments fail immediately. On-chain payments may still be detected once the node syncs. BTCPay invoices created during downtime will show errors.
Mitigation: On-chain payments via xpub still work even if BTCPay is temporarily down (the addresses are deterministic). Customers can be shown a fallback on-chain address.
Incident: Lightning Channel Force-Closed¶
Symptoms: Channel disappears from listpeerchannels, funds locked in timelock
Diagnosis:
# Check for closed channels
lightning-cli listclosedchannels | jq '.closedchannels[] | {
peer_id: .peer_id[:16],
capacity_msat: .capacity_msat,
close_cause: .close_cause,
funding_txid: .funding_txid
}'
# Check on-chain transactions for timelocked outputs
lightning-cli listfunds | jq '.outputs[] | select(.status == "unconfirmed")'
Recovery:
# Wait for timelock to expire (check locktime)
# Funds return to your on-chain wallet automatically after timelock
# Open a replacement channel if needed
lightning-cli connect <peer_id>@<host>:<port>
lightning-cli fundchannel <peer_id> <amount_sat>
Prevention: - Maintain high uptime to avoid peer-initiated force-closes - Avoid closing channels yourself unless absolutely necessary (cooperative close is much cheaper and faster) - CLBOSS avoids problematic peers automatically
Incident: Payment Stuck (HTLC pending)¶
Symptoms: Customer says they paid but invoice still shows "processing" or payment is in limbo
Diagnosis:
# Check pending HTLCs
lightning-cli listpeerchannels | jq '[.channels[] | select(.htlcs | length > 0) | {
peer: .peer_id[:16],
htlcs: .htlcs
}]'
# Check specific invoice
lightning-cli listinvoices | jq '.invoices[] | select(.label == "<invoice_label>")'
# For on-chain payments, check mempool
bitcoin-cli getrawtransaction <txid> true | jq '.confirmations'
Recovery: - Lightning HTLC stuck: Usually resolves within the HTLC timeout (typically 40-144 blocks). If the HTLC is stuck for >24 hours, the channel may need to be force-closed as a last resort. - On-chain unconfirmed: Wait for confirmation. If fee was too low, the sender needs to use RBF (Replace-By-Fee) or CPFP (Child-Pays-For-Parent). - BTCPay shows "Processing" indefinitely: Check if the configured number of confirmations has been reached. Default is 1 for on-chain. Lightning should settle instantly.
Customer communication: - On-chain: "Your payment has been received and is awaiting blockchain confirmation. This typically takes 10-60 minutes." - Lightning stuck: "There was an issue routing your Lightning payment. Please try again or use the on-chain option."
Incident: Wallet Compromise Suspected¶
Symptoms: Unexpected outgoing transactions, unauthorized channel opens/closes, unknown API keys
Immediate Actions:
# 1. IMMEDIATELY: Move funds to cold storage
# If hot wallet is enabled, create a PSBT to sweep all funds NOW
# 2. Revoke all API keys
# BTCPay Dashboard > Account > API Keys > Delete all
# 3. Close all Lightning channels cooperatively (if possible)
lightning-cli close <channel_id>
# Repeat for each channel
# 4. Disable BTCPay Server public access
systemctl stop nginx
# Or: iptables -A INPUT -p tcp --dport 443 -j DROP
# 5. Rotate all secrets
# - Change BTCPay admin password
# - Generate new API keys
# - Rotate webhook secrets
# - Change SSH keys
# - Check for unauthorized SSH authorized_keys
# 6. Investigate
journalctl --since "7 days ago" -u btcpayserver | grep -i "auth\|login\|api"
# Check access logs
journalctl --since "7 days ago" -u nginx | grep -v "200\|301\|302"
# 7. Rebuild server from NixOS config if in doubt
# NixOS is declarative — rebuild guarantees clean state
# nixos-rebuild switch --flake .
After recovery:
- Generate new hsm_secret (new Lightning identity)
- Generate new on-chain wallet (new xpub from hardware wallet)
- Open fresh channels
- Update all integration credentials (Medusa .env)
Incident: Disk Full¶
Symptoms: Services crash, database errors, Bitcoin node stops syncing
Diagnosis:
df -h
du -sh /var/lib/bitcoind /var/lib/clightning /var/lib/btcpayserver
# Check Bitcoin data size
bitcoin-cli getblockchaininfo | jq '.size_on_disk'
Recovery:
# 1. Free immediate space
journalctl --vacuum-size=100M
# 2. If Bitcoin data is too large, increase pruning
# In NixOS config:
# services.bitcoind.prune = 50000; # Reduce to 50GB
# nixos-rebuild switch --flake .
# 3. Bitcoin will prune on next restart
systemctl restart bitcoind
# 4. Verify services recover
systemctl status bitcoind clightning btcpayserver
Prevention: Monitor disk usage in daily health check. Alert at 80% capacity.
5. Monitoring and Alerting¶
Recommended Monitoring Stack¶
BTCPay Server lacks built-in comprehensive monitoring. Build external monitoring:
Layer 1: Uptime Monitoring (External)¶
Use an external service to monitor HTTPS availability:
- Uptime Kuma (self-hosted) or Better Uptime (SaaS)
- Monitor: https://btcpay.research-relay.com — expect 200 or 302
- Alert via email/SMS if down for >5 minutes
Layer 2: System Metrics (On-Server)¶
# Optional: Prometheus + node_exporter for system metrics
services.prometheus = {
enable = true;
exporters.node = {
enable = true;
enabledCollectors = [ "systemd" "diskstats" "filesystem" "meminfo" "netdev" ];
};
};
# CLN Prometheus plugin (if available in nix-bitcoin)
# services.clightning.plugins.prometheus.enable = true;
Layer 3: Application Alerts (Systemd + Email)¶
The daily health check script (Section 1) covers application-level monitoring. For real-time alerts, add systemd failure notifications:
# Alert on service failure
systemd.services.bitcoind.unitConfig.OnFailure = "notify-failure@%n.service";
systemd.services.clightning.unitConfig.OnFailure = "notify-failure@%n.service";
systemd.services.btcpayserver.unitConfig.OnFailure = "notify-failure@%n.service";
systemd.services."notify-failure@" = {
description = "Send failure notification for %i";
serviceConfig.Type = "oneshot";
script = ''
echo "Service %i failed on $(hostname) at $(date)" | \
mail -s "CRITICAL: %i failed" admin@research-relay.com
'';
};
What to Alert On¶
| Alert | Severity | Threshold |
|---|---|---|
| BTCPay HTTPS down | Critical | >5 min |
| bitcoind stopped | Critical | Immediate |
| clightning stopped | Critical | Immediate |
| btcpayserver stopped | Critical | Immediate |
| Disk usage | Warning/Critical | >80% / >90% |
| No active Lightning channels | Warning | 0 channels |
| Low inbound liquidity | Warning | <200K sats |
| Bitcoin node behind | Warning | >10 blocks behind |
| TLS cert expiring | Warning | <7 days |
| Force-close detected | Warning | Any occurrence |
| High memory usage | Warning | >90% |
Log Locations¶
| Service | Log Command |
|---|---|
| bitcoind | journalctl -u bitcoind -f |
| CLN | journalctl -u clightning -f |
| BTCPay Server | journalctl -u btcpayserver -f |
| nginx | journalctl -u nginx -f |
| System | journalctl -b |
| CLBOSS | journalctl -u clightning -f (CLBOSS logs through CLN) |
6. Common Operations Reference¶
Open a New Lightning Channel¶
# 1. Find a peer (check 1ML.com or Amboss.space)
# 2. Connect
lightning-cli connect <pubkey>@<host>:<port>
# 3. Fund channel
lightning-cli fundchannel <pubkey> <amount_sat>
# Example: lightning-cli fundchannel 03abcdef... 1000000
# 4. Wait for funding transaction to confirm (1 block)
# 5. Channel enters CHANNELD_NORMAL state
Close a Lightning Channel¶
# Cooperative close (preferred — cheaper, faster)
lightning-cli close <channel_id>
# Unilateral close (if peer is offline — last resort)
lightning-cli close <channel_id> 1
# Funds locked for timelock period
Send On-Chain Payment from BTCPay¶
# Via BTCPay Dashboard: Wallets > Send
# Or via API:
curl -X POST "https://btcpay.research-relay.com/api/v1/stores/{storeId}/wallet/transactions" \
-H "Authorization: token $BTCPAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"destinations": [{"destination": "bc1q...", "amount": "0.001"}],
"feerate": 10,
"sign": false
}'
# Returns PSBT for hardware wallet signing
Create a Manual Invoice¶
curl -X POST "https://btcpay.research-relay.com/api/v1/stores/{storeId}/invoices" \
-H "Authorization: token $BTCPAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"amount": "49.99",
"currency": "USD",
"metadata": {"orderId": "manual-001", "buyer": "test@example.com"}
}'
Check Invoice Status¶
curl "https://btcpay.research-relay.com/api/v1/stores/{storeId}/invoices/{invoiceId}" \
-H "Authorization: token $BTCPAY_API_KEY" | jq '{status, amount, currency, metadata}'
Rotate API Keys¶
- BTCPay Dashboard > Account > Manage Account > API Keys
- Create new key with minimum required permissions:
btcpay.store.cancreateinvoicebtcpay.store.canviewinvoicesbtcpay.store.canmodifystoresettings(for webhook management)- Update Medusa
.envwith new key - Restart Medusa
- Delete old API key
NixOS Configuration Update¶
# Edit NixOS config
cd /path/to/nixos-config
vim flake.nix # or relevant module
# Test build (does not switch)
nixos-rebuild test --flake .
# If test passes, switch
nixos-rebuild switch --flake .
# Verify services
systemctl status bitcoind clightning btcpayserver
7. Disaster Recovery¶
Scenario: Complete Server Loss¶
Recovery from NixOS config + backups:
- Provision new server (or reinstall
shops-btc-01) - Deploy NixOS configuration from git:
- Restore CLN
hsm_secret: - Restore BTCPay PostgreSQL database:
- Wait for Bitcoin node to sync (pruned: ~24-48 hours, full: ~3-7 days)
- CLN will detect the
hsm_secretand recover on-chain funds - Old Lightning channels are lost — open new channels
- Restore Medusa integration settings (API keys, webhook secrets)
Recovery time estimate: 1-3 days (dominated by Bitcoin sync time)
Scenario: Corrupted CLN Database¶
If CLN database is corrupted but hsm_secret is intact:
- Stop CLN:
systemctl stop clightning - Remove corrupted database:
rm /var/lib/clightning/bitcoin/lightningd.sqlite3 - Start CLN:
systemctl start clightning - CLN recreates database and recovers on-chain funds from
hsm_secret - Lightning channel funds may be lost or recovered via DLP (Data Loss Protection) if peers cooperate
- Open new channels
Scenario: Bitcoin Node Corruption¶
- Stop bitcoind:
systemctl stop bitcoind - Remove chainstate:
rm -rf /var/lib/bitcoind/blocks /var/lib/bitcoind/chainstate - Start bitcoind:
systemctl start bitcoind - Node re-syncs from scratch (pruned: ~24-48 hours)
- CLN and BTCPay reconnect automatically after sync completes