If you donât have backups, you donât have data. Period.
Your production database just corrupted. Last backup was 3 days ago. Youâre about to explain to customers why their data is gone forever while you scramble through corrupted log files hoping to recover something.
Reality check: Your database will fail. Hardware dies, humans make mistakes, software has bugs, and ransomware exists. The question isnât if youâll need backups - itâs whether youâll have them when disaster strikes at 3 AM on a Sunday.
The Reality of Database Disasters
Common failure scenarios:
- Disk corruption (hardware failure)
- Accidental
DROP TABLEorDELETEwithoutWHERE - Application bug corrupting data
- Ransomware targeting databases
- Failed migration destroying production data
Recovery time objectives:
- Hobby project: 1-2 days acceptable
- Side business: 4-6 hours maximum
- Production app: 15-30 minutes target
Backup Strategy Breakdown
Level 1: Basic Automated Dumps
Good for: Small databases (< 10GB), daily recovery acceptable
#!/bin/bash
DB_NAME="myapp"DB_USER="myapp"BACKUP_DIR="/var/backups/postgres"DATE=$(date +%Y%m%d_%H%M%S)RETENTION_DAYS=30
mkdir -p $BACKUP_DIR
# PostgreSQL backupdocker compose exec -T postgres pg_dump -U $DB_USER -Fc $DB_NAME > "$BACKUP_DIR/${DB_NAME}_$DATE.dump"
# Cleanup old backupsfind $BACKUP_DIR -name "${DB_NAME}_*.dump" -mtime +$RETENTION_DAYS -delete
echo "Backup completed: ${DB_NAME}_$DATE.dump"Schedule it:
# Run daily at 2 AMecho "0 2 * * * /usr/local/bin/backup_postgres.sh >> /var/log/backup.log 2>&1" | sudo crontab -Level 2: WAL Archiving (Point-in-Time Recovery)
Good for: Critical databases, need recovery to specific timestamp
PostgreSQL WAL setup in postgresql.conf:
# WAL settings for PITRwal_level = replicaarchive_mode = onarchive_command = 'cp %p /var/backups/postgres/wal/%f'max_wal_size = 2GBcheckpoint_completion_target = 0.9wal_keep_size = 1GBCreate WAL archive directory:
sudo mkdir -p /var/backups/postgres/walsudo chown postgres:postgres /var/backups/postgres/walsudo chmod 700 /var/backups/postgres/walBase backup script:
#!/bin/bash
BACKUP_DIR="/var/backups/postgres/base"DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p $BACKUP_DIR
# Create base backupdocker compose exec -T postgres pg_basebackup -U postgres -D /tmp/backup -Ft -z -Pdocker compose exec -T postgres tar -czf /tmp/base_backup_$DATE.tar.gz /tmp/backupdocker cp $(docker compose ps -q postgres):/tmp/base_backup_$DATE.tar.gz $BACKUP_DIR/
echo "Base backup completed: base_backup_$DATE.tar.gz"PITR Recovery process:
# Stop PostgreSQLdocker compose stop postgres
# Restore base backupcd /var/backups/postgres/basetar -xzf base_backup_20241030_020000.tar.gz
# Create recovery.confcat > recovery.conf << EOFrestore_command = 'cp /var/backups/postgres/wal/%f %p'recovery_target_time = '2024-10-30 14:30:00'recovery_target_action = 'promote'EOF
# Start PostgreSQL - it will recover to the specified timedocker compose start postgresLevel 3: Continuous Backup with pgBackRest
Good for: Production databases, automated PITR, multiple retention policies
Install pgBackRest:
sudo apt install pgbackrest -yConfigure /etc/pgbackrest.conf:
[global]repo1-path=/var/lib/pgbackrestrepo1-retention-full=3repo1-retention-diff=3process-max=4
[myapp]pg1-path=/var/lib/postgresql/datapg1-port=5432pg1-user=postgresPostgreSQL configuration:
# Add to postgresql.confarchive_command = 'pgbackrest --stanza=myapp archive-push %p'Setup and run:
# Create stanzasudo -u postgres pgbackrest --stanza=myapp stanza-create
# Full backup (run weekly)sudo -u postgres pgbackrest --stanza=myapp backup --type=full
# Incremental backup (run daily)sudo -u postgres pgbackrest --stanza=myapp backup --type=diff
# Point-in-time recoverysudo -u postgres pgbackrest --stanza=myapp restore --type=time --target="2024-10-30 14:30:00"Cloud Storage Integration
S3-Compatible Storage
Setup rclone for cloud backups:
# Install rclonesudo apt install rclone -y
# Configure (interactive setup)rclone config
# Test connectionrclone ls mycloud:mybucketEnhanced backup script with cloud sync:
#!/bin/bash
LOCAL_BACKUP_DIR="/var/backups/postgres"CLOUD_REMOTE="mycloud:postgres-backups"DATE=$(date +%Y%m%d_%H%M%S)
# Create local backupdocker compose exec -T postgres pg_dump -U myapp -Fc myapp > "$LOCAL_BACKUP_DIR/myapp_$DATE.dump"
# Sync to cloud storagerclone copy $LOCAL_BACKUP_DIR $CLOUD_REMOTE --include "*.dump" --transfers=4
# Cleanup local files older than 7 daysfind $LOCAL_BACKUP_DIR -name "*.dump" -mtime +7 -delete
# Cleanup cloud files older than 90 daysrclone delete $CLOUD_REMOTE --min-age 90d
echo "Backup synced to cloud: myapp_$DATE.dump"Hetzner Storage Box Integration
Mount Storage Box:
# Install dependenciessudo apt install cifs-utils -y
# Create credentials filesudo mkdir -p /etc/cifsecho "username=your-username" | sudo tee /etc/cifs/hetzner-credentialsecho "password=your-password" | sudo tee -a /etc/cifs/hetzner-credentialssudo chmod 600 /etc/cifs/hetzner-credentials
# Mount storage boxsudo mkdir /mnt/backupecho "//your-username.your-server.de/backup /mnt/backup cifs credentials=/etc/cifs/hetzner-credentials,uid=1000,gid=1000,iocharset=utf8 0 0" | sudo tee -a /etc/fstabsudo mount -aDirect backup to Storage Box:
#!/bin/bash# Direct backup to Hetzner Storage Box
BACKUP_DIR="/mnt/backup/postgres"DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p $BACKUP_DIR
# Backup directly to storage boxdocker compose exec -T postgres pg_dump -U myapp -Fc myapp > "$BACKUP_DIR/myapp_$DATE.dump"
# Keep 90 days on storage boxfind $BACKUP_DIR -name "*.dump" -mtime +90 -delete
echo "Backup completed to Storage Box: myapp_$DATE.dump"Death is hard enough. Accessing accounts shouldn't be.
When someone dies, you don't get even one extra second to access the documents and information they meant to share it with you. Trying to fix this problem with Eternal Vault.
Monitoring and Alerting
Backup Health Check
#!/bin/bashBACKUP_DIR="/var/backups/postgres"MAX_AGE_HOURS=26 # Alert if backup older than 26 hours
# Find latest backupLATEST_BACKUP=$(find $BACKUP_DIR -name "*.dump" -type f -printf '%T@ %p\n' | sort -n | tail -1 | cut -d' ' -f2-)BACKUP_AGE_HOURS=$((($(date +%s) - $(stat -c %Y "$LATEST_BACKUP")) / 3600))
if [ $BACKUP_AGE_HOURS -gt $MAX_AGE_HOURS ]; then echo "ALERT: Latest backup is $BACKUP_AGE_HOURS hours old" | mail -s "Backup Alert" $ALERT_EMAIL exit 1fi
# Test backup integrityif docker compose exec -T postgres pg_restore --list "$LATEST_BACKUP" > /dev/null 2>&1; then echo "Backup integrity check passed"else echo "ALERT: Backup integrity check failed" | mail -s "Backup Corruption Alert" $ALERT_EMAIL exit 1fiSchedule health checks:
# Run every 4 hoursecho "0 */4 * * * /usr/local/bin/check_backup_health.sh" | crontab -Recovery Testing
Automated Recovery Testing
#!/bin/bashTEST_DB="myapp_recovery_test"BACKUP_FILE="/var/backups/postgres/myapp_20241030_020000.dump"
# Create test databasedocker compose exec -T postgres createdb -U postgres $TEST_DB
# Restore backup to test databasedocker compose exec -T postgres pg_restore -U postgres -d $TEST_DB "$BACKUP_FILE"
# Run basic integrity checksRECORD_COUNT=$(docker compose exec -T postgres psql -U postgres -d $TEST_DB -t -c "SELECT COUNT(*) FROM users;")if [ "$RECORD_COUNT" -gt 0 ]; then echo "Recovery test passed: $RECORD_COUNT records restored"else echo "Recovery test failed: No records found"fi
# Cleanup test databasedocker compose exec -T postgres dropdb -U postgres $TEST_DBWhat Could Go Wrong
Common Backup Failures
Backup script permissions:
# Fix permissionssudo chown postgres:postgres /var/backups/postgressudo chmod 755 /var/backups/postgresInsufficient disk space:
# Monitor backup directory sizedf -h /var/backupsdu -sh /var/backups/postgres/*
# Compress old backupsfind /var/backups -name "*.dump" -mtime +7 -exec gzip {} \;Network timeouts to cloud storage:
# Add retry logic to rclonerclone copy $LOCAL_BACKUP_DIR $CLOUD_REMOTE --retries 3 --low-level-retries 10Recovery Troubleshooting
Corrupted backup file:
# Test backup integrity before recoverypg_restore --list backup.dump | head -10Missing WAL files:
# Check WAL archive directoryls -la /var/backups/postgres/wal/
# Verify archive_command is workingdocker compose exec postgres psql -c "SELECT pg_switch_wal();"Reality Check
Setup time: 2-4 hours for comprehensive backup strategy
Storage costs (monthly):
- Local: 50GB = included in VPS storage (CAX11 has 40GB, CAX21 has 80GB)
- Hetzner âď¸ Storage Box: 100GB = $3.20/month
- AWS S3: 100GB = $2.30/month + transfer costs
Critical decision: Point-in-time recovery adds complexity but saves hours during actual disasters. Worth it for any database you canât afford to lose.
Most production issues come from:
- No backup testing - backups exist but are corrupted
- Missing WAL files - gaps in point-in-time recovery chain
- Slow recovery - no documented recovery procedures
Test your recovery process monthly. When disaster strikes, you want muscle memory, not learning.