PostgreSQL Backup Strategies: A Guide to Sleeping Well at Night
There are two kinds of database administrators: those who have lost data, and those who have not lost data yet.
Backups are not optional — they are the foundation
Good evening. I regret that this is not a topic one can afford to find boring.
A production database without a tested backup strategy is a liability. Hardware fails. Disks corrupt. Engineers run DELETE without a WHERE clause. Ransomware encrypts data directories. Cloud providers experience region-level outages. Every one of these scenarios is recoverable with a proper backup — and every one of them is catastrophic without one.
Before choosing a backup strategy, define two numbers:
- Recovery Point Objective (RPO): How much data loss is acceptable? If the answer is "none," you need continuous WAL archiving. If "up to 24 hours" is tolerable, a nightly pg_dump may suffice.
- Recovery Time Objective (RTO): How quickly must the database be back online? Restoring a 500GB pg_dump takes hours. Restoring a physical backup with WAL replay can take minutes.
These two numbers drive every decision that follows.
Logical backups — pg_dump and pg_dumpall
What pg_dump does
pg_dump produces a logical backup: a file containing the SQL commands needed to recreate a database. The available output formats:
| Format | Flag | Compressed | Parallel Dump | Selective Restore | Notes |
|---|---|---|---|---|---|
| Plain SQL | -Fp (default) | No | No | No | Human-readable, largest file size |
| Custom | -Fc | Yes (zlib) | No | Yes | Recommended default |
| Directory | -Fd | Yes (zlib) | Yes | Yes | One file per table, enables parallel dump |
| Tar | -Ft | No | No | Yes | Archive format, less common |
Custom format (-Fc) is the recommended default — compressed, supports parallel restore, and allows restoring individual tables.
pg_dump in practice
pg_dump -Fc -f backup.dump mydb pg_dump -Fd -j 4 -f /backup/mydb_dir/ mydb pg_restore -d mydb -j 4 backup.dump pg_restore -t orders -d mydb backup.dump # List everything in the archive
pg_restore -l backup.dump > backup_contents.list
# Edit the list to keep only what you need, then restore
pg_restore -L backup_contents.list -d mydb backup.dump pg_dumpall — when you need everything
pg_dumpall only supports plain SQL format. For most use cases, combine per-database pg_dump with a globals-only dump:
# Dump global objects (roles, tablespaces)
pg_dumpall --globals-only -f globals.sql
# Dump each database separately in custom format
pg_dump -Fc -f db1.dump db1
pg_dump -Fc -f db2.dump db2 Strengths and limitations
Strengths: Cross-version portability (dump from PG 14, restore on PG 16), cross-architecture portability, selective restore, human-readable option.
Limitations: Slow for large databases, no point-in-time recovery, backup window scales with size.
When pg_dump is right: Databases under 100GB, cross-version migrations, selective table recovery, development environment seeding, supplementary backup alongside WAL-based strategies.
Physical backups — pg_basebackup
What pg_basebackup does
pg_basebackup produces a physical backup: an exact binary copy of the PostgreSQL data directory. It uses the streaming replication protocol to copy files while PostgreSQL continues running.
pg_basebackup in practice
pg_basebackup -D /backup/base -Ft -z -P pg_basebackup -D /backup/base -Ft -z -P --max-rate=100M pg_basebackup -D /backup/base -Ft -z -P --checkpoint=fast Strengths and limitations
Strengths: Fast backup and restore for large databases, foundation for PITR, complete (captures everything).
Limitations: Not portable across major versions or architectures, no selective restore, backup size equals data size before compression.
Physical backups are rarely used alone. Their primary value is as the foundation for WAL archiving and point-in-time recovery.
WAL archiving and point-in-time recovery (PITR)
How WAL archiving works
PostgreSQL writes every data modification to the Write-Ahead Log (WAL) before applying it to data files. WAL archiving copies each completed WAL segment to an archive location. A physical backup combined with all WAL files since that backup contains enough information to reconstruct the database at any point in time within the archive window.
Configuring WAL archiving
# Enable WAL at a level sufficient for archiving
wal_level = replica
# Turn on archiving
archive_mode = on
# Command to archive each completed WAL segment
archive_command = 'cp %p /archive/wal/%f' Production configurations typically use S3 or rsync:
# Archive to S3
archive_command = 'aws s3 cp %p s3://my-wal-archive/%f'
# Archive with rsync to a remote server
archive_command = 'rsync -a %p backup-server:/archive/wal/%f'
# Archive with test-before-write to avoid overwriting
archive_command = 'test ! -f /archive/wal/%f && cp %p /archive/wal/%f' # Archive every 5 minutes, even if the segment is not full
archive_timeout = 300 Performing a point-in-time recovery
PITR is a multi-step process (PostgreSQL 12+):
- Stop PostgreSQL and clear the data directory.
- Restore the physical base backup.
- Configure recovery settings in
postgresql.conf:
# Command to retrieve archived WAL files
restore_command = 'cp /archive/wal/%f %p'
# Recovery target — restore to this specific point in time
recovery_target_time = '2026-03-28 14:30:00+00'
# Action after reaching the target
recovery_target_action = 'promote' Alternative recovery targets:
# Recover to a specific transaction ID
recovery_target_xid = '12345678'
# Recover to a specific WAL position
recovery_target_lsn = '0/1A2B3C4D'
# Recover to a named restore point (created with pg_create_restore_point())
recovery_target_name = 'before_deploy_v2' - Create the recovery signal file:
touch recovery.signal - Start PostgreSQL. It replays WAL up to the target and promotes.
- Verify the recovery:
-- Check the current timeline and LSN
SELECT pg_current_wal_lsn(), pg_is_in_recovery();
-- Verify data integrity on critical tables
SELECT count(*) FROM orders;
SELECT max(created_at) FROM orders; Practice this procedure. The worst time to learn PITR is during an actual data loss event. Run a recovery drill at least quarterly.
WAL archive management
# pg_archivecleanup removes WAL files older than the specified file
pg_archivecleanup /archive/wal/ 000000010000000A00000042 -- Check if archiving is working
SELECT * FROM pg_stat_archiver;
-- Check for WAL files waiting to be archived
SELECT count(*) FROM pg_ls_waldir() WHERE name < pg_walfile_name(pg_current_wal_lsn()); pgBackRest — if I may recommend the professional's choice
What pgBackRest provides
pgBackRest replaces the manual combination of pg_basebackup + archive_command + custom cleanup scripts. It handles full, differential, and incremental backups, WAL archiving, retention management, backup verification, and restore operations — including PITR — as a single integrated tool.
pgBackRest in practice
[mydb]
pg1-path=/var/lib/postgresql/16/main
[global]
repo1-path=/backup/pgbackrest
repo1-retention-full=2
repo1-retention-diff=14
repo1-cipher-type=aes-256-cbc
repo1-cipher-pass=your-encryption-passphrase
# Compression
compress-type=zstd
compress-level=3
# Parallelism
process-max=4
# WAL archiving
start-fast=y For S3 storage:
[global]
repo1-type=s3
repo1-s3-bucket=my-pgbackrest-bucket
repo1-s3-region=us-east-1
repo1-s3-endpoint=s3.amazonaws.com
repo1-path=/pgbackrest
repo1-retention-full=2 Configure PostgreSQL for pgBackRest WAL archiving:
wal_level = replica
archive_mode = on
archive_command = 'pgbackrest --stanza=mydb archive-push %p' # Full backup
pgbackrest --stanza=mydb --type=full backup
# Incremental backup (only files changed since last backup)
pgbackrest --stanza=mydb --type=incr backup
# Differential backup (all files changed since last full)
pgbackrest --stanza=mydb --type=diff backup
# List backups
pgbackrest --stanza=mydb info Point-in-time restore:
# Stop PostgreSQL first
systemctl stop postgresql
# Restore to a specific point in time
pgbackrest --stanza=mydb --type=time \
--target="2026-03-28 14:30:00+00" \
--target-action=promote \
restore
# Start PostgreSQL
systemctl start postgresql Verify backup integrity:
pgbackrest --stanza=mydb verify Typical production schedule:
# Weekly full backup (Sunday 2 AM)
0 2 * * 0 pgbackrest --stanza=mydb --type=full backup
# Daily incremental backup (2 AM, Mon-Sat)
0 2 * * 1-6 pgbackrest --stanza=mydb --type=incr backup For databases above 100GB, pgBackRest is the approach I would recommend without reservation.
Barman — the alternative for existing setups
Barman (Backup and Recovery Manager), developed by EDB, provides similar capabilities to pgBackRest: base backups, WAL archiving, retention policies, PITR, and remote operation.
When to choose Barman: Your team already uses it, you run EDB Postgres Advanced Server, or your architecture requires a dedicated backup server.
When to choose pgBackRest instead: Starting fresh, need native cloud storage support, need file-block-level incremental backups, or prefer broader community documentation.
Both tools solve the same problem competently. The choice between them is less important than having either one configured, running, and tested. A tested Barman setup is infinitely better than a planned pgBackRest setup that has not been implemented. Intentions do not protect data.
Cloud-native backups — RDS, Cloud SQL, Supabase
AWS RDS automated backups
RDS takes automated backups continuously: daily snapshots, continuous WAL archiving to S3, configurable retention from 1 to 35 days, and PITR to any second within the retention window. Manual snapshots persist until explicitly deleted.
# Create a manual snapshot via AWS CLI
aws rds create-db-snapshot \
--db-instance-identifier mydb \
--db-snapshot-identifier pre-migration-2026-03-28
# Restore to a point in time (creates a new instance)
aws rds restore-db-instance-to-point-in-time \
--source-db-instance-identifier mydb \
--target-db-instance-identifier mydb-restored \
--restore-time "2026-03-28T14:30:00Z" Supabase
Supabase backup capabilities vary by plan:
| Plan | Backup Frequency | PITR |
|---|---|---|
| Free | No backups | No |
| Pro | Daily | No |
| Team | Daily | Yes |
| Enterprise | Configurable | Yes |
For production workloads on Supabase, ensure your plan includes PITR. Daily backups without PITR mean up to 24 hours of data loss.
The cloud backup limitation
Cloud-native backups do not protect against cloud provider account lockout, region-level outages (without cross-region replication), or provider migration. For critical data, supplement with an independent backup to a separate provider. A prudent household keeps more than one set of keys.
The decision matrix — which backup strategy to use
Small database (< 10GB), simple needs: pg_dump on a cron schedule. Simple, portable, sufficient.
# Nightly pg_dump cron job
0 2 * * * pg_dump -Fc -f /backup/mydb_$(date +\%Y\%m\%d).dump mydb && \
find /backup/ -name "mydb_*.dump" -mtime +7 -delete Medium database (10-100GB), needs PITR: pg_basebackup + WAL archiving. Manual but manageable at this scale.
Large database (100GB+), production: pgBackRest with incremental backups. The professional standard.
Managed platform (RDS, Cloud SQL, Supabase): Cloud-native backups as primary. Supplement with pg_dump or pgBackRest to an independent location for critical data.
All scenarios: Test your restore procedure. A backup you have not tested restoring is not a backup — it is a hope. And hope is not a strategy.
The backup you never test — and I must be honest about this
The most common backup failure mode is not a failed backup. It is a backup that was never tested with a restore.
The ways backups fail silently:
- The cron job exits non-zero, but nobody monitors the exit code
archive_commandhits a disk-full error and PostgreSQL stops archiving, but the database keeps running normally- pg_dump completes but produces a corrupt file on a disk with bad sectors
- pgBackRest retention policies delete the oldest full backup, breaking the incremental chain
- The backup runs successfully every night for two years, and on the day it is needed, the restore procedure has never been practiced
Schedule quarterly restore tests:
- Take your latest backup
- Restore it to a separate test server
- Verify data completeness
- Confirm the application can connect and function
- Document the procedure and how long the restore took
-- Quick data verification after restore
SELECT
schemaname,
relname,
n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC
LIMIT 20; Monitor your backup pipeline:
-- Monitor WAL archiving health
SELECT
archived_count,
failed_count,
last_archived_wal,
last_archived_time,
now() - last_archived_time AS time_since_last_archive
FROM pg_stat_archiver; If time_since_last_archive is larger than your archive_timeout setting, archiving has stalled.
The technical implementation of any backup strategy in this guide is straightforward. The discipline to test and maintain the backup is where most teams fall short. In backup management, as in all matters of the household, consistency outranks ambition.