← How-To

PostgreSQL Backup Strategies: A Guide to Sleeping Well at Night

There are two kinds of database administrators: those who have lost data, and those who have not lost data yet.

The Butler of Gold Lapel · March 29, 2026 · 30 min read
The illustrator is testing a restore procedure. We await the quarterly drill.

Backups are not optional — they are the foundation

Good evening. I regret that this is not a topic one can afford to find boring.

A production database without a tested backup strategy is a liability. Hardware fails. Disks corrupt. Engineers run DELETE without a WHERE clause. Ransomware encrypts data directories. Cloud providers experience region-level outages. Every one of these scenarios is recoverable with a proper backup — and every one of them is catastrophic without one.

Before choosing a backup strategy, define two numbers:

  • Recovery Point Objective (RPO): How much data loss is acceptable? If the answer is "none," you need continuous WAL archiving. If "up to 24 hours" is tolerable, a nightly pg_dump may suffice.
  • Recovery Time Objective (RTO): How quickly must the database be back online? Restoring a 500GB pg_dump takes hours. Restoring a physical backup with WAL replay can take minutes.

These two numbers drive every decision that follows.

Logical backups — pg_dump and pg_dumpall

What pg_dump does

pg_dump produces a logical backup: a file containing the SQL commands needed to recreate a database. The available output formats:

FormatFlagCompressedParallel DumpSelective RestoreNotes
Plain SQL-Fp (default)NoNoNoHuman-readable, largest file size
Custom-FcYes (zlib)NoYesRecommended default
Directory-FdYes (zlib)YesYesOne file per table, enables parallel dump
Tar-FtNoNoYesArchive format, less common

Custom format (-Fc) is the recommended default — compressed, supports parallel restore, and allows restoring individual tables.

pg_dump in practice

Basic dump in custom format
pg_dump -Fc -f backup.dump mydb
Parallel dump for large databases
pg_dump -Fd -j 4 -f /backup/mydb_dir/ mydb
Parallel restore
pg_restore -d mydb -j 4 backup.dump
Selective restore — single table
pg_restore -t orders -d mydb backup.dump
Selective restore — list and filter
# List everything in the archive
pg_restore -l backup.dump > backup_contents.list

# Edit the list to keep only what you need, then restore
pg_restore -L backup_contents.list -d mydb backup.dump

pg_dumpall — when you need everything

pg_dumpall only supports plain SQL format. For most use cases, combine per-database pg_dump with a globals-only dump:

Better approach: globals + per-database dumps
# Dump global objects (roles, tablespaces)
pg_dumpall --globals-only -f globals.sql

# Dump each database separately in custom format
pg_dump -Fc -f db1.dump db1
pg_dump -Fc -f db2.dump db2

Strengths and limitations

Strengths: Cross-version portability (dump from PG 14, restore on PG 16), cross-architecture portability, selective restore, human-readable option.

Limitations: Slow for large databases, no point-in-time recovery, backup window scales with size.

When pg_dump is right: Databases under 100GB, cross-version migrations, selective table recovery, development environment seeding, supplementary backup alongside WAL-based strategies.

Physical backups — pg_basebackup

What pg_basebackup does

pg_basebackup produces a physical backup: an exact binary copy of the PostgreSQL data directory. It uses the streaming replication protocol to copy files while PostgreSQL continues running.

pg_basebackup in practice

Basic physical backup with compression
pg_basebackup -D /backup/base -Ft -z -P
With rate limiting to protect production I/O
pg_basebackup -D /backup/base -Ft -z -P --max-rate=100M
With fast checkpoint
pg_basebackup -D /backup/base -Ft -z -P --checkpoint=fast

Strengths and limitations

Strengths: Fast backup and restore for large databases, foundation for PITR, complete (captures everything).

Limitations: Not portable across major versions or architectures, no selective restore, backup size equals data size before compression.

Physical backups are rarely used alone. Their primary value is as the foundation for WAL archiving and point-in-time recovery.

WAL archiving and point-in-time recovery (PITR)

How WAL archiving works

PostgreSQL writes every data modification to the Write-Ahead Log (WAL) before applying it to data files. WAL archiving copies each completed WAL segment to an archive location. A physical backup combined with all WAL files since that backup contains enough information to reconstruct the database at any point in time within the archive window.

Configuring WAL archiving

Basic WAL archiving configuration
# Enable WAL at a level sufficient for archiving
wal_level = replica

# Turn on archiving
archive_mode = on

# Command to archive each completed WAL segment
archive_command = 'cp %p /archive/wal/%f'

Production configurations typically use S3 or rsync:

Production archive_command examples
# Archive to S3
archive_command = 'aws s3 cp %p s3://my-wal-archive/%f'

# Archive with rsync to a remote server
archive_command = 'rsync -a %p backup-server:/archive/wal/%f'

# Archive with test-before-write to avoid overwriting
archive_command = 'test ! -f /archive/wal/%f && cp %p /archive/wal/%f'
Force archiving on low-traffic databases
# Archive every 5 minutes, even if the segment is not full
archive_timeout = 300

Performing a point-in-time recovery

PITR is a multi-step process (PostgreSQL 12+):

  1. Stop PostgreSQL and clear the data directory.
  2. Restore the physical base backup.
  3. Configure recovery settings in postgresql.conf:
Recovery configuration
# Command to retrieve archived WAL files
restore_command = 'cp /archive/wal/%f %p'

# Recovery target — restore to this specific point in time
recovery_target_time = '2026-03-28 14:30:00+00'

# Action after reaching the target
recovery_target_action = 'promote'

Alternative recovery targets:

Recovery target options
# Recover to a specific transaction ID
recovery_target_xid = '12345678'

# Recover to a specific WAL position
recovery_target_lsn = '0/1A2B3C4D'

# Recover to a named restore point (created with pg_create_restore_point())
recovery_target_name = 'before_deploy_v2'
  1. Create the recovery signal file: touch recovery.signal
  2. Start PostgreSQL. It replays WAL up to the target and promotes.
  3. Verify the recovery:
Verify recovery
-- Check the current timeline and LSN
SELECT pg_current_wal_lsn(), pg_is_in_recovery();

-- Verify data integrity on critical tables
SELECT count(*) FROM orders;
SELECT max(created_at) FROM orders;

Practice this procedure. The worst time to learn PITR is during an actual data loss event. Run a recovery drill at least quarterly.

WAL archive management

Clean up old WAL files
# pg_archivecleanup removes WAL files older than the specified file
pg_archivecleanup /archive/wal/ 000000010000000A00000042
Monitor archive status
-- Check if archiving is working
SELECT * FROM pg_stat_archiver;

-- Check for WAL files waiting to be archived
SELECT count(*) FROM pg_ls_waldir() WHERE name < pg_walfile_name(pg_current_wal_lsn());

pgBackRest — if I may recommend the professional's choice

What pgBackRest provides

pgBackRest replaces the manual combination of pg_basebackup + archive_command + custom cleanup scripts. It handles full, differential, and incremental backups, WAL archiving, retention management, backup verification, and restore operations — including PITR — as a single integrated tool.

pgBackRest in practice

/etc/pgbackrest/pgbackrest.conf
[mydb]
pg1-path=/var/lib/postgresql/16/main

[global]
repo1-path=/backup/pgbackrest
repo1-retention-full=2
repo1-retention-diff=14
repo1-cipher-type=aes-256-cbc
repo1-cipher-pass=your-encryption-passphrase

# Compression
compress-type=zstd
compress-level=3

# Parallelism
process-max=4

# WAL archiving
start-fast=y

For S3 storage:

S3 repository configuration
[global]
repo1-type=s3
repo1-s3-bucket=my-pgbackrest-bucket
repo1-s3-region=us-east-1
repo1-s3-endpoint=s3.amazonaws.com
repo1-path=/pgbackrest
repo1-retention-full=2

Configure PostgreSQL for pgBackRest WAL archiving:

postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'pgbackrest --stanza=mydb archive-push %p'
Backup commands
# Full backup
pgbackrest --stanza=mydb --type=full backup

# Incremental backup (only files changed since last backup)
pgbackrest --stanza=mydb --type=incr backup

# Differential backup (all files changed since last full)
pgbackrest --stanza=mydb --type=diff backup

# List backups
pgbackrest --stanza=mydb info

Point-in-time restore:

PITR with pgBackRest
# Stop PostgreSQL first
systemctl stop postgresql

# Restore to a specific point in time
pgbackrest --stanza=mydb --type=time \
  --target="2026-03-28 14:30:00+00" \
  --target-action=promote \
  restore

# Start PostgreSQL
systemctl start postgresql

Verify backup integrity:

Verify backup
pgbackrest --stanza=mydb verify

Typical production schedule:

Cron schedule
# Weekly full backup (Sunday 2 AM)
0 2 * * 0  pgbackrest --stanza=mydb --type=full backup

# Daily incremental backup (2 AM, Mon-Sat)
0 2 * * 1-6  pgbackrest --stanza=mydb --type=incr backup

For databases above 100GB, pgBackRest is the approach I would recommend without reservation.

Barman — the alternative for existing setups

Barman (Backup and Recovery Manager), developed by EDB, provides similar capabilities to pgBackRest: base backups, WAL archiving, retention policies, PITR, and remote operation.

When to choose Barman: Your team already uses it, you run EDB Postgres Advanced Server, or your architecture requires a dedicated backup server.

When to choose pgBackRest instead: Starting fresh, need native cloud storage support, need file-block-level incremental backups, or prefer broader community documentation.

Both tools solve the same problem competently. The choice between them is less important than having either one configured, running, and tested. A tested Barman setup is infinitely better than a planned pgBackRest setup that has not been implemented. Intentions do not protect data.

Cloud-native backups — RDS, Cloud SQL, Supabase

AWS RDS automated backups

RDS takes automated backups continuously: daily snapshots, continuous WAL archiving to S3, configurable retention from 1 to 35 days, and PITR to any second within the retention window. Manual snapshots persist until explicitly deleted.

AWS CLI backup and restore
# Create a manual snapshot via AWS CLI
aws rds create-db-snapshot \
  --db-instance-identifier mydb \
  --db-snapshot-identifier pre-migration-2026-03-28

# Restore to a point in time (creates a new instance)
aws rds restore-db-instance-to-point-in-time \
  --source-db-instance-identifier mydb \
  --target-db-instance-identifier mydb-restored \
  --restore-time "2026-03-28T14:30:00Z"

Supabase

Supabase backup capabilities vary by plan:

PlanBackup FrequencyPITR
FreeNo backupsNo
ProDailyNo
TeamDailyYes
EnterpriseConfigurableYes

For production workloads on Supabase, ensure your plan includes PITR. Daily backups without PITR mean up to 24 hours of data loss.

The cloud backup limitation

Cloud-native backups do not protect against cloud provider account lockout, region-level outages (without cross-region replication), or provider migration. For critical data, supplement with an independent backup to a separate provider. A prudent household keeps more than one set of keys.

The decision matrix — which backup strategy to use

Small database (< 10GB), simple needs: pg_dump on a cron schedule. Simple, portable, sufficient.

Nightly pg_dump cron job
# Nightly pg_dump cron job
0 2 * * * pg_dump -Fc -f /backup/mydb_$(date +\%Y\%m\%d).dump mydb && \
          find /backup/ -name "mydb_*.dump" -mtime +7 -delete

Medium database (10-100GB), needs PITR: pg_basebackup + WAL archiving. Manual but manageable at this scale.

Large database (100GB+), production: pgBackRest with incremental backups. The professional standard.

Managed platform (RDS, Cloud SQL, Supabase): Cloud-native backups as primary. Supplement with pg_dump or pgBackRest to an independent location for critical data.

All scenarios: Test your restore procedure. A backup you have not tested restoring is not a backup — it is a hope. And hope is not a strategy.

The backup you never test — and I must be honest about this

The most common backup failure mode is not a failed backup. It is a backup that was never tested with a restore.

The ways backups fail silently:

  • The cron job exits non-zero, but nobody monitors the exit code
  • archive_command hits a disk-full error and PostgreSQL stops archiving, but the database keeps running normally
  • pg_dump completes but produces a corrupt file on a disk with bad sectors
  • pgBackRest retention policies delete the oldest full backup, breaking the incremental chain
  • The backup runs successfully every night for two years, and on the day it is needed, the restore procedure has never been practiced

Schedule quarterly restore tests:

  1. Take your latest backup
  2. Restore it to a separate test server
  3. Verify data completeness
  4. Confirm the application can connect and function
  5. Document the procedure and how long the restore took
Quick data verification after restore
-- Quick data verification after restore
SELECT
  schemaname,
  relname,
  n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC
LIMIT 20;

Monitor your backup pipeline:

Monitor WAL archiving health
-- Monitor WAL archiving health
SELECT
  archived_count,
  failed_count,
  last_archived_wal,
  last_archived_time,
  now() - last_archived_time AS time_since_last_archive
FROM pg_stat_archiver;

If time_since_last_archive is larger than your archive_timeout setting, archiving has stalled.

The technical implementation of any backup strategy in this guide is straightforward. The discipline to test and maintain the backup is where most teams fall short. In backup management, as in all matters of the household, consistency outranks ambition.

Frequently asked questions