PostgreSQL Replication: Streaming, Logical, and When to Use Each
Two built-in mechanisms, two different purposes. Allow me to walk you through both.
The two kinds of PostgreSQL replication
Allow me to set the stage. PostgreSQL ships two built-in replication mechanisms. No extensions, no third-party tools, no additional licensing.
Streaming replication copies WAL (Write-Ahead Log) records byte for byte from the primary to a standby. The standby is an exact physical clone — same data, same schema, same extensions, same roles. It cannot diverge from the primary in any structural way.
Logical replication decodes WAL records into row-level operations (INSERT, UPDATE, DELETE) and applies them on a subscriber. The subscriber is a regular, independent PostgreSQL instance that can have different indexes, additional tables, or even run a different PostgreSQL major version.
They solve different problems. Streaming replication is for high availability and read scaling. Logical replication is for selective data distribution — sending specific tables to a reporting database, replicating across major versions for zero-downtime upgrades, or feeding changes into external systems.
Choosing the wrong one creates operational pain that compounds over time.
Streaming replication — the full clone
How it works
Every change to a PostgreSQL database is first written to the WAL as a record of the physical change. Streaming replication exploits this: a WAL sender process on the primary streams these records over TCP to the standby. A WAL receiver on the standby writes them to its own WAL files and replays them. The result is a byte-for-byte identical copy.
The replication is continuous, typically with millisecond-level delay.
- Asynchronous (default): the primary streams WAL without waiting for confirmation. Commits return immediately. The data-loss window on primary crash is typically less than a second.
- Synchronous: the primary waits for the standby to confirm receipt before acknowledging a commit. Zero data loss on primary failure, at the cost of added commit latency.
Setup — step by step
Step 1: Create a replication user on the primary
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'a_strong_password_here'; Step 2: Configure pg_hba.conf on the primary
# pg_hba.conf
host replication replicator standby_ip/32 scram-sha-256 sudo systemctl reload postgresql Step 3: Configure the primary's postgresql.conf
# postgresql.conf on the primary
wal_level = replica # default since PostgreSQL 10
max_wal_senders = 10 # number of concurrent replication connections
wal_keep_size = 1GB # retain this much WAL for standbys Step 4: Take a base backup on the standby
pg_basebackup -h primary_host -D /var/lib/postgresql/17/main -U replicator -P -R -Xs The -R flag automatically creates standby.signal and populates primary_conninfo — configuring the standby to connect and begin replaying WAL on startup.
Step 5: Start the standby
sudo systemctl start postgresql Verify:
-- On the standby:
SELECT pg_is_in_recovery(); -- should return true
-- On the primary:
SELECT client_addr, state, sent_lsn, replay_lsn
FROM pg_stat_replication; If pg_stat_replication shows a row with the standby's IP and state = 'streaming', replication is active.
Replication slots — preventing WAL deletion
Without a replication slot, the primary may recycle WAL segments the standby has not yet received. When this happens, the standby must be rebuilt from scratch.
SELECT * FROM pg_create_physical_replication_slot('standby1'); Configure the standby to use it:
primary_slot_name = 'standby1' The trade-off — and I should be direct about this: if the standby disconnects and stays disconnected, WAL accumulates on the primary without bound. Always monitor replication slot lag:
SELECT
slot_name,
active,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots; If a slot is inactive and retained_wal is growing, either bring the standby back online or drop the slot:
SELECT pg_drop_replication_slot('standby1'); Synchronous vs asynchronous
Asynchronous replication (default) offers the lowest commit latency. The trade-off is a small data-loss window if the primary crashes before the standby receives the most recent WAL.
Synchronous replication eliminates the data-loss window:
# postgresql.conf on the primary
synchronous_standby_names = 'standby1' | Level | Primary waits for | Data safety | Latency impact |
|---|---|---|---|
on (default) | WAL flush to primary's disk | No data loss on primary crash | Disk flush latency only |
remote_write | Standby receives and writes WAL | Data loss only if both crash simultaneously | Network round-trip |
remote_flush | Standby flushes WAL to disk | No data loss unless both disks fail | Network round-trip + disk flush |
remote_apply | Standby replays WAL | No data loss; reads immediately consistent | Network round-trip + replay time |
For most situations, I would recommend asynchronous replication. Switch to synchronous when data loss is genuinely unacceptable and the primary and standby are in the same datacenter.
Read replicas — using standbys for queries
A streaming standby in hot standby mode accepts read-only queries while continuing to replay WAL. This is enabled by default since PostgreSQL 10.
Use cases: offload reporting queries, read scaling, and geographic read latency reduction. I should note a caveat: replication lag means reads on the standby may return slightly stale data. For most applications — dashboards, product catalogs, user profiles — millisecond staleness is invisible.
Application-level read/write splitting routes write queries to the primary and read queries to the standby. Common approaches include Rails connected_to, Django database routers, and connection pooler routing.
Logical replication — selective and flexible
How it works
Instead of sending raw WAL bytes, the primary decodes WAL records into logical row-level changes and sends those to the subscriber. A publication on the publisher defines which tables are replicated. A subscription on the subscriber connects and applies the changes. The subscriber is a regular read-write PostgreSQL instance.
Setup — step by step
Step 1: Set wal_level on the publisher — change to logical (requires restart). logical is a superset of replica.
Step 2: Create a publication on the publisher
-- Replicate specific tables
CREATE PUBLICATION orders_pub FOR TABLE orders, customers, order_items;
-- Or replicate all tables (use with caution)
CREATE PUBLICATION all_pub FOR ALL TABLES; PostgreSQL 15+ supports column and row filtering:
-- PostgreSQL 15+: replicate only specific columns and rows
CREATE PUBLICATION us_orders_pub
FOR TABLE orders (id, customer_id, total, created_at)
WHERE (region = 'us'); Step 3: Create a subscription on the subscriber
CREATE SUBSCRIPTION orders_sub
CONNECTION 'host=publisher_host dbname=mydb user=replicator password=a_strong_password_here'
PUBLICATION orders_pub; When the subscription is created, PostgreSQL automatically creates a logical replication slot, copies all existing data, and begins streaming changes.
Step 4: Verify
-- On the publisher: check the replication slot
SELECT slot_name, active, confirmed_flush_lsn
FROM pg_replication_slots
WHERE slot_type = 'logical';
-- On the subscriber: check the subscription status
SELECT subname, received_lsn, latest_end_lsn
FROM pg_stat_subscription; What logical replication can do that streaming cannot
- Replicate a subset of tables
- Replicate between different PostgreSQL major versions — enabling zero-downtime upgrades
- Replicate to a subscriber with different indexes optimized for reporting
- Fan out to multiple subscribers
- Feed changes into external systems via CDC (Change Data Capture)
What logical replication cannot do
- Does not replicate DDL. Schema changes must be applied manually on the subscriber before the publisher.
- Does not replicate sequences. Advance sequences manually after promotion.
- Does not replicate large objects (lo / BLOB types).
- Does not replicate TRUNCATE before PostgreSQL 11.
- Limited conflict handling. If the same row is modified on both sides, the subscription stops and waits for manual resolution.
- Initial table sync can be slow for large tables.
Streaming vs logical — the decision framework
| Dimension | Streaming Replication | Logical Replication |
|---|---|---|
| Data scope | Entire database cluster | Selected tables (or all tables) |
| DDL replication | Automatic | Not replicated |
| Version compatibility | Same major version required | Cross-version supported |
| Failover capability | Designed for failover | Not designed for failover |
| Subscriber writability | Read-only | Full read-write |
| Setup complexity | Lower | Moderate |
| Performance overhead | Minimal | Moderate (WAL decoding) |
| Replication lag | Very low (sub-millisecond) | Low to moderate |
Use streaming when you need a full disaster recovery standby, read replicas for load balancing, automatic DDL propagation, or the lowest possible replication lag.
Use logical when you need to replicate a subset of tables, perform a zero-downtime major version upgrade, send data to a reporting database with different indexes, or fan out changes to multiple systems.
Using both together: streaming for HA (a hot standby ready for failover) and logical for data distribution (a reporting database with analytical indexes). The two mechanisms operate independently and do not affect each other.
Monitoring replication lag
Streaming replication lag
SELECT
client_addr,
application_name,
state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
replay_lag
FROM pg_stat_replication; Byte-based lag (useful for alerting):
SELECT
client_addr,
pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication; Suggested alert thresholds: Warning at 10MB or 5 seconds. Critical at 100MB or 30 seconds.
Logical replication lag
On the publisher:
SELECT
slot_name,
confirmed_flush_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS lag_bytes,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) AS lag_pretty
FROM pg_replication_slots
WHERE slot_type = 'logical'; What to do when lag grows
For streaming replication: check for long-running queries on the standby that block WAL replay.
-- On the standby: find long-running queries
SELECT pid, now() - query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY duration DESC; Configure the conflict resolution policy:
# postgresql.conf on the standby
max_standby_streaming_delay = 30s # how long queries can block replay
hot_standby_feedback = on # tell the primary about queries on the standby For logical replication: check for subscription errors.
-- On the subscriber: check subscription status
SELECT subname, pid, received_lsn, last_msg_send_time, last_msg_receipt_time
FROM pg_stat_subscription; If pid is NULL, the subscription worker has stopped. After fixing the underlying issue, restart:
ALTER SUBSCRIPTION orders_sub DISABLE;
ALTER SUBSCRIPTION orders_sub ENABLE; Failover strategies
Planned failover (switchover)
A planned failover occurs during a scheduled maintenance window. The goal is zero data loss and minimal downtime.
Step 1: Stop writes to the primary
-- On the primary: prevent new write transactions
ALTER SYSTEM SET default_transaction_read_only = on;
SELECT pg_reload_conf(); Step 2: Wait for the standby to catch up
SELECT
client_addr,
sent_lsn,
replay_lsn,
sent_lsn = replay_lsn AS caught_up
FROM pg_stat_replication; Step 3: Promote the standby
-- On the standby (PostgreSQL 12+):
SELECT pg_promote(); Or from the command line:
pg_ctl promote -D /var/lib/postgresql/17/main Step 4: Update application connection strings. Redirect traffic to the new primary. If using a connection pooler, update its configuration and reload.
Unplanned failover (primary failure)
When the primary fails unexpectedly, assess the situation, quantify potential data loss (the difference between the standby's replay_lsn and the last known WAL position), promote the standby immediately, and reconnect applications.
Automated failover tools: PostgreSQL does not provide built-in automatic failover. Patroni (the most widely deployed HA framework) and pg_auto_failover (a simpler alternative) handle automated promotion and connection rerouting.
Failover with logical replication
Allow me to be direct: logical replication is not designed for failover. A streaming standby is a passive replica waiting to be promoted. A logical subscriber is an independent database receiving a feed of changes. Use streaming replication for high availability and failover. Use logical replication for data distribution, selective replication, and cross-version migration.
Common pitfalls
Replication slot disk bloat. This one deserves particular attention. A replication slot prevents WAL from being recycled until the consumer confirms receipt. If a standby disconnects, WAL accumulates indefinitely.
-- Find slots with growing retained WAL
SELECT
slot_name,
slot_type,
active,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots
ORDER BY pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) DESC; pg_hba.conf misconfiguration. Replication connections require explicit entries with the replication database keyword.
# Correct: explicitly allow replication connections
host replication replicator 10.0.1.50/32 scram-sha-256
# This does NOT work for replication (even though it allows all databases):
host all replicator 10.0.1.50/32 scram-sha-256 WAL level mismatch. Logical replication requires wal_level = logical. If it is set to replica, logical subscriptions will fail. Changing wal_level requires a restart.
DDL not replicating logically. The most common source of logical replication failures. Always apply DDL to the subscriber first, then to the publisher.
Sequence divergence. After promoting a logical subscriber, advance all sequences past their last known value:
-- Advance the sequence past any value the old publisher may have issued
SELECT setval('orders_id_seq', (SELECT max(id) FROM orders) + 10000); Honest counterpoint — when PostgreSQL replication is not enough
PostgreSQL's built-in replication covers the large majority of operational needs. But it has boundaries.
What built-in replication does not provide:
- Automatic failover. Promotion and connection rerouting are manual. For automated HA, you need Patroni, pg_auto_failover, or a managed service with built-in failover.
- Multi-master writes. Neither streaming nor logical supports true multi-master with automatic conflict resolution.
- Global distribution with conflict resolution. Replicating across continents with writes at every location requires conflict resolution semantics that PostgreSQL does not provide natively.
What PostgreSQL replication excels at: read scaling, disaster recovery, zero-downtime upgrades, and selective data distribution. Know its boundaries and design within them. For broader context on when replication is the right scaling strategy, see the scaling decision framework.