PostgreSQL Scaling Decision Framework

Q: When should I shard PostgreSQL?

Shard only after exhausting single-node optimizations: query tuning, connection pooling, materialized views, read replicas, and table partitioning. Sharding is appropriate when write throughput or total data volume exceeds what a single PostgreSQL instance can handle. Most workloads never reach this threshold.

Q: How do I scale PostgreSQL reads?

Start with query optimization and indexing — the majority of read scaling problems are slow query problems in disguise. Add materialized views for expensive aggregations that don't require real-time data. Use read replicas with streaming replication for distributing total query volume across multiple servers. A connection pooler can increase effective throughput without additional database servers.

Q: Should I use read replicas or materialized views?

They solve different problems. Materialized views reduce individual query cost by precomputing results — they make each query faster. Read replicas distribute query volume across multiple servers — they increase total throughput. If one expensive query is slowing everything down, try materialized views first. If the total number of queries overwhelms your server, add read replicas.

Q: How many connections can PostgreSQL handle?

The default max_connections is 100. Production systems typically configure 200-500 connections, but each connection consumes 5-10MB of memory. A connection pooler can support thousands of application connections by multiplexing them over a smaller pool of actual database connections. Most applications should use a pooler before increasing max_connections beyond 500.

Q: Is Citus the only way to shard PostgreSQL?

No. Citus provides transparent sharding as a PostgreSQL extension. PgCat offers sharding at the proxy layer. Application-level sharding implements routing logic in your code. Foreign data wrappers enable federated queries across shards, with performance trade-offs. The right choice depends on your query patterns and how much transparency you require.

The right order matters more than the right tool.

How-To Guide · The Butler of Gold Lapel · March 2026 · 28 min read

The artist proposed adding a second illustrator before finishing the sketch. We suggested they try a better pencil first.

The scaling ladder — order matters

Good evening. You've arrived with a scaling question, and I appreciate you coming here before reaching for the infrastructure catalog. Most guests do it the other way around.

I have observed, across a great many production systems, that teams reach for complex scaling solutions before exhausting simpler ones. A team struggling with query latency adds read replicas when the real problem is a missing index. Another team evaluates sharding frameworks when connection pooling would have tripled their throughput overnight.

The correct order for scaling PostgreSQL is:

Tune — query optimization and configuration
Pool — connection pooling
Materialize — materialized views for expensive reads
Replicate — read replicas for distributing query volume
Partition — table partitioning for very large tables
Cache — application-level caching
Proxy — intelligent query proxies
Shard — horizontal data distribution

Each step is dramatically cheaper than the next — in engineering time, operational burden, and infrastructure cost. A missing index costs nothing to add and takes minutes. Sharding restructures your entire data layer and never stops costing you.

Step 1 — Tune what you have

Query optimization

Query optimization is the single most impactful activity in PostgreSQL scaling. The majority of "scaling problems" are slow query problems in disguise. Start with EXPLAIN ANALYZE:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending'
  AND o.created_at > NOW() - INTERVAL '30 days'
ORDER BY o.created_at DESC
LIMIT 50;

Look for these signals: sequential scans on large tables, poor join strategies, row estimate errors, and sort operations spilling to disk.

Use pg_stat_user_tables to find tables where sequential scans dominate:

SELECT
    schemaname,
    relname,
    seq_scan,
    idx_scan,
    seq_tup_read,
    n_live_tup
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 20;

For a detailed walkthrough, see the EXPLAIN ANALYZE guide. For composite index design, see composite indexes.

Configuration tuning

Most PostgreSQL installations run on default configuration values that are intentionally conservative. Adjusting a handful of settings can yield remarkable performance gains.

work_mem: A single increase from 4MB to 64MB can eliminate a 12-second sort-on-disk, reducing the query to under 200 milliseconds:

-- Check for sort operations spilling to disk
EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, SUM(total)
FROM orders
GROUP BY customer_id
ORDER BY SUM(total) DESC;

-- If you see "Sort Method: external merge Disk: 24576kB"
-- increase work_mem:
SET work_mem = '64MB';

Key configuration settings

# postgresql.conf
random_page_cost = 1.1    # SSD storage
effective_cache_size = 24GB  # 75% of 32GB RAM
shared_buffers = 8GB         # 25% of 32GB RAM
work_mem = 64MB              # adjust based on query complexity

Autovacuum tuning

-- More aggressive autovacuum for high-write tables
ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.05,    -- default 0.2
    autovacuum_analyze_scale_factor = 0.02,   -- default 0.1
    autovacuum_vacuum_cost_delay = 2          -- default 2ms (PG12+)
);

See: autovacuum tuning | shared_buffers | work_mem

When to move past tuning

Move to Step 2 when queries are already using appropriate indexes, configuration is tuned for your hardware, and CPU or I/O saturation is genuine — not caused by offending queries. A useful checkpoint: if your p95 query latency is under 100ms and CPU utilization is under 70%, tuning may not be your current bottleneck.

Step 2 — Connection pooling

Why pooling comes before replication

Many "we need more capacity" situations are actually "we've run out of connections." Each PostgreSQL connection consumes 5-10MB of memory. An application scaled to 200 workers consumes 200 connections and potentially 2GB of RAM just for connection overhead.

A connection pooler can increase effective throughput 5-10x without any schema or query changes. The operational burden is minimal compared to adding replicas or partitioning tables.

PgBouncer configuration

# pgbouncer.ini
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
listen_port = 6432
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 50
min_pool_size = 10

For a detailed comparison of pooler options, see PostgreSQL poolers. For configuration patterns, see the connection pooling guide.

When to move past pooling

Move to Step 3 when connections are pooled, utilization is healthy, but query latency or throughput is still insufficient. Specific expensive aggregations dominating query time is a strong signal.

Step 3 — Materialized views

Trading storage for speed

Materialized views precompute expensive queries and store the results as physical tables. Instead of running a 30-second aggregation every time a dashboard loads, you run it once and serve the precomputed result in milliseconds.

CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
    DATE_TRUNC('month', created_at) AS month,
    product_category,
    COUNT(*) AS order_count,
    SUM(total) AS revenue,
    AVG(total) AS avg_order_value
FROM orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('month', created_at), product_category
ORDER BY month DESC, revenue DESC;

-- Add an index for fast lookups
CREATE UNIQUE INDEX idx_monthly_revenue_month_cat
    ON monthly_revenue (month, product_category);

Querying the materialized view is identical to querying a table:

SELECT month, product_category, revenue
FROM monthly_revenue
WHERE month >= DATE_TRUNC('month', NOW() - INTERVAL '12 months')
ORDER BY month DESC;

This executes in single-digit milliseconds regardless of how many millions of rows are in the orders table.

Refresh strategies

Time-based refresh with pg_cron:

-- Install pg_cron (if not already available)
CREATE EXTENSION pg_cron;

-- Refresh every 15 minutes
SELECT cron.schedule(
    'refresh-monthly-revenue',
    '*/15 * * * *',
    $$REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue$$
);

Event-based refresh using pg_notify to signal an external worker:

CREATE OR REPLACE FUNCTION refresh_revenue_view()
RETURNS TRIGGER AS $$
BEGIN
    -- Use pg_notify to signal an external worker
    PERFORM pg_notify('refresh_views', 'monthly_revenue');
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_orders_refresh
    AFTER INSERT OR UPDATE OR DELETE ON orders
    FOR EACH STATEMENT
    EXECUTE FUNCTION refresh_revenue_view();

Incremental refresh with pg_ivm maintains views incrementally as data changes:

-- Using pg_ivm (Incremental View Maintenance)
CREATE EXTENSION pg_ivm;

SELECT create_immv(
    'monthly_revenue_live',
    $$
    SELECT
        DATE_TRUNC('month', created_at) AS month,
        product_category,
        COUNT(*) AS order_count,
        SUM(total) AS revenue
    FROM orders
    WHERE status = 'completed'
    GROUP BY DATE_TRUNC('month', created_at), product_category
    $$
);

See: Materialized views guide | pg_ivm | pg_cron

When to move past materialized views

Move to Step 4 when data freshness requirements are sub-second, write-heavy workloads make refresh costs prohibitive, or total read volume exceeds what a single instance can serve.

Step 4 — Read replicas

Streaming replication

If you've arrived at this step, the earlier measures have served you well but your read volume genuinely exceeds what a single instance can handle. PostgreSQL's streaming replication is built into the core.

Primary configuration

# On the primary — postgresql.conf
wal_level = replica
max_wal_senders = 5
max_replication_slots = 5

# On the primary — pg_hba.conf (allow replication connections)
host replication replicator 10.0.0.0/24 scram-sha-256

Creating the replica

# On the replica — create the base backup
pg_basebackup -h primary-host -D /var/lib/postgresql/data \
    -U replicator -Fp -Xs -P -R

Monitoring replication lag

SELECT
    client_addr,
    state,
    sent_lsn,
    replay_lsn,
    (sent_lsn - replay_lsn) AS bytes_behind,
    replay_lag
FROM pg_stat_replication;

Lag-aware routing

-- Check replica lag before routing (run on replica)
SELECT
    CASE
        WHEN pg_last_wal_replay_lsn() = pg_last_wal_receive_lsn() THEN 0
        ELSE EXTRACT(EPOCH FROM NOW() - pg_last_xact_replay_timestamp())
    END AS replica_lag_seconds;

If the lag exceeds a threshold (e.g., 500ms), route the query to the primary instead.

How many replicas

Replicas	Read Capacity	Operational Overhead
0	1x	Baseline
1	~2x	Low — one additional server
2	~3x	Moderate — monitoring, failover
3	~4x	Higher — WAL shipping to 3 servers
5+	~5-6x	Significant — replication slot management, storage

Beyond 3-5 replicas, the overhead of replication slot management and WAL shipping starts to matter. Right-size by monitoring replica utilization, not just primary utilization.

When to move past read replicas

Move to Step 5 when write throughput is the bottleneck, individual tables are too large for efficient execution, or data volume approaches single-node storage capacity.

Step 5 — Table partitioning

Declarative partitioning (PostgreSQL 10+)

PostgreSQL's declarative partitioning splits large tables into smaller physical segments. Range partitioning is ideal for time-series data:

CREATE TABLE events (
    id          BIGINT GENERATED ALWAYS AS IDENTITY,
    event_type  TEXT NOT NULL,
    payload     JSONB,
    created_at  TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2026_01 PARTITION OF events
    FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE events_2026_02 PARTITION OF events
    FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
CREATE TABLE events_2026_03 PARTITION OF events
    FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');

List partitioning works well for categorical data:

CREATE TABLE orders (
    id          BIGINT GENERATED ALWAYS AS IDENTITY,
    region      TEXT NOT NULL,
    total       NUMERIC(10,2),
    created_at  TIMESTAMPTZ NOT NULL
) PARTITION BY LIST (region);

CREATE TABLE orders_us PARTITION OF orders
    FOR VALUES IN ('us-east', 'us-west', 'us-central');
CREATE TABLE orders_eu PARTITION OF orders
    FOR VALUES IN ('eu-west', 'eu-central', 'eu-north');
CREATE TABLE orders_apac PARTITION OF orders
    FOR VALUES IN ('ap-southeast', 'ap-northeast', 'ap-south');

Partitioning pitfalls

Queries without the partition key scan all partitions. A query without a created_at filter will scan every partition — potentially slower than a single unpartitioned table.

Unique constraints must include the partition key:

-- This fails on a partitioned table:
-- ALTER TABLE events ADD CONSTRAINT events_pkey PRIMARY KEY (id);

-- This works:
ALTER TABLE events ADD CONSTRAINT events_pkey PRIMARY KEY (id, created_at);

Too many partitions degrade planning time. Monthly partitions are typically more practical than daily ones unless data volume is extreme.

See: pg_partman | Django partitioning

Step 6 — Application-level caching

Why caching is step 6, not step 1

Caching before optimizing introduces three problems that compound:

It hides the real issue. The slow query is still there, waiting to resurface on cache miss.
Cache invalidation is genuinely hard. Getting it wrong means serving incorrect data.
Every earlier step is simpler. An index takes minutes. A materialized view refreshes on schedule. A cache layer is a distributed system with its own failure modes.

PostgreSQL-native caching options

UNLOGGED tables for ephemeral data — they skip WAL writes, making them significantly faster for data that doesn't need crash recovery:

CREATE UNLOGGED TABLE session_cache (
    session_id  TEXT PRIMARY KEY,
    user_id     BIGINT NOT NULL,
    data        JSONB,
    expires_at  TIMESTAMPTZ NOT NULL
);

-- Clean up expired sessions periodically
DELETE FROM session_cache WHERE expires_at < NOW();

LISTEN/NOTIFY for cache invalidation signals:

CREATE OR REPLACE FUNCTION notify_product_change()
RETURNS TRIGGER AS $$
BEGIN
    PERFORM pg_notify(
        'product_changes',
        json_build_object('id', NEW.id, 'action', TG_OP)::text
    );
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_product_notify
    AFTER INSERT OR UPDATE OR DELETE ON products
    FOR EACH ROW
    EXECUTE FUNCTION notify_product_change();

For a thorough treatment, see PostgreSQL vs Redis caching.

Step 7 — PostgreSQL proxies

A PostgreSQL proxy sits in the connection path and can automate optimizations that are impractical to manage by hand: automatic query analysis, materialized view management, intelligent connection pooling, and read/write routing.

A proxy is appropriate when the team cannot dedicate a full-time DBA, query patterns change frequently, or you want the benefits of steps 1-3 applied automatically.

Gold Lapel is one such proxy — it monitors query traffic, creates and maintains materialized views, and optimizes query routing without requiring application changes.

See: query proxy | PgBouncer comparison

Step 8 — Sharding (the last resort)

If you've arrived here having genuinely exhausted the previous seven steps, you are operating at a scale that relatively few teams encounter.

Sharding distributes data across multiple independent PostgreSQL instances. Unlike replication, sharding splits the data. The costs are significant:

Cross-shard queries are expensive or impossible. Simple JOINs become distributed queries.
Schema changes must be coordinated across all shards.
Operational complexity multiplies. Each shard needs its own backups, monitoring, and failover.
Data rebalancing is disruptive when adding a new shard.

Sharding options

Citus provides transparent sharding as a PostgreSQL extension:

-- On the Citus coordinator
SELECT create_distributed_table('orders', 'tenant_id');
SELECT create_distributed_table('order_items', 'tenant_id');

-- Queries on the same tenant_id are routed to a single shard
SELECT * FROM orders
JOIN order_items ON order_items.order_id = orders.id
WHERE orders.tenant_id = 42;

Application-level sharding implements routing logic directly in application code:

def get_shard(tenant_id):
    shard_index = tenant_id % NUM_SHARDS
    return database_connections[shard_index]

# Route query to the correct shard
connection = get_shard(tenant_id=42)
cursor = connection.cursor()
cursor.execute("SELECT * FROM orders WHERE tenant_id = %s", [42])

Most PostgreSQL workloads never need sharding. PostgreSQL on modern hardware handles databases in the multi-terabyte range with proper tuning.

See: Citus

Decision flowchart

Use this checklist to identify where you are on the scaling ladder. Start at the top and work down — the first "yes" is your next step.

Question	If Yes	Step
Are your queries optimized? Indexes in place, plans efficient, configuration tuned?	No →	Step 1: Tune
Are you running out of connections or seeing connection errors?	Yes →	Step 2: Pool
Are expensive aggregations or repeated complex queries slowing reads?	Yes →	Step 3: Materialize
Is total read throughput the bottleneck?	Yes →	Step 4: Replicate
Are specific tables too large for efficient queries or maintenance?	Yes →	Step 5: Partition
Do you need sub-millisecond reads for hot data?	Yes →	Step 6: Cache
Do you want automated, continuous optimization without a dedicated DBA?	Yes →	Step 7: Proxy
Is write throughput or total data volume the bottleneck?	Yes →	Step 8: Shard

For most teams, the answer is somewhere in steps 1-4. Steps 5-7 serve specific workload profiles. Step 8 is rare. The discipline to exhaust inexpensive options before reaching for expensive ones is what separates a well-run household from one that takes the time to understand the full capabilities of the staff already on retainer.