Chapter 15: Scaling: From Laptop to Cluster

The Waiter of Gold Lapel · Published Apr 12, 2026 · 10 min

Permit me to address the question directly, because I know it is on your mind.

Does it scale?

I have spent fourteen chapters demonstrating that PostgreSQL handles search — lexical, fuzzy, phonetic, semantic, autocomplete, aggregations, reverse search, hybrid fusion, real-world architectures. The capabilities are established. The parity is demonstrated. And now, inevitably, comes the question that every engineering team asks before they commit to anything: what happens when the data grows? What happens when the traffic grows? What happens when the thing works so well that success itself becomes the challenge?

The answer is layered, and I will present it as a progression — because scaling is not a single step. It is a staircase, and most teams never need to climb past the first step. That is not a limitation. That is the architecture being appropriately sized for the workload.

The application code does not change at any step. I will say that once now, and then I will demonstrate it five times.

Step 1: Single Node

For the vast majority of application search workloads, a single PostgreSQL node is sufficient. I say this not as a caveat but as a recommendation.

The numbers:

GIN indexes on tsvector: Full-text search over millions of rows in single-digit milliseconds.
HNSW indexes on pgvector: Vector search over millions of vectors with sub-10ms latency at 95%+ recall.
GIN trigram indexes: Autocomplete and fuzzy matching in sub-millisecond times on tables with millions of rows.
Hybrid search (Ch13): The two-CTE RRF pipeline still completes in single-digit milliseconds because each CTE is fast and the fusion is a lightweight join.

Most applications have thousands to low millions of searchable rows. A single node is not a limitation for these workloads — it is the correct architecture. Running a distributed cluster for a dataset that fits comfortably on a single server is not scaling. It is overengineering, and overengineering creates its own operational burden — more nodes to monitor, more failure modes to handle, more complexity to debug at three in the morning. I would spare you that.

Gold Lapel’s proxy (Rust on Tokio — async, multithreaded) adds minimal overhead. Connection pooling is built in. The proxy is not the bottleneck; the database query is, and the indexes make the queries fast.

My recommendation: start here. Stay here as long as it works. Most teams stay here permanently, and they are right to do so.

Step 2: Table Partitioning

When a single table grows very large — tens of millions of rows — partitioning splits it into smaller physical tables that PostgreSQL manages transparently.

SQL

CREATE TABLE articles (
    id SERIAL,
    title TEXT,
    body TEXT,
    search_vector tsvector,
    embedding vector(1536),
    created_at TIMESTAMP
) PARTITION BY RANGE (created_at);

CREATE TABLE articles_2025 PARTITION OF articles
    FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
CREATE TABLE articles_2026 PARTITION OF articles
    FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');

What this means for search specifically:

GIN indexes are per-partition. Each partition has its own GIN index on search_vector — smaller, faster to build, faster to maintain than a single index spanning the entire table. HNSW indexes are also per-partition. Each partition has its own vector index on the embedding column.

Partition pruning is where the investment pays off. A query with WHERE created_at > '2025-06-01' only scans the relevant partition’s indexes. Search over recent data remains fast even if total data spans years and hundreds of millions of rows. The older partitions are there if you need them. They are not consulted if you do not.

Materialized views can reference partitioned base tables. The view’s query spans all partitions, but the underlying storage and indexes are partitioned for efficiency. The view does not know or care that the base table is partitioned. It simply queries it.

When to consider partitioning: Tables over approximately 50 million rows. Time-series data with natural date boundaries. Data with clear lifecycle patterns where old partitions can be archived or dropped.

No application code changes. No Gold Lapel configuration changes. PostgreSQL handles routing queries to the right partitions automatically. I trust you are beginning to notice a pattern.

Step 3: Read Replicas

When query throughput exceeds what a single node can handle — many concurrent search queries overwhelming the CPU or I/O — add read replicas.

Gold Lapel supports read replicas via the --replica flag. Search queries route to replicas. Writes go to the primary. The routing is automatic.

Read replicas use PostgreSQL’s streaming replication. Replication lag is typically sub-second. For search, this is well within acceptable latency — a search result that is one second behind the latest write is indistinguishable from real-time in nearly every application. If your users can perceive a one-second search lag, you have my respect for the sensitivity of your use case, but I would suggest it is the exception rather than the rule.

Each replica is a full copy of the database with all indexes — GIN, HNSW, trigram, expression indexes. Add more replicas to handle more concurrent queries. Read throughput scales linearly with replica count.

Materialized view interaction: The materialized view is refreshed on the primary. The refresh propagates to all replicas via streaming replication. All replicas serve the same view contents. No additional refresh configuration is needed. The replica is always current with the primary’s latest refresh. One refresh, all replicas updated. I find this a particularly considerate design.

When to use replicas: High concurrent read load — hundreds to thousands of search queries per second. Geographic distribution — replicas in different regions for lower latency. Isolating search query load from write-heavy application workloads.

Step 4: Citus — Distributed PostgreSQL

When a single node’s storage or write throughput is the constraint — not read throughput, which replicas address — Citus provides horizontal sharding.

Citus distributes tables across multiple worker nodes. The coordinator node routes queries. The application connects to the coordinator using standard PostgreSQL protocol — the same protocol every PostgreSQL client library already speaks. The application does not know the data is distributed. It simply queries PostgreSQL. Citus handles the rest.

SQL

-- On the Citus coordinator
SELECT create_distributed_table('articles', 'tenant_id');

What this means for search specifically:

Auto-created indexes — GIN, HNSW, trigram — propagate to all shards automatically. Citus handles this natively. Gold Lapel’s proxy creates the index on the coordinator; Citus distributes it to every shard. The developer does not manage per-shard indexes.

The distribution key matters. For multi-tenant applications, tenant_id is the natural choice. All of a tenant’s data lives on the same shard, so tenant-scoped searches are single-shard operations — fast, because the query touches only one node. Cross-shard search — searching across all tenants — requires scatter-gather: the coordinator sends the query to all shards and merges results. This is functionally equivalent to Elasticsearch’s distributed search across shards. Both systems perform the same fundamental operation.

The hybrid search pipeline from Chapter 13 works on Citus. The CTEs execute on each shard. Results are merged at the coordinator. The SQL is unchanged.

Gold Lapel + Citus: Gold Lapel auto-detects Citus at startup. The dashboard shows “PostgreSQL + Citus” with worker node count. No code changes. No configuration changes. GL talks to the Citus coordinator the same way it talks to a single-node PostgreSQL.

When Citus matters: Tens of millions to billions of rows where partitioning isn’t sufficient. Write-heavy workloads where a single primary is the bottleneck. Multi-tenant applications where per-tenant sharding is the natural distribution key. For most applications: you will not need Citus. It is there when you do, and the path to it does not require rebuilding anything you have already built.

Gold Lapel’s Own Scaling

Gold Lapel’s proxy is built in Rust on Tokio — async, multithreaded. Multiple proxy instances can run in front of the same database or database cluster. Mesh networking syncs cache state between instances — cache invalidation propagates across all proxy instances via P2P gossip.

The GL scaling path:

Single GL instance — handles most workloads
Multiple GL instances with mesh — higher proxy throughput, shared cache state
GL instances behind a load balancer — standard horizontal scaling
GL instances talking to Citus coordinator — distributed database, distributed proxy

At each step, the application code stays the same. gl.start() and the 13 search methods. The infrastructure grows beneath a stable API. I believe this is how scaling should work — the complexity stays in the infrastructure, not in the application.

The Complete Scaling Story

Step	Addresses	When Needed	Application Change	GL Change
Single node	Most workloads	Always (start here)	None	None
Partitioning	Very large tables (50M+ rows)	Tens of millions of rows	None	None
Read replicas	High read throughput	Hundreds+ QPS concurrent	None	`--replica` flag
Citus	Horizontal scale, write throughput	Billions of rows, multi-tenant	None	Auto-detected
Multi-instance GL	Proxy throughput, cache distribution	High proxy load	None	Mesh config

I would ask you to look at the fourth column. “Application Change: None.” At every step. The search code you write for a development laptop with 10,000 rows is the same search code that runs on a Citus cluster with a billion rows. The infrastructure changes. The queries do not. The application does not. I find this the most important property of the entire scaling story, and I did not want it buried in a paragraph where you might miss it.

Honest Boundary

Elasticsearch was purpose-built for distributed search from the ground up. Its sharding, replication, and cluster management are deeply integrated and battle-tested at extreme scale — multi-terabyte indexes, hundreds of nodes, petabyte-scale logging clusters. Citus brings PostgreSQL comparable distributed capabilities, but Elasticsearch has more production mileage at the furthest extreme of the scale spectrum. I acknowledge this because it is true, and because my recommendations are more useful when they include their boundaries.

The practical reality: a single PostgreSQL node with GIN indexes handles search over hundreds of millions of rows in single-digit milliseconds. Most applications will never need step 2, let alone step 4. The scaling story is there for the exceptions, and it is complete.

HNSW indexes on Citus are relatively new. For very large-scale distributed vector search — billions of vectors across many shards — test thoroughly before committing. The technology is maturing rapidly, but it has less production history than Elasticsearch’s distributed kNN search at extreme scale. This is a boundary worth knowing before you reach it, not after.

The scaling path is clear. Single node for most teams — and that is the right choice, not the temporary one. Partitioning for very large tables. Read replicas for high throughput. Citus for horizontal distribution. At every step, the application code stays the same. The infrastructure beneath it grows. The queries above it do not change.

I have now spent fifteen chapters making an argument. The search capabilities are demonstrated. The aggregations, the percolator, the custom analyzers — all addressed. The architecture patterns are blueprinted. The scaling path is complete. At this point, you may be convinced of the technical merits.

But you may also have an Elasticsearch cluster running in production right now. The indices are built. The sync pipeline is humming. The team knows the query DSL. And the question in your mind is not “can PostgreSQL do this?” — you have seen that it can. The question is: how do I get from where I am to where this book says I could be? Safely. Incrementally. Without breaking production on a Tuesday afternoon.

Chapter 16 is the migration playbook. It addresses that question with the same care I have given every other question in this book — which is to say, with the care it deserves. If you will follow me, I should like to show you the path. Others have walked it. I have guided them. The footing is sound.