Citus

Q: Is Citus a fork of PostgreSQL?

No, and this distinction matters. Citus is an extension that loads into standard PostgreSQL via shared_preload_libraries. You install it with CREATE EXTENSION like any other extension. Your database remains fully PostgreSQL-compatible — standard clients, tools, and other extensions all work alongside it. You are not leaving PostgreSQL. You are extending it.

Q: How do I choose a distribution column?

This is the single most consequential decision in a Citus deployment, so take your time with it. Pick the column that appears in the WHERE clause of most queries — typically tenant_id for multi-tenant applications, or user_id for user-facing workloads. A good distribution column produces roughly equal shard sizes and allows most queries to target a single shard. Avoid columns with high skew (one value holding most of the data) or columns rarely used in filters. Changing it later is possible but decidedly unpleasant.

Q: Can I run JOINs across distributed tables?

Yes, and understanding the difference between co-located and cross-shard joins is essential. Co-located joins — where both tables are distributed on the same column and the join condition includes that column — execute entirely on individual worker nodes with no cross-shard traffic. This is the ideal case. Cross-shard joins also work but require data movement between nodes, which is slower. Reference tables can be joined with any distributed table efficiently since they are present on every node.

Q: Does Citus support single-node mode?

Yes, and I would encourage you to start here. Since Citus 11, you can run the extension on a single PostgreSQL instance without any worker nodes. Tables are still logically sharded, which allows parallel query execution across CPU cores. This is useful for development, for testing your sharding strategy, and for workloads that benefit from parallelism without needing multiple machines. Graduate to a multi-node cluster when the single node genuinely cannot keep up.

Q: What happens when a worker node goes down?

Queries that need data from the unavailable node will fail — there is no graceful degradation by default, and I would rather you know this now than discover it at 3am. Citus supports shard replication (configuring a replication factor greater than 1), which maintains copies of each shard on multiple workers. With replication enabled, reads can be served from the replica, though writes to affected shards will still block until the node recovers. For production high-availability, most deployments use streaming replication at the PostgreSQL level on each node.

Q: Can I add nodes to a running cluster without downtime?

Yes, and this is one of the more reassuring features Citus offers. You add a new worker node with citus_add_node(), then run citus_rebalance_start() to redistribute shards. The rebalancer moves shards in the background using logical replication, so the cluster remains available during the process. Reads and writes continue normally; there is a brief lock per shard at the cutover moment, but your guests will scarcely notice.

Distributed PostgreSQL — shard tables across multiple nodes for horizontal scaling. A powerful tool, if you have genuinely exhausted everything a single node can offer.

The Waiter of Gold Lapel · Updated Mar 30, 2026 Published Mar 21, 2026 · 9 min read

Before we proceed, a word of context: distributing a database adds real operational complexity, and the return on that complexity depends on whether simpler approaches have been explored first. In many cases I have encountered, a well-placed index or query rewrite resolved the bottleneck without any distribution at all.

That said, when the work genuinely exceeds what one node can manage, Citus is an excellent choice. It is an open-source PostgreSQL extension (maintained by Microsoft) that transforms a standard PostgreSQL database into a distributed system. It shards tables across multiple nodes, parallelizes queries, and scales both reads and writes horizontally — all while remaining a PostgreSQL extension, not a fork.

What Citus does

Citus introduces a coordinator-worker architecture on top of PostgreSQL. The coordinator node holds metadata about how tables are sharded and routes queries to the correct worker nodes. Worker nodes store the actual data shards and execute queries locally. Applications connect to the coordinator using standard PostgreSQL clients — the distribution layer is transparent.

Tables in a Citus cluster fall into three categories:

Distributed tables are sharded by a distribution column. Each row is assigned to a shard based on the hash of its distribution column value. Queries that include the distribution column in their WHERE clause are routed to a single shard; broader queries are parallelized across all shards.
Reference tables are small lookup tables that are replicated in full to every worker node. This allows them to be joined efficiently with any distributed table without cross-node traffic.
Local tables are ordinary PostgreSQL tables that live only on the coordinator. They are useful for metadata, configuration, or administrative data that does not need to be distributed.

The coordinator rewrites incoming SQL into per-shard queries, pushes computation down to workers wherever possible, and merges results. For queries scoped to a single tenant or distribution key, the entire query runs on one worker with no coordination overhead.

When to use Citus

Citus is not needed for every PostgreSQL deployment, and it is worth noting that most databases perform well on a single node with proper indexing and query optimization. Citus becomes relevant when:

Multi-tenant SaaS applications — shard by tenant_id so each tenant's data lives on a single shard, queries are fast, and tenant isolation is natural
Real-time analytics dashboards — parallelize aggregation queries across workers to maintain sub-second response times as data volume grows
Time-series data at scale — distribute high-ingest event or IoT data across nodes when a single PostgreSQL instance cannot keep up with write throughput
Data that outgrows a single node — when your dataset exceeds what fits in memory on one server and disk I/O becomes the bottleneck
Read and write scaling — unlike read replicas which only scale reads, Citus scales both reads and writes across the cluster

Installation and setup

Citus must be loaded via shared_preload_libraries at server startup, and it must be the first entry in the list. This is a hard requirement — PostgreSQL will not start if Citus is not listed first. A restart is required when enabling Citus for the first time. Allow me to save you the debugging session: if your server refuses to start after enabling Citus, check the ordering.

postgresql.conf + SQL

-- 1. Add to postgresql.conf (requires restart)
-- Citus must be first in the list
shared_preload_libraries = 'citus'

-- 2. Restart PostgreSQL, then create the extension
CREATE EXTENSION citus;

-- 3. Verify
SELECT citus_version();

For a multi-node cluster, you also need to register worker nodes from the coordinator.

SQL — coordinator node

-- On the coordinator node, register worker nodes
SELECT citus_set_coordinator_host('coordinator-host', 5432);
SELECT citus_add_node('worker-1', 5432);
SELECT citus_add_node('worker-2', 5432);

-- Verify the cluster
SELECT * FROM citus_get_active_worker_nodes();

Citus is available as a package from the Citus repository for Debian, Ubuntu, and RHEL-based distributions. It is also available as a Docker image. For single-node development, you can run the coordinator and workers on the same machine, or use Citus in single-node mode (Citus 11+) without any workers at all.

Key concepts and examples

Distributing a table

Create the table with standard DDL, then call create_distributed_table() with the distribution column. Citus splits the table into 32 shards by default (configurable via citus.shard_count) and distributes them across workers.

SQL

-- Create a table normally
CREATE TABLE events (
  tenant_id  bigint NOT NULL,
  event_id   bigint NOT NULL,
  event_type text NOT NULL,
  payload    jsonb,
  created_at timestamptz DEFAULT now(),
  PRIMARY KEY (tenant_id, event_id)
);

-- Distribute it by tenant_id (32 shards by default)
SELECT create_distributed_table('events', 'tenant_id');

Reference tables

Small lookup tables that every worker needs access to — pricing plans, country codes, feature flags. Replicate them with create_reference_table() and they can be joined with any distributed table without network round-trips.

SQL

-- Small lookup tables should be reference tables
CREATE TABLE plans (
  plan_id   serial PRIMARY KEY,
  name      text NOT NULL,
  max_seats int NOT NULL
);

-- Replicate to every worker node
SELECT create_reference_table('plans');

Co-located joins

When two distributed tables share the same distribution column, JOINs on that column execute entirely within each worker node. No data leaves the node. This is the key to Citus performance in multi-tenant workloads — and the reason your choice of distribution column matters so much.

SQL

-- Co-located join: both tables sharded by tenant_id
-- Executes entirely on each worker node, no cross-shard traffic
SELECT e.event_type, count(*)
FROM events e
JOIN user_actions ua ON ua.tenant_id = e.tenant_id
                    AND ua.event_id = e.event_id
WHERE e.tenant_id = 42
GROUP BY e.event_type;

Cross-shard aggregations

Queries that span all shards are parallelized across workers. Each worker computes a partial result, and the coordinator merges them. This is where the investment in multiple nodes earns its keep.

SQL

-- Aggregation across all tenants
-- Citus parallelizes across workers, merges on coordinator
SELECT date_trunc('hour', created_at) AS hour,
       count(*) AS event_count
FROM events
GROUP BY 1
ORDER BY 1 DESC
LIMIT 24;

Online shard rebalancing

When you add a worker node, rebalance shards without downtime. Citus moves shards using logical replication in the background.

SQL

-- After adding a new worker node, rebalance shards
SELECT citus_rebalance_start();

-- Monitor rebalance progress
SELECT * FROM citus_rebalance_status();

Cloud availability

Provider	Status
Amazon RDS / Aurora	Not available — Citus is not supported on RDS
Google Cloud SQL	Not available — Citus is not in the supported extensions list
Azure Database for PostgreSQL	Available — native Citus support via Azure Cosmos DB for PostgreSQL (formerly Hyperscale)
Supabase	Not available — not included in supported extensions
Neon	Not available — not a pre-built extension (custom extension loading may be possible on paid plans)

Azure is currently the only major managed provider with native Citus support via Azure Cosmos DB for PostgreSQL, which follows naturally from Microsoft's acquisition of Citus Data in 2019. For other cloud providers, Citus requires self-hosting on virtual machines or containers — a meaningful operational commitment that should factor into your decision.

How Gold Lapel relates

I should note: Gold Lapel sits between your application and PostgreSQL, seeing every query before it reaches the database. In a Citus deployment, that means we see queries before they arrive at the coordinator — which is precisely where intervention is most useful.

The relationship is complementary. Citus handles data distribution and parallel execution across nodes. Gold Lapel handles the quality of the queries themselves — materialized view recommendations, index suggestions, query rewriting. A poorly written query distributed across 10 nodes is still a poorly written query. It simply runs poorly in parallel. We would prefer to catch those patterns before Citus fans them out across your cluster.

For multi-tenant workloads, this is particularly valuable. Gold Lapel can identify patterns where queries are missing the distribution column in their WHERE clause — causing expensive cross-shard scatter that your monitoring may not immediately surface. It flags N+1 patterns generating unnecessary coordinator round-trips, and tracks per-tenant query latency to spot tenants whose workloads are quietly consuming disproportionate resources. The kind of thing one notices when paying close attention.

Citus

What Citus does

When to use Citus

Installation and setup

Key concepts and examples

Distributing a table

Reference tables

Co-located joins

Cross-shard aggregations

Online shard rebalancing

Cloud availability

How Gold Lapel relates

Frequently asked questions

Is Citus a fork of PostgreSQL?

How do I choose a distribution column?

Can I run JOINs across distributed tables?

Does Citus support single-node mode?

What happens when a worker node goes down?

Can I add nodes to a running cluster without downtime?

Related content