← PostgreSQL Extensions

pg_repack

The diagnosis has been made. Now allow me to show you the remedy — one that tidies the room without asking anyone to leave it.

Extension · March 21, 2026 · 7 min read

If pgstattuple is the inspection — measuring exactly how much dead weight your tables are carrying — then pg_repack is the renovation that follows. It rewrites bloated tables and indexes online, without locking out concurrent reads or writes. Where VACUUM FULL would close the room entirely, pg_repack builds the replacement alongside the original, then swaps them in the space of a heartbeat. The household continues to operate throughout. I find the mechanism rather elegant.

What pg_repack does

PostgreSQL's MVCC architecture means that UPDATE and DELETE operations leave behind dead tuples. Regular VACUUM marks this space as reusable, but it does not return it to the operating system or compact the table. Over time, tables and indexes grow larger than their live data warrants — this is bloat.

pg_repack solves this by creating a clean copy of the table in the background. It installs a trigger on the original table to capture any changes (inserts, updates, deletes) that occur during the copy. Once the copy is complete and all indexes are rebuilt, it replays the captured changes, then performs a brief exclusive lock to swap the old table with the new one in PostgreSQL's system catalogs. The old bloated table is dropped immediately after.

The result is a fully compacted table with fresh indexes, achieved while your application continues to read and write normally. A shadow table, a trigger, and a swap — the old bloated copy is gone before anyone notices it was being replaced.

When to use pg_repack

pg_repack is most valuable when you need to reclaim space or restore performance without taking your database offline. Specific scenarios:

  • Table bloat after bulk deletes — a large DELETE or data migration leaves significant dead space that regular VACUUM cannot reclaim
  • Index bloat — B-tree indexes can become bloated after many updates to indexed columns, degrading lookup performance
  • Physical reordering — use --order-by to cluster table rows by a column (e.g., created_at) for better range scan performance, without the exclusive lock that CLUSTER requires
  • Production maintenance windows — when you cannot afford the downtime that VACUUM FULL or CLUSTER would impose
  • Post-migration cleanup — after schema changes or large data loads that leave tables in a suboptimal physical state

Installation and setup

pg_repack is a third-party extension — it does not ship with PostgreSQL's contrib modules. You need two components: the server-side extension (a shared library installed in PostgreSQL's extension directory) and the command-line client (pg_repack binary) that drives the operation from outside the database.

On Debian/Ubuntu, install both with your package manager:

Shell
sudo apt-get install postgresql-16-repack

Then create the extension inside your database:

SQL
-- Install the extension in your database
CREATE EXTENSION pg_repack;

-- Verify it's installed
SELECT extname, extversion FROM pg_extension WHERE extname = 'pg_repack';

pg_repack does not require shared_preload_libraries — no server restart is needed. Once the extension is created, you can run the pg_repack client immediately.

The client and server extension versions must match. If you upgrade PostgreSQL or the extension, make sure both sides are updated together.

Usage examples

Repack tables

The most common operation — rewrite one or more tables to remove dead tuples and reclaim disk space. Start here.

Shell
# Repack a single table (removes bloat, no reordering)
pg_repack -d mydb --table public.orders

# Repack multiple tables
pg_repack -d mydb --table public.orders --table public.line_items

# Repack all eligible tables in the database
pg_repack -d mydb

Repack with physical reordering

Rewrite the table with rows sorted by a specific column — the online equivalent of CLUSTER, without the regrettable habit CLUSTER has of locking out every connection for the duration.

Shell
# Repack and physically reorder rows by a column (online CLUSTER)
pg_repack -d mydb --table public.orders --order-by "created_at DESC"

# Equivalent to CLUSTER, but without the long exclusive lock
# Useful for range-scan-heavy queries on time-series data

Repack indexes

Rebuild indexes without touching the table data. Useful when index bloat is the primary issue.

Shell
# Repack a single index (rebuilds it without locking writes)
pg_repack -d mydb --index public.orders_pkey

# Repack only the indexes of a table (leaves table data alone)
pg_repack -d mydb --table public.orders --only-indexes

Dry run and diagnostics

Preview what pg_repack would do, or get verbose output for debugging.

Shell
# Dry run — list what would be repacked without doing anything
pg_repack -d mydb --dry-run

# Show connection info and repack with verbose output
pg_repack -d mydb --table public.orders --no-order --echo

Measuring bloat before and after

Use pgstattuple to quantify bloat before running pg_repack, and verify it was eliminated afterward. Measure twice, repack once.

SQL
-- Measure table bloat with pgstattuple before and after
CREATE EXTENSION IF NOT EXISTS pgstattuple;

SELECT
  pg_size_pretty(pg_relation_size('orders')) AS table_size,
  dead_tuple_percent
FROM pgstattuple('orders');

-- After repacking, dead_tuple_percent should drop to ~0%

How it works internally

Understanding the internal mechanism helps you anticipate resource requirements and plan maintenance windows.

  1. Create log table and trigger — pg_repack briefly takes an exclusive lock to install a trigger on the original table. This trigger captures all INSERTs, UPDATEs, and DELETEs into a log table for the duration of the operation.
  2. Create shadow table and copy data — a new table is created with the same schema. All live rows are copied from the original table using an INSERT ... SELECT. During this phase, concurrent writes go to the original table and are recorded by the trigger.
  3. Build indexes — all indexes from the original table are recreated on the shadow table. This is often the most time-consuming step for heavily indexed tables.
  4. Replay changes — the log table is drained: changes captured by the trigger are applied to the shadow table. This repeats until the backlog is small enough to complete during the final lock.
  5. Swap and drop — a brief exclusive lock is acquired. The original and shadow tables are swapped in pg_class (including TOAST tables and indexes). The old table is dropped.

Throughout steps 2-4, the original table remains fully available for reads and writes. The exclusive lock in step 5 typically lasts less than a second. The entire operation amounts to renovating a room while the household carries on — the guests never need to be inconvenienced.

Cloud availability

ProviderStatus
Amazon RDS / AuroraAvailable — install via CREATE EXTENSION; run the pg_repack client from an EC2 instance
Google Cloud SQLAvailable — enable via CREATE EXTENSION; may require cloudsqlsuperuser privileges
Azure Database for PostgreSQLAvailable — supported on Flexible Server (v1.5.1+)
SupabaseAvailable — enable via dashboard; use the -k flag to skip superuser check
NeonAvailable — supported on all plans

On managed platforms, you typically cannot install the pg_repack client binary on the database server itself. Instead, run it from an external machine (a bastion host, CI/CD runner, or local workstation) that can connect to the database. The client version must match the server extension version.

How Gold Lapel relates

I should note that bloat rarely announces itself. It accumulates quietly — a few percent here, a few percent there — until one morning your queries take twice as long and no one can explain why. This is precisely the sort of thing I prefer to catch early.

Gold Lapel observes query execution patterns at the proxy level: sequential scan frequency, I/O amplification, index hit ratios. When a table with 40% dead tuples forces the planner into less efficient scan strategies, or when shared buffer reads climb for unchanged query volumes, these signals surface before the slowdown becomes a crisis. I would rather inform you that a table needs attention next Tuesday than explain why your dashboard was unresponsive last Thursday.

pg_repack handles the remedy. Gold Lapel handles the early detection. One diagnoses, the other treats — and between them, bloat becomes a scheduled maintenance task rather than a midnight emergency.

Frequently asked questions