pg_repack

Q: How is pg_repack different from VACUUM FULL?

Both reclaim space by rewriting the table. The difference is in manners. VACUUM FULL holds an exclusive lock for the entire duration — no reads or writes may proceed until it finishes. For a large table, that could mean hours of downtime. pg_repack, by contrast, holds only a brief exclusive lock at the very end when it swaps the old and new copies. Your application continues operating throughout. One locks you out of the room; the other works around you.

Q: Does my table need a primary key to use pg_repack?

Yes, and this is non-negotiable. pg_repack requires a PRIMARY KEY or a UNIQUE index on NOT NULL columns — it needs a way to uniquely identify each row so it can replay INSERTs, UPDATEs, and DELETEs that occur during the copy. Tables without any unique identifier simply cannot be repacked. If your table does not have one, adding a primary key is a worthwhile step that benefits more than just pg_repack.

Q: How much extra disk space does pg_repack need?

A fair amount, and this catches people off guard. pg_repack creates a full copy of the table plus its indexes, so you need roughly as much free disk space as the current table occupies. For a 50 GB table with 20 GB of indexes, plan for at least 70 GB free. The old copy is dropped immediately after the swap, so the space is temporary — but you do need it available before you begin.

Q: Can I run pg_repack on a read replica?

No. pg_repack creates a shadow copy, installs triggers, and swaps tables in pg_class — all write operations that can only execute on the primary. Replicas will receive the changes through normal replication after the repack completes. Attempting it on a replica will simply fail.

Q: Is it safe to cancel a pg_repack operation mid-way?

Yes — and this is one of its more reassuring qualities. If pg_repack is interrupted by Ctrl+C, a network disconnect, or a crash, the original table remains fully intact. The operation leaves behind temporary objects (the shadow table, log table, and trigger) that occupy disk space but cause no harm. Clean them up with --dry-run or by dropping the repack schema objects manually.

Q: How often should I repack tables?

That depends entirely on how your tables are treated. Tables under heavy UPDATE or DELETE activity accumulate bloat faster than those that are mostly read. A sensible approach is to monitor dead tuple percentage with pgstattuple and repack when bloat exceeds 20-30%. Some teams schedule weekly or monthly repacks for their most active tables via pg_cron. The key is measuring rather than guessing — bloat is not a feeling, it is a number.

The diagnosis has been made. Now allow me to show you the remedy — one that tidies the room without asking anyone to leave it.

The Waiter of Gold Lapel · Updated Mar 30, 2026 Published Mar 21, 2026 · 7 min read

If pgstattuple is the inspection — measuring exactly how much dead weight your tables are carrying — then pg_repack is the renovation that follows. It rewrites bloated tables and indexes online, without locking out concurrent reads or writes. Where VACUUM FULL would close the room entirely, pg_repack builds the replacement alongside the original, then swaps them in the space of a heartbeat. The household continues to operate throughout. I find the mechanism rather elegant.

What pg_repack does

PostgreSQL's MVCC architecture means that UPDATE and DELETE operations leave behind dead tuples. Regular VACUUM marks this space as reusable, but it does not return it to the operating system or compact the table. Over time, tables and indexes grow larger than their live data warrants — this is bloat.

pg_repack solves this by creating a clean copy of the table in the background. It installs a trigger on the original table to capture any changes (inserts, updates, deletes) that occur during the copy. Once the copy is complete and all indexes are rebuilt, it replays the captured changes, then performs a brief exclusive lock to swap the old table with the new one in PostgreSQL's system catalogs. The old bloated table is dropped immediately after.

The result is a fully compacted table with fresh indexes, achieved while your application continues to read and write normally. A shadow table, a trigger, and a swap — the old bloated copy is gone before anyone notices it was being replaced.

When to use pg_repack

pg_repack is most valuable when you need to reclaim space or restore performance without taking your database offline. Specific scenarios:

Table bloat after bulk deletes — a large DELETE or data migration leaves significant dead space that regular VACUUM cannot reclaim
Index bloat — B-tree indexes can become bloated after many updates to indexed columns, degrading lookup performance
Physical reordering — use --order-by to cluster table rows by a column (e.g., created_at) for better range scan performance, without the exclusive lock that CLUSTER requires
Production maintenance windows — when you cannot afford the downtime that VACUUM FULL or CLUSTER would impose
Post-migration cleanup — after schema changes or large data loads that leave tables in a suboptimal physical state

Installation and setup

pg_repack is a third-party extension maintained on GitHub — it does not ship with PostgreSQL's contrib modules. You need two components: the server-side extension (a shared library installed in PostgreSQL's extension directory) and the command-line client (pg_repack binary) that drives the operation from outside the database. The official documentation covers installation, command-line options, and version compatibility.

On Debian/Ubuntu, install both with your package manager:

Shell

sudo apt-get install postgresql-16-repack

Then create the extension inside your database:

SQL

-- Install the extension in your database
CREATE EXTENSION pg_repack;

-- Verify it's installed
SELECT extname, extversion FROM pg_extension WHERE extname = 'pg_repack';

pg_repack does not require shared_preload_libraries — no server restart is needed. Once the extension is created, you can run the pg_repack client immediately.

The client and server extension versions must match. If you upgrade PostgreSQL or the extension, make sure both sides are updated together.

Usage examples

Repack tables

The most common operation — rewrite one or more tables to remove dead tuples and reclaim disk space. Start here.

Shell

# Repack a single table (removes bloat, no reordering)
pg_repack -d mydb --table public.orders

# Repack multiple tables
pg_repack -d mydb --table public.orders --table public.line_items

# Repack all eligible tables in the database
pg_repack -d mydb

Repack with physical reordering

Rewrite the table with rows sorted by a specific column — the online equivalent of CLUSTER, without the regrettable habit CLUSTER has of locking out every connection for the duration.

Shell

# Repack and physically reorder rows by a column (online CLUSTER)
pg_repack -d mydb --table public.orders --order-by "created_at DESC"

# Equivalent to CLUSTER, but without the long exclusive lock
# Useful for range-scan-heavy queries on time-series data

Repack indexes

Rebuild indexes without touching the table data. Useful when index bloat is the primary issue.

Shell

# Repack a single index (rebuilds it without locking writes)
pg_repack -d mydb --index public.orders_pkey

# Repack only the indexes of a table (leaves table data alone)
pg_repack -d mydb --table public.orders --only-indexes

Dry run and diagnostics

Preview what pg_repack would do, or get verbose output for debugging.

Shell

# Dry run — list what would be repacked without doing anything
pg_repack -d mydb --dry-run

# Show connection info and repack with verbose output
pg_repack -d mydb --table public.orders --no-order --echo

Measuring bloat before and after

Use pgstattuple to quantify bloat before running pg_repack, and verify it was eliminated afterward. Measure twice, repack once.

SQL

-- Measure table bloat with pgstattuple before and after
CREATE EXTENSION IF NOT EXISTS pgstattuple;

SELECT
  pg_size_pretty(pg_relation_size('orders')) AS table_size,
  dead_tuple_percent
FROM pgstattuple('orders');

-- After repacking, dead_tuple_percent should drop to ~0%

How it works internally

Understanding the internal mechanism helps you anticipate resource requirements and plan maintenance windows.

Create log table and trigger — pg_repack briefly takes an exclusive lock to install a trigger on the original table. This trigger captures all INSERTs, UPDATEs, and DELETEs into a log table for the duration of the operation.
Create shadow table and copy data — a new table is created with the same schema. All live rows are copied from the original table using an INSERT ... SELECT. During this phase, concurrent writes go to the original table and are recorded by the trigger.
Build indexes — all indexes from the original table are recreated on the shadow table. This is often the most time-consuming step for heavily indexed tables.
Replay changes — the log table is drained: changes captured by the trigger are applied to the shadow table. This repeats until the backlog is small enough to complete during the final lock.
Swap and drop — a brief exclusive lock is acquired. The original and shadow tables are swapped in pg_class (including TOAST tables and indexes). The old table is dropped.

Throughout steps 2-4, the original table remains fully available for reads and writes. The exclusive lock in step 5 typically lasts less than a second. The entire operation amounts to renovating a room while the household carries on — the guests never need to be inconvenienced.

Cloud availability

Provider	Status
Amazon RDS / Aurora	Available — install via CREATE EXTENSION; run the pg_repack client from an EC2 instance
Google Cloud SQL	Available — enable via CREATE EXTENSION; may require cloudsqlsuperuser privileges
Azure Database for PostgreSQL	Available — supported on Flexible Server (v1.5.1+)
Supabase	Available — enable via dashboard; use the `-k` flag to skip superuser check
Neon	Available — supported on all plans

On managed platforms, you typically cannot install the pg_repack client binary on the database server itself. Instead, run it from an external machine (a bastion host, CI/CD runner, or local workstation) that can connect to the database. The client version must match the server extension version.

How Gold Lapel relates

I should note that bloat rarely announces itself. It accumulates quietly — a few percent here, a few percent there — until one morning your queries take twice as long and no one can explain why. This is precisely the sort of thing I prefer to catch early.

Gold Lapel observes query execution patterns at the proxy level: sequential scan frequency, I/O amplification, index hit ratios. When a table with 40% dead tuples forces the planner into less efficient scan strategies, or when shared buffer reads climb for unchanged query volumes, these signals surface before the slowdown becomes a crisis. I would rather inform you that a table needs attention next Tuesday than explain why your dashboard was unresponsive last Thursday.

pg_repack handles the remedy. Gold Lapel handles the early detection. One diagnoses, the other treats — and between them, bloat becomes a scheduled maintenance task rather than a midnight emergency.

pg_repack

What pg_repack does

When to use pg_repack

Installation and setup

Usage examples

Repack tables

Repack with physical reordering

Repack indexes

Dry run and diagnostics

Measuring bloat before and after

How it works internally

Cloud availability

How Gold Lapel relates

Frequently asked questions

How is pg_repack different from VACUUM FULL?

Does my table need a primary key to use pg_repack?

How much extra disk space does pg_repack need?

Can I run pg_repack on a read replica?

Is it safe to cancel a pg_repack operation mid-way?

How often should I repack tables?

Related content