Bloat
Wasted space in tables and indexes caused by dead rows that VACUUM has not yet reclaimed — or cannot reclaim fast enough.
If you have read the entries on MVCC and VACUUM, you know the cause and the cure. Bloat is what happens when the cure does not arrive quickly enough. PostgreSQL's MVCC model never overwrites a row in place — every UPDATE creates a new row version and leaves the old one behind, and every DELETE marks a row as dead without removing it. These dead tuples occupy physical space until VACUUM reclaims it. When dead space accumulates faster than VACUUM cleans it, tables and indexes grow beyond their useful size. Sequential scans slow down because they must read through pages of dead rows. The buffer cache wastes memory on data no transaction will ever see. Disk usage climbs without a corresponding increase in live data. A well-run household does not tolerate this kind of accumulation.
What bloat is
Every table in PostgreSQL is stored as a sequence of 8 KB pages. Each page holds some number of row versions (tuples). When a row is updated, the old version remains on its page, marked with an xmax transaction ID to indicate it has been superseded. When a row is deleted, the tuple stays in place, similarly marked. These dead tuples are invisible to new transactions but they still occupy physical bytes on disk.
Bloat is what happens when those dead tuples accumulate. A table that holds 1 million live rows but contains 500,000 dead tuples is roughly 50% larger than it needs to be. PostgreSQL must read through all of those pages during a sequential scan, regardless of whether the tuples on them are live or dead. Index entries pointing to dead tuples remain in the index until VACUUM removes them, inflating the index beyond its useful size as well. It is, if I may be direct, clutter — and clutter compounds.
Some amount of dead space is normal. PostgreSQL reuses it for future inserts and updates. Bloat becomes a problem when the dead-to-live ratio grows large enough to measurably affect performance — typically when 20-30% or more of a table's physical space is dead.
Why bloat matters
I should enumerate the consequences, because bloat is one of those problems that sounds merely untidy until you see the bill.
- Slower sequential scans — a sequential scan reads every page of a table. Dead tuples inflate page count, so the scan reads more data than necessary. A table with 50% bloat takes roughly 50% longer to scan.
- Wasted buffer cache — PostgreSQL caches pages in shared_buffers. Bloated pages full of dead tuples occupy cache slots that could hold live data, effectively reducing your usable cache size.
- Inflated disk usage — dead space consumes real disk. On managed databases where you pay per GB, bloat directly increases cost.
- Index inefficiency — dead index entries make B-tree indexes sparse and larger than necessary. Index scans traverse more pages, and index-only scans become less effective when the visibility map marks bloated pages as not all-visible.
- Slower backups and replication — physical backups (pg_basebackup, WAL archiving) copy the full table files, dead space included. Bloated tables mean larger backups and longer base backup times.
Measuring bloat
Before prescribing a remedy, one must establish the severity. There are several ways to assess bloat, from quick estimates to precise measurements.
pg_stat_user_tables
The fastest check — a glance around the room, not a formal inspection. The n_dead_tup column reports an estimate of dead tuples per table. Compare it to n_live_tup to get a dead tuple ratio. This is an estimate updated by autovacuum and ANALYZE, not a real-time count, but it is good enough for identifying which tables deserve closer attention.
-- Check dead tuple counts and bloat indicators per table
SELECT
schemaname,
relname,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
n_live_tup,
n_dead_tup,
round(n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0) * 100, 1) AS dead_pct,
last_autovacuum,
last_vacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC
LIMIT 15; pgstattuple
For precise measurement, the pgstattuple extension scans the table physically and reports exact dead tuple counts, dead space, and free space. It reads every page, so it is slower than querying statistics views — but the numbers are real, not estimated. I appreciate a proper inventory.
-- Detailed physical bloat measurement with pgstattuple
CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT
table_len,
tuple_count,
dead_tuple_count,
dead_tuple_len,
round(dead_tuple_len::numeric / nullif(table_len, 0) * 100, 2) AS dead_pct,
free_space,
round(free_space::numeric / nullif(table_len, 0) * 100, 2) AS free_pct
FROM pgstattuple('my_table'); Bloat estimation queries
A middle ground: estimate bloat by comparing the table's actual disk size against what you would expect from its live row count and average row width. These queries use planner statistics and do not scan the table.
-- Estimate table bloat by comparing actual size to expected size
-- Based on average row width and live tuple count
SELECT
schemaname,
relname,
pg_size_pretty(pg_relation_size(relid)) AS table_size,
n_live_tup,
pg_size_pretty(
pg_relation_size(relid) -
(n_live_tup * (SELECT avg(avg_width) FROM pg_stats WHERE tablename = relname)::bigint)
) AS estimated_bloat
FROM pg_stat_user_tables
WHERE n_live_tup > 0
ORDER BY pg_relation_size(relid) DESC
LIMIT 10; Index bloat
Index bloat is measured separately. The pgstatindex function from pgstattuple reports leaf page density and fragmentation for B-tree indexes. An avg_leaf_density below 50% signals that the index has significant wasted space.
-- Check index bloat using pgstattuple
SELECT
indexrelname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan,
(pgstatindex(indexrelid)).avg_leaf_density,
(pgstatindex(indexrelid)).leaf_fragmentation
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 10;
-- avg_leaf_density below 50% suggests significant index bloat
-- A healthy B-tree index typically has 90%+ leaf density Fixing bloat
Once bloat has accumulated, you have several options to reclaim the space. Each involves a trade-off between thoroughness and disruption — the eternal tension between a deep clean and keeping the household running.
VACUUM
Standard VACUUM marks dead space as reusable within the table. It does not shrink the table file on disk — the OS-level file stays the same size — but PostgreSQL will fill the reclaimed gaps with future writes. This is the right tool for prevention: run it consistently and the clutter never accumulates beyond what the table can absorb.
VACUUM FULL
Rewrites the entire table to a new file, eliminating all dead space. The table physically shrinks on disk. The cost is an ACCESS EXCLUSIVE lock for the duration of the rewrite — no reads or writes until it completes. On a large table, this can mean minutes or hours of downtime.
-- Standard VACUUM: marks dead space as reusable
-- Does NOT return space to the OS — table stays the same size on disk
VACUUM VERBOSE my_table;
-- VACUUM FULL: rewrites the entire table, reclaims disk space
-- WARNING: acquires ACCESS EXCLUSIVE lock — blocks all reads and writes
VACUUM FULL my_table; pg_repack
Rebuilds a table or index online, without the heavy lock of VACUUM FULL. It creates a copy of the table, replays changes that occurred during the copy, and swaps the old and new tables in a brief lock at the end. The result is the same as VACUUM FULL — a compact, bloat-free table — without closing the doors to guests. This is the preferred tool for fixing bloat on production tables, and I am rather fond of it.
pg_squeeze
An extension that automates compaction based on bloat thresholds. You configure a table to be monitored, and pg_squeeze rebuilds it automatically when bloat exceeds the threshold. Uses the same logical decoding approach as pg_repack but runs as a background worker without manual intervention.
-- pg_repack: rebuild table online without heavy locks
-- Install the extension first
CREATE EXTENSION pg_repack;
-- Repack a single table (runs online, minimal locking)
-- Run from the command line, not SQL:
-- pg_repack -d mydb -t my_table
-- Repack a specific index
-- pg_repack -d mydb -i my_table_pkey
-- pg_squeeze: automatic compaction triggered by bloat thresholds
CREATE EXTENSION pg_squeeze;
-- Register a table for automatic compaction
SELECT squeeze.start_worker();
INSERT INTO squeeze.tables (tabschema, tabname, free_space_extra)
VALUES ('public', 'my_table', 20); REINDEX
For index-only bloat, REINDEX rebuilds an index from scratch. REINDEX CONCURRENTLY (PostgreSQL 12+) does this without locking writes, though it uses more disk space temporarily. pg_repack can also rebuild individual indexes online.
Preventing bloat
Fixing bloat after the fact works, but I confess a strong preference for preventing it altogether. Three strategies cover most cases.
Tune autovacuum for write-heavy tables
The default autovacuum_vacuum_scale_factor of 0.2 means vacuum does not trigger until 20% of the table is dead tuples. For a table with 10 million rows, that is 2 million dead tuples before vacuum runs — far too many. Lower the scale factor to 0.01 or even 0.005 for high-write tables, so vacuum runs earlier and more often.
-- Lower autovacuum thresholds for write-heavy tables
ALTER TABLE hot_table SET (
autovacuum_vacuum_scale_factor = 0.01, -- trigger at 1% dead tuples
autovacuum_vacuum_threshold = 1000 -- minimum dead tuples
);
-- Increase autovacuum aggressiveness globally
-- In postgresql.conf:
-- autovacuum_vacuum_cost_delay = 2 -- ms (default 2, lower = faster)
-- autovacuum_vacuum_cost_limit = 400 -- higher = more work per cycle
-- Check for long-running transactions that block VACUUM
SELECT
pid,
now() - xact_start AS duration,
state,
query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
AND state != 'idle'
ORDER BY xact_start
LIMIT 10; Avoid long-running transactions
VACUUM can only reclaim tuples that are invisible to all active transactions. A long-running transaction holds the visibility horizon open, preventing cleanup of any tuples created after it started. A single forgotten BEGIN in a monitoring session can block vacuuming across the entire database. Monitor pg_stat_activity for transactions that have been open for hours and terminate them. I cannot stress this enough — one idle transaction, left unattended, can undo every other precaution on this list.
Maximize HOT updates
HOT (Heap-Only Tuple) updates occur when an update changes no indexed columns and the new row version fits on the same page. HOT updates skip index updates entirely, which means no dead index entries and no index bloat. Two things help: keep fillfactor below 100 (e.g., 70) to leave room for in-page updates, and avoid indexing columns that are frequently updated.
-- Check HOT update effectiveness per table
SELECT
schemaname,
relname,
n_tup_upd,
n_tup_hot_upd,
round(n_tup_hot_upd::numeric / nullif(n_tup_upd, 0) * 100, 1) AS hot_pct
FROM pg_stat_user_tables
WHERE n_tup_upd > 1000
ORDER BY n_tup_upd DESC
LIMIT 10;
-- High hot_pct (80%+) means most updates avoid index writes
-- Low hot_pct means updates are creating index bloat too How Gold Lapel relates
Gold Lapel monitors table and index bloat as part of its continuous database analysis. When we detect that query performance has degraded on a table carrying significant dead space, we correlate the two — distinguishing between a query that is slow because it lacks an index and a query that is slow because the table it scans is three times larger than its live data warrants.
This distinction matters because the fix is different. A missing index calls for an index recommendation. A bloated table calls for vacuum tuning or a rebuild. Gold Lapel surfaces both the symptom (slow query) and the underlying cause (bloat, missing index, stale statistics, or something else), so you address the right problem rather than adding indexes to a table that simply needs a thorough cleaning.