Index bloat
Your index has grown rather larger than its contents warrant. If I may, I'd like to show you where the dead entries are hiding — and how to restore order.
Index bloat is wasted space within a PostgreSQL index — a filing system that has accumulated outdated entries nobody will ever look up again. When rows are updated or deleted, the old index entries are not immediately removed; they linger as dead pointers until VACUUM attends to them. Even after dead entries are cleared, B-tree indexes do not automatically compact their page structure. Leaf pages that were once full can end up half-empty or entirely empty, and those pages stay allocated. The result is an index that is physically larger than it needs to be, forcing index scans to read more pages, wasting buffer cache on sparse data, and inflating disk usage. REINDEX or pg_repack will rebuild the index from scratch and restore it to proper working order.
What index bloat is
A B-tree index in PostgreSQL is a balanced tree of 8 KB pages. Leaf pages hold the actual index entries — each one a (key, pointer) pair that maps a column value to a heap tuple. When a row is updated, PostgreSQL creates a new heap tuple and inserts a new index entry pointing to it. The old index entry, still pointing to the now-dead heap tuple, remains in the leaf page until VACUUM removes it. Think of it as a directory where someone has crossed out old entries but never torn out the pages.
This creates two forms of wasted space. First, dead index entries occupy bytes on leaf pages — entries that will never be followed but must be skipped during scans. Second, when VACUUM does remove dead entries, or when page splits leave behind unevenly filled pages, the leaf pages become sparse. A page that could hold 200 index entries might contain only 40. PostgreSQL does not merge underfilled pages or return empty pages to the free space map in normal operation. The pages stay allocated, and the index stays large.
Over time, on tables with heavy update or delete activity, B-tree indexes can grow to two, three, or even ten times the size that the live data warrants. The index still works — queries still find the right rows — but every scan must thumb through pages of stale references to reach the ones that matter.
Why index bloat matters
Allow me to be direct: index bloat is not merely a storage concern. It affects query performance in several compounding ways, each one quietly making the next worse:
- More I/O per index scan — an index scan traverses the tree from root to leaf and then follows pointers to the heap. Sparse leaf pages mean the scan reads more pages to find the same number of live entries. For range scans that touch many leaf pages, the overhead multiplies.
- Wasted buffer cache — PostgreSQL caches index pages in shared_buffers. Sparse pages full of dead or empty space occupy cache slots that could hold denser, more useful data. Effective cache capacity drops.
- Reduced index-only scan effectiveness — index-only scans avoid heap access by reading values directly from the index. When index pages are sparse, the scan reads more pages for the same data, reducing the I/O advantage that makes index-only scans fast.
- Inflated backups — physical backups (pg_basebackup, WAL archiving) copy index files at their full size, bloat included. A 10 GB index that should be 2 GB adds 8 GB to every base backup. One pays to store the emptiness, which I find rather dispiriting.
- Slower writes — inserting into a bloated index is not significantly slower, but the increased tree depth from sparse pages can add an extra level of traversal. More importantly, the wasted buffer cache from bloated indexes puts pressure on all other operations.
Measuring index bloat
Before one can fix what ails the index, one must know how far the situation has deteriorated. There are two approaches: precise measurement with pgstatindex() and quick estimation from catalog statistics.
pgstatindex (precise)
The pgstattuple extension provides pgstatindex(), which physically reads every page of a B-tree index and reports structural details. The key metric is avg_leaf_density — the percentage of leaf page space occupied by live index entries.
-- Measure index bloat precisely with pgstatindex
CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT
indexrelname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
(pgstatindex(indexrelid)).version,
(pgstatindex(indexrelid)).tree_level,
(pgstatindex(indexrelid)).index_size AS raw_size,
(pgstatindex(indexrelid)).leaf_pages,
(pgstatindex(indexrelid)).empty_pages,
(pgstatindex(indexrelid)).deleted_pages,
(pgstatindex(indexrelid)).avg_leaf_density,
(pgstatindex(indexrelid)).leaf_fragmentation
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 10;
-- avg_leaf_density: percentage of leaf page space used by live tuples
-- 90%+ = healthy
-- 50-70% = moderate bloat
-- below 50% = significant bloat, rebuild recommended A word of caution: because pgstatindex() reads the entire index, it is slow on large indexes. I would not run this against a 50 GB index during peak hours. Off-hours or a replica, please.
Catalog estimation (quick)
For a faster but less precise check, compare index sizes from pg_stat_user_indexes against what you would expect for the table's row count. An index that is substantially larger than its table is often bloated. Indexes with zero scans since the last statistics reset may be both unused and bloated — candidates for DROP rather than rebuild.
-- Quick bloat estimation from pg_stat_user_indexes
-- Compares index size to the expected size for the number of rows
SELECT
schemaname,
relname AS table_name,
indexrelname AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
pg_size_pretty(pg_relation_size(relid)) AS table_size,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 15;
-- An index that is larger than its table is a red flag
-- An index with zero idx_scan may be unused — and bloated Ongoing monitoring
Track index sizes over time. A healthy index grows roughly in proportion to the table's live row count. An index whose size climbs steadily while the table's row count stays flat is accumulating bloat — the directory is expanding without new entries to justify it. Monitoring pg_stat_user_indexes at regular intervals catches this drift before it becomes a performance problem.
-- Monitor index usage alongside size to find bloated or unused indexes
SELECT
schemaname,
relname AS table_name,
indexrelname AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan AS scans_since_reset,
pg_stat_get_last_analyze_time(relid) AS last_analyze
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND pg_relation_size(indexrelid) > 1024 * 1024 -- > 1 MB
ORDER BY pg_relation_size(indexrelid) DESC;
-- Unused indexes that are also bloated are candidates for DROP, not REINDEX Fixing index bloat
Once an index is bloated, there is no gentle remedy — the only fix is to rebuild it. VACUUM removes dead entries but does not compact pages. Rebuilding creates a fresh, dense index structure from the current live data. The good news: PostgreSQL provides civilized ways to do this.
REINDEX
The built-in approach. REINDEX INDEX drops the old index and builds a new one from scratch. The result is a compact, bloat-free index. The cost: it acquires an ACCESS EXCLUSIVE lock on the index, blocking all reads and writes to the table for the duration. Effective, but not what I would call graceful under pressure.
-- REINDEX: rebuild a single index (locks writes)
REINDEX INDEX my_table_pkey;
-- REINDEX: rebuild all indexes on a table
REINDEX TABLE my_table;
-- REINDEX CONCURRENTLY (PostgreSQL 12+): rebuild without locking writes
-- Takes longer and uses more disk space temporarily, but does not block DML
REINDEX INDEX CONCURRENTLY my_table_pkey;
REINDEX TABLE CONCURRENTLY my_table; REINDEX CONCURRENTLY
Available since PostgreSQL 12, and — if I may say — the approach I would recommend for any system that values its uptime. It builds a new index alongside the old one without blocking writes. Once the new index is complete and consistent, PostgreSQL swaps it in and drops the old one. It uses roughly double the disk space temporarily and takes longer than a regular REINDEX, but it does not cause downtime. A proper renovation that never closes the house to guests.
pg_repack
pg_repack can rebuild individual indexes online, similar to REINDEX CONCURRENTLY but available on older PostgreSQL versions. It can also rebuild an entire table and all its indexes in one operation. For indexes specifically, it offers the same benefit: a compact rebuild without blocking DML.
pg_squeeze
An extension that automates compaction based on configurable bloat thresholds. Register a table, and pg_squeeze monitors it and rebuilds when bloat exceeds the threshold — including its indexes. Useful for hands-off maintenance on tables with predictable bloat patterns.
-- pg_repack: rebuild specific indexes online
-- Run from the command line, not SQL
-- pg_repack -d mydb --index my_table_pkey
-- pg_repack -d mydb --table my_table -- rebuilds table and all its indexes
-- pg_squeeze: automatic compaction when bloat exceeds a threshold
CREATE EXTENSION pg_squeeze;
SELECT squeeze.start_worker();
INSERT INTO squeeze.tables (tabschema, tabname, free_space_extra)
VALUES ('public', 'my_table', 20); How Gold Lapel relates
Gold Lapel notices when an index scan takes longer than it should for the number of rows returned — more page reads, lower cache hit ratios than the query plan warrants. I correlate that with index size relative to table size to identify bloat as the likely cause. The distinction matters: a slow scan caused by a poorly chosen index needs a different index; a slow scan caused by a well-chosen but bloated index needs a rebuild. Different diagnosis, different remedy, and getting it wrong wastes your time on the wrong fix.