← How-To

pgstattuple Deep Dive: Measuring and Fixing PostgreSQL Table Bloat

The room is tidy; it is simply larger than necessary. Let us measure exactly how much.

March 27, 2026 · 18 min read
The canvas measures 40 by 30 inches. The actual artwork occupies roughly 11. The rest, I am informed, is dead space from earlier drafts the artist never reclaimed. We have scheduled a VACUUM.

What table bloat actually is (and why it matters)

I regret to inform you that your table may be larger than it needs to be. Allow me to explain why.

PostgreSQL uses Multi-Version Concurrency Control (MVCC) to handle concurrent access. When a row is updated, PostgreSQL does not modify the existing row in place. Instead, it creates a new version of the row and marks the old version as a dead tuple. When a row is deleted, the row is not immediately removed — it is marked as dead. These dead tuples accumulate until VACUUM reclaims them.

This is where bloat enters the picture. VACUUM marks dead tuple space as reusable by future inserts, but it does not return that space to the operating system. The table file on disk remains the same size. Over time, a table can grow substantially larger than its live data — the gap between total size and live data size is bloat.

Bloat matters for several concrete reasons:

  • Increased I/O. Sequential scans read every page in the table, including pages that contain only dead tuples or free space. A table with 50% bloat requires twice the I/O for a full table scan.
  • Larger backups. Backup tools copy the entire table file. Bloated tables produce larger backups that take longer to create and restore.
  • Cache pollution. PostgreSQL's shared buffer cache stores table pages. Bloated pages waste buffer cache space that could hold live data, reducing the effective cache hit ratio.
  • Slower sequential scans. Any query that triggers a sequential scan reads more pages than necessary.

One important distinction, and I should be clear about this: some bloat is normal and healthy. A table with zero dead tuples and zero free space means VACUUM is running extremely aggressively or the table has no write activity. Healthy tables maintain a small buffer of free space for new inserts. The concern is excessive bloat — when dead tuple percentages climb into double digits or free space exceeds 30-40% of the table size.

For a broader treatment of bloat causes and prevention strategies, see the table bloat guide.

Installing and configuring pgstattuple

pgstattuple is one of the more accommodating PostgreSQL extensions to deploy. It does not require shared_preload_libraries, which means no server restart is needed:

SQL
CREATE EXTENSION pgstattuple;

Availability

pgstattuple is part of the PostgreSQL contrib package and is available on virtually every managed provider:

Providerpgstattuple Available
Self-managedYes
Amazon RDSYes
Google Cloud SQLYes
SupabaseYes
NeonYes
Azure Flexible ServerYes

Permissions

On PostgreSQL 16 and later, running pgstattuple functions requires membership in the pg_stat_scan_tables role:

SQL
GRANT pg_stat_scan_tables TO your_monitoring_role;

On earlier versions, superuser privileges are required.

The I/O warning

I must be forthcoming about this before we proceed: pgstattuple performs a full sequential read of the table. It reads every page to count live tuples, dead tuples, and free space. For a 100GB table, that means 100GB of I/O.

On small to medium tables (under 10GB), the impact is minimal. On very large tables during peak hours, the I/O load can affect other queries. Plan accordingly — run pgstattuple on large tables during maintenance windows or off-peak hours.

Reading the pgstattuple output

The primary function returns a comprehensive row-level summary of the table's physical state:

SQL
SELECT * FROM pgstattuple('orders');

-- -[ RECORD 1 ]------+----------
-- table_len          | 104857600
-- tuple_count        | 150000
-- tuple_len          | 72000000
-- tuple_percent      | 68.66
-- dead_tuple_count   | 18500
-- dead_tuple_len     | 8880000
-- dead_tuple_percent | 8.47
-- free_space         | 23977600
-- free_percent       | 22.87

Column-by-column interpretation

ColumnDescriptionThis Example
table_lenTotal table size in bytes100 MB
tuple_countNumber of live tuples150,000 rows
tuple_lenTotal size of live tuples in bytes~68.7 MB
tuple_percentPercentage of table occupied by live data68.66%
dead_tuple_countNumber of dead (deleted/updated) tuples18,500 rows
dead_tuple_lenTotal size of dead tuples in bytes~8.5 MB
dead_tuple_percentPercentage of table occupied by dead tuples8.47%
free_spaceSpace marked as reusable by VACUUM~22.9 MB
free_percentPercentage of table that is free space22.87%

In this example, the table is 100 MB on disk, but only 68.7 MB is live data. 8.5 MB is dead tuples waiting for VACUUM, and 22.9 MB is free space (previously reclaimed by VACUUM but not returned to the OS). The effective bloat is roughly 31%. I'm afraid that is rather more than one would prefer.

What the numbers mean

dead_tuple_percent < 5% — Healthy. Autovacuum is keeping up with the rate of updates and deletes. No action needed.

dead_tuple_percent 5-20% — Elevated. Autovacuum may be falling behind. Investigate autovacuum settings — the table may need per-table tuning to trigger vacuuming more frequently.

dead_tuple_percent > 20% — Bloated. Dead tuples are accumulating faster than VACUUM can process them, or VACUUM is being blocked (long-running transactions holding back the visibility horizon). Action is needed.

free_percent > 30% — The table has substantial free space. VACUUM has been reclaiming dead tuples, but the physical table file is still oversized. New inserts will reuse this space, but if the table is mostly read-heavy, the free space just wastes I/O on every sequential scan.

dead_tuple_percent near 0% with free_percent > 50% — This is a specific and common pattern worth understanding. It means VACUUM has done its job admirably (dead tuples are reclaimed), but the table file never shrank. The fix is compaction — VACUUM FULL or pg_repack.

pgstattuple for indexes — pgstatindex()

pgstattuple also provides pgstatindex() for analyzing B-tree index health:

SQL
SELECT * FROM pgstatindex('idx_orders_customer_id');

-- -[ RECORD 1 ]------+---------
-- version            | 4
-- tree_level         | 2
-- index_size         | 22487040
-- root_block_no      | 3
-- internal_pages     | 8
-- leaf_pages         | 2736
-- empty_pages        | 412
-- deleted_pages      | 89
-- avg_leaf_density   | 62.40
-- leaf_fragmentation | 18.30

Key columns for diagnosing index bloat:

  • leaf_fragmentation — Percentage of leaf pages that are out of logical order. Values above 30% indicate significant fragmentation from page splits and deletions.
  • avg_leaf_density — Average fill percentage of leaf pages. Values below 50% mean index pages are less than half full — significant space waste. A healthy B-tree typically shows 70-90% leaf density.
  • empty_pages — Pages that once held data but are now completely empty. These are wasted space within the index file.

Index bloat is particularly harmful for index-only scans, which read leaf pages directly without visiting the heap. When those leaf pages are mostly empty, the scan reads substantially more I/O than the actual data warrants.

Diagnosing bloat across your entire database

Checking tables one at a time is impractical for databases with hundreds of tables. If you'll permit me, the following query runs pgstattuple against all user tables and produces a ranked bloat report:

SQL — database-wide bloat report
SELECT
  c.relname AS table_name,
  pg_size_pretty(pg_relation_size(c.oid)) AS total_size,
  s.tuple_count AS live_tuples,
  s.dead_tuple_count AS dead_tuples,
  s.dead_tuple_percent,
  s.free_percent,
  COALESCE(
    to_char(psu.last_autovacuum, 'YYYY-MM-DD HH24:MI'),
    'never'
  ) AS last_autovacuum,
  psu.autovacuum_count
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
CROSS JOIN LATERAL pgstattuple(c.oid) s
LEFT JOIN pg_stat_user_tables psu
  ON psu.relid = c.oid
WHERE c.relkind = 'r'
  AND n.nspname = 'public'
  AND pg_relation_size(c.oid) > 10 * 1024 * 1024  -- skip tables under 10 MB
ORDER BY s.dead_tuple_percent DESC;

-- table_name  | total_size | live_tuples | dead_tuples | dead_tuple_percent | free_percent | last_autovacuum  | autovacuum_count
-- ------------+------------+-------------+-------------+--------------------+--------------+------------------+------------------
-- audit_log   | 340 MB     |      300000 |       98000 |              24.12 |         8.40 | 2026-03-20 03:15 |               12
-- sessions    | 85 MB      |       40000 |       15200 |              15.80 |        12.30 | 2026-03-25 14:22 |               45
-- orders      | 100 MB     |      150000 |       18500 |               8.47 |        22.87 | 2026-03-25 22:10 |               89
-- line_items  | 210 MB     |      620000 |        8100 |               2.10 |         5.20 | 2026-03-26 01:45 |              134
-- users       | 45 MB      |      200000 |        1200 |               0.85 |         3.10 | 2026-03-26 02:30 |               78

Allow me to read this report with you:

  • audit_log — 24% dead tuples on a 340 MB table. Autovacuum has only run 12 times and last ran 6 days ago. This table needs attention — autovacuum is not keeping up with its duties.
  • sessions — 16% dead tuples. Autovacuum runs frequently (45 times) but the table has high churn. Per-table autovacuum tuning may help.
  • orders — 8.5% dead tuples, 23% free space. Moderate bloat. Autovacuum is running regularly. Worth monitoring but no urgent action needed.
  • line_items and users — In good order. Low dead tuple percentages, autovacuum running frequently.

The pgstattuple performance warning

A word of caution. The database-wide query above runs pgstattuple on every qualifying table sequentially. For databases with many large tables, this can generate substantial I/O. Two faster alternatives exist:

pgstattuple_approx() (PostgreSQL 9.5+) — Samples pages instead of reading every one. Faster and less I/O-intensive, but returns approximate numbers:

SQL — approximate, 85% less I/O
SELECT * FROM pgstattuple_approx('orders');

-- -[ RECORD 1 ]--------+----------
-- table_len            | 104857600
-- scanned_percent      | 15.00
-- approx_tuple_count   | 148500
-- approx_tuple_len     | 71280000
-- approx_tuple_percent | 67.98
-- dead_tuple_count     | 17800
-- dead_tuple_len       | 8544000
-- dead_tuple_percent   | 8.15
-- approx_free_space    | 24320000
-- approx_free_percent  | 23.19

The scanned_percent column shows that only 15% of pages were read. The results are approximate but directionally accurate — sufficient for monitoring and alerting.

pg_stat_user_tables.n_dead_tup — Zero overhead, no I/O. This column is updated by autovacuum and provides an estimate of dead tuples without scanning the table:

SQL — zero overhead
SELECT
  relname,
  n_live_tup,
  n_dead_tup,
  ROUND(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_pct,
  last_autovacuum
FROM pg_stat_user_tables
WHERE n_live_tup > 1000
ORDER BY dead_pct DESC NULLS LAST;

This relies on autovacuum statistics and can be stale, but it costs nothing to run and is suitable for alerting dashboards.

When to use each:

  • pgstattuple — Accurate diagnosis. Use when investigating a specific table's bloat or performing periodic audits.
  • pgstattuple_approx — Regular monitoring. Use for weekly or daily scans of all tables.
  • pg_stat_user_tables — Continuous alerting. Use in dashboards and monitoring systems that poll frequently.

Fixing bloat — your options

Option 1 — Let autovacuum do its job

Before reaching for more involved remedies, consider whether the existing staff simply need better instructions. If dead_tuple_percent is high but autovacuum has not run recently (or has run infrequently), the root cause may be autovacuum configuration rather than a fundamental bloat problem. Tuning autovacuum to run more aggressively on high-churn tables is often the correct first step.

Key autovacuum settings in postgresql.conf:

postgresql.conf
# Default: VACUUM triggers when dead tuples exceed
# autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * table_size
autovacuum_vacuum_threshold = 50       # minimum dead tuples before triggering
autovacuum_vacuum_scale_factor = 0.2   # fraction of table that must be dead
autovacuum_naptime = 60                # seconds between autovacuum checks

The default autovacuum_vacuum_scale_factor of 0.2 means autovacuum triggers when 20% of the table consists of dead tuples. For a table with 1 million rows, that is 200,000 dead tuples before VACUUM runs — far too many for a high-churn table.

Per-table overrides for hot tables:

SQL — per-table autovacuum tuning
-- Trigger autovacuum when just 1% of the table is dead
ALTER TABLE orders SET (autovacuum_vacuum_scale_factor = 0.01);

-- Or use an absolute threshold: vacuum after 1,000 dead tuples
ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0,
  autovacuum_vacuum_threshold = 1000
);

For the full treatment of autovacuum configuration, see the autovacuum tuning guide.

Option 2 — VACUUM FULL (the thorough approach)

VACUUM FULL rewrites the entire table from scratch, compacting all live tuples into the minimum number of pages and returning the freed space to the operating system. After a VACUUM FULL, the table file is as small as it can be.

SQL
VACUUM FULL orders;

I should be forthcoming about the cost: VACUUM FULL acquires an ACCESS EXCLUSIVE lock on the table. No reads or writes can proceed while it runs. For a 100 GB table, this can take hours. The table is completely unavailable during the operation.

When it is justified:

  • free_percent exceeds 50% on a table that is primarily read-heavy
  • The table can tolerate a maintenance window
  • The bloat is causing measurable performance degradation

Option 3 — pg_repack (online rebuilding)

pg_repack achieves the same compaction as VACUUM FULL without holding an exclusive lock for the duration. For production tables that cannot tolerate downtime, this is the approach I would recommend.

How it works:

  1. Creates a log table and installs a trigger on the original table to capture concurrent changes
  2. Creates a new, empty copy of the table
  3. Copies all live tuples into the new table (while the trigger captures concurrent writes)
  4. Replays captured changes into the new table
  5. Swaps the original and new tables atomically (brief exclusive lock)
  6. Drops the old table and log table
SQL — installation
CREATE EXTENSION pg_repack;
Shell — usage
# Repack a single table
pg_repack -d mydb -t orders

# Repack all tables in the database
pg_repack -d mydb

# Repack a specific index
pg_repack -d mydb -i idx_orders_customer_id

Caveats:

  • Disk space: pg_repack needs enough free disk space to hold a full copy of the table plus its indexes.
  • Trigger overhead: The triggers installed during the copy add some write overhead to the original table.
  • Primary key required: pg_repack requires the table to have a primary key or a unique index with no nullable columns.

Option 4 — pg_squeeze (background compaction)

pg_squeeze is an alternative to pg_repack that uses logical decoding instead of triggers to capture changes during compaction. This results in lower overhead on write-heavy tables during the rebuild process.

Key differences from pg_repack:

  • Uses logical replication slots instead of triggers — lower write overhead
  • Runs as a background worker process — can be scheduled and managed within PostgreSQL
  • Requires wal_level = logical in postgresql.conf

Choosing the right approach

If I may suggest a decision flow:

  1. dead_tuple_percent is high — Investigate autovacuum. Is it running frequently enough? Tune per-table settings first.
  2. Autovacuum is tuned but free_percent is high — Space has been reclaimed but the table file is oversized. Compaction is needed.
  3. Can the table tolerate downtime?VACUUM FULL is the simplest option.
  4. No downtime acceptable?pg_repack or pg_squeeze.
ApproachLock LevelDisk OverheadDowntimeBest For
Autovacuum tuningNoneNoneNoneDead tuples accumulating due to configuration
VACUUM FULLACCESS EXCLUSIVETemporary (rewrites in place)Full table lock durationTables that can tolerate maintenance windows
pg_repackBrief exclusive (during swap)Full table copyMinimal (seconds)Production tables needing online compaction
pg_squeezeBrief exclusive (during swap)Full table copyMinimal (seconds)High-write tables with logical replication

Monitoring bloat over time

A one-time diagnosis identifies current bloat, but bloat is a recurring condition. Tables that bloat once will bloat again unless the underlying cause is addressed. Ongoing vigilance is the proper approach.

Periodic snapshots

Create a history table and populate it on a schedule using pg_cron or an external scheduler:

SQL — bloat history with pg_cron
CREATE TABLE bloat_history (
  captured_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
  table_name    TEXT NOT NULL,
  table_size    BIGINT,
  live_tuples   BIGINT,
  dead_tuples   BIGINT,
  dead_pct      NUMERIC(5,2),
  free_pct      NUMERIC(5,2)
);

-- Run daily via pg_cron
SELECT cron.schedule(
  'bloat-snapshot',
  '0 3 * * *',  -- 3 AM daily
  $$
  INSERT INTO bloat_history (table_name, table_size, live_tuples, dead_tuples, dead_pct, free_pct)
  SELECT
    c.relname,
    pg_relation_size(c.oid),
    s.approx_tuple_count,
    s.dead_tuple_count,
    s.dead_tuple_percent,
    s.approx_free_percent
  FROM pg_class c
  JOIN pg_namespace n ON n.oid = c.relnamespace
  CROSS JOIN LATERAL pgstattuple_approx(c.oid) s
  WHERE c.relkind = 'r'
    AND n.nspname = 'public'
    AND pg_relation_size(c.oid) > 10 * 1024 * 1024
  $$
);

Note the use of pgstattuple_approx rather than pgstattuple for the scheduled job — the approximate version is fast enough for daily monitoring without generating excessive I/O.

Alerting thresholds

MetricWarningCritical
dead_tuple_percent> 15% on tables > 1 GB> 25% on tables > 1 GB
free_percent> 30% on tables > 1 GB> 50% on tables > 1 GB
Time since last autovacuum> 24 hours on high-churn tables> 72 hours on any table > 1 GB

Trending with Grafana

Query the history table to build Grafana panels:

SQL — Grafana trend query
-- Dead tuple percentage trend for a specific table
SELECT
  captured_at,
  dead_pct
FROM bloat_history
WHERE table_name = 'orders'
  AND captured_at > now() - INTERVAL '30 days'
ORDER BY captured_at;

Visualizing bloat over time reveals patterns: does bloat spike after batch jobs? Does it accumulate slowly during the week and get reclaimed on weekends when autovacuum catches up? These patterns inform tuning decisions. The goal is to catch bloat trends before they become emergencies — proactive monitoring rather than reactive firefighting.

What Gold Lapel sees that pgstattuple does not

pgstattuple measures table bloat accurately, but it cannot tell you whether that bloat is actually affecting your queries. A table with 20% dead tuples that is only accessed via index scans may have no measurable performance impact from its bloat. A table with 10% dead tuples that is frequently sequentially scanned may be severely affected. The numbers alone do not tell the full story.

Gold Lapel monitors bloat continuously as part of its proxy operation, correlating table bloat with actual query performance. The proxy knows when bloat is degrading query execution times versus when it is cosmetically high but functionally irrelevant. This distinction — impact-aware bloat monitoring — determines whether action is needed and when.

Understanding pgstattuple and the bloat measurement workflow is valuable regardless of tooling. Teams that know how to read pgstattuple output, interpret the numbers, and choose the right remediation approach make better decisions about when and how to compact their tables. That knowledge makes every tool in the stack — pgstattuple included — more useful.

Frequently asked questions