← PostgreSQL Extensions

pg_squeeze

Automatic table compaction triggered by bloat thresholds — the staff member who tidies the house before you notice it needs tidying.

Extension · March 21, 2026 · 7 min read

A well-run household does not wait for a guest to remark upon the dust. pg_squeeze is a PostgreSQL extension by CYBERTEC that automatically monitors registered tables for bloat and rebuilds them when reclaimable space exceeds a configurable threshold. It runs entirely server-side as a background worker, using logical decoding to capture concurrent changes — no external tools, no manual intervention, no long-held locks.

What pg_squeeze does

PostgreSQL's MVCC architecture means that UPDATE and DELETE operations leave dead tuples behind. VACUUM reclaims these tuples for reuse within the table, but it does not return space to the operating system. Over time, tables accumulate bloat — the physical file grows larger than the live data it contains, which degrades sequential scan performance and wastes disk space.

pg_squeeze solves this by periodically checking registered tables against the free space map. When a table's reclaimable space exceeds the configured threshold (default: 50%), pg_squeeze creates a compact copy of the table, replays any concurrent changes via a logical decoding replication slot, and atomically swaps the new file in place of the old one. The result is a fully compacted table with no long-held locks.

This is the same fundamental approach used by pg_repack, but with a critical difference: pg_squeeze automates the entire lifecycle. You register a table once, define a schedule and a bloat threshold, and the background worker handles the rest. Where pg_repack is summoned when trouble arrives, pg_squeeze is the standing instruction to prevent trouble from arriving at all.

When to use pg_squeeze

pg_squeeze is most valuable when bloat management needs to be hands-off.

  • High-churn tables — tables with heavy UPDATE or DELETE traffic that accumulate bloat faster than VACUUM can compact in place
  • Automated maintenance — when you want bloat remediation on a schedule without cron jobs or external scripts
  • Large tables that need CLUSTER behavior — pg_squeeze can physically reorder rows by an index during compaction, similar to CLUSTER but without an exclusive lock
  • Environments where pg_repack is too manual — if you are already running pg_repack ad-hoc and want the same result with less operational overhead
  • Disk space pressure — when bloated tables are consuming significantly more storage than their live data requires

Installation and setup

pg_squeeze is a third-party extension maintained by CYBERTEC. It requires wal_level = logical because it uses logical decoding internally, and it must be loaded via shared_preload_libraries because it registers a background worker. Both settings require a PostgreSQL restart.

postgresql.conf + SQL
-- 1. Add to postgresql.conf (requires restart)
wal_level = logical
max_replication_slots = 1   -- or add 1 to your current value
shared_preload_libraries = 'pg_squeeze'

-- 2. Restart PostgreSQL, then create the extension
CREATE EXTENSION pg_squeeze;

After the restart and extension creation, pg_squeeze is ready to use — but it is not monitoring any tables yet. You need to register tables explicitly by inserting rows into squeeze.tables. The extension is diligent, not presumptuous. It will not touch a table it has not been asked to mind.

Registering tables for monitoring

Tables are registered by inserting rows into the squeeze.tables configuration table. Each row defines which table to monitor, how often to check it, and what bloat threshold should trigger compaction.

SQL
-- Register a table for automatic bloat monitoring
INSERT INTO squeeze.tables (tabschema, tabname, schedule, free_space_extra, min_size)
VALUES (
  'public',
  'orders',
  ('{0}', '{3}', NULL, NULL, '{0,1,2,3,4,5,6}'),  -- daily at 03:00
  50,   -- squeeze when 50%+ of space is reclaimable
  8     -- only process tables larger than 8 MB
);

Once registered, the background worker checks the table at the scheduled times. If the free space map indicates that reclaimable space exceeds free_space_extra percent and the table is at least min_size MB, a squeeze task is queued.

Schedule format

The schedule uses a cron-like composite type with five array fields: minutes, hours, days of month, months, and days of week. NULL means "any value" (wildcard).

Schedule examples
-- Schedule format: (minutes[], hours[], days_of_month[], months[], days_of_week[])
-- Days of week: 0 = Sunday, 1 = Monday, ..., 6 = Saturday

-- Every night at 02:00
('{0}', '{2}', NULL, NULL, NULL)

-- Weekdays at 22:30
('{30}', '{22}', NULL, NULL, '{1,2,3,4,5}')

-- First day of every month at 04:00
('{0}', '{4}', '{1}', NULL, NULL)

Advanced configuration

The full set of options for squeeze.tables gives fine-grained control over when and how tables are processed.

SQL
-- Register with all options
INSERT INTO squeeze.tables (
  tabschema,
  tabname,
  schedule,
  free_space_extra,   -- bloat threshold percentage (default: 50)
  min_size,           -- minimum table size in MB (default: 8)
  vacuum_max_age,     -- max time since last VACUUM for FSM freshness (default: 1 hour)
  max_retry,          -- retry attempts on failure (default: 0)
  clustering_index,   -- optional: reorder rows by this index
  skip_analyze        -- skip ANALYZE after squeeze (default: false)
) VALUES (
  'public',
  'events',
  ('{0}', '{4}', NULL, NULL, '{0,6}'),  -- weekends at 04:00
  30,                  -- more aggressive: squeeze at 30% reclaimable
  64,                  -- only bother with tables over 64 MB
  '30 minutes',        -- require fresher FSM data
  2,                   -- retry twice on failure
  'events_created_at_idx',
  false
);

Manual squeeze and monitoring

You do not have to rely on the scheduler. If a matter requires immediate attention, the squeeze.squeeze_table() function lets you compact a table on the spot. And the squeeze.log table records the history of all operations — because good staff keep records.

SQL
-- Squeeze a table immediately (no schedule needed)
SELECT squeeze.squeeze_table('public', 'orders');

-- Squeeze and physically reorder rows by an index
SELECT squeeze.squeeze_table('public', 'orders', 'orders_created_at_idx');
SQL
-- Check registered tables and their configuration
SELECT tabschema, tabname, free_space_extra, min_size, schedule
FROM squeeze.tables;

-- View completed and in-progress tasks
SELECT * FROM squeeze.log ORDER BY started DESC LIMIT 20;

-- Check worker status
SELECT squeeze.start_worker();   -- start the background worker
SELECT squeeze.stop_worker();    -- stop the background worker

Cloud availability

ProviderStatus
Amazon RDS / AuroraNot available
Google Cloud SQLAvailable — enable via the cloudsql.enable_pg_squeeze database flag
Azure Database for PostgreSQLAvailable — add to shared_preload_libraries and allowlist via azure.extensions
SupabaseNot available
NeonNot available

pg_squeeze requires shared_preload_libraries access and wal_level = logical, which limits its availability on managed platforms. Self-hosted and VM-based deployments have no restrictions.

How Gold Lapel relates

Allow me to draw a distinction. Gold Lapel and pg_squeeze attend to different floors of the same household. I operate at the query level — tracking execution patterns, identifying sequential scans on bloated tables, surfacing the queries that are quietly paying the cost of physical disarray. pg_squeeze operates at the storage level, physically removing the bloat by rebuilding the table.

The two are complementary, and I am glad of the help. I can tell you that a table is hurting performance; pg_squeeze can address the underlying physical cause. My optimizations — materialized views, index recommendations — help queries perform well even when some bloat is present, while pg_squeeze ensures the storage itself stays compact over time.

If you are running both, I benefit from the reduced I/O that comes with compacted tables. And pg_squeeze benefits from my awareness of your workload — knowing which tables matter most helps you prioritize which ones to register for monitoring. A household runs best when the staff communicate.

Frequently asked questions