HypoPG: Test PostgreSQL Indexes Without Creating Them
A proper dress rehearsal for your indexes — no stage, no audience, no commitment.
The problem: real indexes are expensive to test
You have a slow query. You suspect an index would help. The question is: which index?
A single-column B-tree on the filtered column? A composite index covering the WHERE clause and the ORDER BY? A partial index that targets only the active rows? Each choice has different implications for query speed, disk usage, and write overhead. And the only way to know which one the PostgreSQL query planner would actually use is to create it and run EXPLAIN ANALYZE.
That is the problem. Creating a real index on a large table is not free. On a table with 50 million rows, CREATE INDEX scans the entire table, sorts the data, writes the resulting B-tree pages to disk, and generates WAL for each page. Depending on table size and hardware, this can take minutes to hours. It holds locks that can interfere with concurrent operations. And if the index turns out to be useless — if the planner prefers a different access path — you drop it and try again, having spent the I/O for nothing.
Testing three candidate indexes means building and dropping three real indexes. On a busy production database, that is not a quick experiment. It is a planned maintenance window.
I find this state of affairs unsatisfactory. One should not have to rearrange the furniture to see whether it fits. HypoPG eliminates this cost entirely.
What HypoPG does
HypoPG is a PostgreSQL extension that creates hypothetical indexes — virtual indexes that exist only in the planner's cost model. They consume no disk space, generate no WAL, require no table scans, and are invisible to every session except the one that created them. Think of them as a dress rehearsal: the planner walks through the performance as if the index were on stage, but nothing has been built, nothing has been committed, and the audience — your production workload — is none the wiser.
When you create a hypothetical index with hypopg_create_index(), HypoPG registers it in the session's private memory. The PostgreSQL query planner sees it during EXPLAIN and factors it into cost estimates, exactly as it would for a real index. If the planner determines the hypothetical index produces a cheaper plan, it shows up in the EXPLAIN output. If not, the planner ignores it — the same decision it would make with the real thing.
The core workflow takes seconds:
- Run
EXPLAINon the slow query to see the current plan. - Create a hypothetical index with
hypopg_create_index(). - Run
EXPLAINagain. If the planner uses the hypothetical index and the cost drops, the real index is worth building. - Repeat with different candidates. Compare costs. Build only the winner.
No disk I/O. No locking. No WAL generation. No impact on other sessions. You can test dozens of index candidates in the time it would take to build one real index. Measure twice, cut once — or in this case, measure twelve times and cut only the one that fits.
Installation
HypoPG is a third-party extension, not part of PostgreSQL's contrib modules. Installation varies by platform:
- Debian / Ubuntu:
apt install postgresql-17-hypopg(replace 17 with your PostgreSQL major version) - RHEL / CentOS / Fedora:
dnf install hypopg_17 - PGXN:
pgxn install hypopg - From source: Clone the GitHub repository and run
make && make install
An important distinction from many extensions, and one I appreciate: HypoPG does not require a shared_preload_libraries entry. It loads on demand when you run CREATE EXTENSION. No server restart. No coordination with the on-call engineer. Simply install and proceed.
-- Install the extension
-- No shared_preload_libraries entry required. No server restart needed.
CREATE EXTENSION hypopg;
-- Verify it loaded
SELECT * FROM hypopg_list_indexes();
-- indexrelid | indexname | nspname | relname | amname
-- -----------+----------+---------+---------+--------
-- (0 rows) On managed platforms: HypoPG is available on Amazon RDS and Aurora (PostgreSQL 13.11+, 14.8+, 15.3+, and later), Azure Database for PostgreSQL, Supabase, and Neon. Google Cloud SQL does not currently support it (AlloyDB does).
Creating hypothetical indexes
The function hypopg_create_index() takes a standard CREATE INDEX statement as its argument — the same SQL you would use to build the real index. It returns the OID and name assigned to the hypothetical index. The interface is pleasingly direct: you hand it the index you are considering, and it tells you whether the planner would use it.
Start with the current plan to establish a baseline. Here is a query doing a sequential scan on a million-row orders table:
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- QUERY PLAN
-- -----------------------------------------------------------
-- Seq Scan on orders (cost=0.00..25412.00 rows=128 width=64)
-- Filter: (customer_id = 42) Cost 25,412 for 128 rows. The planner is reading every row in the table. Now create a hypothetical index:
SELECT * FROM hypopg_create_index(
'CREATE INDEX ON orders (customer_id)'
);
-- indexrelid | indexname
-- ------------+----------------------------------
-- 13543 | <13543>btree_orders_customer_id The hypothetical index exists. Run EXPLAIN again:
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- QUERY PLAN
-- -----------------------------------------------------------------------
-- Index Scan using <13543>btree_orders_customer_id on orders
-- (cost=0.30..8.45 rows=128 width=64)
-- Index Cond: (customer_id = 42) The planner switched from a sequential scan (cost 25,412) to an index scan (cost 8.45). A three-thousand-fold cost reduction, determined in milliseconds, without touching a single data page. The rehearsal went well.
The angle brackets in the index name (<13543>) indicate a hypothetical index. This naming convention makes it easy to distinguish hypothetical indexes from real ones in plan output.
Why EXPLAIN, not EXPLAIN ANALYZE
Allow me to address the most common mistake people make with HypoPG, because it is the kind of error that produces deeply confusing results.
EXPLAIN shows the planner's estimated plan — what it would do. EXPLAIN ANALYZE actually executes the query and reports what happened. The difference matters because hypothetical indexes do not physically exist. The planner can consider them during planning, but the executor cannot use them during execution.
When you run EXPLAIN ANALYZE with a hypothetical index present, PostgreSQL plans the query (potentially selecting the hypothetical index), then attempts to execute it. The executor discovers the index does not exist and falls back to the real execution path — typically a sequential scan. The result is confusing: the plan may not mention the hypothetical index at all, or the timings reflect the fallback path rather than the hypothetical one.
-- EXPLAIN works: the planner considers the hypothetical index
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- Index Scan using <13543>btree_orders_customer_id on orders ...
-- EXPLAIN ANALYZE executes the query.
-- The hypothetical index does not physically exist,
-- so PostgreSQL cannot use it for actual execution.
-- The plan falls back to the real execution path:
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
-- Seq Scan on orders (cost=0.00..25412.00 rows=128 width=64)
-- (actual time=0.03..142.87 rows=128 loops=1)
-- Filter: (customer_id = 42)
-- Rows Removed by Filter: 999872 The rule is simple: always use EXPLAIN (without ANALYZE) when testing hypothetical indexes. The cost estimates from the planner are reliable indicators of whether the real index would be selected and how much it would improve the query. You do not need actual execution to validate the index choice — the planner's cost model is the same model that will make the decision when the real index exists. Trust the rehearsal. That is the entire point.
Estimating index size before building
Knowing whether an index improves a query is half the decision. The other half is knowing what it will cost you. A covering index that drops query cost from 25,000 to 8 is appealing — but less so if it adds 12 GB to your disk footprint on a table that receives heavy writes. A responsible recommendation accounts for both sides of the ledger.
HypoPG provides hypopg_relation_size() to estimate the on-disk size of a hypothetical index without building it:
-- Estimate how large the hypothetical index would be on disk
SELECT pg_size_pretty(hypopg_relation_size(13543));
-- pg_size_pretty
-- ----------------
-- 2544 kB
-- For reference, check the table size
SELECT pg_size_pretty(pg_relation_size('orders'));
-- pg_size_pretty
-- ----------------
-- 806 MB The estimate uses the same internal logic PostgreSQL uses for real index size predictions, based on table statistics. For the most accurate results, run ANALYZE on the table before creating the hypothetical index, so the statistics are current.
You can also inspect all hypothetical indexes and retrieve their definitions:
-- List all hypothetical indexes in the current session
SELECT indexrelid, indexname, amname
FROM hypopg_list_indexes();
-- indexrelid | indexname | amname
-- ------------+----------------------------------+--------
-- 13543 | <13543>btree_orders_customer_id | btree
-- 13544 | <13544>btree_orders_status | btree
-- Get the CREATE INDEX statement for any hypothetical index
SELECT hypopg_get_indexdef(13543);
-- CREATE INDEX ON public.orders USING btree (customer_id) The hypopg_get_indexdef() function returns the CREATE INDEX statement that would create the real index — ready to copy and execute once you have validated the candidate.
Supported index types
I should note that HypoPG does not support every PostgreSQL access method. The absence of GiST and GIN is the most notable gap. The supported types are:
| Access method | Status | Notes |
|---|---|---|
| B-tree | Supported | The default. Equality, range, and ORDER BY queries. |
| BRIN | Supported | Block Range Indexes for large, naturally ordered tables. |
| Hash | Supported | Equality-only lookups. Requires PostgreSQL 10 or later. |
| Bloom | Supported | Multi-column equality filters. Requires the bloom extension. |
| GiST | Not supported | Geometric, full-text, range type indexes cannot be tested with HypoPG. |
| GIN | Not supported | Full-text search, JSONB, and array indexes cannot be tested with HypoPG. |
Within the supported access methods, HypoPG handles composite indexes, partial indexes (with WHERE clauses), and covering indexes (with INCLUDE columns). Here are the access method examples:
B-tree
-- B-tree: equality and range queries (the default)
SELECT * FROM hypopg_create_index(
'CREATE INDEX ON orders (customer_id)'
); BRIN
-- BRIN: large tables with naturally ordered data
-- Tiny index, good for append-only tables with correlated columns
SELECT * FROM hypopg_create_index(
'CREATE INDEX ON events USING brin (created_at)'
); BRIN indexes are a fraction of the size of B-tree indexes on large tables — often less than 1% of the table size. If your table has a naturally ordered column (like a timestamp on an append-only events table), a BRIN index can be a dramatically more space-efficient choice. HypoPG lets you verify that the planner would actually use it before committing.
Hash
-- Hash: equality-only lookups (PostgreSQL 10+)
SELECT * FROM hypopg_create_index(
'CREATE INDEX ON sessions USING hash (session_token)'
); Hash indexes support only equality operations (=). They cannot satisfy range queries, ORDER BY, or IS NULL checks. They are smaller than B-tree indexes for the same column but limited in utility. Since PostgreSQL 10, hash indexes are WAL-logged and crash-safe.
Bloom
-- Bloom: multi-column equality filters on wide tables
-- Requires the bloom extension to be installed
CREATE EXTENSION IF NOT EXISTS bloom;
SELECT * FROM hypopg_create_index(
'CREATE INDEX ON products USING bloom (category_id, brand_id, color_id)'
); Bloom indexes are useful when queries filter on various combinations of columns from a wide table. A single bloom index on five columns can serve queries that filter on any two or three of those columns — where B-tree would require separate composite indexes for each combination. The trade-off is that bloom indexes only support equality checks and have a higher false-positive rate than B-tree.
Advanced B-tree variants
Composite, partial, and covering indexes all use the B-tree access method. HypoPG supports all three:
-- Composite index
SELECT * FROM hypopg_create_index(
'CREATE INDEX ON orders (status, created_at DESC)'
);
-- Partial index
SELECT * FROM hypopg_create_index(
'CREATE INDEX ON orders (customer_id) WHERE status = ''pending'''
);
-- Covering index (INCLUDE)
SELECT * FROM hypopg_create_index(
'CREATE INDEX ON orders (customer_id) INCLUDE (total, created_at)'
); Real example: three candidates, one winner
If you will permit me, I would like to walk through a proper diagnostic. This is where HypoPG earns its keep — not in the documentation, but in the moment when you have a slow query, three plausible index designs, and no appetite for building all three on a production table to see which one the planner prefers.
The scenario: a query fetches recent high-value orders for a specific customer. The table has one million rows and no relevant indexes.
-- The slow query: find recent high-value orders for a customer
EXPLAIN SELECT id, total, created_at
FROM orders
WHERE customer_id = 7821
AND status = 'completed'
AND total > 100.00
ORDER BY created_at DESC
LIMIT 20;
-- QUERY PLAN
-- ------------------------------------------------------------------
-- Limit (cost=26841.22..26841.27 rows=20 width=20)
-- -> Sort (cost=26841.22..26841.38 rows=64 width=20)
-- Sort Key: created_at DESC
-- -> Seq Scan on orders (cost=0.00..26841.00 rows=64 width=20)
-- Filter: ((customer_id = 7821) AND (status = 'completed')
-- AND (total > 100.00)) Cost 26,841 with a sequential scan. Three columns in the WHERE clause, plus an ORDER BY. The question is not whether an index would help — that much is evident. The question is which index design gives the best result for the cost. Let us audition three candidates and see who deserves the role.
Candidate A: single-column index
-- Candidate A: single-column index on the highest-selectivity column
SELECT * FROM hypopg_create_index(
'CREATE INDEX ON orders (customer_id)'
);
EXPLAIN SELECT id, total, created_at FROM orders
WHERE customer_id = 7821 AND status = 'completed' AND total > 100.00
ORDER BY created_at DESC LIMIT 20;
-- Limit (cost=12.85..12.90 rows=20 width=20)
-- -> Sort (cost=12.85..12.87 rows=8 width=20)
-- Sort Key: created_at DESC
-- -> Index Scan using <13543>btree_orders_customer_id on orders
-- (cost=0.30..12.75 rows=8 width=20)
-- Index Cond: (customer_id = 7821)
-- Filter: ((status = 'completed') AND (total > 100.00))
SELECT pg_size_pretty(hypopg_relation_size(13543));
-- 2544 kB Cost drops from 26,841 to 12.90. A considerable improvement — but observe the plan closely. The planner uses the index to find the customer's orders and then applies the remaining filters (status, total) as post-index conditions. The index is 2.5 MB. And there is still an explicit sort node — the index does not provide the ORDER BY ordering. Promising, but not yet polished.
Candidate B: composite index
-- Reset and test candidate B
SELECT hypopg_reset();
-- Candidate B: composite index matching all WHERE clause columns
SELECT * FROM hypopg_create_index(
'CREATE INDEX ON orders (customer_id, status, created_at DESC)'
);
EXPLAIN SELECT id, total, created_at FROM orders
WHERE customer_id = 7821 AND status = 'completed' AND total > 100.00
ORDER BY created_at DESC LIMIT 20;
-- Limit (cost=8.47..8.48 rows=4 width=20)
-- -> Index Scan using <13544>btree_orders_customer_id_status_created_at
-- on orders (cost=0.30..8.47 rows=8 width=20)
-- Index Cond: ((customer_id = 7821) AND (status = 'completed'))
-- Filter: (total > 100.00)
SELECT pg_size_pretty(hypopg_relation_size(13544));
-- 4216 kB Cost drops further to 8.48. Now we are getting somewhere. The composite index on (customer_id, status, created_at DESC) satisfies both the WHERE clause and the ORDER BY. No explicit sort node in the plan — rows arrive in the correct order from the index itself. The index is 4.1 MB, roughly 65% larger than candidate A, but the elimination of the sort is worth noting.
Candidate C: covering index
-- Reset and test candidate C
SELECT hypopg_reset();
-- Candidate C: covering index — no heap access needed
SELECT * FROM hypopg_create_index(
'CREATE INDEX ON orders (customer_id, status, created_at DESC)
INCLUDE (id, total)'
);
EXPLAIN SELECT id, total, created_at FROM orders
WHERE customer_id = 7821 AND status = 'completed' AND total > 100.00
ORDER BY created_at DESC LIMIT 20;
-- Limit (cost=0.30..0.82 rows=4 width=20)
-- -> Index Only Scan using <13545>btree_orders_customer_id_status_created_at
-- on orders (cost=0.30..8.12 rows=8 width=20)
-- Index Cond: ((customer_id = 7821) AND (status = 'completed'))
-- Filter: (total > 100.00)
SELECT pg_size_pretty(hypopg_relation_size(13545));
-- 5765 kB Cost drops to 0.82. There it is. The covering index includes all columns the query needs (id and total via INCLUDE), so the planner uses an Index Only Scan — it never touches the heap table at all. The index is 5.6 MB. The query is answered entirely from the index, as if the table were not even in the room.
Comparison
| Candidate | Estimated cost | Scan type | Sort | Index size |
|---|---|---|---|---|
| No index (baseline) | 26,841 | Seq Scan | Explicit sort | - |
| A: (customer_id) | 12.90 | Index Scan | Explicit sort | 2.5 MB |
| B: (customer_id, status, created_at) | 8.48 | Index Scan | Index-ordered | 4.1 MB |
| C: B + INCLUDE (id, total) | 0.82 | Index Only Scan | Index-ordered | 5.6 MB |
Candidate C is the clear winner for read performance. The cost is 0.82 versus the baseline of 26,841 — a 32,000x improvement. The Index Only Scan means zero heap accesses, which matters under concurrent load when the buffer cache is contested.
The trade-off is index size and write amplification. Candidate C's covering index is 5.6 MB versus 2.5 MB for candidate A. Every INSERT and UPDATE to the indexed columns must maintain a larger index structure. For a read-heavy workload, candidate C is the right choice. For a write-heavy workload where this query runs infrequently, candidate A may be sufficient. There is no universal answer — but there is a right answer for your particular workload, and now you have the data to find it.
The entire comparison took seconds. Three candidates, each given a fair rehearsal, the winner identified with evidence rather than instinct. This is how index decisions should be made.
Workflow: pg_qualstats + HypoPG
HypoPG answers "would this index help?" But it does not tell you which indexes to test in the first place. A proper procedure requires both diagnosis and validation. That is where pg_qualstats comes in.
pg_qualstats is a companion extension (from the same team that builds HypoPG) that tracks which columns appear in WHERE clauses, JOIN conditions, and other predicates across your real workload. It identifies columns that are frequently filtered on but lack adequate index support — exactly the candidates you should then validate with HypoPG.
Unlike HypoPG, pg_qualstats does require shared_preload_libraries and a server restart:
-- pg_qualstats requires shared_preload_libraries (server restart)
-- postgresql.conf:
-- shared_preload_libraries = 'pg_qualstats, pg_stat_statements'
CREATE EXTENSION pg_qualstats;
CREATE EXTENSION pg_stat_statements; After pg_qualstats has collected data from your real workload (hours or days of representative traffic), it can generate index suggestions. The pg_qualstats_index_advisor() function produces CREATE INDEX statements for columns that would benefit from indexes:
-- Step 1: Let pg_qualstats collect data from real workload
-- (Run your application for a representative period)
-- Step 2: Check what pg_qualstats recommends
SELECT v
FROM json_array_elements(
pg_qualstats_index_advisor(min_filter => 50)->'indexes'
) v;
-- "CREATE INDEX ON public.orders (customer_id)"
-- "CREATE INDEX ON public.orders (status, created_at)"
-- "CREATE INDEX ON public.line_items (order_id, product_id)"
-- Step 3: Test each suggestion with HypoPG before building
SELECT * FROM hypopg_create_index(
'CREATE INDEX ON public.orders (customer_id)'
);
-- Step 4: Verify the planner would use it
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- Does the plan change? If yes, the index is worth building.
-- If no, skip it — save the disk space and write overhead.
-- Step 5: Build only the indexes that passed the test
CREATE INDEX CONCURRENTLY idx_orders_customer_id
ON orders (customer_id); The combined workflow is:
- pg_qualstats collects predicate data from your real workload — no guesswork about which queries matter.
- pg_qualstats_index_advisor() proposes candidate indexes based on observed access patterns.
- HypoPG tests each candidate without building it — confirming the planner would actually use the index and estimating its size.
- You build only the validated indexes, using
CREATE INDEX CONCURRENTLYto avoid locking the table.
This is the proper way to manage indexes: evidence-based identification, risk-free validation, and targeted implementation. Observe, rehearse, then commit. No guessing, no wasted builds, no "let us add an index and see what happens." That approach has always struck me as the database equivalent of buying a suit without trying it on.
Both pg_qualstats and HypoPG are components of the PoWA (PostgreSQL Workload Analyzer) suite, which provides a web interface that combines these steps into a single dashboard. If you prefer a graphical workflow, PoWA is worth investigating.
Cleanup and session scope
Hypothetical indexes are scoped to the session that created them. They are stored in the connection's private memory and are completely invisible to other sessions. When the session ends — whether by a normal disconnect, a connection pool recycle, or a crash — all hypothetical indexes vanish without a trace. The stage is struck automatically.
For explicit cleanup during a session:
-- Remove a specific hypothetical index
SELECT hypopg_drop_index(13543);
-- Remove all hypothetical indexes at once
SELECT hypopg_reset();
-- Or simply disconnect — all hypothetical indexes vanish with the session A practical pattern when testing multiple candidates: call hypopg_reset() between each test to ensure a clean slate. If you create hypothetical indexes A, B, and C simultaneously, the planner considers all three at once and may choose a combination that would not exist in practice. Test one at a time — create, EXPLAIN, reset, next candidate. Each candidate deserves an individual audition, not a group reading.
The session-scoped design has two important implications:
- Safety: there is no way to accidentally leave a hypothetical index "installed" that affects other users or application queries. Disconnect and everything is clean.
- No persistence: if you want to repeat a test, you must recreate the hypothetical indexes. There is no way to save them across sessions. This is by design — hypothetical indexes are a diagnostic tool, not a deployment mechanism.
How Gold Lapel automates this
The workflow described in this guide — identify slow queries, propose index candidates, validate with HypoPG, build the winners — is effective but manual. Someone needs to run pg_qualstats, review the suggestions, test each candidate, make the build/skip decision, and execute CREATE INDEX CONCURRENTLY. Then repeat periodically as workload patterns shift. It is, if I may say so, the sort of work that benefits from having someone attend to it full-time.
Gold Lapel performs this cycle continuously. The proxy observes every query in flight, identifies access patterns that would benefit from an index, evaluates candidates using the planner's cost model, and applies the winning index automatically. When a workload shifts — a new feature launches, a report query changes, traffic patterns evolve — Gold Lapel adapts without waiting for someone to notice and intervene.
HypoPG remains an excellent tool for ad-hoc investigation: diagnosing a specific slow query during development, testing an index idea during incident response, or exploring optimization options during a performance review. Gold Lapel handles the same class of problem in production, continuously, so that the dress rehearsal and the opening night are one and the same.