HypoPG in CI/CD: Automated Index Testing for PostgreSQL
Every migration that adds an index, automatically tested for impact before it merges.
The problem — deploying indexes blind
I have observed, in a great many production systems, a pattern that deserves frank discussion. Most teams deploy indexes based on intuition. Someone notices a slow query, examines the query plan, picks a column that appears in the WHERE clause, and adds an index. Perhaps they run EXPLAIN ANALYZE in a staging environment first. Perhaps they do not.
The index lands in production. If it helps, nobody notices — the slow query disappears and life continues. If it hurts, the team discovers it hours or days later through elevated CPU, increased replication lag, or user complaints. If it simply does nothing — the planner ignores it in favor of a better plan — it sits on disk consuming space and slowing writes indefinitely, because nobody goes back to check.
This gap between "proposed an index" and "verified it helps" is the problem. Indexes are not free. On large tables, CREATE INDEX acquires locks, consumes I/O, and can take minutes to hours. Even CREATE INDEX CONCURRENTLY generates sustained write amplification. Every index that exists adds overhead to every INSERT, UPDATE, and DELETE on the table. Deploying a useless or harmful index carries a cost that accrues quietly.
HypoPG addresses this gap with considerable elegance. It creates hypothetical indexes that exist only in the planner's memory. No disk writes, no locks, no maintenance overhead. The PostgreSQL planner evaluates them as if they were real and reports whether it would use them and what cost reduction it estimates.
Running HypoPG interactively is valuable. Integrating it into CI/CD is transformative. Every migration that adds an index is automatically tested for impact before it merges. The pull request includes a report showing exactly which queries improved, which did not, and whether the planner would actually use the proposed index. Deploying a useless index becomes a conscious decision, not an accident.
HypoPG fundamentals — a brief refresher
For those already acquainted with HypoPG, this is a brief refresher. For a comprehensive introduction, see the HypoPG guide.
Create a hypothetical index:
SELECT * FROM hypopg_create_index('CREATE INDEX ON orders (customer_id)'); indexrelid | indexname
------------+-----------------------------
205401 | <205401>btree_orders_customer_id The hypothetical index is visible only to the current session. It does not touch disk, does not acquire locks, and disappears when the session ends.
Check whether the planner would use the hypothetical index:
EXPLAIN SELECT * FROM orders WHERE customer_id = 42; QUERY PLAN
---------------------------------------------------------------------------
Index Scan using <205401>btree_orders_customer_id on orders
(cost=0.04..8.06 rows=1 width=48)
Index Cond: (customer_id = 42) An important note: use EXPLAIN, not EXPLAIN ANALYZE. Hypothetical indexes cannot be used for actual execution — they exist only in the planner's cost estimation. For a deeper understanding of reading plan output, see the EXPLAIN output guide.
List and clean up hypothetical indexes:
SELECT * FROM hypopg_list_indexes(); SELECT hypopg_drop_index(indexrelid) FROM hypopg_list_indexes(); The CI/CD integration architecture
What you need
If you'll follow me, I should like to walk through the arrangement. The integration requires four components:
A PostgreSQL instance accessible from CI. Options include a Docker container spun up in the pipeline (simplest), a dedicated test database on a shared server, or a Neon branch created per pipeline run. The instance must have the current schema loaded — without the proposed index.
HypoPG installed on that instance. HypoPG requires no special permissions beyond CREATE EXTENSION and adds no overhead when not actively in use.
A set of representative queries. These are the queries the proposed index is meant to improve, plus other critical queries that could be affected. They can live in a dedicated file (e.g., queries/critical_queries.sql) or be extracted from your ORM's generated SQL.
A testing script. A script that creates hypothetical indexes, runs EXPLAIN on each query, compares costs, and reports results.
The testing flow
The pipeline follows a straightforward sequence:
- A migration file adds
CREATE INDEX ...— this is the change being tested. - The CI pipeline starts a PostgreSQL instance and loads the current schema (without the new index).
- The testing script parses the migration to extract the index definition.
- The script runs
EXPLAINon the critical query set — capturing baseline estimated costs. - The script creates hypothetical indexes matching the migration's index definitions via HypoPG.
- The script runs
EXPLAINagain on the same queries — capturing estimated costs with hypothetical indexes. - The script compares costs and generates a report: which queries improved, by how much, and whether the planner would actually use the index.
- The report is posted as a PR comment or saved as a pipeline artifact.
The entire process adds seconds to the pipeline. Seconds. HypoPG creates hypothetical indexes in microseconds, and EXPLAIN (without ANALYZE) returns instantly because it only estimates costs without executing the query.
GitHub Actions implementation
The workflow file
This workflow triggers on pull requests that modify migration files. It spins up a PostgreSQL container with HypoPG, applies the current schema, and runs the index testing script.
# .github/workflows/index-test.yml
name: Index Impact Test
on:
pull_request:
paths:
- 'migrations/**'
- 'db/migrate/**'
jobs:
test-indexes:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16
env:
POSTGRES_DB: index_test
POSTGRES_USER: ci_user
POSTGRES_PASSWORD: ci_password
ports:
- 5432:5432
options: >-
--health-cmd="pg_isready -U ci_user -d index_test"
--health-interval=10s
--health-timeout=5s
--health-retries=5
env:
DATABASE_URL: postgresql://ci_user:ci_password@localhost:5432/index_test
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install HypoPG
run: |
sudo apt-get update
sudo apt-get install -y postgresql-16-hypopg
psql "$DATABASE_URL" -c "CREATE EXTENSION IF NOT EXISTS hypopg;"
- name: Apply current schema
run: |
# Apply all existing migrations (adjust for your migration tool)
for f in migrations/*.sql; do
psql "$DATABASE_URL" -f "$f"
done
- name: Identify new index migrations
id: find-indexes
run: |
CHANGED_FILES=$(git diff --name-only origin/${{ github.base_ref }}...HEAD -- 'migrations/*.sql' 'db/migrate/*.rb')
echo "changed_files=$CHANGED_FILES" >> "$GITHUB_OUTPUT"
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install dependencies
run: pip install psycopg2-binary
- name: Run index impact test
id: index-test
run: |
python scripts/test_index_impact.py \
--database-url "$DATABASE_URL" \
--migration-files ${{ steps.find-indexes.outputs.changed_files }} \
--query-file queries/critical_queries.sql \
--output-format markdown \
--output-file index_report.md
- name: Post results as PR comment
if: always()
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const report = fs.readFileSync('index_report.md', 'utf8');
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body: report
});
- name: Fail on regression
run: |
python scripts/test_index_impact.py \
--database-url "$DATABASE_URL" \
--migration-files ${{ steps.find-indexes.outputs.changed_files }} \
--query-file queries/critical_queries.sql \
--check-regression \
--regression-threshold 10 The path filter (paths: migrations/**, db/migrate/**) ensures the workflow only runs when migration files change.
The index testing script
This Python script is the core of the pipeline. It accepts a migration file, extracts CREATE INDEX statements, and compares EXPLAIN costs with and without the hypothetical index.
#!/usr/bin/env python3
"""
test_index_impact.py — Tests index impact via HypoPG.
Compares EXPLAIN costs with and without hypothetical indexes.
"""
import argparse, json, re, sys
import psycopg2
def extract_index_statements(migration_file):
with open(migration_file, "r") as f:
content = f.read()
pattern = r"CREATE\s+(?:UNIQUE\s+)?INDEX\s+(?:CONCURRENTLY\s+)?(?:IF\s+NOT\s+EXISTS\s+)?\S+\s+ON\s+[^;]+;"
return re.findall(pattern, content, re.IGNORECASE | re.MULTILINE)
def get_explain_cost(cursor, query):
explain_query = query.replace("$1", "'1'").replace("$2", "'1'")
cursor.execute(f"EXPLAIN (FORMAT JSON) {explain_query}")
plan = cursor.fetchone()[0]
top_node = plan[0]["Plan"]
return {
"total_cost": top_node["Total Cost"],
"node_type": top_node["Node Type"],
"plan_text": json.dumps(plan, indent=2),
}
def compare_costs(baseline, with_index):
if baseline["total_cost"] == 0:
delta_pct = 0
else:
delta_pct = ((with_index["total_cost"] - baseline["total_cost"])
/ baseline["total_cost"]) * 100
return {
"baseline_cost": baseline["total_cost"],
"hypo_cost": with_index["total_cost"],
"delta_pct": delta_pct,
"improvement": delta_pct < 0,
"regression": delta_pct > 10,
"uses_index": "Index" in with_index["node_type"],
"baseline_node": baseline["node_type"],
"hypo_node": with_index["node_type"],
} Save the script as scripts/test_index_impact.py in your repository and make it executable.
Posting results as a PR comment
The workflow uses actions/github-script to post the markdown report directly as a PR comment. The report includes a table with query name, baseline cost, hypothetical cost, delta percentage, plan change description, and whether the planner uses the index. A summary line reports how many queries improved and the average cost change. Regression warnings flag any query whose cost increased above the threshold.
Maintaining the critical query set
Which queries to include
The critical query set is the list of queries tested against every proposed index. Choosing the right queries determines whether the pipeline catches real problems or produces noise — and I have strong views on the distinction.
Start with the queries that define your application's performance.
Top queries by total execution time. Query pg_stat_statements on your staging or production database:
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20; These 20 queries represent the largest share of your database's work. Any index change that affects them — positively or negatively — matters.
Queries sensitive to index changes. Look for queries with WHERE clauses on indexed columns, JOINs on foreign keys, and ORDER BY on indexed columns.
Queries that have caused production incidents. If a query has been the subject of a performance investigation, it belongs in the critical set permanently.
A focused set of 10-30 queries is manageable and meaningful. Include an annotation describing the expected access pattern:
-- name: Orders by customer
-- expected: Index Scan on orders(customer_id)
SELECT * FROM orders WHERE customer_id = $1 ORDER BY created_at DESC LIMIT 20;
-- name: Recent high-value orders
-- expected: Index Scan on orders(total, created_at)
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.total > 1000
AND o.created_at > NOW() - INTERVAL '30 days'; Keeping queries current
The critical query set drifts as the application evolves. Review the critical query set quarterly, or when pg_stat_statements shows a new query in the top 20 by execution time. Version the query file alongside the application code. It is part of your project's performance contract — as fundamental as the schema itself. A contract that is not maintained is not a contract.
Integrating with pg_qualstats for index recommendations
pg_qualstats tracks which columns appear in WHERE clauses, JOIN conditions, and ORDER BY during actual query execution. The integration creates a closed loop:
- pg_qualstats identifies candidate indexes on staging or production.
- A developer writes a migration to add the recommended index.
- HypoPG in CI validates the candidate. The pipeline creates a hypothetical version and tests it against the critical query set.
- The migration deploys only if validated. If the planner ignores the hypothetical index or if it causes regressions, the PR is flagged for review.
This workflow prevents a failure mode I have seen more often than I would prefer: pg_qualstats or a similar tool recommends an index based on predicate frequency, a developer adds it without testing, and it turns out the planner prefers a different plan entirely.
For the detailed pg_qualstats recommendation workflow, see the pg_qualstats + HypoPG guide.
Regression detection — catching index changes that hurt
What regression looks like
I'm afraid this section contains unwelcome news for those who view indexes as purely beneficial. Adding an index does not always improve performance. In some cases, it makes specific queries slower.
Low-cardinality trap. Adding an index on orders(status) where status has 3 distinct values across millions of rows. The planner might choose an Index Scan when a Sequential Scan with a filter would be faster:
-- Before: Sequential Scan (fast for low selectivity)
EXPLAIN SELECT * FROM orders WHERE status = 'shipped';
-- Seq Scan on orders (cost=0.00..45000.00 rows=1650000 width=48)
-- Filter: (status = 'shipped')
-- After adding index on status: Index Scan (slower)
EXPLAIN SELECT * FROM orders WHERE status = 'shipped';
-- Index Scan using idx_orders_status on orders (cost=0.43..52000.00 rows=1650000 width=48)
-- Index Cond: (status = 'shipped') The estimated cost increased. The planner chose the index, but the index is slower for this query because it returns too many rows.
For guidance on designing composite indexes and choosing the right index type, see the linked guides.
Automated regression flags
The CI script should flag any query where the estimated cost increases after the hypothetical index is added. Set a threshold — small cost fluctuations (under 10%) can be planner estimation noise:
# In the CI script
REGRESSION_THRESHOLD = 10.0 # percent
for result in results:
if result["delta_pct"] > REGRESSION_THRESHOLD:
print(f"REGRESSION: {result['query_name']} "
f"cost increased {result['delta_pct']:.1f}%") Block the PR on regression. The CI workflow's final step checks for regressions and exits with a non-zero code if any are found.
For cases where the regression is acceptable — the index helps 9 queries and hurts 1 insignificant one — use a "regression acknowledged" mechanism:
-- regression-acknowledged: query "low-value status check" regresses 15%
-- reason: this query runs <100 times/day and the cost increase is marginal
CREATE INDEX idx_orders_customer_status ON orders (customer_id, status); The key is making regressions visible and intentional, not silent. A known trade-off is a decision. An unknown trade-off is an accident waiting to present itself.
Beyond indexes — testing other schema changes
The EXPLAIN comparison technique is not limited to indexes. Any schema change that affects query plans can be tested the same way.
Partitioning. Compare EXPLAIN output before and after applying a partition scheme. Verify that partition pruning works for the critical queries:
-- Before partitioning: full table scan
EXPLAIN SELECT * FROM orders WHERE created_at > '2026-01-01';
-- Seq Scan on orders (cost=0.00..45000.00 rows=500000 width=48)
-- After partitioning by created_at range: partition pruning
EXPLAIN SELECT * FROM orders WHERE created_at > '2026-01-01';
-- Append (cost=0.00..5000.00 rows=500000 width=48)
-- -> Seq Scan on orders_2026_q1 (cost=0.00..2500.00 rows=250000 width=48)
-- -> Seq Scan on orders_2026_q2 (cost=0.00..2500.00 rows=250000 width=48) If the planner does not prune partitions for a critical query, the partition scheme needs adjustment before deployment.
Materialized views. Verify that queries intended to hit a materialized view actually use it.
Column additions. Adding wide columns to frequently scanned tables can affect plan choices by changing the cost of sequential scans.
The principle: schema changes that affect query planning should be tested against the critical query set before deployment. The cost of running EXPLAIN against 20-30 queries is negligible. The cost of discovering a plan regression in production at three in the morning is decidedly not.
Honest counterpoint — limitations of this approach
I should be honest about the boundaries of what this approach can guarantee. EXPLAIN without ANALYZE uses estimated costs, not actual execution times. Estimates can be wrong. The planner relies on table statistics (row counts, value distributions, null fractions) collected by ANALYZE. If the test database has stale statistics or unrepresentative data distribution, the cost estimates may not reflect production behavior.
The test database must have realistic data. An empty test database produces useless EXPLAIN estimates. A table with 100 rows will never trigger an index scan. Populate the test database with representative data: realistic row counts, realistic distributions, and up-to-date statistics. Running ANALYZE after loading data is essential.
HypoPG cannot test everything about an index. It validates that the planner would use the index and estimates a cost reduction. It cannot test index build time, index size on disk, write amplification, or impact on autovacuum and I/O.
This is a strong signal, not a proof. CI testing validates that the planner estimates a cost reduction and would use the proposed index. It does not guarantee the index will improve production performance. For high-stakes index changes on large tables, complement CI testing with a staging environment test using EXPLAIN ANALYZE on realistic data. CI catches the obvious mistakes automatically. Staging catches the subtle ones with human review. The two approaches are complementary, not redundant — and together, they represent a standard of care that your production database deserves.