PostgreSQL Missing Index Detection: Automatic Recommendations

Q: How do I find missing indexes in PostgreSQL without installing extensions?

pg_stat_user_tables is built into PostgreSQL and requires no extensions. It provides table-level statistics — seq_scan, idx_scan, seq_tup_read, n_live_tup — that reveal which tables are being scanned sequentially when they likely should not be. This is enough to identify candidate tables. For column-level detail on which predicates are unindexed, you need pg_qualstats. Without it, you can examine queries in pg_stat_statements and check which WHERE clauses target unindexed columns.

Q: Does pg_qualstats slow down my production database?

The overhead depends on the sampling rate. At the default rate (typically 0.5–1%), the overhead is negligible. At a 10% sampling rate, expect 1–3% overhead. At 100% sampling, overhead can reach 5–8% on query-heavy workloads. For most production deployments, a 10% sampling rate provides sufficient data to detect missing index patterns within a few hours while keeping overhead well within acceptable bounds.

Q: Can HypoPG test multi-column or partial indexes?

Yes. hypopg_create_index() accepts the same DDL syntax as CREATE INDEX, including multi-column indexes, partial indexes, expression indexes, and different index types (GIN, GiST). Each hypothetical index is evaluated by the planner as if it were a real index. You can create multiple hypothetical indexes simultaneously and compare which combination the planner prefers.

Q: How often should I check for missing indexes?

Check after any workload change: new feature deployments, traffic spikes, schema migrations, or changes to query patterns. For routine checks, a weekly or bi-weekly review of pg_stat_user_tables is a reasonable cadence. Reset statistics after each review to track fresh patterns. For production systems where workloads shift frequently, continuous monitoring is more effective than periodic checks.

Q: What is the difference between pg_qualstats and pg_stat_statements for index detection?

pg_stat_statements tracks query-level performance metrics: total execution time, number of calls, rows returned. It tells you which queries are slow. pg_qualstats tracks predicate-level information: which WHERE clause conditions are evaluated, how often, and whether the filtered columns are indexed. It tells you which specific columns are causing sequential scans. Used together, they connect "this query is slow" to "this column needs an index."

Your database has been cataloging its own performance characteristics since the moment you started it. The data was always there, waiting for someone to ask the right questions.

The Butler of Gold Lapel · March 26, 2026 · 18 min read

An illustration was commissioned depicting a magnifying glass held over a query plan. The artist delivered a stock photo of a man pointing at a monitor. We have parted ways.

The signals your database is already sending

I should tell you something encouraging: your database already knows which indexes it wishes it had. PostgreSQL tracks every table access in detail — every sequential scan, every index scan, every row fetched. These statistics accumulate quietly in system catalogs, available to anyone who takes a moment to query them. Most teams have not had reason to look — until now.

The starting point is pg_stat_user_tables. This view exposes per-table statistics that reveal how PostgreSQL is accessing your data. Two columns matter most for missing index detection:

seq_scan — the number of sequential scans initiated against the table
idx_scan — the number of index scans initiated against the table

A table with a high seq_scan count and a low or zero idx_scan count is a candidate for investigation. If PostgreSQL is repeatedly scanning the entire table to satisfy queries, a missing index may be the cause.

If you'll permit me, a quick scan of your database:

SELECT
  schemaname,
  relname,
  seq_scan,
  idx_scan,
  seq_tup_read,
  n_live_tup
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 20;

This query surfaces the tables reading the most rows via sequential scans. But — and this is worth noting before we proceed — a high seq_scan count alone is not proof of a missing index. Several perfectly legitimate situations produce high sequential scan counts:

Small tables. A table with a few hundred rows fits in one or two pages. A sequential scan is faster than an index lookup because there is no index traversal overhead.
Full-table analytics. Aggregate queries (COUNT(*), SUM(), reporting dashboards) are expected to scan every row. No index helps a query that needs all the data. For precomputing these results, see materialized views.
Bulk operations. ETL pipelines and batch jobs that process entire tables will produce sequential scans by design.

The threshold heuristic: investigate when a table has a large n_live_tup count (thousands of rows or more), a high seq_tup_read value, and a seq_scan to idx_scan ratio that suggests most accesses are unindexed. Tables with fewer than roughly 1,000 rows can generally be left in peace — a sequential scan on a small table is often the fastest path available.

Step 1 — Identify candidates with pg_stat_user_tables

Good news first: pg_stat_user_tables is a built-in view — no extensions required, no configuration, no restart. It has been collecting data since your database started. The relevant columns (documented in the PostgreSQL statistics collector documentation):

Column	Description
`seq_scan`	Number of sequential scans initiated on the table
`seq_tup_read`	Total rows read by sequential scans
`idx_scan`	Number of index scans initiated on the table
`idx_tup_fetch`	Total rows fetched by index scans
`n_live_tup`	Estimated number of live rows in the table

The most useful diagnostic query ranks tables by "wasted sequential reads" — the volume of rows read sequentially on tables large enough that an index should be serving those reads:

SELECT
  relname AS table_name,
  seq_scan,
  seq_tup_read,
  idx_scan,
  n_live_tup,
  ROUND(
    CASE WHEN idx_scan > 0
      THEN seq_scan::numeric / idx_scan
      ELSE seq_scan
    END, 2
  ) AS seq_to_idx_ratio,
  pg_size_pretty(pg_relation_size(schemaname || '.' || relname)) AS table_size
FROM pg_stat_user_tables
WHERE n_live_tup > 5000
  AND seq_scan > 100
ORDER BY seq_tup_read DESC
LIMIT 15;

Example output:

pg_stat_user_tables output

 table_name  | seq_scan | seq_tup_read | idx_scan | n_live_tup | seq_to_idx_ratio | table_size
-------------+----------+--------------+----------+------------+------------------+-----------
 orders      |    48201 |   7230150000 |    12840 |     150000 |             3.75 | 98 MB
 line_items  |    31045 |   4967200000 |      980 |     620000 |            31.68 | 210 MB
 sessions    |    22890 |    915600000 |   285000 |      40000 |             0.08 | 18 MB
 audit_log   |     8450 |   2535000000 |        0 |     300000 |          8450.00 | 340 MB

Allow me to walk through this:

orders has a moderate ratio (3.75) but a very high seq_tup_read. Worth investigating — some queries on this large table are not finding their way to an index.
line_items has a high ratio (31.68) on a large table. This one warrants prompt attention.
sessions has a low ratio (0.08) — most accesses are indexed. The sequential scans are likely legitimate (cleanup jobs, analytics). No cause for concern.
audit_log has zero index scans and 300,000 rows. I'm afraid this table is working rather harder than it needs to. Either it has no indexes at all, or every query against it is doing a full scan.

Resetting statistics

One detail worth noting: statistics accumulate since the last reset (or since the database was created). If your application's workload has changed significantly — a new version deployed, a feature removed, traffic patterns shifted — the accumulated statistics may reflect outdated access patterns rather than current ones. For a broader diagnostic workflow, see Your PostgreSQL Query Is Slow.

Resetting statistics

SELECT pg_stat_reset();

-- Or reset a single table:
SELECT pg_stat_reset_single_table_counters('orders'::regclass);

Reset statistics after major workload changes, then let the new patterns accumulate for at least 24–48 hours (or one full traffic cycle) before drawing conclusions.

Limitations

pg_stat_user_tables tells you which tables are being scanned sequentially, but it does not tell you which columns or predicates are causing those scans. It identifies the rooms that need attention — but not the specific furnishings that are out of place. For that, you need pg_qualstats.

Step 2 — Discover missing predicates with pg_qualstats

Now we arrive at the more revealing instrument. pg_qualstats is a PostgreSQL extension (available on GitHub) that records WHERE clause predicates and JOIN conditions at runtime. Where pg_stat_user_tables tells you which tables need indexes, pg_qualstats tells you which columns those indexes should cover.

Installation and configuration

pg_qualstats must be loaded as a shared preload library, which requires a server restart:

postgresql.conf

# postgresql.conf
shared_preload_libraries = 'pg_qualstats'

# Optional: configure sampling rate (default is 1/max_connections)
pg_qualstats.sample_rate = 0.1    # Sample 10% of queries
pg_qualstats.track_constants = on  # Track actual filter values

After restarting PostgreSQL, create the extension:

CREATE EXTENSION IF NOT EXISTS pg_qualstats;

Querying pg_qualstats

The core query joins pg_qualstats_all with system catalogs to produce human-readable output showing which predicates are used most often and whether they are backed by an index:

SELECT
  qs.relid::regclass AS table_name,
  a.attname AS column_name,
  qs.opno::regoper AS operator,
  qs.occurences AS occurrences,
  qs.execution_count,
  qs.nbfiltered AS rows_filtered,
  CASE
    WHEN EXISTS (
      SELECT 1 FROM pg_index pi
      JOIN pg_attribute pa ON pa.attrelid = pi.indrelid
        AND pa.attnum = ANY(pi.indkey)
      WHERE pi.indrelid = qs.relid
        AND pa.attnum = qs.attnum
    ) THEN 'indexed'
    ELSE 'NOT INDEXED'
  END AS index_status
FROM pg_qualstats_all qs
JOIN pg_attribute a ON a.attrelid = qs.relid AND a.attnum = qs.attnum
WHERE qs.relid IS NOT NULL
ORDER BY qs.occurences DESC
LIMIT 20;

Example output:

pg_qualstats output

 table_name | column_name | operator | occurrences | execution_count | rows_filtered | index_status
------------+-------------+----------+-------------+-----------------+---------------+-------------
 orders     | customer_id | =        |       48200 |           48200 |       7230000 | NOT INDEXED
 orders     | status      | =        |       31000 |           31000 |        620000 | indexed
 line_items | order_id    | =        |       29800 |           29800 |       4768000 | NOT INDEXED
 audit_log  | created_at  | >=       |        8400 |            8400 |       2520000 | NOT INDEXED
 audit_log  | entity_type | =        |        8400 |            8400 |       2520000 | NOT INDEXED

Now the picture becomes considerably clearer:

orders.customer_id is filtered on 48,200 times and has no index. There is your primary culprit from the pg_stat_user_tables analysis — confirmed.
orders.status is already indexed. No action needed here.
line_items.order_id is filtered on 29,800 times without an index. This explains the high sequential scan count on line_items.
audit_log.created_at and audit_log.entity_type both appear 8,400 times — these are very likely used together in the same queries, which suggests a composite index candidate.

Interpreting multi-column candidates

When multiple columns from the same table appear with the same occurences count, they are likely filtered together in the same queries. pg_qualstats tracks predicates individually, but correlated occurrence counts reveal composite index opportunities.

In the example above, audit_log.created_at and audit_log.entity_type both have exactly 8,400 occurrences. This strongly suggests queries like:

SELECT * FROM audit_log
WHERE entity_type = 'order'
  AND created_at >= '2026-01-01';

A composite index on (entity_type, created_at) would serve this pattern better than two separate single-column indexes. Column order matters: the equality column (entity_type) should come first, followed by the range column (created_at). For more on this decision, see the composite indexes guide.

Performance overhead

A reasonable question: what does this cost? pg_qualstats adds overhead to query execution because it inspects predicate nodes in every sampled query's plan. The overhead depends on the sampling rate:

Default sampling rate (1/max_connections, typically 0.5–1%): overhead is negligible, usually immeasurable.
10% sampling rate: overhead is typically 1–3% of total query execution time. Acceptable for most production workloads.
100% sampling rate (sample_rate = 1.0): overhead can reach 5–8% on query-intensive workloads. Use this only for short diagnostic sessions, not permanently in production.

For production use, a 10% sampling rate provides enough data to identify patterns within a few hours of normal traffic.

Combining pg_stat_user_tables + pg_qualstats

The complete workflow uses both views together: pg_stat_user_tables identifies which tables are suffering from sequential scans, and pg_qualstats identifies which predicates on those tables are unindexed.

A combined query that produces a candidate index list:

Combined candidate query

SELECT
  t.relname AS table_name,
  a.attname AS column_name,
  qs.opno::regoper AS operator,
  qs.occurences AS predicate_uses,
  t.seq_scan AS table_seq_scans,
  t.seq_tup_read AS rows_scanned_sequentially,
  t.n_live_tup AS table_rows
FROM pg_qualstats_all qs
JOIN pg_stat_user_tables t ON t.relid = qs.relid
JOIN pg_attribute a ON a.attrelid = qs.relid AND a.attnum = qs.attnum
WHERE NOT EXISTS (
  SELECT 1 FROM pg_index pi
  JOIN pg_attribute pa ON pa.attrelid = pi.indrelid
    AND pa.attnum = ANY(pi.indkey)
  WHERE pi.indrelid = qs.relid
    AND pa.attnum = qs.attnum
)
AND t.n_live_tup > 5000
AND qs.occurences > 100
ORDER BY qs.occurences * t.seq_tup_read DESC
LIMIT 10;

This surfaces the highest-impact missing index candidates: columns that are filtered frequently (occurences) on tables where sequential scans are reading the most rows (seq_tup_read).

For a deeper dive on pg_qualstats configuration and advanced usage, see the pg_qualstats how-to guide.

Step 3 — Validate with HypoPG before you commit

Before we proceed to creating anything, a word of caution. Indexes are not free. Each one consumes disk space, slows down INSERT, UPDATE, and DELETE operations (because the index must be maintained), and adds to vacuum workload. Creating an index based on a hunch — without validating that the query planner will actually use it — risks adding cost without benefit. One does not furnish a room without first measuring the space.

HypoPG (available on GitHub) solves this neatly by creating hypothetical indexes that exist only in the planner's metadata. The planner considers them when generating query plans, but no physical index is created. You can test whether an index would help before committing to the disk and write overhead.

Installation

CREATE EXTENSION IF NOT EXISTS hypopg;

HypoPG does not require shared_preload_libraries — it can be installed and used immediately without a server restart.

Basic workflow

Create a hypothetical index, then run EXPLAIN on the query you want to optimize:

-- Create a hypothetical index on orders(customer_id)
SELECT * FROM hypopg_create_index('CREATE INDEX ON orders (customer_id)');

HypoPG output

 indexrelid |          indexname
------------+-----------------------------
     205462 | <205462>btree_orders_customer_id

Now run EXPLAIN on the problem query:

EXPLAIN SELECT * FROM orders WHERE customer_id = 8472;

With hypothetical index

                                       QUERY PLAN
---------------------------------------------------------------------------------
 Index Scan using <205462>btree_orders_customer_id on orders
   (cost=0.05..8.07 rows=1 width=120)
   Index Cond: (customer_id = 8472)

Compare this to the plan without the hypothetical index (drop it first):

SELECT * FROM hypopg_drop_index(205462);

EXPLAIN SELECT * FROM orders WHERE customer_id = 8472;

Without hypothetical index

                          QUERY PLAN
--------------------------------------------------------------
 Seq Scan on orders  (cost=0.00..28734.00 rows=18 width=120)
   Filter: (customer_id = 8472)

The cost dropped from 28,734 to 8.07 — a 3,500x reduction. The planner switched from a sequential scan to an index scan. Quite the improvement. This hypothetical index is worth creating.

Testing multi-column and partial indexes

HypoPG accepts the same syntax as CREATE INDEX, including composite and partial indexes:

-- Hypothetical composite index
SELECT * FROM hypopg_create_index(
  'CREATE INDEX ON audit_log (entity_type, created_at)'
);

-- Hypothetical partial index
SELECT * FROM hypopg_create_index(
  'CREATE INDEX ON orders (created_at) WHERE status = ''pending'''
);

Test each with EXPLAIN to see if the planner uses them. For more HypoPG examples and advanced usage, see the HypoPG how-to guide.

Cleanup

Remove all hypothetical indexes when you are done testing:

SELECT hypopg_reset();

Limitations

I should be forthcoming about what HypoPG cannot tell you. Hypothetical indexes inform the planner's cost model but do not account for several real-world factors:

I/O patterns. A real index changes the ratio of sequential to random I/O. HypoPG cannot simulate the actual I/O behavior.
Cache effects. Frequently accessed indexes stay in the buffer cache, improving performance beyond what cost estimates predict. HypoPG cannot model cache warming.
Write amplification. Every INSERT, UPDATE, and DELETE must also update the index. HypoPG does not account for this cost because it only evaluates read queries.
Index build time. Creating an index on a 100 million row table may take minutes and lock the table (unless you use CREATE INDEX CONCURRENTLY). HypoPG does not simulate the build process.

HypoPG answers the question "will the planner use this index?" — not "will this index improve overall system performance?" The former is a necessary condition for the latter, but not sufficient.

The complete workflow — from signal to index

Allow me to bring this together with a complete example. We will walk through the entire pipeline on a concrete case: an e-commerce application with an orders table that has become slow as the dataset has grown.

The schema

orders table

CREATE TABLE orders (
  id BIGSERIAL PRIMARY KEY,
  customer_id BIGINT NOT NULL,
  status VARCHAR(20) NOT NULL DEFAULT 'pending',
  total_amount NUMERIC(12,2) NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Existing indexes: only the primary key
-- Table has 1.5 million rows

Step 1: Identify the table

SELECT relname, seq_scan, seq_tup_read, idx_scan, n_live_tup
FROM pg_stat_user_tables
WHERE relname = 'orders';

pg_stat_user_tables output

 relname | seq_scan | seq_tup_read | idx_scan | n_live_tup
---------+----------+--------------+----------+-----------
 orders  |    48201 |   7230150000 |    12840 |    1500000

48,201 sequential scans on a 1.5 million row table, reading 7.2 billion rows total. This table has been working extraordinarily hard on your behalf. It deserves our attention.

Step 2: Find the predicates

SELECT
  a.attname AS column_name,
  qs.opno::regoper AS operator,
  qs.occurences,
  qs.execution_count
FROM pg_qualstats_all qs
JOIN pg_attribute a ON a.attrelid = qs.relid AND a.attnum = qs.attnum
WHERE qs.relid = 'orders'::regclass
ORDER BY qs.occurences DESC;

pg_qualstats output

 column_name | operator | occurences | execution_count
-------------+----------+------------+----------------
 customer_id | =        |      38400 |           38400
 status      | =        |       9200 |            9200
 created_at  | >=       |       4800 |            4800
 created_at  | <=       |       4800 |            4800

customer_id equality lookups account for the majority of sequential scans. created_at range queries are also significant.

Step 3: Validate with HypoPG

Test a B-tree index on customer_id:

SELECT * FROM hypopg_create_index('CREATE INDEX ON orders (customer_id)');

Run EXPLAIN on the representative query. Note: use EXPLAIN without ANALYZE here — HypoPG indexes exist only in the planner's metadata and are not visible to the executor:

EXPLAIN SELECT * FROM orders WHERE customer_id = 8472;

With hypothetical index:

With hypothetical index

                                       QUERY PLAN
---------------------------------------------------------------------------------
 Index Scan using <205462>btree_orders_customer_id on orders
   (cost=0.05..12.42 rows=15 width=120)
   Index Cond: (customer_id = 8472)

Without hypothetical index (after dropping it):

Without hypothetical index

                          QUERY PLAN
--------------------------------------------------------------
 Seq Scan on orders  (cost=0.00..38921.00 rows=15 width=120)
   Filter: (customer_id = 8472)

The estimated cost dropped from 38,921 to 12.42 — a 3,100x reduction. The planner switched from a sequential scan to an index scan. The evidence is clear.

To confirm the real-world impact, create the actual index and then run EXPLAIN ANALYZE:

CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id);
ANALYZE orders;

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE customer_id = 8472;

EXPLAIN ANALYZE output

Index Scan using idx_orders_customer_id on orders
  (cost=0.43..12.42 rows=15 width=120)
  (actual time=0.04..0.09 rows=12 loops=1)
  Index Cond: (customer_id = 8472)
  Buffers: shared hit=4
Planning Time: 0.22 ms
Execution Time: 0.14 ms

Execution time: 0.14 ms, reading 4 buffer pages. Compare this to the sequential scan that was reading 30,000+ pages. One index. A 3,100x improvement. This index is very much worth keeping.

For more on interpreting EXPLAIN output, see the EXPLAIN ANALYZE guide.

Step 4: Consider additional indexes

With the customer_id index confirmed and created, review the remaining pg_qualstats output for additional opportunities.

When to use a composite index instead

If queries frequently filter on multiple columns together, a composite index may be more effective than separate single-column indexes. From the pg_qualstats data, if status and created_at consistently appear in the same queries:

-- Common query pattern
SELECT * FROM orders
WHERE status = 'pending'
  AND created_at >= '2026-01-01'
  AND created_at < '2026-02-01';

A composite index serves this pattern efficiently:

CREATE INDEX CONCURRENTLY idx_orders_status_created
  ON orders (status, created_at);

The equality column (status) comes first, the range column (created_at) second. This ordering allows the index to narrow to the matching status value and then perform a range scan on the date. See the composite indexes guide for the full rationale on column ordering.

When to use a partial index instead

If the vast majority of queries against orders filter for a specific status value, a partial index covers only the relevant rows and is smaller and faster:

CREATE INDEX CONCURRENTLY idx_orders_pending_created
  ON orders (created_at)
  WHERE status = 'pending';

This index is smaller than a full index on (status, created_at) because it excludes all rows that are not pending. If 95% of your queries are looking for pending orders, this partial index is a better choice. If queries target multiple status values, a full composite index is more appropriate.

For more on choosing the right index type, see the index types guide.

What Gold Lapel automates

The workflow above — identify candidate tables, discover unindexed predicates, validate with hypothetical indexes, create the index — is effective but manual. It requires installing extensions, running diagnostic queries, interpreting results, and repeating the process whenever workloads change. It is excellent work, and I would encourage you to do it at least once for the understanding it builds.

Gold Lapel runs this same pipeline continuously as a PostgreSQL proxy. Because it sits between the application and the database, it observes every query in real time — not a statistical sample, but the actual workload. This gives it two advantages over the manual approach:

Workload-aware recommendations. The proxy sees which queries run most often, which are slowest, and how query patterns change over time. It weighs index recommendations by actual query frequency and impact, not just predicate occurrence counts.
Continuous monitoring. New features, traffic changes, and schema migrations constantly shift which indexes are needed. The proxy detects these shifts automatically rather than waiting for someone to run diagnostic queries.

The manual approach taught in this article is the right starting point. It works, it is free, and it builds the intuition needed to evaluate any tool's recommendations — including ours. For teams who want the same pipeline running hands-free in production, Gold Lapel is available.

Common mistakes and edge cases

I should mention several patterns I encounter frequently. None of these are causes for self-reproach — they are the natural result of working with incomplete information, which is precisely what this article aims to correct.

Creating indexes on low-cardinality columns

A column like is_active (boolean) or status (with three possible values) has low cardinality. An index on a low-cardinality column is rarely useful for equality lookups because each value matches a large fraction of the table. The planner will often choose a sequential scan even when the index exists, because the cost of random I/O to fetch 30% of the table via index exceeds the cost of a sequential scan. The PostgreSQL documentation on examining index usage explains this trade-off in detail.

Exceptions exist: if one value is rare (1% of rows are status = 'failed'), a partial index on that rare value is effective. A full B-tree index on the entire column is not.

Ignoring write amplification on hot tables

This is the counterbalance to everything above, and I would be doing you a disservice to gloss over it. Every index on a table adds overhead to every INSERT, UPDATE (if the indexed column changes), and DELETE. A table receiving 10,000 writes per second with five indexes pays the maintenance cost five times per write — a phenomenon known as write amplification. Adding a sixth index for a query that runs once per minute may not be a net improvement.

Before creating an index, consider the table's write volume. For the full analysis of index types and their write costs, including B-tree, GIN, and GiST maintenance characteristics, evaluate whether the read improvement justifies the write cost.

Over-indexing

The temptation, having found this workflow, is to index everything. I would gently counsel restraint. More indexes are not always better. Each index:

Consumes disk space (often 10–30% of the table size per index)
Slows down INSERTs, UPDATEs, and DELETEs
Increases VACUUM workload (dead index tuples must be cleaned up alongside dead heap tuples)
Adds to backup size and replication lag

A table with 15 indexes is worth a closer look. Review whether older indexes are still earning their keep by querying pg_stat_user_indexes:

SELECT
  indexrelname,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE relname = 'orders'
ORDER BY idx_scan ASC;

Indexes with zero idx_scan since the last statistics reset are candidates for removal.

Sampling rate too low

A subtlety worth attending to: if pg_qualstats.sample_rate is set too low, infrequent but expensive queries may not be captured. A query that runs once per hour but takes 30 seconds each time is a significant performance problem, but at a 1% sampling rate, pg_qualstats may never observe it.

For thorough coverage, run a diagnostic session at a higher sampling rate (25–50%) for 24 hours, then reduce the rate for ongoing monitoring. Alternatively, use pg_stat_statements to identify slow queries by total execution time and manually check whether their predicates are indexed.

Forgetting to ANALYZE after creating the index

A small step that makes all the difference: after creating a new index, PostgreSQL needs updated table statistics to accurately estimate the index's selectivity. Without running ANALYZE, the planner may not choose the new index because its cost estimates are based on stale data.

ANALYZE orders;

Autovacuum will eventually run ANALYZE, but the delay can be minutes to hours depending on your autovacuum configuration. Running it manually ensures the new index is used immediately.

The tools described here — pg_stat_user_tables, pg_qualstats, HypoPG — are not exotic. They are the built-in diagnostic instruments of a database that has been quietly cataloging its own performance characteristics since the moment you started it. The data was always there, waiting for someone to ask the right questions. Now you know how to ask them.

PostgreSQL Missing Index Detection: Automatic Recommendations

The signals your database is already sending

Step 1 — Identify candidates with pg_stat_user_tables

Resetting statistics

Limitations

Step 2 — Discover missing predicates with pg_qualstats

Installation and configuration

Querying pg_qualstats

Interpreting multi-column candidates

Performance overhead

Combining pg_stat_user_tables + pg_qualstats

Step 3 — Validate with HypoPG before you commit

Installation

Basic workflow

Testing multi-column and partial indexes

Cleanup

Limitations

The complete workflow — from signal to index

The schema

Step 1: Identify the table

Step 2: Find the predicates

Step 3: Validate with HypoPG

Step 4: Consider additional indexes

When to use a composite index instead

When to use a partial index instead

What Gold Lapel automates

Common mistakes and edge cases

Creating indexes on low-cardinality columns

Ignoring write amplification on hot tables

Over-indexing

Sampling rate too low

Forgetting to ANALYZE after creating the index

Frequently asked questions

How do I find missing indexes in PostgreSQL without installing extensions?

Does pg_qualstats slow down my production database?

Can HypoPG test multi-column or partial indexes?

How often should I check for missing indexes?

What is the difference between pg_qualstats and pg_stat_statements for index detection?

Related content