Covering index
An index that carries everything a query needs — so PostgreSQL never has to leave the index to ask the table for help.
A covering index is an index that contains all the columns a query references — in the WHERE clause, the SELECT list, and any JOIN or ORDER BY expressions. When PostgreSQL can satisfy a query entirely from the index, it performs an index-only scan, skipping the table heap entirely. Since PostgreSQL 11, the INCLUDE clause lets you attach non-key columns to an index for precisely this purpose. The result is an index that answers questions on its own, without troubling the table. I find self-sufficiency an admirable quality in an index.
What a covering index is
In a standard B-tree index, each leaf entry stores the indexed column values plus a pointer (TID) to the corresponding row in the table. When a query needs columns that are not in the index, PostgreSQL must follow those pointers to the table heap to retrieve the missing data — a heap fetch. Think of it as an index that knows where to find the answer but still has to walk across the house to fetch it.
A covering index eliminates that errand. PostgreSQL 11 introduced the INCLUDE clause, a precision tool that lets you attach non-key columns directly to the index's leaf pages. These included columns are not part of the B-tree's sort order — they cannot be used for filtering or ordering. They are payload, carried along so that queries can read them without visiting the table. The index arrives with everything in hand.
-- A standard B-tree index on customer_id
CREATE INDEX idx_orders_customer ON orders (customer_id);
-- A covering index that also stores order_total and status
CREATE INDEX idx_orders_customer_covering ON orders (customer_id)
INCLUDE (order_total, status); The first index stores only customer_id and a TID. The second stores customer_id, order_total, and status. Any query that filters on customer_id and selects only those two columns can be answered entirely from the second index — no heap visit required.
Why covering indexes matter
The performance difference between an index scan and an index-only scan can be substantial. An index scan must make a random I/O trip to the table heap for every row it returns. On a cold cache, each heap fetch means a disk seek. On a query that returns 10,000 rows, that is 10,000 random reads. One shudders.
An index-only scan reads only the index. The data is already there in the leaf pages. No heap fetches, no random I/O to the table. For read-heavy queries that return many rows from a large table, this is often the difference between a query that takes milliseconds and one that takes seconds.
-- This query can be answered entirely from the covering index
EXPLAIN ANALYZE
SELECT order_total, status
FROM orders
WHERE customer_id = 42;
-- Plan output (simplified):
-- Index Only Scan using idx_orders_customer_covering on orders
-- Index Cond: (customer_id = 42)
-- Heap Fetches: 0 The number to watch in the EXPLAIN output is Heap Fetches: 0. When you see that, the covering index is doing its job — PostgreSQL answered without ever touching the table. That zero is one of the more satisfying numbers in a query plan.
Creating covering indexes
If you'll allow me, the syntax here is worth appreciating. It draws a clean line between key columns (used for lookup and sorting) and included columns (carried as payload):
-- Composite index: all columns are key columns, sorted and searchable
CREATE INDEX idx_composite ON orders (customer_id, status, order_total);
-- Covering index: only customer_id is a key column; the rest are payload
CREATE INDEX idx_covering ON orders (customer_id) INCLUDE (status, order_total);
-- The composite index is larger because it sorts by all three columns.
-- The covering index is smaller — INCLUDE columns are stored only in
-- leaf pages and are not part of the tree's sort order. Key columns go in the parentheses after the table name. These are the columns that form the B-tree structure — they determine sort order and can be used for WHERE, ORDER BY, and range scans. Included columns go in the INCLUDE clause. They are stored only in leaf pages and add no overhead to the tree's internal pages.
This distinction matters for index size. A composite index on (customer_id, status, order_total) sorts by all three columns at every level of the tree. A covering index on (customer_id) INCLUDE (status, order_total) sorts only by customer_id — the other columns exist only at the leaf level. The covering index is smaller, fits in the buffer cache more readily, and is faster to scan. Economy and precision in a single clause.
When to use them
Covering indexes earn their keep when a query pattern is both frequent and narrow — it filters on one or two columns and selects a small, predictable set of output columns. Common candidates:
- Dashboard queries — queries that SELECT a few columns for display, filtered by a user ID or status. These run constantly and benefit from eliminating heap fetches.
- Filtered aggregations —
COUNT(*),SUM(), orAVG()with a WHERE clause. An index-only scan avoids reading every matching row from the table. - API list endpoints — paginated queries that return a subset of columns for a list view, filtered by a foreign key.
- Existence checks — queries that check
WHERE EXISTS (SELECT 1 FROM ... WHERE ...). The subquery can use an index-only scan if the index covers the filter columns.
-- Covering index for a filtered COUNT
CREATE INDEX idx_orders_active ON orders (status) INCLUDE (id);
-- This COUNT can use an index-only scan
SELECT COUNT(*)
FROM orders
WHERE status = 'active'; I should note where covering indexes offer diminishing returns: when queries SELECT many columns (you would need to include most of the table, at which point the index is scarcely smaller than the table itself), when the table is small enough to fit in the buffer cache regardless, or when the workload is write-heavy and the overhead of maintaining included columns outweighs the read benefit.
Visibility map and index-only scans
A covering index is necessary but not sufficient for an index-only scan. There is a second condition, and it catches people off guard. PostgreSQL also needs to know that the rows on a table page are visible to all current transactions. This information lives in the visibility map — a bitmap with one bit per table page.
When a page is marked all-visible, PostgreSQL knows that every row on that page is visible to all transactions and can skip the heap fetch. When a page is not all-visible — because it contains recently inserted, updated, or deleted rows — PostgreSQL must check the heap to verify visibility, even if the index contains all the data the query needs. A well-prepared index, undermined by stale housekeeping.
-- Check how much of a table is visible (all-visible pages)
SELECT
relname,
n_live_tup,
n_dead_tup,
round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 1) AS dead_pct
FROM pg_stat_user_tables
WHERE relname = 'orders';
-- If dead_pct is high, VACUUM the table to update the visibility map
VACUUM orders; This is why VACUUM matters for covering index performance. VACUUM marks pages as all-visible after removing dead tuples. A table that is not vacuumed frequently will have many pages that are not all-visible, and your carefully constructed covering index will degrade into an index scan with heap fetches — which rather defeats the purpose.
-- Check heap fetches on an index-only scan
EXPLAIN (ANALYZE, BUFFERS)
SELECT order_total, status FROM orders WHERE customer_id = 42;
-- Look for "Heap Fetches: N" in the output.
-- Heap Fetches: 0 means the visibility map is clean — pure index-only scan.
-- Heap Fetches: 1500 means 1,500 rows required a trip to the table heap
-- because their pages were not marked all-visible. If you create a covering index and still see high heap fetch counts, the remedy is usually to run VACUUM or to tune autovacuum to run more aggressively on that table. The index did its part. The visibility map needs to do its.
How Gold Lapel relates
Gold Lapel observes every query that passes through the proxy and tracks which columns each query filters on, selects, and joins. When it detects a pattern where the same query shape repeatedly triggers heap fetches — an index scan returning columns not present in the index — it recommends a covering index that would eliminate those fetches. The query told us exactly what it needed; we simply ensure the index is prepared to provide it.
Gold Lapel also monitors heap fetch counts on existing index-only scans. If a table's visibility map is falling behind and causing index-only scans to degrade, Gold Lapel flags the table as a candidate for more aggressive autovacuum tuning. A covering index is only as effective as the visibility map allows — we keep an eye on both sides of that equation.