PostgreSQL Partial Indexes: An Elegant Solution for Particular Queries
You are indexing 10 million rows to find 50,000. A partial index corrects this inefficiency rather neatly.
Good evening. I notice you have rather more index than you need.
Allow me a brief observation. Your orders table contains 10 million rows. Your application queries pending orders — constantly, urgently, dozens of times per second. There are 50,000 pending orders. Your index on status covers all 10 million rows, 9,950,000 of which your pending-order queries will never, under any circumstances, touch.
This is the infrastructural equivalent of maintaining a filing cabinet for every document the household has ever received, when you only ever retrieve this month's correspondence. The cabinet is immaculate. It is also 200 times larger than it needs to be.
PostgreSQL's own documentation on partial indexes covers the syntax. A partial index adds a WHERE clause to the index definition itself, instructing PostgreSQL to index only the rows that match a given condition. The result is a dramatically smaller, more targeted index that costs less to maintain, occupies less memory, and is faster to scan. It has been available since PostgreSQL 7.2 — released in 2002 — and remains, in my experience, one of the most underused features in the entire system.
-- Full index: indexes every row in the table.
CREATE INDEX idx_orders_status ON orders (status);
-- Indexes all 10M rows. Size: ~214MB.
-- Partial index: indexes only the rows that match a WHERE clause.
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';
-- Indexes only the ~50K pending orders. Size: ~1.1MB.
-- 200x smaller. Faster to scan, faster to maintain. One megabyte instead of 214. The numbers are not metaphorical.
When does PostgreSQL use a partial index?
This is where most confusion begins, so allow me to be precise. PostgreSQL uses a partial index when the query's WHERE clause matches or logically implies the index's WHERE clause. The planner must be able to prove — at plan time, not at execution time — that every row the query could possibly return is covered by the index.
-- The partial index is used when the query's WHERE clause
-- matches or implies the index's WHERE clause.
-- This query uses the partial index:
SELECT * FROM orders
WHERE status = 'pending'
AND created_at > '2025-01-01';
-- Index Scan using idx_orders_pending
-- (actual time=0.02..0.14 rows=847 loops=1)
-- This query does NOT use the partial index:
SELECT * FROM orders
WHERE status = 'shipped'
AND created_at > '2025-01-01';
-- 'shipped' does not match WHERE status = 'pending'.
-- Falls back to a different index or sequential scan. The mechanism is straightforward: the planner compares the query's predicates against the index's predicate using a process called predicate implication. If the query's conditions are a logical subset of the index's condition — meaning the query can only return rows that the index contains — the index is eligible.
If your application queries WHERE status = 'pending', the index WHERE status = 'pending' matches exactly. If your application queries WHERE status = 'pending' AND created_at > '2025-01-01', the index still matches — the additional condition narrows further within the index's coverage.
But if your application queries WHERE status IN ('pending', 'processing'), neither a partial index on pending nor one on processing will be used. The planner cannot prove the IN clause is fully covered by either single-value partial index. This is not a limitation of intelligence — it is a guarantee of correctness. The planner will not use an index that might miss rows.
The subtleties of predicate implication
The planner's ability to prove implication has boundaries that are worth understanding, because they will affect whether your carefully crafted partial index is actually used.
-- The planner needs to PROVE your query is a subset of the index.
-- Simple equality: works.
-- Implied conditions: works in some cases.
-- This partial index:
CREATE INDEX idx_recent ON events (created_at)
WHERE created_at > '2025-01-01';
-- This query uses it (the planner can prove the subset):
SELECT * FROM events
WHERE created_at > '2025-06-01';
-- '2025-06-01' > '2025-01-01', so every matching row
-- is within the index's range.
-- This query does NOT use it:
SELECT * FROM events
WHERE created_at > now() - interval '30 days';
-- The planner cannot prove at plan time that
-- now() - '30 days' > '2025-01-01'.
-- It evaluates to true today, but the planner
-- doesn't do runtime arithmetic on index predicates. This distinction matters. A partial index with a static predicate like WHERE created_at > '2025-01-01' will not match a query that uses now() - interval '30 days', even when the runtime value falls within the index's range. The planner resolves implication at plan time using the literal expressions, not their evaluated results.
The practical consequence: partial index predicates should use constants or conditions that your queries will replicate exactly. WHERE status = 'pending' works because your application sends WHERE status = 'pending'. WHERE created_at > now() - interval '1 year' as an index predicate will almost never match, because no query will contain that identical expression with the same evaluation.
How to verify your partial index is being used
Trust, but verify. I have seen partial indexes sit unused for months because the query pattern shifted slightly, or because the ORM generated a different predicate than expected. EXPLAIN ANALYZE is the definitive answer.
-- Always verify with EXPLAIN ANALYZE.
-- Look for the index name in the plan output.
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE status = 'pending'
AND created_at > '2025-01-01';
-- Good — partial index is used:
-- Index Scan using idx_orders_pending on orders
-- (cost=0.29..8.31 rows=847 width=64)
-- (actual time=0.02..0.14 rows=847 loops=1)
-- Index Cond: (created_at > '2025-01-01')
-- Buffers: shared hit=12
-- Note: the WHERE status = 'pending' does NOT appear
-- as a Filter line. The planner knows every row in the
-- index already satisfies that condition. One fewer
-- check per row. One fewer thing that can go wrong. Notice what is absent from the plan: there is no Filter: (status = 'pending') line. When PostgreSQL uses a partial index, the index predicate is already satisfied — every row in the index matches the condition. The planner eliminates the redundant filter check entirely. This is a small but meaningful optimisation: one fewer comparison per row returned.
If you see your partial index name in the plan output, it is working. If you see a sequential scan or a different index, something has gone wrong — most likely a predicate mismatch between your query and the index, or a prepared statement interaction. Both are addressed later in this article.
How much smaller are partial indexes?
The savings are proportional to the selectivity of the WHERE clause. The fewer rows the index covers, the smaller it is. This is not a subtle effect.
| Scenario | Full index | Partial index | Size savings | Write impact |
|---|---|---|---|---|
| Status filter (status = pending, 0.5% of rows) | 214MB | 1.1MB | 99.5% | Only pending inserts update index |
| Soft delete (is_deleted = false, 5% of rows) | 214MB | 11MB | 95% | Only live rows update index |
| Active users (last_seen > 30 days ago, 20%) | 214MB | 43MB | 80% | Only recent users update index |
| Null filter (column IS NOT NULL, 60%) | 214MB | 128MB | 40% | Only non-null rows update index |
The size savings tell half the story. The write impact column tells the other half, and it may be the more important one.
A full index must be updated on every INSERT, UPDATE, or DELETE to any row in the table — regardless of whether that row is relevant to the queries the index serves. Every time an order ships, every time a status changes, every time a row is deleted, the full index is maintained. On a write-heavy table, this maintenance cost is continuous and cumulative.
A partial index is updated only when the affected row falls within the index's WHERE clause. On a table where 99.5% of rows are not pending, the partial index on pending rows is updated for 0.5% of writes. That is a 200x reduction in index maintenance cost. On tables with dozens of indexes — and I have seen tables with 40 — the aggregate write amplification savings can be transformative.
There is a secondary benefit that is easy to overlook: a smaller index fits in memory. PostgreSQL caches frequently accessed index pages in shared_buffers. A 1.1MB partial index fits entirely in cache on any production system. A 214MB full index competes with every other index and table page for buffer space. When the partial index is cached and the full index is not, the performance difference is not 200x — it is the difference between a memory read and a disk read. On a system with spinning disks, that can be 10,000x.
Partial unique indexes: enforcing business rules with precision
If partial indexes for performance are useful, partial indexes for correctness are indispensable. A partial unique index enforces uniqueness only on the rows that match its WHERE clause — which allows you to express business rules that a full unique index cannot.
-- Partial unique indexes: enforce uniqueness on a subset of rows.
-- Only one active subscription per user:
CREATE UNIQUE INDEX idx_one_active_sub ON subscriptions (user_id)
WHERE status = 'active';
-- This allows:
-- user_id=1, status='active' (allowed)
-- user_id=1, status='canceled' (allowed — not covered by index)
-- user_id=1, status='active' (REJECTED — duplicate)
-- A full unique index on (user_id) would prevent multiple
-- subscriptions entirely. The partial index prevents multiple
-- ACTIVE subscriptions while allowing history. This pattern appears with remarkable frequency once you start looking for it. One active session per user. One draft document per author. One open ticket per customer. One pending payment per invoice. In each case, the business rule is not "there shall be only one" — it is "there shall be only one in this state." Historical records, cancelled entries, closed tickets — these may duplicate freely. Only the active subset requires uniqueness.
A full unique index on (user_id) would prevent historical records entirely. You would need to hard-delete the cancelled subscription before creating a new one, losing the audit trail. A partial unique index preserves the history while enforcing the constraint where it matters.
-- More partial unique patterns that solve real business rules:
-- One draft article per author (published articles are unlimited):
CREATE UNIQUE INDEX idx_one_draft ON articles (author_id)
WHERE status = 'draft';
-- One open support ticket per customer:
CREATE UNIQUE INDEX idx_one_open_ticket ON tickets (customer_id)
WHERE resolved_at IS NULL;
-- Unique email only among non-deleted users:
CREATE UNIQUE INDEX idx_unique_email_active ON users (email)
WHERE deleted_at IS NULL;
-- Allows a user to delete their account and re-register
-- with the same email. The old row's uniqueness constraint
-- simply ceases to apply. The email uniqueness example deserves particular attention. Without a partial unique index, allowing users to delete their account and re-register with the same email requires either hard-deleting the user record (losing all associated data) or implementing a separate email-uniqueness table with manual management. The partial index on WHERE deleted_at IS NULL handles it in a single line of DDL.
I should note the honest limitation: partial unique indexes are enforced at the database level, which means your application must set the WHERE column correctly before the insert. If a bug causes deleted_at to remain NULL on a user who should be deleted, the uniqueness constraint will prevent re-registration. The database enforces the rule faithfully — it does not compensate for application errors. This is a feature, not a limitation, but it does require that your application's state transitions are correct.
Soft-deleted rows: index only what matters
-- Soft-deleted rows: index only the live ones.
-- Your table has millions of rows, 95% soft-deleted:
-- is_deleted = true: 9.5M rows (don't query these)
-- is_deleted = false: 500K rows (query these constantly)
-- Full index:
CREATE INDEX idx_items_name ON items (name);
-- 10M rows indexed. 214MB. 95% wasted.
-- Partial index:
CREATE INDEX idx_items_name_live ON items (name)
WHERE is_deleted = false;
-- 500K rows indexed. ~11MB. Every entry is relevant. Soft deletion is one of the highest-value use cases for partial indexes, and it is worth examining in depth because the pattern is so common and the waste so significant.
Applications that soft-delete rather than hard-delete accumulate large volumes of "dead" rows that are never queried but are included in every index. A social media platform that soft-deletes posts, a SaaS application that soft-deletes user accounts, an e-commerce system that soft-deletes products — each accumulates a graveyard of rows that inflates every index, slows every write, and wastes buffer cache space.
The standard approach — a boolean is_deleted or a nullable deleted_at timestamp — creates a natural partition between queried data and archived data. A partial index on WHERE is_deleted = false (or WHERE deleted_at IS NULL) excludes the archived rows entirely.
-- Soft-delete tables often need multiple partial indexes.
-- Every query your application runs against "live" data
-- should have a partial index that excludes the dead rows.
CREATE INDEX idx_items_name_live ON items (name)
WHERE is_deleted = false;
CREATE INDEX idx_items_category_live ON items (category)
WHERE is_deleted = false;
CREATE INDEX idx_items_created_live ON items (created_at)
WHERE is_deleted = false;
-- Each of these is ~20x smaller than its full equivalent.
-- Three partial indexes totaling ~33MB vs three full indexes
-- totaling ~642MB. The savings compound with every index. The savings compound with every index you create on the table. If you have five indexes and 95% of your data is soft-deleted, converting all five to partial indexes reduces total index storage by 95%. On a table with 100 million rows and 5 million live rows, that is the difference between a gigabyte of indexes and 50 megabytes.
I should be honest about when this pattern falls short. If your application regularly queries soft-deleted data — for admin views, analytics, or compliance searches — you will need either full indexes to serve those queries or separate partial indexes for the deleted subset. The savings apply only when the excluded rows are genuinely unqueried. If 20% of your queries touch deleted data, the situation is more nuanced than a clean partitioning of concerns.
Indexing NULL and non-NULL values selectively
-- Index only non-null values:
CREATE INDEX idx_orders_shipped_at ON orders (shipped_at)
WHERE shipped_at IS NOT NULL;
-- Useful when 70% of orders haven't shipped yet.
-- The index covers only the 30% that have a shipped_at value.
-- Queries filtering on shipped_at inherently need non-null values. Partial indexes on IS NOT NULL are useful when a column is NULL for the majority of rows and you only query the non-null values. Common examples: shipped_at (most orders haven't shipped), deleted_at (most rows aren't deleted), verified_at (most accounts are unverified), completed_at (most tasks are still in progress).
But the more interesting pattern — and the one I see most often in production — is indexing the null rows. The "null as queue" pattern.
-- The "null as queue" pattern:
-- A common design where NULL means "not yet processed."
CREATE TABLE jobs (
id bigserial PRIMARY KEY,
payload jsonb NOT NULL,
created_at timestamptz DEFAULT now(),
processed_at timestamptz -- NULL = pending
);
-- Your worker queries:
SELECT * FROM jobs
WHERE processed_at IS NULL
ORDER BY created_at
LIMIT 100;
-- Full index on processed_at: indexes all rows.
-- 99% of rows have a processed_at value (already done).
-- The 1% you actually query are the NULLs.
CREATE INDEX idx_jobs_pending ON jobs (created_at)
WHERE processed_at IS NULL;
-- This index contains only the unprocessed jobs.
-- As jobs are processed, they leave the index automatically.
-- The index stays small regardless of how many total jobs exist.
-- 50M jobs processed, 12K pending: the index covers 12K rows. This is among the most elegant applications of partial indexing. The index represents the work queue itself. As items are processed and processed_at receives a timestamp, they leave the index automatically. The index is self-maintaining: it grows when work arrives and shrinks when work completes. No background job to rebuild it. No scheduled cleanup. The data model and the index are in perfect alignment.
I have observed this pattern serving job queues with 50 million total rows and 10,000 pending items. The index occupies a few hundred kilobytes. A full index on processed_at would occupy several gigabytes and be utterly irrelevant to the only query that matters.
Multiple partial indexes vs one large index
-- Multiple values with partial indexes (instead of one large index):
-- Separate indexes for frequently-queried statuses:
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';
CREATE INDEX idx_orders_processing ON orders (created_at)
WHERE status = 'processing';
-- No index for 'shipped' or 'delivered' (rarely queried).
-- Total: 2 small indexes instead of 1 large index.
-- Each index is tiny and perfectly targeted. Instead of one index on status that covers all 10 million rows, you can create targeted partial indexes for only the statuses you actually query. This is especially effective when different statuses have vastly different cardinalities and query frequencies.
Consider a typical order lifecycle: pending (0.5% of rows, queried constantly), processing (0.3%, queried frequently), shipped (15%, queried occasionally), delivered (84%, queried rarely). A single index on status indexes all 10 million rows to serve queries that predominantly target 80,000 of them. Two partial indexes — one for pending, one for processing — cover the queries that matter while ignoring the 99.2% of rows that are queried once a quarter for a report.
The trade-off is management complexity. More indexes to create, to track, to drop when they are no longer needed. For 2-3 statuses, the benefit is unambiguous. For 20 statuses with varying query patterns, a single full index is simpler and probably adequate — the overhead of managing 20 partial indexes exceeds the storage savings.
There is a sharper trade-off to acknowledge: if your status distribution changes, your partial indexes become stale. A partial index on WHERE status = 'pending' is useful when pending is 0.5% of the table. If a processing backlog causes pending to grow to 40% of the table, the partial index is no longer dramatically smaller than a full index, and you are paying the management overhead for diminished returns. Partial indexes work best when the data distribution is stable and well-understood.
"The right index is not the one with the most columns. It is the one whose column order mirrors the way your queries actually filter and sort. This distinction is where most indexing strategies succeed or fail."
— from You Don't Need Redis, Chapter 18: The PostgreSQL Performance Decision Framework
Combining partial indexes with composite columns
Partial indexes compose beautifully with composite column ordering — and the combination can produce indexes of extraordinary precision. The same rules apply within the index: equality columns first, range columns last.
-- Partial indexes compose beautifully with composite columns.
-- Equality-first column ordering still applies within the index.
CREATE INDEX idx_orders_pending_customer ON orders (customer_id, created_at)
WHERE status = 'pending';
-- Serves: "Show me this customer's pending orders, newest first."
SELECT * FROM orders
WHERE status = 'pending'
AND customer_id = 42
ORDER BY created_at DESC;
-- Index Scan Backward using idx_orders_pending_customer
-- Index Cond: (customer_id = 42)
-- (actual time=0.01..0.03 rows=7 loops=1)
-- The status filter is satisfied by the index predicate.
-- The customer_id filter uses the first index column.
-- The ORDER BY uses the second index column.
-- No filter, no sort. Everything resolved in the index. The partial predicate eliminates the irrelevant rows. The composite columns serve the specific query within the remaining subset. The result is an index that does three jobs in one: it filters by status (via the predicate), looks up by customer (via the first column), and sorts by date (via the second column). No wasted entries, no post-index filtering, no sort operation.
For queries that return specific columns, add INCLUDE to create a covering partial index — the smallest possible index that answers the query without touching the table heap.
-- Partial + covering: the full combination.
-- INCLUDE adds columns for index-only scans.
CREATE INDEX idx_orders_pending_covering
ON orders (customer_id, created_at)
INCLUDE (total, currency)
WHERE status = 'pending';
-- This query never touches the table heap:
SELECT customer_id, created_at, total, currency
FROM orders
WHERE status = 'pending'
AND customer_id = 42
ORDER BY created_at DESC;
-- Index Only Scan using idx_orders_pending_covering
-- Heap Fetches: 0
-- The index is still tiny — only pending orders are in it.
-- But it carries everything this query needs. A covering partial index is the most targeted access path PostgreSQL can offer. It indexes only the rows that match, organises them by the query's filter and sort columns, and carries the output columns the query needs. Zero heap fetches, minimal I/O, minimal cache footprint. If you know your query pattern and your data distribution, this is as good as it gets.
Partial indexes with GIN, GiST, and BRIN
Partial indexing is not limited to B-tree. The WHERE clause can be applied to any index type, and the savings are often more dramatic on expensive index types like GIN.
-- Partial indexing works with GIN indexes too.
-- GIN on JSONB is expensive. Partial GIN is surgical.
-- A products table with a metadata JSONB column:
-- 8M products total, 200K are 'active' and searchable.
-- Full GIN index:
CREATE INDEX idx_products_meta ON products USING gin (metadata);
-- Size: ~1.2GB. Indexes every product's metadata.
-- Partial GIN index:
CREATE INDEX idx_products_meta_active ON products USING gin (metadata)
WHERE status = 'active';
-- Size: ~30MB. Only active products. 40x smaller.
-- Your search query:
SELECT * FROM products
WHERE status = 'active'
AND metadata @> '{"color": "blue"}';
-- Uses the partial GIN index. GIN indexes on JSONB columns are notoriously large — they index every key and value in every JSON document. On a table with 8 million rows and rich metadata, a full GIN index can easily exceed a gigabyte. A partial GIN index that covers only the active products reduces that to a fraction.
The same principle applies to GiST indexes for geometric or full-text search data, and to BRIN indexes on time-series tables where you want to cover only recent data. The WHERE clause is orthogonal to the index type — it simply determines which rows enter the index structure, regardless of how that structure is organised internally.
Creating partial indexes safely in production
A brief but important practical note.
-- Creating partial indexes is safe in production.
-- Use CONCURRENTLY to avoid locking the table.
-- This blocks writes for the duration of the build:
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';
-- This does not:
CREATE INDEX CONCURRENTLY idx_orders_pending ON orders (created_at)
WHERE status = 'pending';
-- CONCURRENTLY takes longer (two table scans instead of one)
-- but allows reads and writes to continue throughout.
-- On a 10M-row table where the partial index covers 50K rows,
-- the build completes in seconds either way.
-- On a 500M-row table, CONCURRENTLY is not optional. CREATE INDEX CONCURRENTLY is the only acceptable way to add indexes to production tables. This applies doubly to partial indexes, which you may be creating in response to performance observations on live systems. A regular CREATE INDEX acquires a SHARE lock on the table, blocking all writes for the duration of the build. On a large table, "the duration" can be minutes.
The encouraging detail: because partial indexes are smaller, CREATE INDEX CONCURRENTLY completes faster. Building a partial index over 50,000 rows is materially faster than building a full index over 10 million. The very feature that makes the index useful also makes it safer to deploy.
After deployment: verify and clean up
-- After creating a partial index, verify it is being used.
-- Check usage statistics:
SELECT indexrelname,
pg_size_pretty(pg_relation_size(indexrelid)) AS size,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE indexrelname = 'idx_orders_pending';
-- idx_scan: how many times the index has been scanned.
-- If this is 0 after a week of production traffic,
-- the index is not being used. Drop it.
-- Also check whether the OLD full index is still needed:
SELECT indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE relname = 'orders'
ORDER BY idx_scan ASC;
-- If the full index on status has zero scans after
-- the partial indexes are in place, it is dead weight. Creating a partial index is the beginning, not the end. After deployment, verify two things: first, that the new partial index is being used; second, that the old full index is no longer needed.
Check pg_stat_user_indexes after a representative period of production traffic — at least a full business cycle, ideally a week. If idx_scan for your partial index is zero, something is wrong. The most common cause is a predicate mismatch — your queries are not matching the index's WHERE clause. The second most common cause is parameterised queries and generic plans, which I address below.
Equally important: check whether the old full index still has scans. If you have replaced a full index with one or more partial indexes that serve all the same queries, the full index is dead weight. It consumes storage, slows writes, and occupies buffer cache — all for zero benefit. Drop it. Unused indexes are not insurance; they are liabilities.
The ORM and prepared statement pitfall
I'm afraid this section contains news that is unwelcome but necessary. ORMs and prepared statements can interfere with partial index usage in ways that are not immediately obvious.
-- Common pitfall: your ORM generates queries that
-- don't match the partial index predicate.
-- Your partial index:
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';
-- Your ORM generates:
SELECT * FROM orders
WHERE status = $1 -- parameterized
AND created_at > $2;
-- PostgreSQL sees a parameter ($1), not the literal 'pending'.
-- At plan time, the planner doesn't know if $1 = 'pending'.
-- With a generic plan, the partial index may not be chosen.
-- This is a prepared statement interaction.
-- For the first 5 executions, PostgreSQL creates custom plans
-- (where it knows $1 = 'pending') and uses the partial index.
-- After that, it MAY switch to a generic plan that cannot.
-- Workaround: ensure your ORM sends the status as a literal
-- in the query text, or use plan_cache_mode = force_custom_plan
-- for critical queries (PostgreSQL 12+). The core issue: PostgreSQL's prepared statement mechanism generates a "generic plan" after the first several executions of a prepared statement. A generic plan does not know the specific parameter values — it plans for any possible value of $1. Since the planner cannot prove that an unknown $1 equals 'pending', it cannot prove the query is covered by the partial index, and it falls back to the full index or a sequential scan.
This is not a bug. It is a reasonable design choice — generic plans avoid the overhead of re-planning every execution. But it means that if your ORM parameterises the column that your partial index filters on, you may lose the partial index after the first few executions.
Three approaches to address this:
- Inline the constant. If your ORM allows it, send the status value as a literal in the SQL text rather than as a parameter.
WHERE status = 'pending'in the query text, notWHERE status = $1with a parameter binding. Not all ORMs support this cleanly. - Force custom plans.
SET plan_cache_mode = force_custom_plan(PostgreSQL 12+) disables generic plans entirely for the session. This ensures the planner always sees the actual parameter values, but at the cost of re-planning every execution. For queries that run thousands of times per second, the planning overhead is measurable. - Design the partial index around the parameterised columns. If the ORM parameterises
status, do not usestatusin the partial index predicate. Instead, use a different column — perhaps one the ORM does not parameterise — or restructure the query pattern.
This interaction between prepared statements and partial indexes is the single most common reason I observe partial indexes sitting unused in production. It is worth verifying after deployment.
When partial indexes are not the answer
A waiter who overstates his case is no waiter at all. Partial indexes are not universally appropriate, and I should be clear about where they fall short.
Uniform data distributions. If every status has roughly equal cardinality — 20% pending, 20% processing, 20% shipped, 20% delivered, 20% cancelled — a partial index on any single status saves only 80%. That is meaningful but not dramatic, and you would need five partial indexes to cover all queries, each nearly as large as a single full index. The management overhead outweighs the benefit. A single full index on status is simpler and almost as effective.
Volatile predicates. If the condition that defines the partial index changes frequently — today you query pending orders, next month the team starts querying processing orders — you are creating and dropping indexes regularly. Partial indexes work best when query patterns are stable. If patterns shift quarterly, the full index is more durable.
Ad hoc queries. Analytics teams, data scientists, and reporting tools issue queries that you cannot predict. A partial index on WHERE status = 'pending' is useless to an analyst running WHERE status = 'delivered' AND total > 500. If a significant fraction of your query traffic is ad hoc, full indexes serve this workload better.
High-cardinality predicates. A partial index predicate should be highly selective — covering a small fraction of the table. A partial index on WHERE country = 'US' when 60% of your users are American saves 40%. A full index is 40% larger but requires zero additional thought about predicate matching, ORM interactions, or planner behaviour. The savings must justify the complexity.
The honest summary: partial indexes excel when data is skewed, query patterns are known and stable, and the indexed subset is a small fraction of the total. The further you move from that profile, the less benefit they provide relative to their management cost.
A checklist before you create one
If you are considering a partial index, these questions will tell you whether it is the right tool.
- Is the predicate highly selective? Does the
WHEREclause cover less than 20% of the table? Below 5%, the savings are dramatic. Between 5% and 20%, they are solid. Above 20%, consider whether the complexity is justified. - Is the query pattern stable? Will your application still be issuing
WHERE status = 'pending'in six months? If query patterns shift frequently, partial indexes require ongoing maintenance. - Does your ORM match the predicate? Run
EXPLAIN ANALYZEon the actual query your application sends — not the query you think it sends. If the ORM parameterises the predicate column, test with the generic plan. - Can you drop the old full index? The full benefit of a partial index is realised when the old full index is removed. If other queries still need the full index, you have added an index without removing one — a net increase in write amplification.
- Will the data distribution remain stable? A partial index on
WHERE status = 'pending'assumes pending remains a small fraction of the table. If a processing bottleneck causes pending to swell, the index loses its advantage.
If you answered yes to the first three and have a plan for the last two, the partial index will serve you well.
The traffic reveals the opportunity
The ideal partial index depends on your actual query patterns: which WHERE clauses appear most frequently, which filter values are the most selective, and which tables have the most skewed data distributions. These patterns live in your query traffic — not in your schema documentation, not in your ORM configuration, and not in your best intentions about how the application should be queried.
I have observed teams create partial indexes based on assumptions about their workload, only to discover — weeks later, via pg_stat_user_indexes showing zero scans — that the actual queries used different predicates, different column orders, or different parameter styles than expected. The gap between the intended query pattern and the actual query pattern is where partial indexes go to be ignored.
Gold Lapel identifies partial index opportunities by observing repeated queries with constant filter values on columns with highly skewed distributions. When 95% of queries on the orders table filter on status = 'pending' and pending orders are 0.5% of the table, the partial index opportunity is evident in the traffic. The proxy creates targeted partial indexes that serve the actual workload, not hypothetical access patterns — and it does so using the exact predicate forms that the application sends, avoiding the ORM mismatch problem entirely.
That said, a partial index is a remarkably simple thing. One line of DDL. If you know your queries and your data distribution, you do not need a proxy to create one. You need five minutes and EXPLAIN ANALYZE to verify it. The technique predates Gold Lapel by two decades. It will outlast us all.
Frequently asked questions
Terms referenced in this article
I have taken the liberty of preparing a broader survey that places partial indexes in context alongside their cousins. Our guide to PostgreSQL index types covers B-tree, GIN, GiST, BRIN, and hash — each with the specific workloads and query patterns where they earn their place in the schema.