← PostgreSQL Concepts

Planner (query planner)

The decision-maker behind every query. If you'll permit me, I'd like to introduce you to the most important colleague you've never met.

Concept · March 21, 2026 · 9 min read

The query planner is the part of PostgreSQL that sits between parsing and execution. It takes the parsed SQL, considers every possible way to execute it — different scan methods, join orders, index choices — estimates the cost of each option, and picks the cheapest plan. These cost estimates are based on table statistics collected by ANALYZE. When those statistics are accurate, the planner nearly always makes the right call. When they are stale or missing, it makes decisions based on wrong assumptions — and queries that should take milliseconds take seconds. The planner is not wrong in these cases. It is misinformed. An important distinction, and one I find most engineers discover too late.

What the query planner does

Every SQL statement goes through the same pipeline: parse, analyze, rewrite, plan, execute. The planner handles the fourth step — and it is, if I may say so, the step that earns its keep. It receives a query tree — an abstract representation of the SQL — and produces an execution plan: a tree of physical operations that the executor will carry out.

For a simple query like SELECT * FROM orders WHERE id = 42, there are only a few options: scan the table sequentially and filter, or use an index. For a query joining five tables with multiple WHERE conditions, the space of possible plans explodes. The planner must consider every permutation of join order (5! = 120 orderings), multiple join strategies for each pair (nested loop, hash join, merge join), and different access methods for each table (sequential scan, index scan, bitmap scan). Even a moderately complex query can have thousands of candidate plans. The planner evaluates all of them in milliseconds. Competent management, when it has good data to work with.

The planner does not try every combination exhaustively for large joins. When the number of tables exceeds geqo_threshold (default 12), it switches from exhaustive search to a genetic algorithm that samples the plan space. For most real-world queries — involving fewer than 12 tables — the planner evaluates all options and finds the true optimum.

SQL
-- See the planner's chosen plan without running the query
EXPLAIN SELECT o.*, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > now() - interval '7 days'
ORDER BY o.total DESC;

-- Output:
-- Sort  (cost=1234.56..1237.89 rows=150 width=96)
--   Sort Key: o.total DESC
--   ->  Nested Loop  (cost=0.85..1180.23 rows=150 width=96)
--         ->  Index Scan using orders_created_at_idx on orders o
--               (cost=0.43..456.78 rows=150 width=64)
--               Index Cond: (created_at > ...)
--         ->  Index Scan using customers_pkey on customers c
--               (cost=0.42..4.82 rows=1 width=32)
--               Index Cond: (id = o.customer_id)

The plan above shows the planner's decisions: use the orders_created_at_idx index to find recent orders, join each order to its customer via the primary key index, then sort by total. Each node includes a cost estimate that the planner used to compare this plan against alternatives.

How cost estimation works

The planner assigns a cost to each candidate plan and picks the one with the lowest total cost. Cost is measured in arbitrary units — not milliseconds, not dollars, just relative cost — calibrated by a set of configurable parameters that tell the planner what your hardware looks like.

The fundamental parameters are:

  • seq_page_cost (default 1.0) — the cost of reading one 8 KB page sequentially from disk
  • random_page_cost (default 4.0) — the cost of reading one page at a random position. Higher than sequential because spinning disks must seek. On SSDs, this should be much lower (1.1–1.5)
  • cpu_tuple_cost (default 0.01) — the cost of processing one row
  • cpu_index_tuple_cost (default 0.005) — the cost of processing one index entry
  • cpu_operator_cost (default 0.0025) — the cost of evaluating one operator or function
  • effective_cache_size (default 4 GB) — the planner's estimate of how much memory is available for caching data (shared_buffers + OS page cache). Does not allocate memory; only influences cost estimates for index scans

For a sequential scan of a table with 10,000 pages and 1 million rows, the planner estimates the cost as: (10,000 * seq_page_cost) + (1,000,000 * cpu_tuple_cost) = 10,000 + 10,000 = 20,000. For an index scan returning 100 rows from a 3-level B-tree index, the cost is much lower — a few random page reads plus a handful of tuple processing costs. The planner compares these totals and picks the cheaper option.

SQL / postgresql.conf
-- View current cost parameters
SHOW seq_page_cost;          -- cost of a sequential page read (default: 1.0)
SHOW random_page_cost;       -- cost of a random page read (default: 4.0)
SHOW cpu_tuple_cost;         -- cost of processing one row (default: 0.01)
SHOW cpu_index_tuple_cost;   -- cost of processing one index entry (default: 0.005)
SHOW cpu_operator_cost;      -- cost of evaluating one operator (default: 0.0025)
SHOW effective_cache_size;   -- planner's estimate of available cache (default: 4GB)

-- On SSDs, random reads are much closer to sequential reads.
-- Lowering random_page_cost makes the planner favor index scans more.
SET random_page_cost = 1.1;  -- typical SSD setting

The most impactful tuning for modern hardware is lowering random_page_cost. The default of 4.0 reflects spinning disks from the 1990s. On SSDs, random reads are nearly as fast as sequential reads, and the planner deserves to know that. I have seen more suboptimal plans caused by this single default than by any other configuration issue. Setting it to 1.1 makes the planner more willing to use index scans, which is almost always the right call on SSD-backed storage.

Table statistics and ANALYZE

Allow me to be direct about something: the planner's cost estimates are only as good as the row count estimates they depend on. The planner needs to know how many rows each step of the plan will produce, because that determines downstream costs — a nested loop join is fast when the outer side has 10 rows and catastrophic when it has 100,000.

Row estimates come from table statistics stored in pg_statistic (and exposed through the friendlier pg_stats view). For each column, PostgreSQL tracks:

  • n_distinct — the number of distinct values (or a ratio if negative, meaning the fraction of distinct values relative to row count)
  • most_common_vals / most_common_freqs — the most frequent values and their frequencies. For a status column with 95% "active" and 5% "pending," these arrays tell the planner that filtering on "pending" returns about 5% of rows
  • histogram_bounds — equally-populated histogram buckets for values not in the most-common-values list. Used for range queries (WHERE created_at > '2026-01-01')
  • null_frac — the fraction of null values in the column
  • correlation — how well the physical order of rows on disk matches the logical order of the column. High correlation means an index range scan will read pages sequentially; low correlation means random I/O
SQL
-- Check table statistics that the planner uses
SELECT
  attname,
  n_distinct,
  most_common_vals,
  most_common_freqs,
  correlation
FROM pg_stats
WHERE tablename = 'orders'
  AND attname IN ('status', 'customer_id', 'created_at');

-- See estimated row counts for a table
SELECT
  relname,
  reltuples::bigint AS estimated_rows,
  relpages AS pages
FROM pg_class
WHERE relname = 'orders';

These statistics are collected by the ANALYZE command, which samples a subset of rows (default 30,000 values per column) and computes the distribution summaries. Autovacuum runs ANALYZE automatically when a table has been modified enough — by default, when 10% of rows have changed since the last ANALYZE.

SQL
-- Update statistics for a specific table
ANALYZE orders;

-- Update statistics for a specific column with higher resolution
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000;
ANALYZE orders;

-- Check when autovacuum last ran ANALYZE
SELECT
  relname,
  last_analyze,
  last_autoanalyze,
  n_mod_since_analyze
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY n_mod_since_analyze DESC
LIMIT 10;

The default statistics target of 100 histogram buckets is sufficient for most columns. For columns with highly skewed distributions or many distinct values — like a user_id column in a multi-tenant table — increasing the statistics target to 500 or 1000 gives the planner a more detailed picture of the data distribution.

When the planner gets it wrong

I should note that when engineers say "the planner chose a bad plan," what they nearly always mean is "the planner chose the best plan for the data it was shown, and that data was wrong." The planner is a cost-based optimizer, not an oracle. It makes the best decision it can given the statistics it has. When those statistics are wrong or incomplete, the plan it chooses can be dramatically suboptimal.

Stale statistics

The most common cause of bad plans, and the most forgivable. A table receives a large batch of inserts, the data distribution shifts, but autovacuum has not yet run ANALYZE. The planner bases its estimates on the old distribution and underestimates (or overestimates) how many rows a filter will return. The fix is mercifully simple: run ANALYZE.

SQL
-- The planner expected 1 row but got 45,000 — chose a nested loop
-- that ran the inner side 45,000 times instead of a hash join
EXPLAIN ANALYZE
SELECT o.*, p.name
FROM orders o
JOIN products p ON p.id = o.product_id
WHERE o.status = 'pending';

-- Nested Loop  (cost=0.42..58.67 rows=1 width=128)
--              (actual time=0.034..8923.456 rows=45000 loops=1)
--   ->  Seq Scan on orders o  (cost=0.00..25432.00 rows=1 width=96)
--                              (actual time=0.012..187.345 rows=45000 loops=1)
--         Filter: (status = 'pending')
--   ->  Index Scan using products_pkey on products p
--         (cost=0.42..4.44 rows=1 width=32)
--         (actual time=0.002..0.002 rows=1 loops=45000)

-- Fix: run ANALYZE orders; — the planner will see the real distribution
-- of status values and choose a hash join instead.

Correlated columns

The planner assumes that WHERE conditions on different columns are independent. If you filter on WHERE country = 'US' AND state = 'California', the planner multiplies the selectivity of each condition separately — even though knowing the country is US makes California far more likely. This leads to underestimated row counts. PostgreSQL does not track cross-column correlations by default. Extended statistics (CREATE STATISTICS) can address this for specific column combinations.

Row estimate errors that cascade

This is where things become genuinely alarming. A small estimation error at one node in the plan can cascade into a catastrophic choice at a higher level. If the planner estimates 10 rows from a subquery but the actual result is 50,000, a nested loop join that was optimal for 10 rows becomes disastrous for 50,000 — the inner side executes 50,000 times instead of 10. This cascading effect is why even moderate estimation errors can cause orders-of-magnitude performance differences. One wrong number, and the entire plan unravels.

Bad join order

For multi-table joins, the order in which tables are joined matters enormously. Joining the most selective table first can reduce intermediate result sizes from millions to hundreds. The planner gets this right when row estimates are accurate, but wrong estimates propagate into wrong join ordering. A query joining five tables with one bad row estimate can end up with a plan that processes millions of intermediate rows unnecessarily. I have seen five-table joins run in 12 milliseconds with good statistics and 45 seconds without. Same query. Same data. Same indexes.

Influencing the planner

When the planner makes a bad choice, the right response is almost always to fix the information it works with rather than override its decisions. I cannot stress this enough. Fix the input, not the output.

Run ANALYZE

The first thing to try. If statistics are stale, updating them is the simplest and most effective fix. For tables that change rapidly, lower autovacuum_analyze_scale_factor so ANALYZE runs more often.

Tune cost parameters

If the planner consistently undervalues index scans, check random_page_cost. On SSDs, lowering it from 4.0 to 1.1 often fixes plans that should use indexes but choose sequential scans instead. Adjust effective_cache_size to reflect the actual amount of memory available for caching — a higher value makes the planner more optimistic about data being in cache.

Create extended statistics

For correlated columns, CREATE STATISTICS (PostgreSQL 10+) tells the planner to track correlations between specific column groups. This fixes estimation errors from the independence assumption without any change to the query.

Disable plan types for diagnosis

The enable_seqscan, enable_nestloop, enable_hashjoin, and similar settings let you disable specific plan strategies to see if an alternative plan is faster. This is a diagnostic tool — use it to confirm your hypothesis, then fix the underlying cause so the planner chooses the right plan on its own.

SQL
-- Disable specific plan types to force alternatives (debugging only)
SET enable_seqscan = off;       -- force index scans when possible
SET enable_nestloop = off;      -- force hash or merge joins
SET enable_hashjoin = off;      -- force nested loop or merge joins

-- These are diagnostic tools, not production settings.
-- Use them to confirm that a different plan is faster,
-- then fix the underlying cause (statistics, indexes, cost params).

-- Reset to defaults
RESET enable_seqscan;
RESET enable_nestloop;
RESET enable_hashjoin;

pg_hint_plan as a last resort

The pg_hint_plan extension adds Oracle-style hints to PostgreSQL, letting you specify index choices, join methods, and join order directly. It is a legitimate tool for cases where the planner genuinely cannot be given enough information to make the right choice — complex queries with multiple layers of correlation, or queries where the optimal plan depends on runtime parameter values that statistics cannot capture. But it should be a last resort. Overriding the planner's decisions rather than improving its information is the infrastructural equivalent of doing someone's job for them instead of training them properly. Hints are brittle: they break when tables are renamed, indexes are dropped, or data distributions change. Fix the statistics first.

How Gold Lapel relates

Gold Lapel does not replace the planner, nor would I presume to. The planner is exceptionally good at its job. What Gold Lapel does is ensure the planner has the best possible information to work with.

Operating as a query proxy between your application and PostgreSQL, Gold Lapel sees every query before it reaches the planner and observes the results after execution. Over time, it builds a model of your workload — which queries run often, which are slow, and where the planner's estimates diverge from reality.

When Gold Lapel detects a query pattern that consistently produces suboptimal plans — high execution times, row estimate mismatches, sequential scans on large tables — it applies complementary optimizations. It may create a materialized view that pre-computes an expensive join, recommend an index that the planner would use if it existed, or restructure a query to give the planner better information to work with. These are not overrides. They improve the landscape the planner surveys — better indexes, pre-computed results, more favorable query structures — so the planner arrives at a good plan on its own. The distinction matters. A well-informed planner needs no hints.

Frequently asked questions