PostgreSQL Query Plan Nodes: What Each One Means

Q: How do I know which plan node is causing my query to be slow?

Look for the node with the highest actual time spread (total time minus startup time, multiplied by loops). Also check for nodes with large Rows Removed by Filter counts, Disk-based sorts (external merge), or Hash Joins with Batches > 1. If you subtract the time consumed by child nodes, the remainder is the time attributable to that specific node — the node with the largest remainder is the bottleneck.

Q: What is the difference between Index Scan and Index Only Scan?

An Index Scan traverses the index to find matching rows, then visits the table heap to fetch the full row data. An Index Only Scan reads all needed data directly from the index without visiting the table — it is faster but only possible when the index contains every column the query needs (a covering index) and the table's visibility map is up to date. Check the Heap Fetches count: zero means perfect; a high number means VACUUM is needed.

Q: When does PostgreSQL use Hash Join vs Nested Loop vs Merge Join?

The planner estimates the cost of each join strategy and picks the cheapest. General patterns: Nested Loop when one side is small and the other has an index; Hash Join when both sides are large with no useful join index; Merge Join when both sides are large and pre-sorted by the join key. Hash Join requires an equality condition. Nested Loop works with any join condition including inequality and range predicates. Merge Join requires sortable conditions (equality or range on the join key).

Q: What does Materialize mean in a query plan?

Materialize reads the entire child result into memory (or disk if the result is large) so it can be re-scanned efficiently. The planner adds it when a Nested Loop's inner side would otherwise be re-executed for each outer row. Small materializations in memory are routine. Large materializations that spill to disk suggest a different join strategy might be better.

Q: Why does my query plan show Gather with Workers Launched: 0?

PostgreSQL planned to use parallel workers but none were available at execution time. The max_parallel_workers setting controls the system-wide pool, and max_parallel_workers_per_gather controls how many a single query can use. If other concurrent queries consumed all available workers, your query runs single-threaded as a fallback.

The illustrator submitted a comprehensive catalog of every brushstroke in his repertoire. We commissioned an artwork. He delivered a reference manual. It is, I concede, a useful one.

March 27, 2026 · 20 min read

The illustrator submitted a comprehensive catalog of every brushstroke in his repertoire — Broad Wash, Fine Detail, Dry Brush, Hash Stipple — with notes on when each appears and when each requires attention. We commissioned an artwork. He delivered a reference manual. It is, I concede, a useful one.

How to use this reference

I should like to conduct a brief inventory, if you don't mind.

This is a companion to the visual guide to reading EXPLAIN output. That article teaches you to read plans — how to interpret costs, actual times, loops, and row counts. This article catalogs every node you will encounter and tells you what each one means.

Each node entry follows a consistent structure:

What it does — the operation the node performs
When the planner chooses it — the conditions that make this node the cheapest option
When it is normal — situations where this node is expected and healthy
When it is a concern — situations where this node indicates a performance problem
EXPLAIN snippet — a realistic fragment showing the node in context

Nodes are organized by category: scan nodes, join nodes, aggregation nodes, sort and limit nodes, and utility nodes.

You do not need to read this top-to-bottom. Use it as a reference when you encounter an unfamiliar node in a plan. For optimization strategies based on what you find, see the comprehensive EXPLAIN ANALYZE guide.

Scan nodes — how PostgreSQL reads tables

Scan nodes are the leaf nodes of every query plan. They read data from tables, indexes, functions, or subqueries. Every plan starts with at least one scan node.

Seq Scan (Sequential Scan)

What it does: Reads every row in the table from beginning to end, applying any filter conditions after fetching each row.

When the planner chooses it: When no suitable index exists, when the query returns a large fraction of the table, or when the table is small enough that sequential access is faster than index overhead.

When it is a concern: Large tables where the filter removes 90% or more of the rows. The Rows Removed by Filter count is the diagnostic signal.

Seq Scan reading 1M+ rows to find 92

Seq Scan on orders  (cost=0.00..25432.00 rows=87 width=64) (actual time=0.031..142.650 rows=92 loops=1)
  Filter: (status = 'cancelled'::text)
  Rows Removed by Filter: 1048476

Fix: Create an index on the filtered column(s):

Fix — add an index

CREATE INDEX idx_orders_status ON orders (status);

For more on fixing sequential scans, see the sequential scan fix guide.

Index Scan

What it does: Traverses a B-tree (or other) index to locate matching rows, then fetches the full row from the table heap.

When it is a concern: Watch for a Filter: line in addition to the Index Cond: line. The filter means the index is not fully covering the query's conditions — a composite index may help.

Index Scan with post-fetch filter

Index Scan using idx_orders_customer_id on orders  (cost=0.42..8.44 rows=5 width=64) (actual time=0.028..0.036 rows=5 loops=1)
  Index Cond: (customer_id = 4217)
  Filter: (status = 'shipped'::text)
  Rows Removed by Filter: 3

Index Only Scan

What it does: Reads data entirely from the index without visiting the table heap. The fastest scan type.

When it is a concern: Check the Heap Fetches count. If heap fetches are high, the visibility map is stale — run VACUUM.

Index Only Scan — zero heap fetches (ideal)

Index Only Scan using idx_orders_customer_status on orders  (cost=0.42..4.22 rows=5 width=12) (actual time=0.019..0.023 rows=5 loops=1)
  Index Cond: (customer_id = 4217)
  Heap Fetches: 0

Index Only Scan — degraded (needs VACUUM)

Index Only Scan using idx_orders_customer_status on orders  (cost=0.42..4.22 rows=5 width=12) (actual time=0.052..0.071 rows=5 loops=1)
  Index Cond: (customer_id = 4217)
  Heap Fetches: 5

Bitmap Index Scan + Bitmap Heap Scan

What it does: A two-phase scan. Phase 1 traverses the index and builds a bitmap of matching row locations. Phase 2 reads those table pages in physical order.

When it is a concern: The Heap Blocks line can show lossy blocks, meaning the bitmap exceeded work_mem.

Bitmap scan — two-phase operation

Bitmap Heap Scan on orders  (cost=52.18..3412.72 rows=2400 width=64) (actual time=1.205..8.340 rows=2387 loops=1)
  Recheck Cond: (amount > 1000)
  Heap Blocks: exact=1842
  ->  Bitmap Index Scan on idx_orders_amount  (cost=0.00..51.58 rows=2400 width=0) (actual time=0.892..0.892 rows=2387 loops=1)
        Index Cond: (amount > 1000)

BitmapAnd and BitmapOr: PostgreSQL can combine bitmaps from multiple indexes:

BitmapOr — combining two index scans

Bitmap Heap Scan on orders  (cost=104.36..6825.44 rows=4800 width=64)
  ->  BitmapOr  (cost=104.36..104.36 rows=4800 width=0)
        ->  Bitmap Index Scan on idx_orders_status  (cost=0.00..51.58 rows=2400 width=0)
              Index Cond: (status = 'cancelled'::text)
        ->  Bitmap Index Scan on idx_orders_amount  (cost=0.00..51.58 rows=2400 width=0)
              Index Cond: (amount > 5000)

Function Scan

What it does: Calls a set-returning function and treats its output as a virtual table.

Function Scan on generate_series

Function Scan on generate_series g  (cost=0.00..10.00 rows=1000 width=4) (actual time=0.032..0.187 rows=1000 loops=1)

CTE Scan

What it does: Reads from a materialized Common Table Expression (a WITH clause). In PostgreSQL 12+, CTEs that are referenced only once and have no side effects are automatically inlined.

Subquery Scan

What it does: Wraps a subquery result so the outer query can reference it. Often a structural node that the planner adds for correctness with minimal cost on its own.

Join nodes — how PostgreSQL combines tables

PostgreSQL supports three join algorithms: Nested Loop, Hash Join, and Merge Join. The planner selects the cheapest option based on row estimates, available indexes, and table sizes.

Nested Loop

What it does: For each row from the outer input, scans the inner input for matching rows. This is the simplest join strategy.

When it is a concern: A high loops count on the inner node — loops=10000 or more means the inner side is being scanned thousands of times.

Nested Loop with index-backed inner lookup

Nested Loop  (cost=0.42..84.65 rows=5 width=128) (actual time=0.035..0.089 rows=5 loops=1)
  ->  Index Scan using idx_orders_customer_id on orders  (cost=0.42..8.44 rows=5 width=64) (actual time=0.028..0.036 rows=5 loops=1)
        Index Cond: (customer_id = 4217)
  ->  Index Scan using customers_pkey on customers  (cost=0.29..0.31 rows=1 width=64) (actual time=0.008..0.008 rows=1 loops=5)
        Index Cond: (id = orders.customer_id)

Hash Join

What it does: Builds an in-memory hash table from the smaller input, then probes it with each row from the larger input.

When it is a concern: Check Batches. Batches: 1 means the hash table fit in memory. Batches: N where N > 1 means it spilled to disk.

Hash Join — hash table in memory

Hash Join  (cost=3245.00..18726.00 rows=50000 width=128) (actual time=42.150..198.340 rows=49873 loops=1)
  Hash Cond: (orders.customer_id = customers.id)
  ->  Seq Scan on orders  (cost=0.00..12500.00 rows=500000 width=64) (actual time=0.018..62.340 rows=500000 loops=1)
  ->  Hash  (cost=2145.00..2145.00 rows=50000 width=64) (actual time=41.920..41.920 rows=50000 loops=1)
        Buckets: 65536  Batches: 1  Memory Usage: 4612kB
        ->  Seq Scan on customers  (cost=0.00..2145.00 rows=50000 width=64) (actual time=0.012..18.340 rows=50000 loops=1)

Fix for spilling:

Increase work_mem

SET work_mem = '256MB';

Merge Join

What it does: Walks through both inputs in sorted order simultaneously, matching rows as it goes. Very efficient when the input is pre-sorted.

When it is a concern: If the planner adds a Sort node below the Merge Join, the sort cost may negate the merge advantage.

Merge Join — both sides pre-sorted by index

Merge Join  (cost=0.85..8245.20 rows=50000 width=128) (actual time=0.042..78.650 rows=49873 loops=1)
  Merge Cond: (orders.customer_id = customers.id)
  ->  Index Scan using idx_orders_customer_id on orders  (cost=0.42..4120.42 rows=500000 width=64) (actual time=0.022..42.310 rows=500000 loops=1)
  ->  Index Scan using customers_pkey on customers  (cost=0.29..2145.29 rows=50000 width=64) (actual time=0.015..12.890 rows=50000 loops=1)

Aggregation and grouping nodes

Aggregate

What it does: Computes aggregate functions over the entire input set. Used when there is no GROUP BY clause.

Simple Aggregate — COUNT(*)

Aggregate  (cost=12500.00..12500.01 rows=1 width=8) (actual time=68.420..68.420 rows=1 loops=1)
  ->  Seq Scan on orders  (cost=0.00..12500.00 rows=500000 width=0) (actual time=0.012..42.150 rows=500000 loops=1)

HashAggregate

What it does: Groups rows by hashing the GROUP BY key values. Each unique combination gets its own entry in a hash table.

When it is a concern: Check for Batches greater than 1 or Disk Usage — the hash table exceeded work_mem.

HashAggregate — 5,000 groups in memory

HashAggregate  (cost=15000.00..15050.00 rows=5000 width=16) (actual time=112.340..113.890 rows=5000 loops=1)
  Group Key: region
  Batches: 1  Memory Usage: 640kB
  ->  Seq Scan on orders  (cost=0.00..12500.00 rows=500000 width=12) (actual time=0.012..52.340 rows=500000 loops=1)

GroupAggregate

What it does: Groups rows that arrive pre-sorted by the GROUP BY keys. A streaming operation that uses constant memory regardless of the number of groups.

GroupAggregate — pre-sorted by index

GroupAggregate  (cost=0.42..18234.50 rows=50000 width=16) (actual time=0.038..142.670 rows=50000 loops=1)
  Group Key: customer_id
  ->  Index Scan using idx_orders_customer_id on orders  (cost=0.42..16234.42 rows=500000 width=12) (actual time=0.022..98.340 rows=500000 loops=1)

Sort and limit nodes

Sort

Sort methods in EXPLAIN ANALYZE:

quicksort — sorted entirely in memory. The normal, efficient case:

In-memory quicksort

Sort  (cost=1425.00..1427.50 rows=1000 width=64) (actual time=8.120..8.340 rows=1000 loops=1)
  Sort Key: created_at DESC
  Sort Method: quicksort  Memory: 102kB

top-N heapsort — optimized for LIMIT queries:

Top-N heapsort for LIMIT

Sort  (cost=1425.00..1427.50 rows=1000 width=64) (actual time=6.280..6.290 rows=10 loops=1)
  Sort Key: created_at DESC
  Sort Method: top-N heapsort  Memory: 26kB

external merge — the data exceeded work_mem and spilled to disk:

Sort spilled to disk — 35MB

Sort  (cost=142500.00..143750.00 rows=500000 width=64) (actual time=1842.120..2105.340 rows=500000 loops=1)
  Sort Key: created_at DESC
  Sort Method: external merge  Disk: 35840kB

Incremental Sort (PostgreSQL 13+)

What it does: Exploits partially sorted input. When data is already sorted by the first key but not subsequent keys, Incremental Sort groups rows by the first key and sorts each group independently.

Incremental Sort — 8kB average per group

Incremental Sort  (cost=0.46..28450.62 rows=500000 width=64) (actual time=0.068..142.340 rows=500000 loops=1)
  Sort Key: customer_id, created_at DESC
  Presorted Key: customer_id
  Full-sort Groups: 12500  Sort Method: quicksort  Average Memory: 8kB  Peak Memory: 12kB

Limit

Key insight: Limit pushes down through the plan tree. A Limit above a Sort enables top-N heapsort. A Limit above a Nested Loop stops the loop early.

Limit stops Index Scan after 10 rows

Limit  (cost=0.42..4.22 rows=10 width=64) (actual time=0.028..0.052 rows=10 loops=1)
  ->  Index Scan using idx_orders_created_at on orders  (cost=0.42..16234.42 rows=500000 width=64) (actual time=0.025..0.048 rows=10 loops=1)

Unique

What it does: Removes consecutive duplicate rows from sorted input.

Unique on Index Only Scan

Unique  (cost=0.42..14234.42 rows=50000 width=4) (actual time=0.025..72.340 rows=50000 loops=1)
  ->  Index Only Scan using idx_orders_customer_id on orders  (cost=0.42..12734.42 rows=500000 width=4) (actual time=0.022..58.120 rows=500000 loops=1)

Utility nodes

Materialize

What it does: Reads the entire result of its child node into memory so the result can be re-scanned without re-executing the child plan.

Materialize — Seq Scan replayed 10 times

Nested Loop  (cost=0.00..245.50 rows=500 width=128) (actual time=0.045..1.230 rows=500 loops=1)
  ->  Seq Scan on categories  (cost=0.00..1.10 rows=10 width=64) (actual time=0.012..0.015 rows=10 loops=1)
  ->  Materialize  (cost=0.00..22.75 rows=50 width=64) (actual time=0.003..0.012 rows=50 loops=10)
        ->  Seq Scan on products  (cost=0.00..22.50 rows=50 width=64) (actual time=0.008..0.042 rows=50 loops=1)

Append

What it does: Concatenates results from multiple child plans. The plan-level implementation of UNION ALL. Standard for partitioned tables.

Append — three partitions scanned

Append  (cost=0.00..45.00 rows=1500 width=64) (actual time=0.018..2.340 rows=1487 loops=1)
  ->  Seq Scan on orders_2026q1  (cost=0.00..15.00 rows=500 width=64) (actual time=0.015..0.780 rows=492 loops=1)
        Filter: (created_at >= '2026-01-01' AND created_at < '2026-04-01')
  ->  Seq Scan on orders_2026q2  (cost=0.00..15.00 rows=500 width=64) (actual time=0.010..0.820 rows=498 loops=1)
        Filter: (created_at >= '2026-01-01' AND created_at < '2026-04-01')
  ->  Seq Scan on orders_2026q3  (cost=0.00..15.00 rows=500 width=64) (actual time=0.008..0.740 rows=497 loops=1)
        Filter: (created_at >= '2026-01-01' AND created_at < '2026-04-01')

Memoize (PostgreSQL 14+)

What it does: Caches results of the inner side of a Nested Loop, keyed by the join parameter. If the same parameter value appears again, Memoize returns the cached result.

Memoize — 4,800 cache hits, 200 misses

Nested Loop  (cost=0.46..1245.30 rows=5000 width=128) (actual time=0.042..12.340 rows=5000 loops=1)
  ->  Seq Scan on orders  (cost=0.00..125.00 rows=5000 width=64) (actual time=0.012..1.230 rows=5000 loops=1)
  ->  Memoize  (cost=0.30..0.32 rows=1 width=64) (actual time=0.001..0.001 rows=1 loops=5000)
        Cache Key: orders.customer_id
        Cache Mode: logical
        Hits: 4800  Misses: 200  Evictions: 0  Overflows: 0  Memory Usage: 26kB
        ->  Index Scan using customers_pkey on customers  (cost=0.29..0.31 rows=1 width=64) (actual time=0.005..0.005 rows=1 loops=200)
              Index Cond: (id = orders.customer_id)

Gather / Gather Merge

What it does: Collects results from parallel worker processes. Gather collects in arbitrary order; Gather Merge preserves sort order.

Gather — 2 parallel workers

Gather  (cost=1000.00..25432.00 rows=500000 width=64) (actual time=0.452..142.340 rows=500000 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on orders  (cost=0.00..12432.00 rows=208333 width=64) (actual time=0.022..52.340 rows=166667 loops=3)

Result

What it does: Returns a computed result without scanning any table. Used for SELECT 1, SELECT now(), etc.

Result — constant expression

Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1)

Reading multi-node plans — putting it together

Now that you are acquainted with the individual nodes, allow me to address how they compose into full plans.

A query plan is a tree. Data flows from leaf nodes (scan nodes at the deepest indentation) through intermediate nodes (joins, sorts, aggregations) to the root node, which produces the final result.

Reading order: Start at the most-indented nodes — these are the first operations to execute. Follow the data flow upward toward the root.

Identifying bottlenecks: For each node, calculate its real total time:

Take the actual time total (the second number)
Multiply by loops
Subtract the time consumed by child nodes

The node with the largest value is where optimization effort will have the most impact.

When the plan looks fine but the query is slow: Check Planning Time. Complex queries can spend significant time in the planning phase. If Planning Time is high relative to Execution Time, consider simplifying the query structure.

For a detailed walkthrough of reading full plans, see the visual guide to reading EXPLAIN output. For optimization strategies, see the EXPLAIN ANALYZE guide.

What Gold Lapel reads in every plan

Gold Lapel's proxy processes query plans using exactly the vocabulary this reference teaches. When the proxy reports that a Hash Join spilled to disk, that a Nested Loop ran 50,000 iterations, or that a Seq Scan removed 99% of rows by filter, you know what those findings mean because you have read their entries above.

Understanding plan nodes makes Gold Lapel's observations interpretable. The proxy does not operate in a black box — it speaks the same language as EXPLAIN ANALYZE.