PostgreSQL Query Plan Nodes: What Each One Means
The illustrator submitted a comprehensive catalog of every brushstroke in his repertoire. We commissioned an artwork. He delivered a reference manual. It is, I concede, a useful one.
How to use this reference
I should like to conduct a brief inventory, if you don't mind.
This is a companion to the visual guide to reading EXPLAIN output. That article teaches you to read plans — how to interpret costs, actual times, loops, and row counts. This article catalogs every node you will encounter and tells you what each one means.
Each node entry follows a consistent structure:
- What it does — the operation the node performs
- When the planner chooses it — the conditions that make this node the cheapest option
- When it is normal — situations where this node is expected and healthy
- When it is a concern — situations where this node indicates a performance problem
- EXPLAIN snippet — a realistic fragment showing the node in context
Nodes are organized by category: scan nodes, join nodes, aggregation nodes, sort and limit nodes, and utility nodes.
You do not need to read this top-to-bottom. Use it as a reference when you encounter an unfamiliar node in a plan. For optimization strategies based on what you find, see the comprehensive EXPLAIN ANALYZE guide.
Scan nodes — how PostgreSQL reads tables
Scan nodes are the leaf nodes of every query plan. They read data from tables, indexes, functions, or subqueries. Every plan starts with at least one scan node.
Seq Scan (Sequential Scan)
What it does: Reads every row in the table from beginning to end, applying any filter conditions after fetching each row.
When the planner chooses it: When no suitable index exists, when the query returns a large fraction of the table, or when the table is small enough that sequential access is faster than index overhead.
When it is a concern: Large tables where the filter removes 90% or more of the rows. The Rows Removed by Filter count is the diagnostic signal.
Seq Scan on orders (cost=0.00..25432.00 rows=87 width=64) (actual time=0.031..142.650 rows=92 loops=1)
Filter: (status = 'cancelled'::text)
Rows Removed by Filter: 1048476 Fix: Create an index on the filtered column(s):
CREATE INDEX idx_orders_status ON orders (status); For more on fixing sequential scans, see the sequential scan fix guide.
Index Scan
What it does: Traverses a B-tree (or other) index to locate matching rows, then fetches the full row from the table heap.
When it is a concern: Watch for a Filter: line in addition to the Index Cond: line. The filter means the index is not fully covering the query's conditions — a composite index may help.
Index Scan using idx_orders_customer_id on orders (cost=0.42..8.44 rows=5 width=64) (actual time=0.028..0.036 rows=5 loops=1)
Index Cond: (customer_id = 4217)
Filter: (status = 'shipped'::text)
Rows Removed by Filter: 3 Index Only Scan
What it does: Reads data entirely from the index without visiting the table heap. The fastest scan type.
When it is a concern: Check the Heap Fetches count. If heap fetches are high, the visibility map is stale — run VACUUM.
Index Only Scan using idx_orders_customer_status on orders (cost=0.42..4.22 rows=5 width=12) (actual time=0.019..0.023 rows=5 loops=1)
Index Cond: (customer_id = 4217)
Heap Fetches: 0 Index Only Scan using idx_orders_customer_status on orders (cost=0.42..4.22 rows=5 width=12) (actual time=0.052..0.071 rows=5 loops=1)
Index Cond: (customer_id = 4217)
Heap Fetches: 5 Bitmap Index Scan + Bitmap Heap Scan
What it does: A two-phase scan. Phase 1 traverses the index and builds a bitmap of matching row locations. Phase 2 reads those table pages in physical order.
When it is a concern: The Heap Blocks line can show lossy blocks, meaning the bitmap exceeded work_mem.
Bitmap Heap Scan on orders (cost=52.18..3412.72 rows=2400 width=64) (actual time=1.205..8.340 rows=2387 loops=1)
Recheck Cond: (amount > 1000)
Heap Blocks: exact=1842
-> Bitmap Index Scan on idx_orders_amount (cost=0.00..51.58 rows=2400 width=0) (actual time=0.892..0.892 rows=2387 loops=1)
Index Cond: (amount > 1000) BitmapAnd and BitmapOr: PostgreSQL can combine bitmaps from multiple indexes:
Bitmap Heap Scan on orders (cost=104.36..6825.44 rows=4800 width=64)
-> BitmapOr (cost=104.36..104.36 rows=4800 width=0)
-> Bitmap Index Scan on idx_orders_status (cost=0.00..51.58 rows=2400 width=0)
Index Cond: (status = 'cancelled'::text)
-> Bitmap Index Scan on idx_orders_amount (cost=0.00..51.58 rows=2400 width=0)
Index Cond: (amount > 5000) Function Scan
What it does: Calls a set-returning function and treats its output as a virtual table.
Function Scan on generate_series g (cost=0.00..10.00 rows=1000 width=4) (actual time=0.032..0.187 rows=1000 loops=1) CTE Scan
What it does: Reads from a materialized Common Table Expression (a WITH clause). In PostgreSQL 12+, CTEs that are referenced only once and have no side effects are automatically inlined.
Subquery Scan
What it does: Wraps a subquery result so the outer query can reference it. Often a structural node that the planner adds for correctness with minimal cost on its own.
Join nodes — how PostgreSQL combines tables
PostgreSQL supports three join algorithms: Nested Loop, Hash Join, and Merge Join. The planner selects the cheapest option based on row estimates, available indexes, and table sizes.
Nested Loop
What it does: For each row from the outer input, scans the inner input for matching rows. This is the simplest join strategy.
When it is a concern: A high loops count on the inner node — loops=10000 or more means the inner side is being scanned thousands of times.
Nested Loop (cost=0.42..84.65 rows=5 width=128) (actual time=0.035..0.089 rows=5 loops=1)
-> Index Scan using idx_orders_customer_id on orders (cost=0.42..8.44 rows=5 width=64) (actual time=0.028..0.036 rows=5 loops=1)
Index Cond: (customer_id = 4217)
-> Index Scan using customers_pkey on customers (cost=0.29..0.31 rows=1 width=64) (actual time=0.008..0.008 rows=1 loops=5)
Index Cond: (id = orders.customer_id) Hash Join
What it does: Builds an in-memory hash table from the smaller input, then probes it with each row from the larger input.
When it is a concern: Check Batches. Batches: 1 means the hash table fit in memory. Batches: N where N > 1 means it spilled to disk.
Hash Join (cost=3245.00..18726.00 rows=50000 width=128) (actual time=42.150..198.340 rows=49873 loops=1)
Hash Cond: (orders.customer_id = customers.id)
-> Seq Scan on orders (cost=0.00..12500.00 rows=500000 width=64) (actual time=0.018..62.340 rows=500000 loops=1)
-> Hash (cost=2145.00..2145.00 rows=50000 width=64) (actual time=41.920..41.920 rows=50000 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 4612kB
-> Seq Scan on customers (cost=0.00..2145.00 rows=50000 width=64) (actual time=0.012..18.340 rows=50000 loops=1) Fix for spilling:
SET work_mem = '256MB'; Merge Join
What it does: Walks through both inputs in sorted order simultaneously, matching rows as it goes. Very efficient when the input is pre-sorted.
When it is a concern: If the planner adds a Sort node below the Merge Join, the sort cost may negate the merge advantage.
Merge Join (cost=0.85..8245.20 rows=50000 width=128) (actual time=0.042..78.650 rows=49873 loops=1)
Merge Cond: (orders.customer_id = customers.id)
-> Index Scan using idx_orders_customer_id on orders (cost=0.42..4120.42 rows=500000 width=64) (actual time=0.022..42.310 rows=500000 loops=1)
-> Index Scan using customers_pkey on customers (cost=0.29..2145.29 rows=50000 width=64) (actual time=0.015..12.890 rows=50000 loops=1) Aggregation and grouping nodes
Aggregate
What it does: Computes aggregate functions over the entire input set. Used when there is no GROUP BY clause.
Aggregate (cost=12500.00..12500.01 rows=1 width=8) (actual time=68.420..68.420 rows=1 loops=1)
-> Seq Scan on orders (cost=0.00..12500.00 rows=500000 width=0) (actual time=0.012..42.150 rows=500000 loops=1) HashAggregate
What it does: Groups rows by hashing the GROUP BY key values. Each unique combination gets its own entry in a hash table.
When it is a concern: Check for Batches greater than 1 or Disk Usage — the hash table exceeded work_mem.
HashAggregate (cost=15000.00..15050.00 rows=5000 width=16) (actual time=112.340..113.890 rows=5000 loops=1)
Group Key: region
Batches: 1 Memory Usage: 640kB
-> Seq Scan on orders (cost=0.00..12500.00 rows=500000 width=12) (actual time=0.012..52.340 rows=500000 loops=1) GroupAggregate
What it does: Groups rows that arrive pre-sorted by the GROUP BY keys. A streaming operation that uses constant memory regardless of the number of groups.
GroupAggregate (cost=0.42..18234.50 rows=50000 width=16) (actual time=0.038..142.670 rows=50000 loops=1)
Group Key: customer_id
-> Index Scan using idx_orders_customer_id on orders (cost=0.42..16234.42 rows=500000 width=12) (actual time=0.022..98.340 rows=500000 loops=1) Sort and limit nodes
Sort
Sort methods in EXPLAIN ANALYZE:
quicksort — sorted entirely in memory. The normal, efficient case:
Sort (cost=1425.00..1427.50 rows=1000 width=64) (actual time=8.120..8.340 rows=1000 loops=1)
Sort Key: created_at DESC
Sort Method: quicksort Memory: 102kB top-N heapsort — optimized for LIMIT queries:
Sort (cost=1425.00..1427.50 rows=1000 width=64) (actual time=6.280..6.290 rows=10 loops=1)
Sort Key: created_at DESC
Sort Method: top-N heapsort Memory: 26kB external merge — the data exceeded work_mem and spilled to disk:
Sort (cost=142500.00..143750.00 rows=500000 width=64) (actual time=1842.120..2105.340 rows=500000 loops=1)
Sort Key: created_at DESC
Sort Method: external merge Disk: 35840kB Incremental Sort (PostgreSQL 13+)
What it does: Exploits partially sorted input. When data is already sorted by the first key but not subsequent keys, Incremental Sort groups rows by the first key and sorts each group independently.
Incremental Sort (cost=0.46..28450.62 rows=500000 width=64) (actual time=0.068..142.340 rows=500000 loops=1)
Sort Key: customer_id, created_at DESC
Presorted Key: customer_id
Full-sort Groups: 12500 Sort Method: quicksort Average Memory: 8kB Peak Memory: 12kB Limit
Key insight: Limit pushes down through the plan tree. A Limit above a Sort enables top-N heapsort. A Limit above a Nested Loop stops the loop early.
Limit (cost=0.42..4.22 rows=10 width=64) (actual time=0.028..0.052 rows=10 loops=1)
-> Index Scan using idx_orders_created_at on orders (cost=0.42..16234.42 rows=500000 width=64) (actual time=0.025..0.048 rows=10 loops=1) Unique
What it does: Removes consecutive duplicate rows from sorted input.
Unique (cost=0.42..14234.42 rows=50000 width=4) (actual time=0.025..72.340 rows=50000 loops=1)
-> Index Only Scan using idx_orders_customer_id on orders (cost=0.42..12734.42 rows=500000 width=4) (actual time=0.022..58.120 rows=500000 loops=1) Utility nodes
Materialize
What it does: Reads the entire result of its child node into memory so the result can be re-scanned without re-executing the child plan.
Nested Loop (cost=0.00..245.50 rows=500 width=128) (actual time=0.045..1.230 rows=500 loops=1)
-> Seq Scan on categories (cost=0.00..1.10 rows=10 width=64) (actual time=0.012..0.015 rows=10 loops=1)
-> Materialize (cost=0.00..22.75 rows=50 width=64) (actual time=0.003..0.012 rows=50 loops=10)
-> Seq Scan on products (cost=0.00..22.50 rows=50 width=64) (actual time=0.008..0.042 rows=50 loops=1) Append
What it does: Concatenates results from multiple child plans. The plan-level implementation of UNION ALL. Standard for partitioned tables.
Append (cost=0.00..45.00 rows=1500 width=64) (actual time=0.018..2.340 rows=1487 loops=1)
-> Seq Scan on orders_2026q1 (cost=0.00..15.00 rows=500 width=64) (actual time=0.015..0.780 rows=492 loops=1)
Filter: (created_at >= '2026-01-01' AND created_at < '2026-04-01')
-> Seq Scan on orders_2026q2 (cost=0.00..15.00 rows=500 width=64) (actual time=0.010..0.820 rows=498 loops=1)
Filter: (created_at >= '2026-01-01' AND created_at < '2026-04-01')
-> Seq Scan on orders_2026q3 (cost=0.00..15.00 rows=500 width=64) (actual time=0.008..0.740 rows=497 loops=1)
Filter: (created_at >= '2026-01-01' AND created_at < '2026-04-01') Memoize (PostgreSQL 14+)
What it does: Caches results of the inner side of a Nested Loop, keyed by the join parameter. If the same parameter value appears again, Memoize returns the cached result.
Nested Loop (cost=0.46..1245.30 rows=5000 width=128) (actual time=0.042..12.340 rows=5000 loops=1)
-> Seq Scan on orders (cost=0.00..125.00 rows=5000 width=64) (actual time=0.012..1.230 rows=5000 loops=1)
-> Memoize (cost=0.30..0.32 rows=1 width=64) (actual time=0.001..0.001 rows=1 loops=5000)
Cache Key: orders.customer_id
Cache Mode: logical
Hits: 4800 Misses: 200 Evictions: 0 Overflows: 0 Memory Usage: 26kB
-> Index Scan using customers_pkey on customers (cost=0.29..0.31 rows=1 width=64) (actual time=0.005..0.005 rows=1 loops=200)
Index Cond: (id = orders.customer_id) Gather / Gather Merge
What it does: Collects results from parallel worker processes. Gather collects in arbitrary order; Gather Merge preserves sort order.
Gather (cost=1000.00..25432.00 rows=500000 width=64) (actual time=0.452..142.340 rows=500000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on orders (cost=0.00..12432.00 rows=208333 width=64) (actual time=0.022..52.340 rows=166667 loops=3) Result
What it does: Returns a computed result without scanning any table. Used for SELECT 1, SELECT now(), etc.
Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1) Reading multi-node plans — putting it together
Now that you are acquainted with the individual nodes, allow me to address how they compose into full plans.
A query plan is a tree. Data flows from leaf nodes (scan nodes at the deepest indentation) through intermediate nodes (joins, sorts, aggregations) to the root node, which produces the final result.
Reading order: Start at the most-indented nodes — these are the first operations to execute. Follow the data flow upward toward the root.
Identifying bottlenecks: For each node, calculate its real total time:
- Take the
actual timetotal (the second number) - Multiply by
loops - Subtract the time consumed by child nodes
The node with the largest value is where optimization effort will have the most impact.
When the plan looks fine but the query is slow: Check Planning Time. Complex queries can spend significant time in the planning phase. If Planning Time is high relative to Execution Time, consider simplifying the query structure.
For a detailed walkthrough of reading full plans, see the visual guide to reading EXPLAIN output. For optimization strategies, see the EXPLAIN ANALYZE guide.
What Gold Lapel reads in every plan
Gold Lapel's proxy processes query plans using exactly the vocabulary this reference teaches. When the proxy reports that a Hash Join spilled to disk, that a Nested Loop ran 50,000 iterations, or that a Seq Scan removed 99% of rows by filter, you know what those findings mean because you have read their entries above.
Understanding plan nodes makes Gold Lapel's observations interpretable. The proxy does not operate in a black box — it speaks the same language as EXPLAIN ANALYZE.