← How-To

Reading PostgreSQL EXPLAIN Output: A Visual Guide

The sketch arrived as a deeply indented tree. The innermost brushstrokes were applied first. Data flows upward from the leaves. I am told this is how one reads it.

March 27, 2026 · 18 min read
The sketch arrived as a deeply indented tree. The innermost brushstrokes were applied first. Data flows upward from the leaves. I am told this is how one reads it.

The 60-second orientation

Ah. You ran EXPLAIN ANALYZE and received a wall of text. I had rather hoped we might avoid this, but since we are here, allow me to walk you through it one field at a time.

The essential mental model: a query plan is a tree. The innermost (most-indented) nodes execute first. Data flows upward from leaf nodes to the root. The root node produces the final result set.

Every node has:

  • A type — what it does (Seq Scan, Index Scan, Hash Join, Sort, etc.)
  • A cost estimate — what the planner predicted
  • Actual execution numbers — what really happened (only with ANALYZE)

You read a plan from the inside out, bottom to top. The most indented lines are the first operations; the least indented line is the final operation that produces the query result.

This guide covers 6 real query plans, progressively more complex. Each plan builds on concepts from the previous one. By Plan #6, every field in the output will be familiar.

This guide teaches you to read the output itself — what each field means and how the pieces fit together. For optimization strategies based on what you find, see the comprehensive EXPLAIN ANALYZE guide.

Plan #1 — the sequential scan

We begin with the simplest case. A query on a table with 100,000 rows:

SQL
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@example.com';

Output:

EXPLAIN ANALYZE output
Seq Scan on users  (cost=0.00..1520.00 rows=1 width=128) (actual time=0.015..12.340 rows=1 loops=1)
  Filter: (email = 'alice@example.com'::text)
  Rows Removed by Filter: 99999
Planning Time: 0.085 ms
Execution Time: 12.456 ms

Field-by-field breakdown:

Seq Scan on users — the node type and the table it operates on. A Sequential Scan reads every row in the table from the first page to the last.

cost=0.00..1520.00 — the planner's estimated cost. The first number (0.00) is the startup cost — how much work before the first row can be returned. The second number (1520.00) is the total cost — the estimated cost to complete the entire operation. These are in planner cost units, not milliseconds.

rows=1 (in the cost parentheses) — the planner's estimate of how many rows this node will return.

width=128 — the planner's estimate of the average row width in bytes.

actual time=0.015..12.340 — the real execution time in milliseconds. The first number is the time to the first row returned. The second number is the time to complete the operation.

rows=1 (in the actual parentheses) — the actual number of rows returned. In this case, the planner's estimate matched reality.

loops=1 — how many times this node executed. For a standalone query, this is always 1. In joins, inner nodes may execute many times.

Filter: (email = 'alice@example.com'::text) — the condition applied during the scan. The database read every row and checked this condition on each one.

Rows Removed by Filter: 99999 — the rows that were read but did not match the filter. The database read 100,000 rows to find 1.

Planning Time: 0.085 ms — time the planner spent choosing this execution plan.

Execution Time: 12.456 ms — total wall-clock time to execute the query. This is the number that matters for application performance.

The concern here is plain: scanning 100,000 rows to find 1. This table needs an index on the email column.

What the cost numbers actually mean

Costs are not milliseconds. They are arbitrary units based on the planner's cost model — sequential page reads, random page reads, CPU processing per row, and CPU processing per operator.

cost=0.00..1520.00 means:

  • Startup cost 0.00 — no work needed before the first row can be returned. Seq Scans can return the first row immediately.
  • Total cost 1520.00 — the estimated total cost to read the entire table and apply the filter.

Startup cost matters for queries with LIMIT. A node with a high startup cost (like a Sort) must finish a substantial amount of work before returning the first row. A node with a low startup cost (like a Seq Scan) can return rows immediately.

You compare costs between alternative plans — the planner does this internally to choose the cheapest plan. The relationship between cost and actual time should be correlated but is not linear.

Plan #2 — the index scan

Now the same query, after we attend to the missing index:

SQL
CREATE INDEX idx_users_email ON users(email);
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@example.com';

Output:

EXPLAIN ANALYZE output
Index Scan using idx_users_email on users  (cost=0.42..8.44 rows=1 width=128) (actual time=0.025..0.027 rows=1 loops=1)
  Index Cond: (email = 'alice@example.com'::text)
Planning Time: 0.092 ms
Execution Time: 0.045 ms

The improvement is substantial:

Index Scan using idx_users_email on users — the planner chose to use the B-tree index instead of scanning the entire table.

cost=0.42..8.44 — dramatically lower than the Seq Scan's 1520. The startup cost (0.42) is the cost of descending the B-tree to find the first matching entry.

actual time=0.025..0.027 — 0.027ms versus 12.340ms. Approximately 450x faster.

Index Cond: (email = 'alice@example.com'::text) — the condition that drove the index lookup. This is different from Filter. Index Cond means the index was used to locate the matching rows directly. No unnecessary rows were read.

The critical distinction: Index Cond vs Filter. Index Cond means the index handled this condition — only matching rows were fetched. Filter means rows were fetched first and then checked — non-matching rows were read and discarded.

When you see both on the same node:

Index Cond + Filter on the same node
Index Scan using idx_orders_customer_id on orders  (cost=0.42..520.00 rows=5 width=96) (actual time=0.030..1.250 rows=5 loops=1)
  Index Cond: (customer_id = 42)
  Filter: (status = 'shipped')
  Rows Removed by Filter: 85

This means the index found 90 rows matching customer_id = 42, but 85 of them were rejected by the additional status = 'shipped' filter. Consider a composite index on (customer_id, status) to eliminate the filter entirely.

Plan #3 — the bitmap scan

Now a query that returns multiple rows, where the index covers only part of the condition:

SQL
EXPLAIN ANALYZE SELECT * FROM orders
WHERE customer_id = 42 AND status = 'shipped';

Output (assuming an index only on customer_id):

EXPLAIN ANALYZE output
Bitmap Heap Scan on orders  (cost=4.65..320.15 rows=5 width=96) (actual time=0.150..1.800 rows=5 loops=1)
  Recheck Cond: (customer_id = 42)
  Filter: (status = 'shipped')
  Rows Removed by Filter: 85
  Heap Blocks: exact=15
  ->  Bitmap Index Scan on idx_orders_customer_id  (cost=0.00..4.65 rows=90 width=0) (actual time=0.100..0.100 rows=90 loops=1)
        Index Cond: (customer_id = 42)
Planning Time: 0.110 ms
Execution Time: 1.920 ms

This is a two-phase operation — an inner node and an outer node.

Phase 1: Bitmap Index Scan (the inner, more-indented node). The index scan finds all row locations matching customer_id = 42 and stores them as a bitmap of heap pages. Note rows=90 — 90 rows matched in the index. The width=0 indicates this node returns page locations, not actual row data.

Phase 2: Bitmap Heap Scan (the outer node). The heap scan reads the actual table pages identified by the bitmap, in physical order. It then applies additional conditions:

  • Recheck Cond — re-verifies the index condition on each fetched row.
  • Filter — the second condition is not in the index, so it is applied as a post-fetch filter.
  • Rows Removed by Filter: 85 — 85 of the 90 rows were rejected.
  • Heap Blocks: exact=15 — 15 table blocks were read.

The optimization opportunity is clear: a composite index on (customer_id, status) would eliminate the filter and reduce the rows fetched from 90 to 5.

Lossy vs exact bitmap

Exact bitmap: Heap Blocks: exact=15. The bitmap stored the exact row locations within each page.

Lossy bitmap: When the bitmap exceeds work_mem, PostgreSQL drops from row-level precision to page-level precision for some blocks:

Lossy bitmap scan
Bitmap Heap Scan on orders  (cost=500.00..15000.00 rows=50000 width=96) (actual time=25.000..350.000 rows=48500 loops=1)
  Recheck Cond: (customer_id = ANY ('{1,2,3,4,5}'::integer[]))
  Rows Removed by Index Recheck: 15000
  Heap Blocks: exact=1000 lossy=5000

5,000 blocks used lossy precision, causing 15,000 extra rows to be fetched and rechecked. Raising work_mem allows the bitmap to use exact precision for all blocks.

Plan #4 — the nested loop join

If you'll follow me to a slightly more complex arrangement — a two-table query with LIMIT:

SQL
EXPLAIN ANALYZE SELECT o.id, u.name
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.created_at > '2026-01-01'
LIMIT 10;

Output:

EXPLAIN ANALYZE output
Limit  (cost=0.85..15.30 rows=10 width=36) (actual time=0.050..0.120 rows=10 loops=1)
  ->  Nested Loop  (cost=0.85..14500.00 rows=10000 width=36) (actual time=0.048..0.115 rows=10 loops=1)
        ->  Index Scan using idx_orders_created_at on orders o  (cost=0.42..7250.00 rows=10000 width=16) (actual time=0.030..0.055 rows=10 loops=1)
              Index Cond: (created_at > '2026-01-01'::date)
        ->  Index Scan using users_pkey on users u  (cost=0.42..0.44 rows=1 width=24) (actual time=0.005..0.005 rows=1 loops=10)
              Index Cond: (id = o.user_id)
Planning Time: 0.250 ms
Execution Time: 0.145 ms

Reading top to bottom:

Limit — the outermost node. It stops execution after 10 rows.

Nested Loop — the join strategy. A Nested Loop works like a for-loop: for each row from the outer side, look up the matching row(s) on the inner side.

Index Scan using idx_orders_created_at on orders o — the outer side (the driving table). Because of the LIMIT, it only produced 10 rows before the Limit node stopped it.

Index Scan using users_pkey on users u — the inner side (the lookup table). Notice loops=10 — this node executed 10 times, once for each order row.

How to read actual time with loops > 1. The reported actual time is the per-loop average. actual time=0.005..0.005 rows=1 loops=10 means each individual lookup took about 0.005ms, and the total across all 10 lookups was approximately 0.05ms.

When Nested Loop is a concern: When loops is very high:

High loop count concern
->  Index Scan using users_pkey on users u  (cost=0.42..0.44 rows=1 width=24) (actual time=0.400..0.410 rows=1 loops=100000)
      Index Cond: (id = o.user_id)

loops=100000 means 100,000 individual index lookups. Even at 0.4ms each, that is 40 seconds total. A Hash Join would process the same data in a single pass.

Plan #5 — the hash join

Now an analytical query — two tables joined with aggregation:

SQL
EXPLAIN ANALYZE SELECT u.name, count(*)
FROM orders o
JOIN users u ON u.id = o.user_id
GROUP BY u.name;

Output:

EXPLAIN ANALYZE output
HashAggregate  (cost=3520.00..3570.00 rows=5000 width=20) (actual time=85.000..87.500 rows=4850 loops=1)
  Group Key: u.name
  Batches: 1  Memory Usage: 625kB
  ->  Hash Join  (cost=230.00..3020.00 rows=100000 width=12) (actual time=5.000..55.000 rows=100000 loops=1)
        Hash Cond: (o.user_id = u.id)
        ->  Seq Scan on orders o  (cost=0.00..1640.00 rows=100000 width=4) (actual time=0.010..12.000 rows=100000 loops=1)
        ->  Hash  (cost=155.00..155.00 rows=5000 width=24) (actual time=4.500..4.500 rows=5000 loops=1)
              Buckets: 8192  Batches: 1  Memory Usage: 290kB
              ->  Seq Scan on users u  (cost=0.00..155.00 rows=5000 width=24) (actual time=0.010..2.000 rows=5000 loops=1)
Planning Time: 0.300 ms
Execution Time: 88.500 ms

Reading from the inside out:

Seq Scan on users u — the innermost node. Reads all 5,000 users.

Hash — builds an in-memory hash table from the users rows. Key details:

  • Buckets: 8192 — the hash table has 8,192 buckets
  • Batches: 1 — the entire hash table fit in memory. This is what you want.
  • Memory Usage: 290kB — the hash table uses 290KB of memory

Batches: 1 vs Batches: 4: If Batches is greater than 1, the hash table exceeded work_mem and spilled to disk. This significantly slows the join.

Hash Join — the join strategy. The Hash Join builds a hash table from the smaller table (users), then probes it with each row from the larger table (orders).

HashAggregate — performs the GROUP BY aggregation:

  • Group Key: u.name — the grouping column
  • Batches: 1 — the aggregation fit in memory
  • Memory Usage: 625kB — memory consumed by the aggregation hash table

When you see HashAggregate with disk spill:

HashAggregate disk spill
HashAggregate  (cost=5000.00..5500.00 rows=500000 width=20) (actual time=350.000..500.000 rows=485000 loops=1)
  Group Key: u.name
  Planned Partitions: 4  Peak Memory Usage: 4145kB  Disk Usage: 15360kB

The aggregation has 500,000 distinct groups — too many for work_mem. Consider raising work_mem or adding an index that enables GroupAggregate with sorted access.

Plan #6 — the complex plan (Sort, CTE, Subquery)

And now, if I may, a real-world query that brings several elements together — a CTE, join, sort, and limit:

SQL
EXPLAIN ANALYZE
WITH recent_orders AS (
  SELECT * FROM orders WHERE created_at > now() - interval '30 days'
)
SELECT u.name, ro.total
FROM recent_orders ro
JOIN users u ON u.id = ro.user_id
ORDER BY ro.total DESC
LIMIT 20;

Output:

EXPLAIN ANALYZE output
Limit  (cost=2890.00..2890.05 rows=20 width=28) (actual time=35.000..35.010 rows=20 loops=1)
  ->  Sort  (cost=2890.00..2940.00 rows=20000 width=28) (actual time=34.990..35.000 rows=20 loops=1)
        Sort Key: ro.total DESC
        Sort Method: top-N heapsort  Memory: 27kB
        ->  Hash Join  (cost=230.00..2640.00 rows=20000 width=28) (actual time=5.500..28.000 rows=19500 loops=1)
              Hash Cond: (ro.user_id = u.id)
              ->  CTE Scan on recent_orders ro  (cost=0.00..400.00 rows=20000 width=12) (actual time=0.015..8.000 rows=19500 loops=1)
              ->  Hash  (cost=155.00..155.00 rows=5000 width=24) (actual time=5.000..5.000 rows=5000 loops=1)
                    Buckets: 8192  Batches: 1  Memory Usage: 290kB
                    ->  Seq Scan on users u  (cost=0.00..155.00 rows=5000 width=24) (actual time=0.010..2.500 rows=5000 loops=1)
Planning Time: 0.400 ms
Execution Time: 35.200 ms

Reading from the inside out:

Seq Scan on users u — reads all users.

Hash — builds a hash table from users.

CTE Scan on recent_orders ro — reads from the materialized CTE. In PostgreSQL 12+, CTEs can be inlined when they are referenced only once and have no side effects.

Hash Join — joins the recent orders with users via the hash table.

Sort — sorts the joined results by ro.total DESC. Key details:

  • Sort Key: ro.total DESC — the column and direction
  • Sort Method: top-N heapsort — an optimized sort for LIMIT queries. Instead of sorting all 19,500 rows, it maintains a heap of the top 20.
  • Memory: 27kB — the heap uses a trivial amount of memory

Limit — returns only the top 20 rows.

When you see external merge Disk:

Sort spilled to disk
Sort  (cost=250000.00..252500.00 rows=1000000 width=64) (actual time=4500.000..5200.000 rows=1000000 loops=1)
  Sort Key: created_at DESC
  Sort Method: external merge  Disk: 153600kB

The sort spilled 150MB to disk because the data exceeded work_mem. Three possible fixes:

  1. Raise work_mem to accommodate the sort in memory
  2. Add an index on created_at DESC to provide pre-sorted access, eliminating the Sort node entirely
  3. Restructure the query to reduce the data volume before the sort

The cheat sheet — fields at a glance

For your reference — every field you will encounter in EXPLAIN ANALYZE output, gathered in one place:

Cost and estimates

FieldMeaning
cost=startup..totalPlanner's estimated cost in arbitrary units. Compare between plans, not as absolute time.
rows=N (estimated)Planner's prediction of rows returned by this node
width=NEstimated average row width in bytes

Actual execution (ANALYZE only)

FieldMeaning
actual time=startup..totalReal execution time in milliseconds
rows=N (actual)Real row count returned
loops=NNumber of times this node executed. Multiply actual time x loops for total time.

Scan and filter fields

FieldMeaning
Index Cond:Condition that drove the index lookup — rows were found via the index
Filter:Condition applied after fetching rows — rows were read then checked
Rows Removed by Filter:Rows fetched but rejected by the Filter condition
Recheck Cond:Condition rechecked after a bitmap or lossy index scan

Sort fields

FieldMeaning
Sort Method: quicksortSorted in memory — normal sort
Sort Method: top-N heapsortOptimized in-memory sort for LIMIT queries
Sort Method: external merge Disk: NkBSort spilled to disk — work_mem exceeded

Hash and join fields

FieldMeaning
Hash Cond:The condition used for hash join matching
Batches: 1Hash table fit in memory (good)
Batches: N (N > 1)Hash table spilled to disk (raise work_mem)
Memory Usage:Memory consumed by hash table or sort

Summary fields

FieldMeaning
Planning Time:Time spent choosing the plan (not executing)
Execution Time:Total wall-clock time to execute the query

Buffer fields (with BUFFERS option)

FieldMeaning
Buffers: shared hit=NPages found in PostgreSQL's buffer cache (fast)
Buffers: shared read=NPages read from the operating system (possibly disk)
Buffers: shared dirtied=NPages modified in the buffer cache
Buffers: shared written=NPages written to disk during this query

The three things to check first

When you want a quick assessment — "is this plan sound?" — check these three things in order.

1. Estimated vs actual rows

Compare rows=N in the cost estimate with rows=N in the actual execution for each node. If they differ by 10x or more, the planner is making decisions based on wrong information.

5,000x row estimate mismatch
->  Index Scan using idx_orders_status on orders  (cost=0.42..8.44 rows=10 width=96) (actual time=0.030..450.000 rows=50000 loops=1)

The planner estimated 10 rows; the actual was 50,000. Table statistics are stale — run ANALYZE orders; to update.

2. Seq Scan on large tables with high filter removal

Missing index pattern
Seq Scan on users  (cost=0.00..1520.00 rows=1 width=128) (actual time=0.015..12.340 rows=1 loops=1)
  Filter: (email = 'alice@example.com'::text)
  Rows Removed by Filter: 99999

If Rows Removed by Filter is 90% or more of the rows scanned, an index on the filter column would eliminate the wasted I/O. Note: a Seq Scan is not always bad. On small tables, it is often faster than an Index Scan. See fixing sequential scans for detailed strategies.

3. Sort or Hash with disk usage

Any Sort Method: external merge Disk or Batches > 1 on a Hash means the operation exceeded work_mem and spilled to disk. Options:

  • Raise work_mem for this query: SET work_mem = '256MB';
  • Add an index to avoid the sort entirely
  • Restructure the query to reduce the data volume before the sort or hash

These three checks — row estimate accuracy, high filter removal on Seq Scans, and disk spills — catch the vast majority of EXPLAIN-visible performance problems.

Where to go next

You can now read every field in EXPLAIN ANALYZE output. The next steps, should you wish to continue:

  • Optimization strategies: For what to do about the patterns you find, see the comprehensive EXPLAIN ANALYZE guide.
  • Row estimation: For a deeper understanding of how the planner predicts row counts, see the row estimation glossary.
  • Slow query diagnosis: If you arrived here because a specific query is slow, the slow query diagnostic walks through a complete diagnostic workflow.
  • Performance tuning: For the full picture, see the performance tuning guide.

Gold Lapel reads query plans the way this article taught you to — continuously and automatically. It watches for the patterns described here (row estimate mismatches, unnecessary Seq Scans, disk spills) across every query your application runs, and attends to them without manual EXPLAIN sessions.

Frequently asked questions