← How-To

PostgreSQL Slow Query Diagnostic Flowchart

We ran EXPLAIN ANALYZE on the missing artwork. The bottleneck, it turns out, was the artist.

The Butler of Gold Lapel · March 26, 2026 · 22 min read
We ran EXPLAIN ANALYZE on the missing artwork. The bottleneck, it turns out, was the artist.

Before You Guess, Measure

I have seen more time lost to premature tuning than to the slow queries themselves. Adjusting work_mem, adding speculative indexes, increasing shared_buffers — all without first reading the query plan. This rarely helps. It occasionally makes things worse.

This article is a diagnostic decision tree. Follow the steps in order, and stop when you find the cause. Each step either resolves the issue or directs you to the next one.

How to use this guide: Start at Step 1. Run the diagnostic SQL. Check the result against the criteria listed. If the step identifies the problem, apply the fix and verify. If not, proceed to the next step. The answer is always in the plan — one simply has to read it.

For a broader treatment of why queries slow down in the first place, you may wish to visit Why Is My PostgreSQL Query Slow?. This guide, by contrast, is the runbook you follow while the problem is actively happening.

Step 1 — Run EXPLAIN (ANALYZE, BUFFERS)

Every slow query investigation begins with the query plan. No exceptions. I cannot stress this enough — the plan is not a suggestion, it is the starting point.

The starting point
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT ... your slow query here ...;

A word of caution: ANALYZE actually executes the query. On a write query (UPDATE, DELETE), wrap it in a transaction and roll back:

Safe EXPLAIN for write queries
BEGIN;
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
UPDATE orders SET status = 'shipped' WHERE id = 42;
ROLLBACK;

What to Look at First

Two numbers in the plan output deserve your immediate attention:

  1. Actual time vs. estimated cost. The actual time=X..Y values show real execution time in milliseconds. The cost=X..Y values are the planner's abstract estimates. When these diverge significantly, the planner is working with incomplete information.
  2. Actual rows vs. estimated rows. Each plan node shows rows=N (estimated) and actual ... rows=M. A mismatch of 10x or more is a reliable sign of stale statistics or a planning limitation.

Three Red Flags in Any Plan

Red flag 1 — Sequential scan on a large table with low selectivity. The plan reads every row in the table and returns only a handful:

Sequential scan red flag
Seq Scan on orders  (cost=0.00..25432.00 rows=15 width=120)
                    (actual time=312.44..4891.22 rows=12 loops=1)
  Filter: (customer_id = 9847)
  Rows Removed by Filter: 1248993
  Buffers: shared hit=8432 read=14200

This scan read 1.2 million rows to return 12. That is an extraordinary amount of effort for a modest result. Go to Step 2.

Red flag 2 — Nested loop with high loop count. The inner side of the loop executes thousands of times:

Nested loop red flag
Nested Loop  (cost=0.43..48291.50 rows=5000 width=200)
             (actual time=0.05..8842.31 rows=5000 loops=1)
  ->  Seq Scan on customers  (actual time=0.02..12.10 rows=5000 loops=1)
  ->  Index Scan on orders   (actual time=1.74..1.76 rows=1 loops=5000)
        Index Cond: (customer_id = customers.id)

The inner index scan runs 5,000 times. Multiply actual time by loops for the true cost: 1.76 ms × 5,000 = 8,800 ms. The per-loop time looks perfectly reasonable — it is the repetition that creates the problem.

Red flag 3 — Sort or hash spilling to disk. Look for Sort Method: external merge Disk or Batches: N in hash joins:

Disk sort red flag
Sort  (cost=89201.44..89451.44 rows=100000 width=64)
      (actual time=4521.22..5102.88 rows=100000 loops=1)
  Sort Key: created_at
  Sort Method: external merge  Disk: 15224kB

This sort exceeded available memory and resorted to disk. The word "external" in that output is never a welcome sight. Go to Step 6 to check work_mem.

Decision Points

  • The plan shows a sequential scan on a large table → go to Step 2
  • The plan shows bad row estimates (10x+ mismatch) → go to Step 3
  • The plan looks reasonable but the query is still slow → go to Step 4

Reading the Plan — The 60-Second Version

For a thorough treatment, see How to Read EXPLAIN ANALYZE Output. The minimum you need to follow this diagnostic:

Start from the innermost node. The most indented node executes first. Read inside out.

actual time=X..YY is the cumulative time for that node. X is the time to the first row.

rows=N (estimated) vs. rows=M (actual) — A mismatch greater than 10x means the planner built its strategy on a faulty assumption.

Buffers: shared hit=X read=Yhit is pages served from memory. read is pages fetched from the OS. A high read count suggests the working set does not fit in memory.

loops=N — The node executed N times. All time and row values are per-loop. Multiply actual time by loops for the true cost.

Step 2 — Is the Right Index Present?

If the query plan reveals a sequential scan on a large table, the first question to ask is whether an appropriate index exists.

Check Existing Indexes

SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'orders'
ORDER BY indexname;

Common Index Mismatches

These are the situations I encounter most frequently. The index exists, the query looks correct, and yet the two do not connect.

Function on the indexed column. A B-tree index on email will not be used for WHERE LOWER(email) = 'user@example.com'. Fix: create an expression index on LOWER(email).

Type mismatch. Comparing a text column to an integer value forces an implicit cast that prevents index use.

OR conditions across columns. WHERE customer_id = 42 OR status = 'pending' cannot use a single B-tree index efficiently. Fix: rewrite as a UNION ALL:

UNION ALL for OR conditions
SELECT * FROM orders WHERE customer_id = 42
UNION ALL
SELECT * FROM orders WHERE status = 'pending' AND customer_id != 42;

Leading wildcard in LIKE. WHERE name LIKE '%smith' defeats a B-tree index. Fix: use a pg_trgm GIN index for substring matching:

Trigram index for LIKE patterns
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_name_trgm ON customers USING GIN (name gin_trgm_ops);

After Creating the Index

Re-run EXPLAIN (ANALYZE, BUFFERS) and confirm the plan changed from Seq Scan to Index Scan or Index Only Scan. Trust, but verify.

If the index exists and PostgreSQL still chooses a sequential scan, the problem lies elsewhere. Go to Step 3.

Partial and Expression Indexes

Partial indexes include only a subset of rows — smaller, faster to scan, and less costly to maintain:

Partial index
CREATE INDEX idx_active_orders ON orders (customer_id)
WHERE status = 'active';

Expression indexes match functions or casts in the WHERE clause:

Expression index
CREATE INDEX idx_lower_email ON users (LOWER(email));

Covering indexes include additional columns so that index-only scans can satisfy the query without visiting the table at all:

Covering index
CREATE INDEX idx_orders_covering ON orders (customer_id)
INCLUDE (total, created_at);

Step 3 — Are Table Statistics Current?

PostgreSQL's query planner relies on table statistics to estimate how many rows each operation will produce. When they are wrong, the planner builds a strategy for a table that no longer resembles reality.

Check When Statistics Were Last Updated

SELECT
  relname,
  last_analyze,
  last_autoanalyze,
  n_live_tup,
  n_dead_tup
FROM pg_stat_user_tables
WHERE relname = 'orders';

If last_analyze and last_autoanalyze are both NULL, the table has never been analyzed — the planner has been working with no reliable information at all.

Check Row Estimate Accuracy

Compare estimated vs actual rows
-- What the planner thinks
SELECT reltuples::bigint AS estimated_rows
FROM pg_class
WHERE relname = 'orders';

-- What the table actually contains
SELECT count(*) AS actual_rows FROM orders;

If these differ by more than 10x, the statistics are stale.

Fix: Run ANALYZE

ANALYZE orders;

After running ANALYZE, re-run your EXPLAIN (ANALYZE, BUFFERS) query. If the row estimates improved and the plan changed, you have found your culprit — and the remedy was a single command.

When Autoanalyze Falls Behind

After bulk loads, large deletes, or COPY operations, the autovacuum daemon may not have triggered autoanalyze yet. You can tune this per-table:

Per-table autoanalyze tuning
ALTER TABLE orders SET (
  autovacuum_analyze_threshold = 1000,
  autovacuum_analyze_scale_factor = 0.02
);

Extended Statistics for Correlated Columns

If statistics are fresh but row estimates remain stubbornly wrong, the cause may be column correlation:

Extended statistics for correlated columns
CREATE STATISTICS orders_stats (dependencies)
ON city, zip_code FROM customers;

ANALYZE customers;

If statistics are fresh and estimates look reasonable but the query persists in its slowness, go to Step 4.

Step 4 — Is a Lock Blocking the Query?

A query with an excellent plan can still appear slow if it is waiting for a lock held by another session. The query itself is not the problem — it is simply queued behind someone who has not yet finished their business.

Diagnostic: Find Waiting Queries

Find lock-waiting queries
SELECT
  pid,
  usename,
  state,
  wait_event_type,
  wait_event,
  query_start,
  LEFT(query, 80) AS query_snippet
FROM pg_stat_activity
WHERE state = 'active'
  AND wait_event_type = 'Lock'
ORDER BY query_start;

Find What Is Blocking What

Identify blocker and blocked
SELECT
  blocked.pid AS blocked_pid,
  blocked.query AS blocked_query,
  blocker.pid AS blocker_pid,
  blocker.query AS blocker_query,
  blocker.state AS blocker_state
FROM pg_stat_activity AS blocked
JOIN LATERAL (
  SELECT unnest(pg_blocking_pids(blocked.pid)) AS pid
) AS blocking_pids ON true
JOIN pg_stat_activity AS blocker ON blocker.pid = blocking_pids.pid
WHERE blocked.wait_event_type = 'Lock';

Common Lock Scenarios

Long-running transaction holding row locks. An UPDATE or DELETE inside an uncommitted transaction blocks other sessions attempting to modify the same rows. The blocker's state will show idle in transaction.

ALTER TABLE waiting for AccessExclusiveLock. Schema changes require an exclusive lock on the table, which blocks all other queries — including SELECT.

Autovacuum blocked by long transactions. Autovacuum cannot clean dead tuples when a long-running transaction holds a snapshot that sees those tuples as potentially still visible.

Fix: Terminate the Blocker

Cancel or terminate the blocker
-- Try cancelling the query first (graceful)
SELECT pg_cancel_backend(12345);

-- If that doesn't work, terminate the connection
SELECT pg_terminate_backend(12345);

Prevention: Timeout Settings

Timeout guardrails
-- Cancel any statement that runs longer than 30 seconds
SET statement_timeout = '30s';

-- Abort if a lock cannot be acquired within 5 seconds
SET lock_timeout = '5s';

-- Kill sessions that sit idle in a transaction for more than 5 minutes
ALTER SYSTEM SET idle_in_transaction_session_timeout = '5min';
SELECT pg_reload_conf();

For a more thorough treatment of lock types and contention patterns, see PostgreSQL Lock Contention.

If no locks are present, the delay lies elsewhere. Go to Step 5.

Step 5 — Is Table Bloat the Problem?

Table bloat is the accumulation of dead tuples — rows that have been updated or deleted but not yet cleaned up by vacuum. They occupy space between live rows, forcing sequential scans to read more pages than necessary.

Diagnostic: Check Dead Tuple Count

Quick bloat check
SELECT
  relname,
  n_live_tup,
  n_dead_tup,
  ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
  last_autovacuum
FROM pg_stat_user_tables
WHERE relname = 'orders';

If dead_pct is above 20%, bloat is very likely contributing to the slowness you are investigating.

Detailed Diagnostic with pgstattuple

Precise bloat measurement
CREATE EXTENSION IF NOT EXISTS pgstattuple;

SELECT
  dead_tuple_count,
  dead_tuple_percent,
  free_space,
  free_percent
FROM pgstattuple('orders');

This scans the entire table, so run it during low-traffic periods.

Fix: Remove Bloat

VACUUM FULL rewrites the table without dead tuples, reclaiming disk space. It requires an AccessExclusiveLock, which means the table is completely unavailable for the duration:

VACUUM FULL orders;

pg_repack accomplishes the same result online, without holding a long exclusive lock.

Index Bloat

Indexes accumulate their own form of bloat over time:

Check index sizes
SELECT
  indexrelname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan
FROM pg_stat_user_indexes
WHERE relname = 'orders'
ORDER BY pg_relation_size(indexrelid) DESC;

If an index is significantly larger than expected, rebuild it without downtime:

Online index rebuild
REINDEX INDEX CONCURRENTLY idx_orders_customer_id;

For a more thorough treatment, see Table Bloat and Autovacuum Tuning.

If bloat is not the issue, we continue. Go to Step 6.

Step 6 — Is It a Configuration Problem?

If you have arrived here — the plan is reasonable, the right index exists, statistics are fresh, no locks are present, and bloat is under control — the problem may reside in PostgreSQL's configuration. The defaults are conservative by design.

shared_buffers

SHOW shared_buffers;

PostgreSQL's dedicated memory cache. Set to 25% of total system RAM. If set too low, look for high Buffers: ... read= values relative to shared hit= in EXPLAIN output.

work_mem

SHOW work_mem;

The default is 4 MB. If EXPLAIN shows Sort Method: external merge Disk, the sort exceeded work_mem. Test a higher value:

Test work_mem per-session
-- Test per-session first
SET work_mem = '64MB';
-- Then re-run the query with EXPLAIN (ANALYZE, BUFFERS)

Do not set work_mem globally to very high values. Each connection can use multiple work_mem allocations simultaneously.

effective_cache_size

A hint to the planner about how much memory is available for caching. Set to approximately 75% of total system RAM. If set too low, the planner assumes most data will require disk access and biases toward sequential scans.

random_page_cost

The default is 4.0 — a value calibrated for spinning hard drives. On SSDs, set this to 1.1–1.5:

Fix random_page_cost for SSDs
ALTER SYSTEM SET random_page_cost = 1.1;
SELECT pg_reload_conf();

A high random_page_cost on SSD storage is one of the most common configuration-related causes of unnecessary sequential scans.

effective_io_concurrency

The default is 1. For SSDs:

ALTER SYSTEM SET effective_io_concurrency = 200;
SELECT pg_reload_conf();

If configuration is already well-tuned, there is one possibility remaining. Go to Step 7.

Step 7 — Is It the Query Itself?

If you have arrived at this step — indexes present, statistics fresh, no locks, bloat controlled, configuration tuned — then the problem may be the query itself. It is doing more work than necessary. No amount of indexing or configuration tuning will resolve a query that is fundamentally asking for too much.

Patterns That Indicate a Query-Level Problem

Selecting more columns than needed. SELECT * on a table with 50 columns when the application uses 3.

Missing LIMIT on exploratory queries. Queries that return unbounded result sets when the application displays only the first 20 rows.

Correlated subqueries that should be JOINs:

Correlated subquery vs JOIN
-- Slow: correlated subquery (runs for each customer)
SELECT c.name,
  (SELECT SUM(total) FROM orders o WHERE o.customer_id = c.id) AS lifetime_total
FROM customers c;

-- Faster: JOIN with aggregation
SELECT c.name, COALESCE(SUM(o.total), 0) AS lifetime_total
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name;

DISTINCT hiding a join problem. SELECT DISTINCT on a large result set is often a signal that the query has an unintended many-to-many join producing duplicate rows.

Rewrite Strategies

Lateral joins for correlated access:

Lateral join for top-N
SELECT c.name, latest.total, latest.created_at
FROM customers c
CROSS JOIN LATERAL (
  SELECT o.total, o.created_at
  FROM orders o
  WHERE o.customer_id = c.id
  ORDER BY o.created_at DESC
  LIMIT 3
) AS latest;

Materialized views for repeated expensive aggregations:

Materialized view
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
  DATE_TRUNC('month', created_at) AS month,
  SUM(total) AS revenue,
  COUNT(*) AS order_count
FROM orders
GROUP BY DATE_TRUNC('month', created_at);

-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue;

When the answer is "this query needs a different approach" rather than "this query needs tuning," no amount of diagnostic work will resolve it. The query must be redesigned.

The Diagnostic Cheat Sheet

A summary of the decision tree above, designed to be bookmarked and consulted during an incident.

StepDiagnostic SQLWhat to Look ForFixIf Not Resolved
1EXPLAIN (ANALYZE, BUFFERS)Seq scan, bad row estimates, disk sortsIdentifies the category of problemGo to Step 2, 3, or 4
2SELECT * FROM pg_indexes WHERE tablename = 't'Missing or mismatched indexCREATE INDEXGo to Step 3
3SELECT last_analyze FROM pg_stat_user_tablesStale or missing statisticsANALYZE tablename;Go to Step 4
4SELECT * FROM pg_stat_activity WHERE wait_event_type = 'Lock'Lock waits, blocked queriespg_cancel_backend() + timeoutsGo to Step 5
5SELECT n_dead_tup FROM pg_stat_user_tablesDead tuple ratio > 20%VACUUM FULL or pg_repackGo to Step 6
6SHOW work_mem; SHOW random_page_cost;Disk sorts, high random_page_costAdjust configurationGo to Step 7
7Read the querySELECT *, missing LIMIT, correlated subqueriesRewrite the query

Frequently asked questions