Beyond auto_explain: From Logging Slow Query Plans to Fixing Them
A slow query at 3am will not wait for you to arrive with EXPLAIN ANALYZE. Allow me to introduce the member of staff who never sleeps.
The problem: you cannot EXPLAIN ANALYZE a query that already finished
EXPLAIN ANALYZE is PostgreSQL's finest diagnostic instrument. It tells you precisely what happened inside a query — which nodes executed, how many rows they processed, where the time went. It has one rather significant limitation: you must be present, at the right terminal, running the right query, at the exact moment the problem occurs.
Intermittent slow queries do not extend this courtesy. A query that runs in 3ms throughout the workday but spikes to 8 seconds will not spike while you sit there with EXPLAIN ANALYZE at the ready. It spikes at 3am on a Saturday, when a particular combination of parameter values hits a bad plan, or when autovacuum has not run recently enough, or when a concurrent lock creates just enough contention to trigger a sequential scan. By Monday morning, the conditions have changed and the query is fast again. The culprit has left the scene. The evidence has evaporated.
This is the problem auto_explain was built to solve. It is a PostgreSQL contrib module that watches every query's execution time and automatically logs the execution plan for any statement exceeding a configurable duration threshold. No manual intervention. No need to predict which query will misbehave. auto_explain captures the plan at the moment of the offence, regardless of whether anyone is watching. It embodies two virtues I hold in the highest regard: patience and diligence.
This guide covers how to set it up, how to tune it for production, how to read what it captures, and — the part most auto_explain guides neglect — how to combine it with pg_stat_statements to build a complete diagnostic workflow from "something is slow" to "it is not slow anymore."
How do you install auto_explain?
auto_explain ships with PostgreSQL as a contrib module — there is nothing to download or compile. It has been part of the household since PostgreSQL 8.4. It can be loaded in three ways, each suited to a different situation.
Server-wide via shared_preload_libraries
This is the standard approach for persistent, server-wide monitoring — the night shift that never calls in sick. Add auto_explain to shared_preload_libraries in postgresql.conf:
# postgresql.conf — requires a restart
shared_preload_libraries = 'auto_explain'
# Set the minimum duration threshold (milliseconds)
auto_explain.log_min_duration = 250 -- After restarting PostgreSQL, verify it loaded
SHOW shared_preload_libraries;
-- Result:
-- shared_preload_libraries
-- --------------------------
-- auto_explain This requires a PostgreSQL restart — an inconvenience, but a one-time one. After the restart, auto_explain is active for every session. Any query exceeding the threshold has its plan written to the server log, quietly and without fuss.
Per-session via LOAD
For ad-hoc debugging without restarting the server:
-- Load auto_explain for this session only (superuser required, no restart)
LOAD 'auto_explain';
-- Configure for this session
SET auto_explain.log_min_duration = 100;
SET auto_explain.log_analyze = on;
SET auto_explain.log_buffers = on;
-- Now run your suspect query — the plan appears in the server log This requires superuser privileges and only affects the current session. When the session ends, auto_explain departs with it. This is the right approach when you know which session is problematic, or when you wish to test your configuration before committing to a server-wide arrangement.
Per-role via session_preload_libraries
A middle ground between server-wide and per-session. Load auto_explain automatically for connections by a specific role:
-- Load auto_explain automatically for a specific role
ALTER ROLE app_readonly SET session_preload_libraries = 'auto_explain';
ALTER ROLE app_readonly SET auto_explain.log_min_duration = 500;
-- Takes effect on the next connection by that role This does not require a restart. It takes effect on the next connection made by that role. I find this particularly useful when only certain application roles generate the queries you need to investigate — you receive plan logging for the relevant traffic without the noise of monitoring administrative sessions.
What configuration parameters does auto_explain offer?
auto_explain exposes nine GUC parameters, each deserving of your attention. All can be set in postgresql.conf for server-wide defaults, or changed per-session by superusers.
# Core settings
auto_explain.log_min_duration = 250 # log plans for queries exceeding 250ms (-1 disables)
auto_explain.log_analyze = off # include actual timings (adds overhead)
auto_explain.log_buffers = off # include buffer usage stats (requires log_analyze)
auto_explain.log_timing = on # per-node timing (requires log_analyze)
# Output control
auto_explain.log_verbose = off # schema-qualified names, output target lists
auto_explain.log_format = text # text, xml, json, yaml
auto_explain.log_level = LOG # DEBUG5, DEBUG4, ... DEBUG1, INFO, NOTICE, WARNING, LOG
# Scope control
auto_explain.log_nested_statements = off # log plans inside PL/pgSQL functions
auto_explain.sample_rate = 1 # fraction of queries to explain (0.0 to 1.0) log_min_duration: the threshold that matters most
This is the most important parameter. It controls the minimum execution time (in milliseconds) that triggers plan logging. Setting it too low produces overwhelming log volume. Setting it too high misses the intermittent slowdowns that are, in my experience, the ones that cause the most grief.
A reasonable starting point for most production systems is 250ms to 500ms. This captures queries that are genuinely slow while ignoring the normal sub-millisecond traffic. Set it to 0 to log all plans (useful in development, reckless in production). Set it to -1 to disable logging entirely.
log_analyze: the overhead trade-off
If I may be candid: this parameter is the central decision in any auto_explain configuration, and it deserves a thorough understanding before you enable it.
With log_analyze = off (the default), auto_explain logs the planner's estimated plan — the same output as EXPLAIN without ANALYZE. You see estimated costs, estimated row counts, and the access methods the planner chose. The overhead is negligible because PostgreSQL already computed this plan before execution began. You are logging what was already known.
With log_analyze = on, auto_explain instruments every node in the execution plan to capture actual timings and row counts — the same output as EXPLAIN ANALYZE. This is dramatically more useful: you see the gap between estimated and actual rows, per-node execution times, and buffer usage. But I should be forthright about the cost. The instrumentation is not free. The executor must call gettimeofday() at every plan node boundary, and the bookkeeping adds measurable overhead to every qualifying query.
The overhead scales with plan complexity. A simple index scan with 3 nodes adds very little. A complex join with 30 nodes and nested loops adds more. On workloads dominated by very short queries (sub-millisecond), the instrumentation can double the execution time of those queries — though they would not be captured by auto_explain unless your threshold is extremely low.
The practical question, then, is not whether to enable log_analyze, but under what conditions. I would counsel the following:
- Development and staging: Enable it freely. The overhead matters less than the diagnostic value.
- Production with a high threshold (500ms+): Enable it. Queries that take 500ms will not notice an additional millisecond of instrumentation overhead.
- Production with a low threshold (50-100ms): Use
sample_rateto limit the fraction of queries that get instrumented, or leavelog_analyzeoff and work with estimated plans.
log_timing: the fine-grained overhead control
When log_analyze is on, you can set log_timing = off to skip per-node timing while still capturing actual row counts and buffer statistics. This reduces the overhead of log_analyze because the system clock calls are the expensive part. You lose per-node timing breakdowns but keep the most actionable data: estimated vs. actual rows, and buffer hit/read ratios.
log_buffers: the I/O diagnosis
Requires log_analyze = on. When enabled, the logged plan includes Buffers: shared hit=N read=N per node, which tells you whether the query is I/O-bound or compute-bound. In my experience, this is often the most actionable piece of information in the entire plan — a node with read=48291 is pulling 377MB from disk, and that is usually the real bottleneck regardless of what the timings suggest.
log_nested_statements: seeing inside functions
By default, auto_explain only logs top-level SQL statements. If your application calls PL/pgSQL functions that contain slow queries, those inner queries are invisible — tucked away behind the function's outer call like a problem concealed behind a closed door. Enabling log_nested_statements captures plans for SQL executed inside functions and procedures, which is essential if your application logic lives in stored procedures.
sample_rate: the production safety valve
For production workloads, this is your most important ally. sample_rate controls the fraction of statements auto_explain considers, from 0.0 (none) to 1.0 (all). At 0.1, auto_explain evaluates 10% of qualifying statements, chosen randomly per session. This reduces both logging volume and the overhead of log_analyze instrumentation by an order of magnitude.
Note the sampling granularity: when sample_rate is less than 1, the decision is made per statement, not per session. Each qualifying statement independently has a sample_rate probability of being explained.
log_format: text versus JSON
The default text format is human-readable. json format is machine-parseable and works well with log aggregation tools and plan visualization services like explain.depesz.com and explain.dalibo.com.
# Use JSON format for programmatic log analysis
auto_explain.log_format = json
# The logged plan becomes machine-parseable:
# LOG: duration: 487.231 ms plan:
# {
# "Query Text": "SELECT ...",
# "Plan": {
# "Node Type": "Hash Join",
# "Actual Total Time": 486.82,
# "Actual Rows": 2871,
# "Shared Hit Blocks": 1204,
# "Shared Read Blocks": 3891,
# "Plans": [ ... ]
# }
# } If you intend to feed auto_explain output into any automated analysis pipeline, use JSON. If you are reading plans yourself from the server log, text is more efficient for human eyes.
log_level: controlling where output appears
The default LOG level writes to the main server log. If your log volume is a concern, you can lower it to a debug level so that only targeted log collection captures it:
# Use DEBUG1 to keep auto_explain output out of the main server log
# while still capturing it with pgBadger or a log aggregator
auto_explain.log_level = DEBUG1
log_min_messages = INFO # server log sees INFO and above
log_min_duration_statement = -1 What does auto_explain output look like?
Allow me to show you what arrives in the server log. The output varies considerably depending on whether log_analyze is on or off, and the difference matters.
With log_analyze off (estimated plans only)
LOG: duration: 487.231 ms plan:
Query Text: SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > now() - interval '30 days'
Hash Join (cost=8.20..4312.72 rows=3120 width=52)
Hash Cond: (o.customer_id = c.id)
-> Seq Scan on orders o (cost=0.00..4190.00 rows=3120 width=24)
Filter: (created_at > (now() - '30 days'::interval))
-> Hash (cost=5.80..5.80 rows=192 width=36)
-> Seq Scan on customers c (cost=0.00..5.80 rows=192 width=36) This tells you the planner's chosen strategy and cost estimates, but not what actually happened during execution. You can see that the planner chose a Seq Scan on orders and a Hash Join — which may or may not be the right plan for the actual data. Without actual row counts, you are reading a plan that was proposed, not a plan that was proven. Sometimes that is sufficient. Often it is not.
With log_analyze on (actual execution data)
LOG: duration: 487.231 ms plan:
Query Text: SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > now() - interval '30 days'
Hash Join (cost=8.20..4312.72 rows=3120 width=52)
(actual time=0.18..486.82 rows=2871 loops=1)
Hash Cond: (o.customer_id = c.id)
Buffers: shared hit=1204 read=3891
-> Seq Scan on orders o (cost=0.00..4190.00 rows=3120 width=24)
(actual time=0.04..485.21 rows=2871 loops=1)
Filter: (created_at > (now() - '30 days'::interval))
Rows Removed by Filter: 497129
Buffers: shared hit=312 read=3891
-> Hash (cost=5.80..5.80 rows=192 width=36)
(actual time=0.09..0.09 rows=192 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 18kB
Buffers: shared hit=4
Planning Time: 0.21 ms
Execution Time: 487.04 ms Now the picture is complete. The planner estimated 3,120 rows from orders; the actual count was 2,871. Reasonably close — the plan was appropriate for the join. But I would draw your attention to Rows Removed by Filter: 497129. The Seq Scan read half a million rows and discarded 497,129 of them. The query returned 2,871 rows out of 500,000 scanned. A 0.6% hit rate — a textbook case for an index on created_at.
The buffer statistics confirm the diagnosis: shared read=3891 means 3,891 pages read from disk (about 30MB). An index scan on created_at would read perhaps 20 pages for the same result. This is exactly the sort of finding that auto_explain catches while you sleep — the kind you would never think to look for, because in a quick manual test the query runs fine.
Nested statement output
With log_nested_statements = on, auto_explain opens those closed doors and captures plans inside PL/pgSQL functions — the queries that are invisible to every other monitoring tool:
-- A PL/pgSQL function that hides slow queries from top-level monitoring
CREATE FUNCTION get_customer_orders(cust_id int)
RETURNS TABLE(order_id int, total numeric) AS $$
BEGIN
RETURN QUERY
SELECT o.id, o.total
FROM orders o
WHERE o.customer_id = cust_id
AND o.status = 'pending';
END;
$$ LANGUAGE plpgsql;
-- Without log_nested_statements, calling this function logs nothing
-- even if the inner query takes 2 seconds.
-- With it enabled:
SET auto_explain.log_nested_statements = on; LOG: duration: 2134.891 ms plan:
Query Text: SELECT o.id, o.total
FROM orders o
WHERE o.customer_id = cust_id
AND o.status = 'pending'
Seq Scan on orders o (cost=0.00..41900.00 rows=5 width=20)
(actual time=1842.31..2134.12 rows=3 loops=1)
Filter: ((customer_id = 42) AND (status = 'pending'))
Rows Removed by Filter: 499997
Buffers: shared hit=2104 read=19796 The function's inner query ran a Seq Scan on 500,000 rows to find 3 matches. A half-million rows examined to produce three. Without log_nested_statements, this query would appear only as its function wrapper — with execution time attributed to the function call, not to the slow scan inside it. The actual offender would remain entirely hidden from view.
What should you look for in auto_explain output?
Reading auto_explain output uses the same skills as reading EXPLAIN ANALYZE output. The complete guide to reading query plans covers this in depth. If you will permit me, here is a focused checklist for the patterns auto_explain is most likely to surface:
- Estimated vs. actual row divergence. If the planner estimated 50 rows and the executor processed 247,000, the plan was chosen under false assumptions. Run
ANALYZEon the table to update statistics. If the divergence persists, consider extended statistics for correlated columns. - Sequential scans with high Rows Removed by Filter. A Seq Scan that reads 500,000 rows and returns 2,871 is doing 170x more work than necessary. An index on the filter column eliminates the waste.
- Buffer reads far exceeding buffer hits.
shared read=48291means the query pulled 377MB from disk. Either the table is too large for the buffer cache, or the query is reading far more data than it needs. - Nested loops with high loop counts. A node with
loops=48andactual time=182.41per loop contributes 8,756ms of total time. The per-loop time looks modest; the total is catastrophic. Always multiply actual time by loops. This is the arithmetic that matters most. - Sort Method: external merge Disk. The sort exceeded
work_memand spilled to temporary files. Consider an index that provides pre-sorted output, or increasework_memfor the affected session.
The proper diagnostic procedure: pg_stat_statements + auto_explain
If I may walk you through the toolkit I would recommend for any production system: auto_explain is most powerful when combined with pg_stat_statements. They answer complementary questions — pg_stat_statements tells you which queries are expensive; auto_explain tells you why. Together, they form a complete diagnostic procedure.
| Step | Tool | Action |
|---|---|---|
| 1. Identify | pg_stat_statements | Find queries with highest total_exec_time, worst max_exec_time, or large call counts |
| 2. Configure | auto_explain | Set log_min_duration to catch the target query when it spikes |
| 3. Collect | Server logs | Wait for the problem to recur — auto_explain captures the plan automatically |
| 4. Analyze | EXPLAIN output | Read the logged plan: check estimates vs actuals, scan types, buffer I/O |
| 5. Fix | DDL / query rewrite | Add indexes, rewrite queries, create materialized views, update statistics |
| 6. Verify | pg_stat_statements | Confirm mean and max execution times dropped after the fix |
Step 1: Identify the expensive queries
-- Step 1: Find the most expensive queries by total time
SELECT
queryid,
calls,
round(total_exec_time::numeric, 1) AS total_ms,
round(mean_exec_time::numeric, 1) AS mean_ms,
round(max_exec_time::numeric, 1) AS max_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- Example output:
-- queryid | calls | total_ms | mean_ms | max_ms | rows
-- ------------+--------+------------+---------+----------+---------
-- 1827459201 | 482910 | 894201.3 | 1.9 | 4872.1 | 482910
-- 3918274510 | 4201 | 312847.2 | 74.5 | 12841.3 | 189045
-- 7291038471 | 891 | 187412.9 | 210.3 | 8921.4 | 44550 The first query has the highest total time but a low mean — it is fast per execution but called 482,910 times. The second query has a moderate mean (74ms) but spikes to 12,841ms. The third is consistently slow at 210ms mean. Each requires a different investigation strategy, but the second one — intermittent spikes — is precisely the sort of problem that vanishes whenever you look directly at it. This is where auto_explain earns its keep.
Step 2: Configure auto_explain to capture the spike
-- Step 2: The second query (queryid 3918274510) has a mean of 74ms
-- but a max of 12,841ms — intermittent spikes.
-- Set auto_explain to capture its plans when it spikes:
SET auto_explain.log_min_duration = 500; -- catch the spikes, not the normal runs
SET auto_explain.log_analyze = on;
SET auto_explain.log_buffers = on; Step 3: Wait for the problem to recur
This is the step that auto_explain was built for, and it embodies a principle I hold dear: proper diagnostics require patience. You do not need to reproduce the problem. You do not need to guess at parameters. You configure the threshold, and auto_explain watches — with the kind of tireless attention that no human can sustain. When the query spikes again, the plan is captured in the server log, timestamped and complete.
Step 4: Analyze the captured plan
-- What auto_explain captures when the spike occurs:
LOG: duration: 8921.442 ms plan:
Query Text: SELECT p.name, p.price, array_agg(r.body)
FROM products p
JOIN reviews r ON r.product_id = p.id
WHERE p.category_id = $1
GROUP BY p.id
GroupAggregate (cost=1245.82..1892.41 rows=50 width=128)
(actual time=2.41..8920.12 rows=48 loops=1)
Group Key: p.id
Buffers: shared hit=142 read=48291
-> Nested Loop (cost=0.85..1245.32 rows=2500 width=128)
(actual time=0.08..8841.23 rows=247891 loops=1)
Buffers: shared hit=142 read=48291
-> Index Scan using products_category_id_idx on products p
(cost=0.42..12.50 rows=50 width=36)
(actual time=0.03..0.21 rows=48 loops=1)
Index Cond: (category_id = 7)
-> Index Scan using reviews_product_id_idx on reviews r
(cost=0.43..23.41 rows=50 width=100)
(actual time=0.02..182.41 rows=5164 loops=48)
Index Cond: (product_id = p.id)
Buffers: shared hit=94 read=48291 And there it is. The plan reveals what no amount of manual testing could have surfaced. The planner estimated 50 reviews per product. The actual count was 5,164. This particular category (category_id = 7) has products with unusually large review counts. The nested loop was chosen for an estimated 2,500 rows but processed 247,891. The 48,291 buffer reads (377MB from disk) confirm that the reviews table for this category does not fit in cache. The mystery of the 3am spike is solved.
Step 5: Fix and verify
-- The diagnosis: estimated 50 reviews per product, actual 5,164.
-- 48 products x 5,164 reviews = 247,891 rows processed.
-- The nested loop ran 48 times, reading 48,291 pages from disk.
-- The reviews table is too large to be cached for this category.
-- The fix: a materialized view for product review aggregates
CREATE MATERIALIZED VIEW product_review_summary AS
SELECT
p.id AS product_id,
p.name,
p.price,
p.category_id,
array_agg(r.body ORDER BY r.created_at DESC) AS review_bodies,
count(r.id) AS review_count
FROM products p
LEFT JOIN reviews r ON r.product_id = p.id
GROUP BY p.id;
CREATE INDEX idx_prs_category ON product_review_summary (category_id);
-- The replacement query:
SELECT name, price, review_bodies
FROM product_review_summary
WHERE category_id = 7;
-- From 8,921ms to 0.4ms. -- Step 5: Verify the fix held up
-- After deploying, check pg_stat_statements for the new query:
SELECT
calls,
round(mean_exec_time::numeric, 2) AS mean_ms,
round(max_exec_time::numeric, 2) AS max_ms
FROM pg_stat_statements
WHERE query LIKE '%product_review_summary%'
AND query LIKE '%category_id%';
-- And confirm auto_explain is no longer logging plans for it
-- (because it no longer exceeds the threshold) The materialized view pre-computes the aggregation. The query that took 8,921ms now takes 0.4ms. From nearly nine seconds to under half a millisecond. The periodic refresh cost of the materialized view is negligible compared to the aggregate savings across all requests for this data.
What does a production-safe configuration look like?
For high-traffic production systems, the goal is maximum diagnostic value with minimal overhead. A well-run household does not install surveillance cameras in every room — it places them at the entrances and the valuables. Here is a configuration that achieves the right balance:
# Production-safe configuration: high threshold + sampling
auto_explain.log_min_duration = 1000 # only queries over 1 second
auto_explain.log_analyze = on # actual timings — worth it at this threshold
auto_explain.log_buffers = on # buffer stats for I/O diagnosis
auto_explain.log_timing = off # skip per-node clock reads to reduce overhead
auto_explain.sample_rate = 0.1 # explain 10% of qualifying queries
auto_explain.log_format = json # machine-parseable for log aggregation
auto_explain.log_nested_statements = on The reasoning behind each choice:
log_min_duration = 1000— only captures genuinely slow queries. On a healthy system, very few queries exceed 1 second.log_analyze = on— safe at this threshold because queries taking 1+ second will not notice the instrumentation overhead.log_buffers = on— essential for I/O diagnosis. No meaningful additional overhead beyond what log_analyze already adds.log_timing = off— skips per-node clock reads. You still get actual row counts and buffer stats, which are usually more actionable than per-node timing anyway.sample_rate = 0.1— 10% of qualifying queries are explained. Reduces log volume and instrumentation overhead by 90%. Intermittent problems will still be captured — they simply need to recur a few times before the sample catches one. Patience, as I mentioned, is the operative virtue here.log_format = json— machine-parseable for automated analysis pipelines.log_nested_statements = on— captures slow queries hiding inside functions. Without this, a 5-second function call appears as a black box.
For lower-traffic systems or development environments, you can afford to be more thorough: lower the threshold to 100-250ms, increase sample_rate to 1.0, and enable log_timing. The overhead matters less when query volume is lower, and the additional detail pays for itself in faster diagnosis.
How does Gold Lapel approach the same problem?
I should note, for completeness, how Gold Lapel addresses the same diagnostic need. Gold Lapel performs plan analysis at the proxy layer. Every query that passes through the proxy has its execution plan captured via EXPLAIN — not by instrumenting the executor (as log_analyze does), but by querying the planner from outside the execution path. This means the plan capture has no impact on the query's actual execution time.
The practical differences:
- No executor overhead. auto_explain with
log_analyzeinstruments every plan node during execution. Gold Lapel obtains the plan before execution and lets the query run uninstrumented. The diagnostic cost is paid at the proxy, not inside PostgreSQL's executor. - Plan change tracking over time. auto_explain logs each plan in isolation. Gold Lapel tracks plan changes for the same query over time — if a query that was using an index scan switches to a sequential scan after an
ANALYZE, the regression is detected automatically. - From observation to action. auto_explain logs plans to the server log. Reading them, interpreting them, and deciding what to do is left to you. Gold Lapel turns plan analysis into recommendations — missing indexes, materialized view candidates, query rewrites — and in some cases, applies optimizations directly.
auto_explain remains valuable alongside Gold Lapel — I would not recommend removing a diagnostic tool simply because another covers similar ground. The raw plan output in the server log is useful for ad-hoc debugging, compliance auditing, and deep investigation. Gold Lapel handles the continuous monitoring and automated response. They serve the same goal from different positions in the household.
# Gold Lapel captures the same information at the proxy layer:
#
# 1. Every query passing through the proxy is analyzed
# 2. Plans are captured via EXPLAIN (no log_analyze overhead)
# 3. Plan changes are tracked over time — regressions are detected
# 4. Recommendations are generated: indexes, materialized views, rewrites
#
# The key difference: auto_explain instruments the executor.
# Gold Lapel queries the planner from outside the execution path.
# Same diagnostic data, different overhead profile.