auto_explain for Production: Safe Setup and Interpretation

Q: Does auto_explain slow down PostgreSQL in production?

With log_analyze = off (the default), the overhead is negligible. auto_explain runs EXPLAIN — not EXPLAIN ANALYZE — on queries that exceed your threshold. This adds only the cost of generating the plan text. With log_analyze = on, auto_explain adds instrumentation that records actual execution times and row counts. This overhead is added only to queries that already exceeded the threshold. If your threshold is 5000ms, the additional overhead on a 5-second query is typically a few percent.

Q: What is the best log_min_duration setting for auto_explain?

Start at 5000ms (5 seconds) in production. This captures only clearly problematic queries and generates minimal log volume. Fix the queries that auto_explain surfaces, then lower the threshold to 1000ms. Continue lowering as you fix slow queries. There is no universal "best" value — the right threshold surfaces a manageable number of plans that lead to actionable optimizations.

Q: Can I enable auto_explain without restarting PostgreSQL?

For a single session, yes — use LOAD 'auto_explain' and SET the parameters. For server-wide persistent configuration, add auto_explain to shared_preload_libraries and restart once. After that initial restart, all settings can be changed via ALTER SYSTEM and SELECT pg_reload_conf() — no further restarts needed.

Q: Should I use auto_explain if I already have pg_stat_statements?

Yes. They answer different questions. pg_stat_statements tells you which queries are slow — ranked by total time, mean time, or call count. auto_explain tells you why they are slow — the execution plan with node types, row estimates, join strategies, and (optionally) actual timing. Use pg_stat_statements to identify top queries, then check auto_explain logs for their execution plans.

Q: What log format should I use for auto_explain in production?

Use json if you have log processing infrastructure — ELK stack, Grafana Loki, Datadog, or custom scripts. JSON plans are machine-parseable and easy to aggregate. Use text if you read PostgreSQL logs manually in a terminal. The format does not affect what data is captured or the overhead.

The illustration exceeded the configured duration threshold. Its full execution plan has been automatically logged.

March 27, 2026 · 18 min read

The illustration exceeded the configured duration threshold. Its full execution plan has been automatically logged to the file. The artist, it appears, nested three unnecessary loops.

Why auto_explain matters in production

The fundamental problem: you cannot EXPLAIN ANALYZE a query you cannot reproduce. Production workloads exhibit timing-dependent plan changes, parameter-dependent behavior, lock contention, and data distributions that development environments never replicate. A query that runs in 5ms on your local machine can take 30 seconds in production due to table statistics, cache state, concurrent load, or simply different data volumes.

auto_explain solves this by capturing execution plans automatically when queries exceed a configurable duration threshold. No code changes. No manual intervention. When a slow query happens, the plan that PostgreSQL chose is already waiting in your logs.

The common concern is that logging EXPLAIN output slows down the database. I should address that honestly: with the right configuration, the overhead ranges from negligible to modest, depending on which features you enable. The key is knowing which settings to turn on, which to leave off, and how to lower the threshold gradually as you fix the queries that surface.

This guide covers the production setup, safety considerations, and operational details. For a guide on interpreting auto_explain output to optimize specific query patterns, see the auto_explain optimization guide.

Installation — two approaches

Server-wide via shared_preload_libraries (recommended)

For production, I would recommend loading auto_explain at server startup:

postgresql.conf

# postgresql.conf
shared_preload_libraries = 'auto_explain'

This requires a PostgreSQL restart. After restart, auto_explain is available for all sessions and configurable via postgresql.conf or ALTER SYSTEM.

If you already load other extensions at startup, combine them:

postgresql.conf

shared_preload_libraries = 'pg_stat_statements, auto_explain'

Verify that auto_explain loaded successfully:

SQL

SHOW shared_preload_libraries;
-- Result: pg_stat_statements, auto_explain

Once loaded via shared_preload_libraries, all configuration changes can be applied without a restart — using ALTER SYSTEM followed by SELECT pg_reload_conf();.

Per-session via LOAD (for testing)

For testing auto_explain before committing to a restart, load it into a single session:

SQL

LOAD 'auto_explain';
SET auto_explain.log_min_duration = 1000;  -- 1 second
SET auto_explain.log_analyze = off;

This takes effect immediately, requires no restart, and is scoped to the current session only. Settings must be configured with SET for each session — they are not persisted.

Limitations: per-session loading only captures queries in the session that loaded it. It is not suitable for production monitoring, but it is valuable for testing configuration settings before applying them server-wide.

The critical configuration settings

If you'll permit me, I should like to walk through each setting and its implications.

log_min_duration — the threshold

SQL

-- Set via ALTER SYSTEM (persisted, requires reload)
ALTER SYSTEM SET auto_explain.log_min_duration = 5000;
SELECT pg_reload_conf();

-- Or set per-session
SET auto_explain.log_min_duration = 5000;

This is the single most important setting. It defines the minimum execution time (in milliseconds) for a query to trigger plan logging. Every query that finishes faster than this threshold is completely unaffected by auto_explain.

Recommended starting points:

Environment	Threshold	Rationale
Conservative production	5000 (5s)	Captures only clearly problematic queries
Active tuning	1000 (1s)	Captures anything a user might notice
Aggressive	500 (500ms)	Captures most optimization opportunities
Development	100 (100ms)	Captures everything worth looking at

The strategy: start at 5000ms. Fix the queries that auto_explain surfaces. Lower the threshold to 1000ms. Fix those. Continue lowering. Each step reveals a new layer of optimization opportunities — and each layer is more rewarding than the last.

Special values:

-1 disables auto_explain entirely (queries still execute normally, nothing is logged)
0 logs every query — I must be direct: never do this in production. The log volume alone can degrade performance.

log_analyze — the overhead decision

SQL

ALTER SYSTEM SET auto_explain.log_analyze = off;

This setting controls whether auto_explain captures estimated plans (off, the default) or actual execution plans (on).

With log_analyze = off, auto_explain runs the equivalent of EXPLAIN — the plan with estimated costs, estimated row counts, and the chosen strategy. No additional execution overhead is added.

With log_analyze = on, auto_explain runs the equivalent of EXPLAIN ANALYZE — the plan includes actual execution times, actual row counts, and actual memory usage. This adds instrumentation overhead because PostgreSQL must call timing functions for each plan node.

The key detail: the overhead from log_analyze = on is only incurred on queries that already exceeded the threshold. If your threshold is 5000ms, you are adding instrumentation to queries that already took at least 5 seconds. The relative overhead is typically small on long-running queries.

Recommendation: start with log_analyze = off. Estimated plans still reveal sequential scans, poor join strategies, and missing indexes. Enable log_analyze = on when you need actual row counts and timing to diagnose specific queries.

log_buffers — I/O insight

SQL

ALTER SYSTEM SET auto_explain.log_buffers = on;

-- Plan output includes:
-- Buffers: shared hit=1234 read=567 dirtied=12 written=3

This distinguishes between "slow because of computation" and "slow because of disk I/O." A query with high shared read relative to shared hit is bottlenecked on I/O — the data was not in PostgreSQL's buffer cache.

Requires log_analyze = on to take effect. When you enable log_analyze, enable log_buffers as well — the additional overhead beyond log_analyze is minimal, and the I/O insight is valuable.

log_timing — per-node timing

SQL

ALTER SYSTEM SET auto_explain.log_timing = on;

When log_analyze is on, log_timing controls whether per-node execution times are reported. It defaults to on when log_analyze is enabled.

The overhead concern: gettimeofday() is called for every plan node on every execution. For plans with hundreds of nodes, this can add measurable overhead.

Recommendation: leave it on unless you have very complex queries with 100+ plan nodes and you measure a noticeable overhead.

log_nested_statements — functions and procedures

SQL

ALTER SYSTEM SET auto_explain.log_nested_statements = off;

The default (off) logs only top-level queries. When set to on, auto_explain also captures plans for queries executed inside PL/pgSQL functions, procedures, and DO blocks.

This is essential when slow queries hide inside stored procedures. A CALL process_orders() might finish in 50ms at the top level, but internally execute a query that takes 10 seconds. Without log_nested_statements, that inner query is invisible to auto_explain.

A word of caution: functions that execute many queries can generate substantial log volume when this setting is enabled.

log_format — choosing the output format

SQL

ALTER SYSTEM SET auto_explain.log_format = 'text';

-- text format output:
-- Seq Scan on users  (cost=0.00..1520.00 rows=1 width=128) (actual time=0.015..12.340 rows=1 loops=1)
--   Filter: (email = 'alice@example.com'::text)
--   Rows Removed by Filter: 99999

Options: text (default), json, xml, yaml.

Recommendation: use json if you have log processing infrastructure. Use text if you read PostgreSQL logs manually in a terminal. The format affects readability and parseability, not what data is captured.

log_level — controlling where output goes

SQL

ALTER SYSTEM SET auto_explain.log_level = 'LOG';

The default LOG level sends auto_explain output to the PostgreSQL log file. For most deployments, keep this at LOG.

A safe production configuration

Conservative — starting point

For a first deployment, I would suggest starting with minimal overhead:

postgresql.conf

# postgresql.conf
shared_preload_libraries = 'auto_explain'

auto_explain.log_min_duration = 5000    # 5 seconds
auto_explain.log_analyze = off
auto_explain.log_buffers = off
auto_explain.log_timing = on
auto_explain.log_nested_statements = off
auto_explain.log_format = text
auto_explain.log_level = LOG

This configuration captures estimated plans for queries exceeding 5 seconds. The overhead is negligible — auto_explain only runs EXPLAIN (not EXPLAIN ANALYZE) on the already-slow queries that cross the threshold.

Moderate — active tuning

After fixing the worst offenders:

postgresql.conf

auto_explain.log_min_duration = 1000    # 1 second
auto_explain.log_analyze = off
auto_explain.log_buffers = off
auto_explain.log_format = json

JSON format enables automated analysis. The 1-second threshold captures queries that users would notice.

Thorough — full diagnostic

When you need actual execution data for specific investigation:

postgresql.conf

auto_explain.log_min_duration = 1000    # 1 second
auto_explain.log_analyze = on
auto_explain.log_buffers = on
auto_explain.log_timing = on
auto_explain.log_format = json

This adds instrumentation overhead to queries exceeding 1 second. The overhead is added to already-slow queries, so the relative impact is typically small.

Changing settings without restart

After the initial restart to add auto_explain to shared_preload_libraries, all settings can be changed without a restart:

SQL

-- Change the threshold
ALTER SYSTEM SET auto_explain.log_min_duration = 1000;
SELECT pg_reload_conf();

-- Enable analyze for a deeper look
ALTER SYSTEM SET auto_explain.log_analyze = on;
ALTER SYSTEM SET auto_explain.log_buffers = on;
SELECT pg_reload_conf();

Per-role and per-database overrides

Fine-tune thresholds for specific databases or roles:

SQL

-- Lower threshold for the main application database
ALTER DATABASE myapp SET auto_explain.log_min_duration = 500;

-- Disable auto_explain for a reporting role (expected slow queries)
ALTER ROLE reporting SET auto_explain.log_min_duration = -1;

-- Aggressive threshold for a specific debugging session
SET auto_explain.log_min_duration = 100;

Reading auto_explain output

Finding the plans in your logs

auto_explain writes to the PostgreSQL log file, typically located in $PGDATA/log/ or the directory specified by log_directory. Plans appear as log entries with the query duration and the execution plan:

PostgreSQL log output

-- In PostgreSQL log:
-- 2026-03-26 14:23:45.678 UTC [12345] LOG:  duration: 5234.123 ms  plan:
--   Query Text: SELECT * FROM orders WHERE customer_id = $1 AND created_at > $2
--   Seq Scan on orders  (cost=0.00..45123.00 rows=500 width=96)
--     Filter: ((customer_id = 42) AND (created_at > '2026-01-01'::date))
--     Rows Removed by Filter: 1999500

For text-format logs, search with:

Shell

grep "duration:.*plan:" postgresql-*.log

What to look for

Six patterns appear most frequently in auto_explain output. I have seen each of them more times than I care to count:

Seq Scan on large tables — the most common finding. A sequential scan on a table with millions of rows, with a Filter that removes most of them, indicates a missing index.

Nested Loop with high actual loops — a Nested Loop join where the inner side executes thousands of times. This suggests an N+1 pattern or a missing index on the join column.

Sort with external merge Disk — the sort operation spilled to disk because the data exceeded work_mem. Either raise work_mem for this query or add an index that provides pre-sorted access.

Hash Join with Batches > 1 — the hash table spilled to disk. Raise work_mem to allow the hash table to fit in memory.

Estimated vs actual row mismatch — when estimated rows and actual rows differ by 10x or more, the planner is making decisions based on stale statistics. Run ANALYZE on the affected table.

Index Scan with Filter removing most rows — the planner used an index, but the index did not cover all the filter conditions. Consider a composite index that covers the additional conditions.

Common patterns and what they mean

Pattern: missing index

-- Pattern: Seq Scan, rows=1, Rows Removed by Filter: 999,999
-- Seq Scan on users  (cost=0.00..22456.00 rows=1 width=128) (actual time=0.021..145.678 rows=1 loops=1)
--   Filter: (email = 'alice@example.com'::text)
--   Rows Removed by Filter: 999999
-- Diagnosis: missing index on the email column.

Pattern: nested loop overhead

-- Pattern: Nested Loop, actual loops=1000
-- Nested Loop  (cost=0.57..8500.00 rows=1000 width=200) (actual time=0.050..450.000 rows=1000 loops=1)
--   -> Seq Scan on order_items  (cost=0.00..150.00 rows=1000 width=100) (actual time=0.010..1.500 rows=1000 loops=1)
--   -> Index Scan using orders_pkey on orders  (cost=0.57..8.34 rows=1 width=100) (actual time=0.400..0.410 rows=1 loops=1000)
-- Diagnosis: 1,000 index lookups on the inner side.

Pattern: sort spilling to disk

-- Pattern: Sort Method: external merge Disk: 150MB
-- Sort  (cost=250000.00..252500.00 rows=1000000 width=64) (actual time=4500.000..5200.000 rows=1000000 loops=1)
--   Sort Key: created_at DESC
--   Sort Method: external merge  Disk: 153600kB
-- Diagnosis: raise work_mem or add a pre-sorted index.

Pattern: lossy bitmap

-- Pattern: Bitmap Heap Scan, lossy=true
-- Bitmap Heap Scan on orders  (cost=500.00..15000.00 rows=50000 width=96) (actual time=25.000..350.000 rows=48500 loops=1)
--   Recheck Cond: (customer_id = ANY ('{1,2,3,4,5}'::integer[]))
--   Rows Removed by Index Recheck: 15000
--   Heap Blocks: exact=1000 lossy=5000
-- Diagnosis: bitmap exceeded work_mem, dropped to lossy precision.

Filtering noise

Once you lower the threshold below 5 seconds, auto_explain captures more plans — including queries you may not care about. Allow me to suggest several approaches for managing the noise:

Focus on top offenders. Sort logged plans by duration. Fix the slowest queries first. Each fix reduces log volume and clears the way for the next round.

Per-role filtering. Disable auto_explain for roles whose queries are expected to be slow:

SQL — per-role filtering

-- Reporting queries are expected to take minutes; do not log plans
ALTER ROLE reporting SET auto_explain.log_min_duration = -1;

-- The application role gets a lower threshold
ALTER ROLE app SET auto_explain.log_min_duration = 500;

Per-database filtering. Different databases can have different thresholds:

SQL — per-database filtering

ALTER DATABASE analytics SET auto_explain.log_min_duration = 30000;  -- 30s
ALTER DATABASE production SET auto_explain.log_min_duration = 1000;  -- 1s

Log rotation and retention. Configure log rotation to keep auto_explain output for 7-14 days:

postgresql.conf

# postgresql.conf
log_rotation_age = 1d
log_rotation_size = 100MB

Structured analysis. With JSON-format logs, pipe into a dashboard that groups by query pattern and shows frequency and duration distribution. This turns raw log entries into actionable rankings.

auto_explain vs pg_stat_statements — complementary, not competing

pg_stat_statements and auto_explain serve different purposes, and I would recommend enabling both.

pg_stat_statements tells you which queries are slow. It aggregates statistics — total execution time, mean time, call count, rows returned — across all executions of each query pattern. It answers: "Which queries consume the most database time?"

auto_explain tells you why those queries are slow. It captures the execution plan that PostgreSQL chose for a specific slow execution. It answers: "What plan did the planner choose, and where did the time go?"

The workflow:

Query pg_stat_statements to identify the top queries by total time or mean time.
Check auto_explain logs for the plans of those queries.
The plan reveals whether the problem is a missing index, a bad join strategy, stale statistics, or a sort spilling to disk.
Fix the plan, and verify with pg_stat_statements that the metrics improve.

For pg_stat_monitor users: pg_stat_monitor captures actual execution plans natively, reducing but not eliminating the need for auto_explain. See the pg_stat_monitor guide for details.

For a comprehensive overview of the complete monitoring stack, including pg_stat_statements, auto_explain, and other diagnostic extensions, see the monitoring stack guide.

What Gold Lapel captures that auto_explain cannot

auto_explain is a threshold-based tool — it captures plans when queries exceed a duration. This means it misses queries that are consistently moderate (800ms every time, with a 1-second threshold) and cannot track how plans change over time as data grows. I should be forthcoming about those limits.

Gold Lapel operates as a proxy between your application and PostgreSQL, observing every query plan as part of its optimization pipeline. It captures plans for every query pattern it tracks, regardless of execution time, and monitors how plans evolve as data distributions change. When a query that used an Index Scan last week silently switches to a Seq Scan because table statistics shifted, Gold Lapel detects the regression.

auto_explain writes to logs for human review. Gold Lapel acts on the plans — identifying optimization opportunities and applying them. The manual approach this article teaches is the foundation. Understanding it makes you better equipped to interpret what any automated tool does on your behalf.

auto_explain for Production: Safe Setup and Interpretation

Why auto_explain matters in production

Installation — two approaches

Server-wide via shared_preload_libraries (recommended)

Per-session via LOAD (for testing)

The critical configuration settings

log_min_duration — the threshold

log_analyze — the overhead decision

log_buffers — I/O insight

log_timing — per-node timing

log_nested_statements — functions and procedures

log_format — choosing the output format

log_level — controlling where output goes

A safe production configuration

Conservative — starting point

Moderate — active tuning

Thorough — full diagnostic

Changing settings without restart

Per-role and per-database overrides

Reading auto_explain output

Finding the plans in your logs

What to look for

Common patterns and what they mean

Filtering noise

auto_explain vs pg_stat_statements — complementary, not competing

What Gold Lapel captures that auto_explain cannot

Frequently asked questions

Does auto_explain slow down PostgreSQL in production?

What is the best log_min_duration setting for auto_explain?

Can I enable auto_explain without restarting PostgreSQL?

Should I use auto_explain if I already have pg_stat_statements?

What log format should I use for auto_explain in production?

Related content