← How-To

PostgreSQL Monitoring Stack: pg_stat_statements + pg_qualstats + auto_explain

Three members of the household staff who, working together, will tell you everything worth knowing about your PostgreSQL workload — without a single third-party agent crossing the threshold.

The Butler of Gold Lapel · March 26, 2026 · 22 min read
A watercolour of three extensions working in concert was commissioned. The artist delivered three separate canvases and insists they are best appreciated individually. He has, perhaps unintentionally, illustrated the very problem this article solves.

The case for a native monitoring stack

Most teams reach for external monitoring — Datadog, pganalyze, New Relic — before exploring what PostgreSQL already provides. A reasonable instinct. External tools offer dashboards, alerting, and historical trend analysis out of the box. But every one of those tools reads from the same underlying data source: the statistics views and extensions that PostgreSQL exposes natively.

Three extensions, properly configured, give you:

  • Query performance tracking. Every query executed, with call counts, execution times, row counts, and I/O statistics.
  • Missing index detection. Which columns your queries filter on, how often, and whether those columns have indexes.
  • Slow query plan capture. The actual execution plan for any query exceeding a time threshold, logged automatically.

The native stack has practical advantages beyond cost. No data leaves your network. There is no agent to install, upgrade, or troubleshoot. The instrumentation runs inside the database process itself, with overhead measured in low single-digit percentages. And because every external monitoring tool builds on these same extensions, understanding them directly means you can debug problems even when your monitoring vendor is unreachable. That last point deserves emphasis — vendor dashboards are lovely until the vendor is down and the database is not.

pg_stat_statements — The foundation

pg_stat_statements is the single most important monitoring extension in PostgreSQL. It tracks execution statistics for every normalized query: how many times it ran, how long it took, how many rows it returned, and how much I/O it consumed. If you install only one monitoring extension — this is the one. Everything else in this guide builds on it.

What it captures

pg_stat_statements records a row for each unique, normalized query. Parameters are replaced with $1, $2, etc., so SELECT * FROM orders WHERE id = 42 and SELECT * FROM orders WHERE id = 99 collapse into a single entry.

For each normalized query, it tracks:

MetricDescription
callsNumber of times the query has been executed
total_exec_timeCumulative execution time in milliseconds
min_exec_timeShortest single execution
max_exec_timeLongest single execution
mean_exec_timeAverage execution time
rowsTotal number of rows returned
shared_blks_hitBuffer cache hits
shared_blks_readBlocks read from disk (or OS cache)
temp_blks_readTemporary blocks read (spill to disk)
temp_blks_writtenTemporary blocks written (spill to disk)

PostgreSQL 14 and later add planning time tracking (total_plan_time, min_plan_time, etc.) and JIT compilation statistics.

Installation and configuration

pg_stat_statements requires loading into shared memory at server startup:

postgresql.conf
shared_preload_libraries = 'pg_stat_statements'

This change requires a PostgreSQL restart. After restarting:

Create extension
CREATE EXTENSION pg_stat_statements;

Key configuration parameters:

postgresql.conf settings
# Number of distinct queries to track (default: 5000)
pg_stat_statements.max = 10000

# Which statements to track: top (default), all, none
pg_stat_statements.track = top

# Whether to track utility commands (DDL, VACUUM, etc.)
pg_stat_statements.track_utility = on

# Separate planning time tracking (PostgreSQL 14+)
pg_stat_statements.track_planning = on

The pg_stat_statements.max setting controls the in-memory hash table size. The default of 5,000 is sufficient for many workloads, but databases with diverse query patterns — particularly those using ORMs — may need 10,000 or more. Each entry consumes approximately 2KB of shared memory.

Essential queries

Top 10 queries by total execution time — the queries consuming the most cumulative wall clock time. These are your highest-impact optimization targets:

Top by total time
SELECT
    queryid,
    LEFT(query, 80) AS query_preview,
    calls,
    round(total_exec_time::numeric, 2) AS total_time_ms,
    round(mean_exec_time::numeric, 2) AS avg_time_ms,
    rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Top 10 queries by call count — the most frequently executed. A 2ms query running 500,000 times an hour is considerably expensive:

Top by call count
SELECT
    queryid,
    LEFT(query, 80) AS query_preview,
    calls,
    round(total_exec_time::numeric, 2) AS total_time_ms,
    round(mean_exec_time::numeric, 2) AS avg_time_ms,
    rows
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;

Top 10 queries by mean execution time — the slowest individual queries, causing the worst tail latency:

Top by mean time
SELECT
    queryid,
    LEFT(query, 80) AS query_preview,
    calls,
    round(mean_exec_time::numeric, 2) AS avg_time_ms,
    round(max_exec_time::numeric, 2) AS max_time_ms,
    rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

Top 10 queries by I/O — the queries hitting disk most often:

Top by I/O
SELECT
    queryid,
    LEFT(query, 80) AS query_preview,
    calls,
    shared_blks_hit,
    shared_blks_read,
    round(
        shared_blks_hit::numeric /
        NULLIF(shared_blks_hit + shared_blks_read, 0) * 100,
        2
    ) AS cache_hit_pct
FROM pg_stat_statements
ORDER BY shared_blks_read DESC
LIMIT 10;

Cache hit ratio per query — queries that consistently miss the buffer cache:

Cache hit ratio
SELECT
    queryid,
    LEFT(query, 80) AS query_preview,
    calls,
    shared_blks_hit + shared_blks_read AS total_blocks,
    round(
        shared_blks_hit::numeric /
        NULLIF(shared_blks_hit + shared_blks_read, 0) * 100,
        2
    ) AS cache_hit_pct
FROM pg_stat_statements
WHERE shared_blks_hit + shared_blks_read > 100
ORDER BY cache_hit_pct ASC
LIMIT 10;

Resetting statistics. After deploying a major change or completing an optimization cycle, reset to establish a clean baseline:

Reset statistics
SELECT pg_stat_statements_reset();

-- PostgreSQL 14+: reset for a specific query
SELECT pg_stat_statements_reset(
    userid => 0,
    dbid => 0,
    queryid => 1234567890
);

What pg_stat_statements cannot tell you

I should be forthright about its limits:

  • Which parameter values were used. Queries are normalized — WHERE id = $1 tells you the query shape but not whether the slow execution was for id = 1 or id = 999999.
  • Which columns were filtered on. The normalized query text does not break out predicate analysis by column.
  • The execution plan. You see how long a query took, but not whether it used a sequential scan, nested loop join, or hash join.

Three blind spots. Three gaps that the remaining two extensions fill precisely.

pg_qualstats — Predicate analysis and missing index detection

pg_qualstats answers the question that pg_stat_statements politely declines to: which columns are your queries actually filtering on, and how often? This is the data you need to make informed indexing decisions — decisions grounded in your actual workload, not in guesswork.

What it captures

pg_qualstats hooks into the query executor and records every qual (WHERE clause predicate) that PostgreSQL evaluates. For each qual, it tracks the table and column referenced, the operator used, how many times that predicate was executed, the number of rows filtered, and the queryid from pg_stat_statements for correlation.

Installation and configuration

pg_qualstats depends on pg_stat_statements and must be loaded alongside it:

postgresql.conf
shared_preload_libraries = 'pg_stat_statements, pg_qualstats'

Restart PostgreSQL, then:

Create extension
CREATE EXTENSION pg_qualstats;
Configuration
# Enable or disable tracking
pg_qualstats.enabled = on

# Whether to track constant values in predicates
pg_qualstats.track_constants = on

# Sampling rate: 1.0 = every query, 0.1 = 10% of queries
pg_qualstats.sample_rate = 1.0

The sample_rate parameter is the primary knob for controlling overhead. On high-traffic systems, setting sample_rate = 0.1 captures a representative sample at one-tenth the cost.

Essential queries

Most frequently filtered columns — your strongest candidates for indexing:

Most filtered columns
SELECT
    relname AS table_name,
    attname AS column_name,
    opno::regoper AS operator,
    sum(execution_count) AS total_filter_count
FROM pg_qualstats_all AS q
JOIN pg_catalog.pg_class AS c ON q.lrelid = c.oid
JOIN pg_catalog.pg_attribute AS a
    ON a.attrelid = c.oid AND a.attnum = q.lattnum
WHERE lrelid IS NOT NULL
GROUP BY relname, attname, opno
ORDER BY total_filter_count DESC
LIMIT 20;

Frequently filtered columns that lack an index — this is the query I would run first if I could run only one:

Missing indexes
SELECT
    c.relname AS table_name,
    a.attname AS column_name,
    sum(q.execution_count) AS total_filter_count
FROM pg_qualstats_all AS q
JOIN pg_catalog.pg_class AS c ON q.lrelid = c.oid
JOIN pg_catalog.pg_attribute AS a
    ON a.attrelid = c.oid AND a.attnum = q.lattnum
WHERE lrelid IS NOT NULL
  AND NOT EXISTS (
      SELECT 1
      FROM pg_catalog.pg_index AS i
      JOIN pg_catalog.pg_attribute AS ia
          ON ia.attrelid = i.indrelid
          AND ia.attnum = ANY(i.indkey)
      WHERE i.indrelid = c.oid
        AND ia.attname = a.attname
  )
GROUP BY c.relname, a.attname
ORDER BY total_filter_count DESC
LIMIT 20;

Joining pg_qualstats with pg_stat_statements — "which queries are slow" combined with "which columns are they filtering on":

Slow queries + predicate analysis
SELECT
    LEFT(s.query, 80) AS query_preview,
    c.relname AS table_name,
    a.attname AS column_name,
    s.calls,
    round(s.mean_exec_time::numeric, 2) AS avg_time_ms,
    sum(q.execution_count) AS filter_count
FROM pg_qualstats_all AS q
JOIN pg_stat_statements AS s ON q.queryid = s.queryid
JOIN pg_catalog.pg_class AS c ON q.lrelid = c.oid
JOIN pg_catalog.pg_attribute AS a
    ON a.attrelid = c.oid AND a.attnum = q.lattnum
WHERE s.mean_exec_time > 100
GROUP BY s.query, c.relname, a.attname, s.calls, s.mean_exec_time
ORDER BY s.mean_exec_time DESC
LIMIT 20;

Built-in index suggestions:

Automated index suggestions
SELECT * FROM pg_qualstats_indexes();

One function call, a set of suggested CREATE INDEX statements. I should note — this is a starting point, not a final answer. Each suggestion deserves validation.

From suggestion to validation with HypoPG

pg_qualstats tells you which indexes might help. HypoPG lets you test whether they actually would — without building the real index:

HypoPG validation workflow
-- 1. Create a hypothetical index
SELECT * FROM hypopg_create_index('CREATE INDEX ON orders (customer_id)');

-- 2. Check if the planner would use it
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

-- 3. Remove the hypothetical index
SELECT hypopg_drop_index(indexrelid)
FROM hypopg_list_indexes();

-- 4. Create the real index only if validated
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id);

Detect with pg_qualstats. Validate with HypoPG. Create only when validated. For the complete three-step workflow, see the PostgreSQL missing index detection guide. For deeper walkthroughs of each tool, see the pg_qualstats guide and the HypoPG guide.

auto_explain — Execution plans for slow queries

auto_explain attends to the last gap in the monitoring stack: capturing execution plans automatically for queries that exceed a time threshold. Without auto_explain, you must manually run EXPLAIN ANALYZE — which requires knowing in advance which query to analyze and being able to reproduce it. In production, neither condition is guaranteed.

What it captures

When a query exceeds the configured threshold, auto_explain logs the full EXPLAIN output. Depending on configuration, this includes the execution plan tree, actual row counts, buffer usage, per-node timing, and plans for statements inside functions.

Installation and configuration

auto_explain is a contrib module that ships with PostgreSQL:

postgresql.conf
shared_preload_libraries = 'pg_stat_statements, pg_qualstats, auto_explain'

Unlike pg_stat_statements and pg_qualstats, auto_explain does not require CREATE EXTENSION — it activates through configuration alone.

For ad-hoc debugging without a restart:

Per-session activation
LOAD 'auto_explain';
SET auto_explain.log_min_duration = 0;  -- Log all queries in this session

Key configuration:

auto_explain settings
# Minimum execution time (ms) to trigger plan logging
auto_explain.log_min_duration = 1000

# Include actual execution times (adds overhead)
auto_explain.log_analyze = off

# Include buffer usage statistics
auto_explain.log_buffers = on

# Include per-node timing
auto_explain.log_timing = off

# Capture plans for statements inside functions/procedures
auto_explain.log_nested_statements = on

# Output format: text, xml, json, yaml
auto_explain.log_format = json

The relationship between log_analyze and overhead deserves emphasis. When log_analyze is off, auto_explain logs the plan with estimated row counts. When on, it runs the equivalent of EXPLAIN ANALYZE internally, adding 5-15% execution time on the logged query. For production, start with log_analyze = off.

Reading auto_explain output

auto_explain output appears in the PostgreSQL log files:

auto_explain log entry
LOG:  duration: 2345.678 ms  plan:
    Query Text: SELECT o.*, c.name
                FROM orders o
                JOIN customers c ON c.id = o.customer_id
                WHERE o.status = 'pending'
                  AND o.created_at > '2026-01-01'
    Nested Loop  (cost=0.43..15234.56 rows=120 width=210)
      ->  Seq Scan on orders o  (cost=0.00..12456.00 rows=120 width=96)
            Filter: ((status = 'pending') AND (created_at > '2026-01-01'))
            Rows Removed by Filter: 498880
      ->  Index Scan using customers_pkey on customers c  (cost=0.43..8.45 rows=1 width=114)
            Index Cond: (id = o.customer_id)

The patterns worth watching for:

  • Seq Scan on large tables. A sequential scan means PostgreSQL is reading every row. On tables with hundreds of thousands of rows, this usually points to a missing index.
  • Nested Loop with high loop counts. If the outer side returns 10,000 rows, the inner side runs 10,000 times.
  • Sort with external merge (disk sort). The data to sort exceeded work_mem.
  • Hash Join with multiple batches. The hash table exceeded work_mem and split into batches on disk.

pg_stat_statements tells you which queries are slow. The plan tells you why. For a comprehensive guide, see the EXPLAIN ANALYZE guide.

Production considerations

Log volume. Start with a high threshold (5,000ms or more) and lower it gradually as you address the slowest queries.

log_analyze overhead. Begin with off in production — you still get the plan structure and estimates.

log_timing overhead. Adds gettimeofday() calls to every plan node. Enable selectively.

Log rotation. auto_explain can grow log files quickly. Ensure rotation is configured via log_rotation_age and log_rotation_size.

JSON format for automation. Set log_format = json for programmatic parsing.

Wiring the stack together

The combined configuration

All three extensions working in concert:

Complete postgresql.conf
# -- Monitoring Stack: shared_preload_libraries --
shared_preload_libraries = 'pg_stat_statements, pg_qualstats, auto_explain'

# -- pg_stat_statements --
pg_stat_statements.max = 10000
pg_stat_statements.track = top
pg_stat_statements.track_utility = on
pg_stat_statements.track_planning = on

# -- pg_qualstats --
pg_qualstats.enabled = on
pg_qualstats.track_constants = on
pg_qualstats.sample_rate = 1.0

# -- auto_explain --
auto_explain.log_min_duration = 1000
auto_explain.log_analyze = off
auto_explain.log_buffers = on
auto_explain.log_timing = off
auto_explain.log_nested_statements = on
auto_explain.log_format = json
Verify installation
-- Restart PostgreSQL, then create extensions:
-- psql -d your_database -c "CREATE EXTENSION IF NOT EXISTS pg_stat_statements;"
-- psql -d your_database -c "CREATE EXTENSION IF NOT EXISTS pg_qualstats;"

-- Verify all extensions are loaded:
SELECT name, installed_version, default_version
FROM pg_available_extensions
WHERE name IN ('pg_stat_statements', 'pg_qualstats', 'auto_explain')
ORDER BY name;

-- Confirm pg_stat_statements is collecting data:
SELECT count(*) FROM pg_stat_statements;

-- Confirm pg_qualstats is collecting data:
SELECT count(*) FROM pg_qualstats();

The monitoring workflow

With all three extensions in service, the monitoring workflow follows a natural sequence — each step informing the next:

  1. pg_stat_statements identifies the expensive queries (the WHAT). Sort by total_exec_time, mean_exec_time, and calls for a prioritized list of optimization targets.
  2. pg_qualstats identifies predicate patterns and missing indexes (the WHERE). For the expensive queries, check which columns they filter on. Cross-reference with existing indexes to find gaps.
  3. auto_explain captures execution plans for the slowest queries (the WHY). Review the logs for sequential scans, poor join strategies, disk spills, or row estimate mismatches.
  4. HypoPG validates index suggestions (the TEST). Before creating an index on a large table, use HypoPG to confirm the planner would use it.
  5. Create indexes, reset stats, measure improvement (the VERIFY). After making changes, reset pg_stat_statements and let the system run under normal load.

This cycle repeats. As you address the most expensive queries, the next tier rises to the top. Lower the auto_explain threshold as the worst offenders are resolved. The database gets faster, and your standards get higher. A virtuous arrangement.

Building a dashboard

pg_stat_statements statistics are cumulative and survive only until a manual reset or server restart. To track trends over time, snapshot the data into a history table.

History table
CREATE TABLE pg_stat_statements_history (
    snapshot_time     timestamptz NOT NULL DEFAULT now(),
    queryid           bigint,
    query             text,
    calls             bigint,
    total_exec_time   double precision,
    mean_exec_time    double precision,
    rows              bigint,
    shared_blks_hit   bigint,
    shared_blks_read  bigint
);

CREATE INDEX idx_pgss_history_time ON pg_stat_statements_history (snapshot_time);
CREATE INDEX idx_pgss_history_queryid ON pg_stat_statements_history (queryid, snapshot_time);
Hourly snapshot with pg_cron
-- Using pg_cron to snapshot every hour
SELECT cron.schedule(
    'pgss-snapshot',
    '0 * * * *',
    $$
    INSERT INTO pg_stat_statements_history
        (queryid, query, calls, total_exec_time, mean_exec_time, rows,
         shared_blks_hit, shared_blks_read)
    SELECT queryid, query, calls, total_exec_time, mean_exec_time, rows,
           shared_blks_hit, shared_blks_read
    FROM pg_stat_statements
    WHERE calls > 0;
    $$
);

Key metrics to track over time:

  • Total query time per snapshot interval. If it trends upward without a corresponding increase in traffic, something is degrading.
  • Cache hit ratio. sum(shared_blks_hit) / sum(shared_blks_hit + shared_blks_read) — a healthy ratio is above 99% for OLTP workloads.
  • Top queries by total time. New entries appearing in the top 10 may indicate a regression.
  • New slow queries. Compare consecutive snapshots for query IDs with significantly increased mean execution time.

Grafana + Prometheus. For teams already using Grafana, the postgres_exporter for Prometheus can expose pg_stat_statements data as metrics. The exporter reads from pg_stat_statements directly, so the native monitoring stack is the prerequisite regardless.

Simple alternative: scheduled SQL report. If Grafana and Prometheus represent more infrastructure than you need — and there is no shame in that — a cron job that runs the "top 10 by total time" query provides 80% of the value at a fraction of the complexity:

Slack report script
#!/bin/bash
REPORT=$(psql -d your_database -t -A -F'|' -c "
    SELECT
        LEFT(query, 60) AS query,
        calls,
        round(total_exec_time::numeric / 1000, 1) AS total_sec,
        round(mean_exec_time::numeric, 1) AS avg_ms
    FROM pg_stat_statements
    ORDER BY total_exec_time DESC
    LIMIT 10;
")

# Send to Slack via webhook
curl -s -X POST "$SLACK_WEBHOOK_URL" \
    -H 'Content-Type: application/json' \
    -d "{\"text\": \"Top 10 queries by total time:\n\`\`\`\n${REPORT}\n\`\`\`\"}"

Beyond the basics — additional monitoring extensions

The core trio covers the essential needs. For specialized observability:

pg_stat_monitor — Percona's alternative to pg_stat_statements with time-based aggregation buckets, histogram distributions, and query plan storage. The tradeoff is ecosystem support — pg_stat_statements is the standard that external tools build against. For a comparison, see the pg_stat_monitor optimization guide.

pg_wait_sampling — profiles wait events: why queries pause during execution. Valuable when execution times increase without any change in the query or data volume.

pg_stat_kcache — adds OS-level cache statistics, distinguishing between reads from the OS page cache and reads from physical disk.

pg_store_plans — stores execution plans alongside pg_stat_statements data. Useful for detecting plan regressions.

pg_buffercache — inspect the current contents of shared buffer cache. See the pg_buffercache analysis guide.

Honest counterpoint — when you need a vendor tool

I should be forthcoming about where this approach falls short:

  • Historical dashboards out of the box. You must build your own snapshotting, storage, and visualization.
  • Alerting. No built-in mechanism to notify you when a query regresses.
  • Anomaly detection. Commercial tools apply statistical models to detect unusual patterns.
  • Team-wide query review UI. Commercial tools provide web interfaces where anyone on the team can browse query statistics.

If your team needs a polished UI with trend graphs and Slack alerts, commercial tools like pganalyze, Datadog, or Tembo are well-engineered choices. They are not a substitute for understanding the native stack — they are a layer on top of it.

The native stack is the foundation regardless. Every vendor tool reads from pg_stat_statements under the hood. Knowing how to query these extensions directly means you can debug when the vendor dashboard is unavailable. For a broader comparison, see the monitoring tools comparison.

Frequently asked questions