From Monitoring to Action: Using pg_stat_monitor to Optimize Queries

Q: Does pg_stat_monitor replace pg_stat_statements?

It can, but it does not have to. pg_stat_monitor provides a superset of functionality — everything pg_stat_statements offers plus time buckets, plan capture, client attribution, and histograms. Many teams run both: pg_stat_statements for lightweight always-on monitoring (especially with third-party tools that expect it) and pg_stat_monitor for deeper analysis during active investigations. If you only want one extension, pg_stat_monitor covers everything pg_stat_statements does and more.

Q: How much overhead does pg_stat_monitor add?

With plan capture disabled (the default), overhead is comparable to pg_stat_statements — negligible for most workloads. Enabling pgsm_enable_query_plan adds measurable overhead because the extension must serialize the execution plan for every query execution. The recommendation: keep plan capture off by default and enable it when you need to investigate a specific performance issue. The overhead of plan capture is proportional to query frequency — high-throughput OLTP workloads will feel it more than low-frequency analytical queries.

Q: What happens when all time buckets are full?

pg_stat_monitor rotates — it overwrites the oldest bucket with new data. With the default settings (10 buckets at 300 seconds each), you get a rolling window of approximately 50 minutes. Once the 11th bucket starts, the 1st is discarded. If you need longer history, increase pgsm_max_buckets (which increases shared memory usage) or export data to an external time-series store before it rotates out.

Q: Can I use pg_stat_monitor on Amazon RDS or Aurora?

As of early 2026, pg_stat_monitor is not available on RDS, Aurora, Google Cloud SQL, Azure Database for PostgreSQL, or Neon. It is available on Supabase and fully supported on self-managed PostgreSQL and Percona Distribution for PostgreSQL. If your managed provider does not offer it, pg_stat_statements remains available everywhere and covers the core use case of identifying slow queries — you lose the time dimension and plan capture, but the fundamentals are there.

Q: How do I read the resp_calls histogram?

resp_calls is a PostgreSQL array where each element represents the count of query executions that fell within that histogram bucket. The buckets span from pgsm_histogram_min to pgsm_histogram_max in equal intervals. For example, with 20 buckets spanning 1ms to 10,000ms, each bucket covers approximately 500ms. An array like {0, 312, 89, 12, 3, 0, ..., 1, 0} tells you most calls were fast (312 in the 1-500ms range) but one outlier hit the 9.5-10s range. This is far more revealing than a simple average.

Q: Should I enable query plan capture in production?

Not permanently. Plan capture serializes the execution plan for every query execution, which adds CPU overhead and increases shared memory usage. Use it as a diagnostic tool: enable it when you have identified a specific regression, capture the plan for the offending query, then disable it. Treat it the way you would EXPLAIN ANALYZE — a targeted investigation tool, not a permanent fixture.

If you'll permit me a frank assessment: pg_stat_statements tells you what happened. pg_stat_monitor tells you what happened, when it happened, to whom, and under what plan.

The Butler of Gold Lapel · March 21, 2026 · 18 min read

The commissioned illustration has yet to arrive. We are told the artist is monitoring progress.

Why pg_stat_statements is not enough, if you'll forgive my candour

Good evening. I see you've arrived with questions about your query monitoring, and I should like to attend to them properly. pg_stat_statements is the standard tool for identifying slow queries in PostgreSQL, and it does that job well. It accumulates statistics — total execution time, call count, rows returned — for every normalized query pattern since the last reset. When you need to answer "which queries cost the most overall," it is the right tool.

But it has a structural limitation that I confess I find genuinely vexing: the statistics are cumulative. Each query gets a single row that aggregates everything from the moment counters were last reset. A query that averaged 5ms for three months and then jumped to 500ms in the last hour still shows a comfortable average in pg_stat_statements. The regression is diluted into the lifetime aggregate, invisible unless you happened to snapshot the statistics before the problem started. Three months of good behaviour providing cover for an hour of catastrophe. One does not run a household this way.

I should be fair: this is not a bug — it is a design choice. pg_stat_statements was built for aggregate analysis, not temporal analysis. If you want to know when a query regressed, you need to either reset statistics manually (losing all historical data in the process, which rather defeats the purpose) or build an external system that snapshots the view periodically and computes deltas. Neither option is what I would call elegant.

pg_stat_monitor, developed by Percona, takes a different — and in my estimation, rather more civilised — approach. It divides statistics into time buckets — configurable windows (default: 5 minutes) that each hold independent counters. When a bucket expires, the extension rotates to the next one. The result is a built-in sliding window of recent query performance: you can compare the last 5 minutes to the previous 5 minutes directly from SQL, without external tooling, without resetting anything. Context is preserved. This is the difference between being told "something happened" and being told "this happened, twelve minutes ago, in the drawing room."

That is merely the beginning. pg_stat_monitor also captures the actual execution plan for each query, records the client IP and application name that originated it, provides response time histograms, and detects when PostgreSQL switches to a different execution plan for the same query. These are capabilities that pg_stat_statements does not offer — not because it was poorly designed, but because they were outside its original brief. Allow me to walk you through each of them.

How do you install pg_stat_monitor?

pg_stat_monitor is not included in PostgreSQL's contrib modules — a minor inconvenience, but one easily resolved. It must be installed separately, from Percona's repositories, compiled from source, or installed via PGXN. Like pg_stat_statements, it requires a spot in shared_preload_libraries, which means a PostgreSQL restart when adding it for the first time. I appreciate that restarts are not to be taken lightly, so let us ensure the configuration is right before you commit to one.

postgresql.conf + SQL

-- 1. Add to postgresql.conf (requires restart)
shared_preload_libraries = 'pg_stat_monitor'

-- 2. Restart PostgreSQL, then create the extension
CREATE EXTENSION pg_stat_monitor;

-- 3. Verify it is collecting data
SELECT count(*) FROM pg_stat_monitor;

After the restart and extension creation, pg_stat_monitor begins tracking every query immediately. No further configuration is required for basic operation — the defaults (10 buckets at 300 seconds each) give you approximately 50 minutes of rolling history. It begins its duties at once, without further instruction.

Running alongside pg_stat_statements

You do not need to choose between the two. Both extensions can coexist in shared_preload_libraries — but the load order matters. pg_stat_statements must be listed first:

postgresql.conf + SQL

-- Running alongside pg_stat_statements — order matters
-- pg_stat_statements must be listed first
shared_preload_libraries = 'pg_stat_statements, pg_stat_monitor'

-- After restart, create both extensions
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
CREATE EXTENSION IF NOT EXISTS pg_stat_monitor;

Why run both? A reasonable question. Many third-party monitoring tools (pganalyze, Datadog, PMM) expect pg_stat_statements and query it directly. Removing it would break those integrations — and one does not dismiss reliable staff simply because new staff have arrived. Running pg_stat_monitor alongside gives you the time-bucketed analysis and plan capture without disrupting your existing monitoring stack.

One caveat noted in Percona's documentation: memory and WAL statistics may show inconsistencies when both extensions are active, because they intercept the same hooks and account for the same data differently. For most practical purposes, this does not affect query timing analysis — the primary use case for pg_stat_monitor.

Configuration: setting the household in order

All parameters are prefixed with pg_stat_monitor. and visible in pg_settings. Some require a restart; others take effect after a configuration reload. Allow me to draw your attention to the ones that matter most.

SQL

-- View all pg_stat_monitor configuration parameters
SELECT name, setting, short_desc
FROM pg_settings
WHERE name LIKE 'pg_stat_monitor%'
ORDER BY name;

Here is a recommended starting configuration with commentary on each parameter:

postgresql.conf

-- postgresql.conf — recommended starting configuration

-- Time bucketing: 60-second buckets, 30 retained = 30 minutes of history
pg_stat_monitor.pgsm_bucket_time = 60
pg_stat_monitor.pgsm_max_buckets = 30

-- Query plan capture: off by default, enable when investigating
pg_stat_monitor.pgsm_enable_query_plan = no

-- Histogram: 1ms to 10s range, 20 buckets
pg_stat_monitor.pgsm_histogram_min = 1
pg_stat_monitor.pgsm_histogram_max = 10000
pg_stat_monitor.pgsm_histogram_buckets = 20

-- Show actual parameter values instead of placeholders
pg_stat_monitor.pgsm_normalized_query = 0

-- Shared memory: 256MB is usually sufficient
pg_stat_monitor.pgsm_max = 256

Tuning bucket time and count

The two most important parameters are pgsm_bucket_time and pgsm_max_buckets. Together, they define your monitoring window:

Total history = pgsm_bucket_time × pgsm_max_buckets
Defaults (300s × 10) = 50 minutes
Recommended for active monitoring (60s × 30) = 30 minutes with finer granularity
For longer retention (60s × 60) = 1 hour, but uses more shared memory

Shorter bucket times give finer resolution for detecting regressions — a 60-second bucket can catch a spike that a 300-second bucket would smooth into obscurity. The trade-off is memory: more buckets with more distinct queries means more rows in shared memory. I would start with 60-second buckets and 30 of them. You can always adjust once you understand your workload's character.

Query plan capture

This is the feature that earns pg_stat_monitor its place in the household. When enabled, the extension stores the text representation of the execution plan that PostgreSQL chose for each query execution. This means you can see whether a query used an index scan or a sequential scan without manually running EXPLAIN — the evidence is already collected and waiting for your inspection.

SQL — toggle plan capture

-- Enable query plan capture for active investigation
ALTER SYSTEM SET pg_stat_monitor.pgsm_enable_query_plan = 'yes';
SELECT pg_reload_conf();

-- ... investigate the issue ...

-- Disable when done — plan capture adds overhead
ALTER SYSTEM SET pg_stat_monitor.pgsm_enable_query_plan = 'no';
SELECT pg_reload_conf();

Let me be forthcoming about the cost: plan serialization runs on every query execution, adding CPU overhead proportional to query throughput. On an OLTP workload doing 10,000 queries per second, you will notice. On an analytical workload doing 50 queries per second, you will not. The recommended pattern — and the one I would insist upon — is to keep it off by default and enable it surgically when you need to investigate a specific problem.

When plan capture is enabled, pg_stat_monitor creates separate rows for each distinct plan used for the same query — identified by the planid column. This is how you detect plan changes: if the same queryid appears with multiple planid values across buckets, PostgreSQL switched plans.

Histogram configuration

The response time histogram is controlled by three parameters:

pgsm_histogram_min — the lower bound in milliseconds (default: 0)
pgsm_histogram_max — the upper bound in milliseconds (default: 100000, i.e. 100 seconds)
pgsm_histogram_buckets — the number of buckets between min and max (default: 10)

The default range of 0-100s with 10 buckets creates 10-second-wide histogram buckets — far too coarse for most OLTP workloads where you care about the difference between 1ms and 100ms. I'm afraid this default is rather like having a thermometer that only reads "cold" and "hot." Setting pgsm_histogram_min = 1, pgsm_histogram_max = 10000, and pgsm_histogram_buckets = 20 gives you 500ms-wide buckets from 1ms to 10s, plus two overflow buckets (below 1ms and above 10s) in pg_stat_monitor 2.0+. Much more informative.

Time-bucketed analysis: what happened in the last five minutes

This is the capability that, in my view, justifies the extension entirely. The core advantage of pg_stat_monitor over pg_stat_statements is the ability to compare query performance across time windows. Instead of a single cumulative row per query — a lifetime average that loses all sense of when — you get independent statistics for each time bucket. You get temporal context. You get the ability to say "this query was fine at 14:05 and catastrophic at 14:10."

Current bucket: what is slow right now

SQL — top queries in the active bucket

-- Top 10 queries by total execution time in the most recent bucket
SELECT
  bucket_start_time,
  substr(query, 1, 80) AS query_preview,
  calls,
  round(total_exec_time::numeric, 2) AS total_ms,
  round(mean_exec_time::numeric, 2) AS mean_ms,
  round(max_exec_time::numeric, 2) AS max_ms
FROM pg_stat_monitor
WHERE bucket_start_time = (
  SELECT max(bucket_start_time) FROM pg_stat_monitor
)
ORDER BY total_exec_time DESC
LIMIT 10;

This is the equivalent of checking pg_stat_statements immediately after a reset — but without actually resetting anything, and without losing your historical data in the bargain. The bucket isolates recent activity automatically. No sacrifice required.

Regression detection: what got slower

The real power, however, lies in comparison. By joining the current bucket against the previous bucket on queryid, you can identify queries whose performance changed — and the specificity of that comparison is what makes this worth the shared memory:

SQL — detect regressions between buckets

-- Compare a query's performance across the two most recent buckets
-- A spike in mean_exec_time between buckets signals a regression
WITH recent_buckets AS (
  SELECT DISTINCT bucket_start_time
  FROM pg_stat_monitor
  ORDER BY bucket_start_time DESC
  LIMIT 2
),
current AS (
  SELECT queryid, query, calls, mean_exec_time, total_exec_time
  FROM pg_stat_monitor
  WHERE bucket_start_time = (SELECT max(bucket_start_time) FROM recent_buckets)
),
previous AS (
  SELECT queryid, calls, mean_exec_time, total_exec_time
  FROM pg_stat_monitor
  WHERE bucket_start_time = (SELECT min(bucket_start_time) FROM recent_buckets)
)
SELECT
  substr(c.query, 1, 60) AS query_preview,
  p.calls AS prev_calls,
  c.calls AS curr_calls,
  round(p.mean_exec_time::numeric, 2) AS prev_mean_ms,
  round(c.mean_exec_time::numeric, 2) AS curr_mean_ms,
  round(
    ((c.mean_exec_time - p.mean_exec_time) / NULLIF(p.mean_exec_time, 0)) * 100
  , 1) AS pct_change
FROM current c
JOIN previous p USING (queryid)
WHERE p.mean_exec_time > 0
  AND c.mean_exec_time > p.mean_exec_time * 1.5  -- 50%+ regression
ORDER BY (c.mean_exec_time - p.mean_exec_time) * c.calls DESC
LIMIT 10;

This query surfaces every query whose mean execution time increased by 50% or more between the two most recent buckets, sorted by total impact (time increase multiplied by call count). A query that went from 2ms to 5ms but runs 100,000 times per bucket appears higher than one that went from 100ms to 10s but runs once.

With pg_stat_statements, this analysis requires an external system that snapshots the view periodically and computes the deltas — infrastructure you must build, maintain, and trust to have been running before the problem started. With pg_stat_monitor, it is a single SQL query against the built-in view. The data was already being collected. It was waiting for you to ask.

Per-query plan capture: seeing the actual execution plan

Identifying a regression is step one. Understanding why the regression happened is step two — and the more consequential of the pair. This is where plan capture earns its place on the staff.

When pgsm_enable_query_plan is set to yes, the query_plan column contains the text representation of the execution plan PostgreSQL used. This is the same output you would get from EXPLAIN — but captured automatically for every execution, without anyone having to run EXPLAIN manually.

SQL — retrieve captured plans

-- Retrieve stored execution plans for the slowest queries
-- Requires: pgsm_enable_query_plan = yes
SELECT
  substr(query, 1, 60) AS query_preview,
  calls,
  round(mean_exec_time::numeric, 2) AS mean_ms,
  planid,
  query_plan
FROM pg_stat_monitor
WHERE query_plan IS NOT NULL
  AND bucket_start_time = (
    SELECT max(bucket_start_time) FROM pg_stat_monitor
  )
ORDER BY mean_exec_time DESC
LIMIT 5;

Detecting plan changes

One of the most common causes of sudden query regressions — and one of the most maddening to diagnose without the right tools — is a plan change. PostgreSQL decides to use a different execution strategy, often after an ANALYZE updates table statistics or after data volume crosses a threshold that changes the planner's cost estimates. The query text hasn't changed. The schema hasn't changed. But the plan has, and performance has gone with it.

The planid column in pg_stat_monitor is a hash of the execution plan. When the same queryid appears with different planid values, the query's execution strategy changed:

SQL — detect plan changes

-- Detect queries that changed execution plans between buckets
-- A new planid for the same queryid often explains sudden regressions
SELECT
  queryid,
  substr(query, 1, 60) AS query_preview,
  bucket_start_time,
  planid,
  calls,
  round(mean_exec_time::numeric, 2) AS mean_ms
FROM pg_stat_monitor
WHERE queryid IN (
  SELECT queryid
  FROM pg_stat_monitor
  GROUP BY queryid
  HAVING count(DISTINCT planid) > 1
)
ORDER BY queryid, bucket_start_time DESC;

This is diagnostic information that pg_stat_statements simply cannot provide. It has no concept of plans — it groups by query text alone. If a query slows down because PostgreSQL switched from an index scan to a sequential scan after an ANALYZE, pg_stat_statements shows only the timing change — the symptom without the cause. pg_stat_monitor shows the timing change and the plan change, linking cause to effect. The difference between knowing that something went wrong and understanding why is the difference between helplessness and a fix.

For a deeper understanding of execution plans and how to read them, see the EXPLAIN ANALYZE guide.

Client and application attribution: who is responsible for this

pg_stat_statements groups statistics by user and database — which is rather like logging complaints by floor of the building without noting which room. pg_stat_monitor goes further: it records the client_ip and application_name for each query, allowing you to attribute database load to specific services, hosts, or connection pools.

SQL — load by client and application

-- Which applications and hosts are generating the most load?
SELECT
  client_ip,
  application_name,
  count(DISTINCT queryid) AS distinct_queries,
  sum(calls) AS total_calls,
  round(sum(total_exec_time)::numeric, 2) AS total_ms,
  round(avg(mean_exec_time)::numeric, 2) AS avg_mean_ms
FROM pg_stat_monitor
WHERE bucket_start_time = (
  SELECT max(bucket_start_time) FROM pg_stat_monitor
)
GROUP BY client_ip, application_name
ORDER BY total_ms DESC;

This answers questions that are difficult to answer with pg_stat_statements alone:

Which microservice is generating the most load? — group by application_name (assuming each service sets it in the connection string)
Is the regression coming from a specific host? — group by client_ip to see if a single application server is responsible
Did a particular deployment cause the regression? — if only one application host shows increased load after a deploy, the problem is likely in the code on that host, not the database

I should note: for this to be useful, your applications must set application_name in their connection strings. If they do not, you will see blank entries — and I will refrain from commenting on what that says about the household's discipline. Most PostgreSQL drivers support this:

Python (psycopg/asyncpg): application_name=myapp-api in the connection string
Node.js (pg): application_name in the connection config
Java (pgjdbc): ApplicationName=myapp-api in the JDBC URL
Go (pgx): application_name=myapp-api in the DSN

Histogram analysis: the mean is not telling you the whole truth

I have a particular grievance with mean execution time. It is not that the metric is useless — it is that it is selectively honest. A query with a mean of 10ms might be consistently 10ms — or it might be 1ms 99% of the time and 1,000ms the other 1%. Your users experiencing that 1% do not care about the mean. The operational impact of those two scenarios is profoundly different, and the mean cannot distinguish between them. It presents a tidy number and omits the chaos beneath.

pg_stat_monitor provides the resp_calls column — a PostgreSQL array that represents the response time histogram. Each element is the count of query executions whose response time fell within that histogram bucket.

SQL — response time histograms

-- Response time histogram for high-frequency queries
-- resp_calls is an array of call counts per histogram bucket
SELECT
  substr(query, 1, 60) AS query_preview,
  calls,
  round(mean_exec_time::numeric, 2) AS mean_ms,
  resp_calls
FROM pg_stat_monitor
WHERE calls > 100
  AND bucket_start_time = (
    SELECT max(bucket_start_time) FROM pg_stat_monitor
  )
ORDER BY calls DESC
LIMIT 5;

Reading a histogram

-- Example resp_calls output (20 histogram buckets, 1ms-10s range):
-- {0, 45, 312, 89, 12, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0}
--  ^                                                              ^
--  |                                                              |
--  <1ms                                                           ~9.5-10s
--
-- This query ran 462 times. 312 calls (67%) fell in the 3rd bucket.
-- But one call landed in bucket 19 — nearly 10 seconds.
-- The mean of 4.2ms hides that outlier entirely.

The histogram reveals the patterns that a well-mannered average would prefer you not see:

Bimodal distribution — most calls are fast, but a minority are very slow. This often indicates lock contention, buffer cache misses, or plan instability.
Long tail — the vast majority of calls cluster in the fast buckets, but occasional outliers reach the high end. This affects p99 latency and may indicate periodic table bloat or autovacuum interference.
Uniform spread — calls are distributed across multiple buckets. This is unusual and may indicate that the query's performance depends heavily on the specific parameter values.

To be fair, pg_stat_statements provides min_exec_time, max_exec_time, mean_exec_time, and stddev_exec_time — enough to calculate a rough spread. But the histogram shows the full shape of the distribution. The difference is the difference between being told "the average temperature in this room is comfortable" and being shown that one corner is freezing and another is on fire.

Additional monitoring capabilities

Error tracking

The elevel and message columns track queries that generated errors or warnings — a feature I find quietly indispensable. This lets you correlate error frequency with query patterns, useful for catching issues like constraint violations that spike after a code deployment:

SQL — error tracking

-- Queries that generated errors in the current bucket
SELECT
  substr(query, 1, 60) AS query_preview,
  elevel,
  message,
  calls,
  round(mean_exec_time::numeric, 2) AS mean_ms
FROM pg_stat_monitor
WHERE elevel > 0
  AND bucket_start_time = (
    SELECT max(bucket_start_time) FROM pg_stat_monitor
  )
ORDER BY calls DESC;

Command type breakdown

The cmd_type column categorizes each query as SELECT, INSERT, UPDATE, DELETE, or UTILITY — a quick inventory of what your database is actually spending its time on. This gives you a read/write ratio breakdown and helps identify whether load shifts are coming from read or write traffic:

SQL — command type breakdown

-- Break down load by command type
SELECT
  cmd_type,
  count(DISTINCT queryid) AS distinct_queries,
  sum(calls) AS total_calls,
  round(sum(total_exec_time)::numeric, 2) AS total_ms,
  round(avg(mean_exec_time)::numeric, 2) AS avg_mean_ms
FROM pg_stat_monitor
WHERE bucket_start_time = (
  SELECT max(bucket_start_time) FROM pg_stat_monitor
)
GROUP BY cmd_type
ORDER BY total_ms DESC;

The proper procedure: deploy, monitor, detect, fix

If you'll follow me, I should like to walk you through the complete procedure. The individual queries above are useful, but they are most powerful as part of a systematic workflow — a protocol, if you will, for maintaining the household's standards around code deployments, which are the event most likely to introduce query regressions:

The deploy-monitor-fix cycle

-- A complete workflow: deploy, monitor, detect, investigate

-- Step 1: Before deploying, note the current bucket baseline
SELECT max(bucket_start_time) AS pre_deploy_bucket FROM pg_stat_monitor;

-- Step 2: Deploy the code change

-- Step 3: Wait for one full bucket to fill (pgsm_bucket_time seconds)

-- Step 4: Compare post-deploy bucket to pre-deploy bucket
-- (use the regression detection query from the previous section)

-- Step 5: If a regression is found, enable plan capture
ALTER SYSTEM SET pg_stat_monitor.pgsm_enable_query_plan = 'yes';
SELECT pg_reload_conf();

-- Step 6: Wait for the next bucket to fill with plans attached

-- Step 7: Examine the plan for the regressed query
SELECT query, query_plan
FROM pg_stat_monitor
WHERE queryid = '<regressed_queryid>'
  AND query_plan IS NOT NULL
ORDER BY bucket_start_time DESC
LIMIT 1;

-- Step 8: Fix the query or add the missing index

-- Step 9: Disable plan capture
ALTER SYSTEM SET pg_stat_monitor.pgsm_enable_query_plan = 'no';
SELECT pg_reload_conf();

The key insight — and it is one I consider non-negotiable for any team deploying to production regularly — is that pg_stat_monitor turns regression detection from a reactive process ("something is slow, let me check the database") into a proactive one ("I deployed code, let me verify the queries are still in good order"). With 60-second buckets, you can detect a regression within minutes of deploying. Before users report it. Before the on-call page. Before the incident channel fills with questions no one can answer yet.

The complete investigation sequence, step by step

Detect — use the regression detection query to identify queries whose performance changed between buckets
Attribute — use client attribution to confirm which application or host is responsible for the changed query pattern
Diagnose — enable plan capture, wait for the next bucket, and examine the execution plan for the regressed query
Check for plan changes — use the planid comparison to see if PostgreSQL changed execution strategies
Fix — add the missing index, update statistics with ANALYZE, or fix the application code
Verify — wait for the next bucket and confirm the regression is resolved

Each step is a SQL query against the same pg_stat_monitor view. No external tools required, no log parsing, no manual EXPLAIN runs. The data is already there, collected quietly while you were attending to other matters. A well-run household anticipates what will be needed.

A proper comparison: pg_stat_monitor vs. pg_stat_statements

Both extensions intercept the same PostgreSQL hooks and track the same core metrics. The difference is in what additional data they capture and how they organize it. I have prepared an honest summary. Percona also maintains a detailed comparison in their documentation.

Feature	pg_stat_monitor	pg_stat_statements
Time-bucketed statistics	✓	—
Cumulative lifetime stats	—	✓
Execution plan capture	✓	—
Plan change detection (planid)	✓	—
Client IP tracking	✓	—
Application name grouping	✓	—
Response time histogram	✓	—
Error/warning tracking	✓	—
Command type (SELECT/INSERT/...)	✓	—
Included in PostgreSQL contrib	—	✓
Available on all managed providers	—	✓
Lower memory overhead	—	✓
Rows/shared blocks per query	✓	✓
Min/max/mean/stddev timing	✓	✓
WAL statistics	✓	✓

When pg_stat_statements alone will serve you well

You need the lightest possible monitoring overhead
You are on a managed provider that does not offer pg_stat_monitor (RDS, Aurora, Cloud SQL, Azure)
Your monitoring tools (pganalyze, Datadog) integrate with pg_stat_statements and you do not need time-bucketed analysis
Your workload is simple enough that "top 10 queries by total time" is sufficient

When pg_stat_monitor becomes essential

You need to detect regressions quickly after deployments
You want to understand latency distribution, not just averages
You need to attribute database load to specific applications or hosts
You want automatic execution plan capture without running EXPLAIN manually
You need to detect and correlate plan changes with performance regressions
You are running self-managed PostgreSQL or Percona Distribution for PostgreSQL

Cloud availability — an honest accounting

I'm afraid the news here is mixed. Because pg_stat_monitor is a third-party extension (not part of PostgreSQL's contrib modules), availability on managed platforms is limited:

Provider	Status
Self-managed PostgreSQL	Fully supported — install from Percona repos or source
Percona Distribution for PostgreSQL	Included and supported
Supabase	Available via the extensions dashboard
Amazon RDS / Aurora	Not available
Google Cloud SQL	Not available
Azure Database for PostgreSQL	Not available
Neon	Not available

If your provider does not support pg_stat_monitor, the situation is not hopeless — merely less convenient. You can snapshot pg_stat_statements periodically for time-based comparison, use auto_explain for plan capture (with log-based collection), and check pg_stat_activity for client attribution. It requires more effort and more infrastructure, but the information is available through other channels. One makes do with the staff one has.

How Gold Lapel approaches the same problem

I would be remiss if I did not mention that Gold Lapel addresses several of these same concerns from a different vantage point. Gold Lapel monitors every query at the proxy level — normalizing query text, tracking execution statistics across configurable time windows, and acting on what it finds. Time-bucketed analysis, client attribution, and regression detection are built into the proxy itself, without requiring an extension installed inside PostgreSQL.

There are advantages to both approaches. pg_stat_monitor observes from inside the database engine — it sees planning time, internal background operations, and queries from any client regardless of whether they pass through a proxy. Gold Lapel observes from the network layer — it sees application-level patterns like N+1 queries, connection pool behavior, and the gap between what the application intended and what the database received.

The two perspectives are complementary. pg_stat_monitor tells you what happened inside PostgreSQL. Gold Lapel tells you what happened in the space between the application and PostgreSQL — and acts on it. If you can run both, you have two independent sources of truth about the same workload, which is a healthy position to be in when diagnosing production issues at inconvenient hours.

For teams on managed providers where pg_stat_monitor is not available, Gold Lapel's proxy-level monitoring provides the time-bucketed analysis, regression detection, and client attribution that would otherwise require the extension — no database-side installation or restart needed. But I should be clear: everything in this guide works on its own, without Gold Lapel, and is worth implementing regardless. Good monitoring is good monitoring. The database was never the problem. It was the visibility.