pg_wait_sampling
If you'll permit me — knowing a query is slow is one thing. Knowing what it was waiting for is quite another.
When a query is slow, the natural question is where did the time go? I have found that most teams stop at "it took 800ms" and begin guessing. pg_wait_sampling removes the guesswork. It is a PostgreSQL extension that continuously samples wait events across all backend processes at configurable intervals, recording what each backend is waiting on — locks, I/O, IPC, client responses — and accumulating those samples into a statistical profile. Where pg_stat_statements tells you which queries are slow, pg_wait_sampling tells you what they were doing instead of working.
What pg_wait_sampling does
pg_wait_sampling launches a background worker that periodically checks the wait state of every PostgreSQL backend process. Each sample records the process ID, the wait event type (such as Lock, IO, or LWLock), and the specific wait event (such as WALWrite or relation). These samples are collected into two in-memory structures:
- History ring buffer — a fixed-size circular buffer of recent individual samples, each timestamped. When it fills up, the oldest entries are overwritten. This gives you a sliding window of recent wait activity.
- Profile hash table — an aggregated count of how many times each (process, wait event type, wait event) combination has been sampled. This accumulates over time and gives you a statistical picture of where backends spend their time.
The sampling approach is the key design choice — and an admirably restrained one. Rather than instrumenting every wait event inline (which would add overhead to every operation), pg_wait_sampling reads the existing wait event state that PostgreSQL already maintains in shared memory. The background worker simply reads these values at regular intervals — a lightweight operation that does not interfere with the sampled backends. It observes without disturbing, which is a quality I value highly.
When to use pg_wait_sampling
pg_wait_sampling is most valuable when you know something is slow but not why — which, in my experience, describes the majority of performance investigations that arrive at my door. Specific scenarios:
- Diagnosing lock contention — if queries are blocked on
Lockwait events, the profile shows which lock types and how frequently they occur - Identifying I/O bottlenecks — high counts of
IOwait events (likeDataFileReadorWALWrite) reveal when backends spend time waiting for disk - Understanding lightweight lock pressure —
LWLockwaits on events likebuffer_contentorWALInsertLockindicate contention on internal PostgreSQL data structures - Investigating idle-in-transaction connections —
Clientwait events show backends waiting for the application to send the next command - Correlating waits with queries — when
profile_queriesis enabled alongsidepg_stat_statements, you can see which query patterns generate which wait events - Building wait event dashboards — periodically export the profile to a monitoring system for historical trending
Installation and setup
pg_wait_sampling is a third-party extension developed by Postgres Professional. It is not included in the PostgreSQL contrib modules — you need to install it from source or from a package repository that carries it (such as PGDG). It requires PostgreSQL 13 or later.
Because it launches a background worker and allocates shared memory, it must be loaded via shared_preload_libraries, which requires a server restart.
-- 1. Add to postgresql.conf (requires restart)
shared_preload_libraries = 'pg_wait_sampling'
-- 2. Restart PostgreSQL, then create the extension
CREATE EXTENSION pg_wait_sampling;
-- 3. Verify it's working
SELECT * FROM pg_wait_sampling_profile LIMIT 5; After the restart, the background worker begins sampling immediately with the default settings. No further configuration is required — it simply starts paying attention, which is more than can be said for most monitoring approaches.
Configuration
All parameters are set in postgresql.conf or via ALTER SYSTEM. Changes to most parameters take effect after a server reload, but history_size requires a restart because it controls shared memory allocation.
-- Key configuration parameters (set in postgresql.conf)
pg_wait_sampling.history_size = 5000 -- ring buffer size (number of samples)
pg_wait_sampling.history_period = 10 -- history sampling interval (ms)
pg_wait_sampling.profile_period = 10 -- profile sampling interval (ms)
pg_wait_sampling.profile_pid = on -- collect per-PID profile data
pg_wait_sampling.profile_queries = none -- none | top | all (requires pg_stat_statements)
pg_wait_sampling.sample_cpu = off -- include non-waiting (CPU) samples The default 10ms sampling period is a good starting point. Lower values give finer resolution but consume more buffer space. Higher values reduce memory usage at the cost of missing short-lived wait events. For most production workloads, the defaults are reasonable.
Views
pg_wait_sampling exposes three views:
pg_wait_sampling_current
Shows what each backend is waiting on right now. This is similar to querying pg_stat_activity for wait events, but scoped to the extension's perspective.
-- See what every backend is waiting on right now
SELECT
pid,
event_type,
event
FROM pg_wait_sampling_current
WHERE event IS NOT NULL
ORDER BY pid; pg_wait_sampling_history
The ring buffer of recent samples. Each row is a single sample with a timestamp, process ID, and wait event. Useful for investigating what happened in the last few seconds or minutes.
-- Recent wait event samples from the ring buffer
SELECT
pid,
ts,
event_type,
event
FROM pg_wait_sampling_history
ORDER BY ts DESC
LIMIT 20; pg_wait_sampling_profile
The aggregated profile — how many times each wait event has been sampled per process since the last reset. This is the most useful view for understanding overall wait patterns.
-- Aggregated wait event profile — where backends spend their time
SELECT
event_type,
event,
sum(count) AS total_samples
FROM pg_wait_sampling_profile
GROUP BY event_type, event
ORDER BY total_samples DESC
LIMIT 20; One query, and you know where the time went. I find that rather satisfying.
Filtering by backend
To focus on a specific process, filter the profile by PID.
-- Wait profile for a specific backend process
SELECT
event_type,
event,
count
FROM pg_wait_sampling_profile
WHERE pid = 12345 -- replace with your backend PID
ORDER BY count DESC; Resetting statistics
Reset the accumulated profile to get a clean baseline — useful after deployments or configuration changes.
-- Reset all collected statistics
SELECT pg_wait_sampling_reset_profile(); Wait event types
PostgreSQL categorizes wait events into types. Understanding these types is essential for interpreting pg_wait_sampling output:
| Type | Meaning |
|---|---|
LWLock | Lightweight lock — contention on internal shared memory structures (buffer mapping, WAL insertion, etc.) |
Lock | Heavyweight lock — waiting for row locks, table locks, advisory locks, or other SQL-visible locks |
BufferPin | Waiting to pin a buffer — another process is examining the same buffer page |
Activity | Background process idle in its main loop (autovacuum sleeping, WAL writer idle, etc.) |
Client | Waiting for the client to send a command — idle connections or slow application response |
Extension | Wait events defined by extensions |
IPC | Inter-process communication — waiting for another backend to complete an operation |
Timeout | Waiting for a timeout to expire (vacuum delay, checkpoint delay, etc.) |
IO | Waiting for I/O — disk reads, writes, WAL sync, and other storage operations |
In a well-run household — if you'll allow me — you expect most of the staff to be standing ready, not struggling with locked doors or waiting for deliveries. The same principle applies here. In a healthy database, the profile shows mostly Activity (idle background workers) and Client (idle connections). High counts of Lock, IO, or LWLock events are the signals worth investigating.
Cloud availability
| Provider | Status |
|---|---|
| Amazon RDS / Aurora | Not available — use Performance Insights for wait event analysis |
| Google Cloud SQL | Not available — use Query Insights for wait event monitoring |
| Azure Database for PostgreSQL | Not available — use Query Performance Insight for wait statistics |
| Supabase | Not available — shared_preload_libraries not user-configurable |
| Neon | Not available — shared_preload_libraries not user-configurable |
pg_wait_sampling requires shared_preload_libraries access and installs a background worker, which most managed providers do not allow. Each major provider offers its own wait event analysis tooling as an alternative. For self-managed PostgreSQL or platforms like Postgres.AI DBLab, the extension installs normally.
How Gold Lapel relates
Allow me to be precise about the boundary. Gold Lapel's proxy sits between your application and PostgreSQL, observing every query before it reaches the database — tracking patterns, execution statistics, and timing data at the proxy level. I see what is being asked and how long it takes. I can act on that: materialized views, index recommendations, pattern analysis.
What I cannot see from the outside is what happens inside the database engine while that query runs. Is the backend waiting for a lock? Stalled on disk I/O? Blocked on WAL sync? That is pg_wait_sampling's domain, and I would not presume to duplicate it.
The two are complementary in the way that a maître d' and a kitchen manager are complementary. I know which dishes are taking too long to reach the table. pg_wait_sampling knows whether the delay was the stove, the pantry, or a bottleneck at the prep station. Together, you have the full picture. Neither alone is sufficient.