pg_stat_kcache
Track OS-level cache and disk I/O statistics per query. Your shared buffer stats have been keeping things from you.
Allow me to draw your attention to an uncomfortable truth: when pg_stat_statements reports a "shared buffer read," it cannot tell you whether that read was served from the OS page cache in microseconds or dragged from a spinning platter in milliseconds. It simply does not know. pg_stat_kcache knows.
pg_stat_kcache is a third-party PostgreSQL extension that uses the kernel's getrusage() system call to capture actual disk reads, disk writes, and CPU time for every query. It joins with pg_stat_statements via queryid, filling in the gap that shared buffer statistics leave: whether a "buffer read" came from the fast OS page cache or from a slow physical disk.
What pg_stat_kcache does
PostgreSQL's built-in statistics and pg_stat_statements tell you how many shared buffer hits and reads each query generates. A "shared buffer read" means the block was not in PostgreSQL's shared buffer cache — but it may still have been in the operating system's page cache, served in microseconds rather than milliseconds. From PostgreSQL's perspective, both cases look the same. From a performance perspective, they are completely different. One might say the household has been filing expense reports without distinguishing between the heating bill and the staff wages.
pg_stat_kcache resolves this ambiguity. By calling getrusage(2) before and after each query execution, it captures the kernel's accounting of actual physical I/O and CPU consumption. The result is a set of per-query counters — bytes read from disk, bytes written, user-mode CPU time, kernel-mode CPU time, page faults, and context switches — that reveal what is really happening at the OS level.
The extension stores its data in shared memory and exposes it through two views: pg_stat_kcache (aggregated per database) and pg_stat_kcache_detail (per query, joining with pg_stat_statements). The detail view is where the diagnostic value lives — it lets you correlate a specific query pattern with its true I/O and CPU footprint.
When to use pg_stat_kcache
pg_stat_kcache is most valuable when you need to go deeper than shared buffer statistics. Specific scenarios:
- Identifying truly I/O-bound queries — a query with high
shared_blks_readmight be entirely served from OS cache.pg_stat_kcachetells you the truth by reporting actual bytes read from disk. - Separating OS cache hits from real disk reads — compare
pg_stat_statements.shared_blks_readwithpg_stat_kcache_detail.exec_reads_blksto calculate the OS page cache hit ratio per query. - Finding CPU-bound queries — some queries burn CPU without touching disk.
exec_user_timeandexec_system_timesurface these directly. - Diagnosing context switch storms — high
exec_nivcsws(involuntary context switches) can indicate CPU contention; highexec_nvcsws(voluntary) can indicate I/O waits. - Validating shared_buffers sizing — if most "buffer reads" are hitting the OS cache (low
exec_readsrelative toshared_blks_read), your working set may be larger thanshared_buffersbut still fits in RAM.
Installation and setup
pg_stat_kcache is a third-party extension maintained by the PoWA team. It must be loaded via shared_preload_libraries alongside pg_stat_statements, which is a hard dependency. The order matters — pg_stat_statements must appear before pg_stat_kcache. Sequence, in this household, is not optional.
-- 1. Add to postgresql.conf (requires restart)
-- pg_stat_kcache must be loaded AFTER pg_stat_statements
shared_preload_libraries = 'pg_stat_statements,pg_stat_kcache'
-- 2. Restart PostgreSQL, then create both extensions
CREATE EXTENSION pg_stat_statements;
CREATE EXTENSION pg_stat_kcache;
-- 3. Verify it's working
SELECT * FROM pg_stat_kcache LIMIT 5; On Debian/Ubuntu systems with PGDG repositories, you can install the package directly: apt install postgresql-16-pg-stat-kcache. On RHEL/Rocky: yum install pg_stat_kcache_16. Replace 16 with your PostgreSQL major version.
After the restart, the extension begins tracking immediately using default settings. The pg_stat_kcache.track parameter (default top) controls whether nested statements (inside functions) are tracked separately. The pg_stat_kcache.track_planning parameter (default off, PostgreSQL 13+) enables separate tracking of planning-phase resource usage.
Key columns
The pg_stat_kcache_detail view is the primary diagnostic interface. It joins with pg_stat_statements by queryid and provides per-query OS-level metrics.
-- Key columns in pg_stat_kcache_detail (per-query view)
-- Identity
queryid -- matches pg_stat_statements.queryid
datname -- database name
toplevel -- true if top-level statement, false if nested
-- Disk I/O (execution phase)
exec_reads -- bytes read from disk during execution
exec_reads_blks -- 8K blocks read from disk during execution
exec_writes -- bytes written to disk during execution
exec_writes_blks -- 8K blocks written to disk during execution
-- CPU time (execution phase)
exec_user_time -- CPU time spent in user mode (seconds)
exec_system_time -- CPU time spent in kernel mode (seconds)
-- Context switches (execution phase)
exec_nvcsws -- voluntary context switches (I/O waits)
exec_nivcsws -- involuntary context switches (CPU preemption)
-- Page faults (execution phase)
exec_minflts -- soft page faults (served from OS cache)
exec_majflts -- hard page faults (required disk read)
-- Planning phase equivalents (PostgreSQL 13+, requires track_planning)
plan_reads, plan_writes, plan_user_time, plan_system_time, ... The aggregated pg_stat_kcache view sums these metrics per database. For query-level analysis, always use pg_stat_kcache_detail.
Note: on platforms where getrusage(2) does not maintain certain fields, those columns will be zero or NULL. On Linux, the I/O and CPU fields are reliably populated. Page fault and context switch counters depend on kernel version.
Practical queries
Most I/O-intensive queries
Find queries generating the most actual physical disk reads — not shared buffer reads, not OS cache reads, but real disk I/O as reported by the kernel.
-- Find the most I/O-intensive queries (actual disk reads)
SELECT
k.datname,
left(s.query, 80) AS query,
s.calls,
pg_size_pretty(k.exec_reads) AS disk_read,
k.exec_reads_blks AS blks_read,
round(k.exec_user_time::numeric, 3) AS cpu_user_s,
round(k.exec_system_time::numeric, 3) AS cpu_sys_s
FROM pg_stat_kcache_detail k
JOIN pg_stat_statements s
ON k.queryid = s.queryid AND k.datname = current_database()
WHERE k.toplevel = true
AND k.exec_reads > 0
ORDER BY k.exec_reads DESC
LIMIT 10; OS cache hit ratio per query
This is the query I would run first. It answers the question that pg_stat_statements alone cannot: how much of your "buffer reads" actually hit the OS page cache vs. requiring physical disk access. A high OS cache hit percentage means the data fits in RAM even though it was not in shared buffers. You were worrying about the wrong thing.
-- Compare shared buffer reads vs actual disk reads per query
-- This reveals whether "buffer reads" came from OS cache or real disk
SELECT
left(s.query, 60) AS query,
s.calls,
s.shared_blks_read AS buffer_reads,
k.exec_reads_blks AS disk_reads,
CASE
WHEN s.shared_blks_read > 0
THEN round(
(1.0 - k.exec_reads_blks::numeric / s.shared_blks_read) * 100, 1
)
ELSE NULL
END AS os_cache_hit_pct
FROM pg_stat_kcache_detail k
JOIN pg_stat_statements s
ON k.queryid = s.queryid AND k.datname = current_database()
WHERE k.toplevel = true
AND s.shared_blks_read > 100
ORDER BY s.shared_blks_read DESC
LIMIT 10; CPU-intensive queries
Not every slow query is waiting on disk. Some are simply thinking too hard. This surfaces the ones burning CPU on complex joins, sorts, or aggregations — the kind that benefit from a rewrite rather than a larger buffer pool.
-- Find CPU-intensive queries
SELECT
left(s.query, 80) AS query,
s.calls,
round(s.total_exec_time::numeric, 2) AS total_exec_ms,
round(k.exec_user_time::numeric, 3) AS cpu_user_s,
round(k.exec_system_time::numeric, 3) AS cpu_sys_s,
round((k.exec_user_time + k.exec_system_time)::numeric, 3) AS cpu_total_s
FROM pg_stat_kcache_detail k
JOIN pg_stat_statements s
ON k.queryid = s.queryid AND k.datname = current_database()
WHERE k.toplevel = true
AND (k.exec_user_time + k.exec_system_time) > 0
ORDER BY (k.exec_user_time + k.exec_system_time) DESC
LIMIT 10; Resetting statistics
Reset pg_stat_kcache counters, typically alongside pg_stat_statements.
-- Reset pg_stat_kcache statistics
SELECT pg_stat_kcache_reset();
-- You typically want to reset pg_stat_statements at the same time
SELECT pg_stat_statements_reset(); Cloud availability
| Provider | Status |
|---|---|
| Amazon RDS / Aurora | Not available — requires loading a third-party shared library, which RDS does not support |
| Google Cloud SQL | Not available — not in the supported extensions list |
| Azure Database for PostgreSQL | Not available — not in the supported extensions list |
| Supabase | Not available — not in the supported extensions list |
| Neon | Not available — not in the supported extensions list |
| Self-hosted / VMs | Available — install from PGDG packages or build from source |
pg_stat_kcache is a third-party extension that requires shared_preload_libraries access. Most managed PostgreSQL services do not support it because they restrict which shared libraries can be loaded. It is primarily available on self-hosted PostgreSQL, VMs, and container-based deployments where you control postgresql.conf. If you manage your own PostgreSQL, this is one of the privileges that comes with it — and I would not let it go unused.
How Gold Lapel relates
I should be straightforward about what Gold Lapel can and cannot see. Gold Lapel observes every query at the SQL protocol layer — timing, frequency, patterns. What it cannot observe is what happens beneath PostgreSQL, at the kernel level. That is pg_stat_kcache's domain, and I would not presume to duplicate it.
The two are natural companions. When Gold Lapel identifies a query pattern worth optimizing, pg_stat_kcache can confirm whether the cost is driven by physical I/O, CPU, or both — and that distinction determines whether the correct remedy is an index, a materialized view, or a rewrite. Gold Lapel proposes the optimization. pg_stat_kcache validates the result at the OS level. Each attends to its own floor of the household.
For self-hosted deployments where pg_stat_kcache is available, I would encourage installing both. Gold Lapel handles the optimization; pg_stat_kcache provides the kernel-level confirmation that the optimization actually reduced physical I/O, not merely rearranged which cache served it.