← PostgreSQL Extensions

pg_stat_kcache

Track OS-level cache and disk I/O statistics per query. Your shared buffer stats have been keeping things from you.

Extension · March 21, 2026 · 7 min read

Allow me to draw your attention to an uncomfortable truth: when pg_stat_statements reports a "shared buffer read," it cannot tell you whether that read was served from the OS page cache in microseconds or dragged from a spinning platter in milliseconds. It simply does not know. pg_stat_kcache knows.

pg_stat_kcache is a third-party PostgreSQL extension that uses the kernel's getrusage() system call to capture actual disk reads, disk writes, and CPU time for every query. It joins with pg_stat_statements via queryid, filling in the gap that shared buffer statistics leave: whether a "buffer read" came from the fast OS page cache or from a slow physical disk.

What pg_stat_kcache does

PostgreSQL's built-in statistics and pg_stat_statements tell you how many shared buffer hits and reads each query generates. A "shared buffer read" means the block was not in PostgreSQL's shared buffer cache — but it may still have been in the operating system's page cache, served in microseconds rather than milliseconds. From PostgreSQL's perspective, both cases look the same. From a performance perspective, they are completely different. One might say the household has been filing expense reports without distinguishing between the heating bill and the staff wages.

pg_stat_kcache resolves this ambiguity. By calling getrusage(2) before and after each query execution, it captures the kernel's accounting of actual physical I/O and CPU consumption. The result is a set of per-query counters — bytes read from disk, bytes written, user-mode CPU time, kernel-mode CPU time, page faults, and context switches — that reveal what is really happening at the OS level.

The extension stores its data in shared memory and exposes it through two views: pg_stat_kcache (aggregated per database) and pg_stat_kcache_detail (per query, joining with pg_stat_statements). The detail view is where the diagnostic value lives — it lets you correlate a specific query pattern with its true I/O and CPU footprint.

When to use pg_stat_kcache

pg_stat_kcache is most valuable when you need to go deeper than shared buffer statistics. Specific scenarios:

  • Identifying truly I/O-bound queries — a query with high shared_blks_read might be entirely served from OS cache. pg_stat_kcache tells you the truth by reporting actual bytes read from disk.
  • Separating OS cache hits from real disk reads — compare pg_stat_statements.shared_blks_read with pg_stat_kcache_detail.exec_reads_blks to calculate the OS page cache hit ratio per query.
  • Finding CPU-bound queries — some queries burn CPU without touching disk. exec_user_time and exec_system_time surface these directly.
  • Diagnosing context switch storms — high exec_nivcsws (involuntary context switches) can indicate CPU contention; high exec_nvcsws (voluntary) can indicate I/O waits.
  • Validating shared_buffers sizing — if most "buffer reads" are hitting the OS cache (low exec_reads relative to shared_blks_read), your working set may be larger than shared_buffers but still fits in RAM.

Installation and setup

pg_stat_kcache is a third-party extension maintained by the PoWA team. It must be loaded via shared_preload_libraries alongside pg_stat_statements, which is a hard dependency. The order matters — pg_stat_statements must appear before pg_stat_kcache. Sequence, in this household, is not optional.

postgresql.conf + SQL
-- 1. Add to postgresql.conf (requires restart)
-- pg_stat_kcache must be loaded AFTER pg_stat_statements
shared_preload_libraries = 'pg_stat_statements,pg_stat_kcache'

-- 2. Restart PostgreSQL, then create both extensions
CREATE EXTENSION pg_stat_statements;
CREATE EXTENSION pg_stat_kcache;

-- 3. Verify it's working
SELECT * FROM pg_stat_kcache LIMIT 5;

On Debian/Ubuntu systems with PGDG repositories, you can install the package directly: apt install postgresql-16-pg-stat-kcache. On RHEL/Rocky: yum install pg_stat_kcache_16. Replace 16 with your PostgreSQL major version.

After the restart, the extension begins tracking immediately using default settings. The pg_stat_kcache.track parameter (default top) controls whether nested statements (inside functions) are tracked separately. The pg_stat_kcache.track_planning parameter (default off, PostgreSQL 13+) enables separate tracking of planning-phase resource usage.

Key columns

The pg_stat_kcache_detail view is the primary diagnostic interface. It joins with pg_stat_statements by queryid and provides per-query OS-level metrics.

Column reference
-- Key columns in pg_stat_kcache_detail (per-query view)
-- Identity
queryid            -- matches pg_stat_statements.queryid
datname            -- database name
toplevel           -- true if top-level statement, false if nested

-- Disk I/O (execution phase)
exec_reads         -- bytes read from disk during execution
exec_reads_blks    -- 8K blocks read from disk during execution
exec_writes        -- bytes written to disk during execution
exec_writes_blks   -- 8K blocks written to disk during execution

-- CPU time (execution phase)
exec_user_time     -- CPU time spent in user mode (seconds)
exec_system_time   -- CPU time spent in kernel mode (seconds)

-- Context switches (execution phase)
exec_nvcsws        -- voluntary context switches (I/O waits)
exec_nivcsws       -- involuntary context switches (CPU preemption)

-- Page faults (execution phase)
exec_minflts       -- soft page faults (served from OS cache)
exec_majflts       -- hard page faults (required disk read)

-- Planning phase equivalents (PostgreSQL 13+, requires track_planning)
plan_reads, plan_writes, plan_user_time, plan_system_time, ...

The aggregated pg_stat_kcache view sums these metrics per database. For query-level analysis, always use pg_stat_kcache_detail.

Note: on platforms where getrusage(2) does not maintain certain fields, those columns will be zero or NULL. On Linux, the I/O and CPU fields are reliably populated. Page fault and context switch counters depend on kernel version.

Practical queries

Most I/O-intensive queries

Find queries generating the most actual physical disk reads — not shared buffer reads, not OS cache reads, but real disk I/O as reported by the kernel.

SQL
-- Find the most I/O-intensive queries (actual disk reads)
SELECT
  k.datname,
  left(s.query, 80) AS query,
  s.calls,
  pg_size_pretty(k.exec_reads) AS disk_read,
  k.exec_reads_blks AS blks_read,
  round(k.exec_user_time::numeric, 3) AS cpu_user_s,
  round(k.exec_system_time::numeric, 3) AS cpu_sys_s
FROM pg_stat_kcache_detail k
JOIN pg_stat_statements s
  ON k.queryid = s.queryid AND k.datname = current_database()
WHERE k.toplevel = true
  AND k.exec_reads > 0
ORDER BY k.exec_reads DESC
LIMIT 10;

OS cache hit ratio per query

This is the query I would run first. It answers the question that pg_stat_statements alone cannot: how much of your "buffer reads" actually hit the OS page cache vs. requiring physical disk access. A high OS cache hit percentage means the data fits in RAM even though it was not in shared buffers. You were worrying about the wrong thing.

SQL
-- Compare shared buffer reads vs actual disk reads per query
-- This reveals whether "buffer reads" came from OS cache or real disk
SELECT
  left(s.query, 60) AS query,
  s.calls,
  s.shared_blks_read AS buffer_reads,
  k.exec_reads_blks AS disk_reads,
  CASE
    WHEN s.shared_blks_read > 0
    THEN round(
      (1.0 - k.exec_reads_blks::numeric / s.shared_blks_read) * 100, 1
    )
    ELSE NULL
  END AS os_cache_hit_pct
FROM pg_stat_kcache_detail k
JOIN pg_stat_statements s
  ON k.queryid = s.queryid AND k.datname = current_database()
WHERE k.toplevel = true
  AND s.shared_blks_read > 100
ORDER BY s.shared_blks_read DESC
LIMIT 10;

CPU-intensive queries

Not every slow query is waiting on disk. Some are simply thinking too hard. This surfaces the ones burning CPU on complex joins, sorts, or aggregations — the kind that benefit from a rewrite rather than a larger buffer pool.

SQL
-- Find CPU-intensive queries
SELECT
  left(s.query, 80) AS query,
  s.calls,
  round(s.total_exec_time::numeric, 2) AS total_exec_ms,
  round(k.exec_user_time::numeric, 3) AS cpu_user_s,
  round(k.exec_system_time::numeric, 3) AS cpu_sys_s,
  round((k.exec_user_time + k.exec_system_time)::numeric, 3) AS cpu_total_s
FROM pg_stat_kcache_detail k
JOIN pg_stat_statements s
  ON k.queryid = s.queryid AND k.datname = current_database()
WHERE k.toplevel = true
  AND (k.exec_user_time + k.exec_system_time) > 0
ORDER BY (k.exec_user_time + k.exec_system_time) DESC
LIMIT 10;

Resetting statistics

Reset pg_stat_kcache counters, typically alongside pg_stat_statements.

SQL
-- Reset pg_stat_kcache statistics
SELECT pg_stat_kcache_reset();

-- You typically want to reset pg_stat_statements at the same time
SELECT pg_stat_statements_reset();

Cloud availability

ProviderStatus
Amazon RDS / AuroraNot available — requires loading a third-party shared library, which RDS does not support
Google Cloud SQLNot available — not in the supported extensions list
Azure Database for PostgreSQLNot available — not in the supported extensions list
SupabaseNot available — not in the supported extensions list
NeonNot available — not in the supported extensions list
Self-hosted / VMsAvailable — install from PGDG packages or build from source

pg_stat_kcache is a third-party extension that requires shared_preload_libraries access. Most managed PostgreSQL services do not support it because they restrict which shared libraries can be loaded. It is primarily available on self-hosted PostgreSQL, VMs, and container-based deployments where you control postgresql.conf. If you manage your own PostgreSQL, this is one of the privileges that comes with it — and I would not let it go unused.

How Gold Lapel relates

I should be straightforward about what Gold Lapel can and cannot see. Gold Lapel observes every query at the SQL protocol layer — timing, frequency, patterns. What it cannot observe is what happens beneath PostgreSQL, at the kernel level. That is pg_stat_kcache's domain, and I would not presume to duplicate it.

The two are natural companions. When Gold Lapel identifies a query pattern worth optimizing, pg_stat_kcache can confirm whether the cost is driven by physical I/O, CPU, or both — and that distinction determines whether the correct remedy is an index, a materialized view, or a rewrite. Gold Lapel proposes the optimization. pg_stat_kcache validates the result at the OS level. Each attends to its own floor of the household.

For self-hosted deployments where pg_stat_kcache is available, I would encourage installing both. Gold Lapel handles the optimization; pg_stat_kcache provides the kernel-level confirmation that the optimization actually reduced physical I/O, not merely rearranged which cache served it.

Frequently asked questions