pg_stat_kcache

Q: What is the difference between pg_stat_statements shared_blks_read and pg_stat_kcache exec_reads?

This is the central question, and the answer is worth understanding precisely. pg_stat_statements shared_blks_read counts blocks that were not in PostgreSQL shared buffers — but those blocks may still have been in the operating system page cache, served quickly and quietly. pg_stat_kcache exec_reads reports the bytes that actually required physical disk I/O, as reported by the kernel via getrusage(). The difference between the two reveals how much of your supposed "disk reads" never touched a disk at all.

Q: Does pg_stat_kcache add noticeable overhead?

Negligible. pg_stat_kcache calls getrusage() before and after each query — a lightweight system call that the kernel handles with minimal fuss. The cost is comparable to pg_stat_statements itself. On most workloads, you will not find it on any chart. The diagnostic value comfortably outweighs the cost.

Q: Why are reads and writes showing zero or NULL?

The getrusage() system call is platform-dependent, and not all platforms are equally forthcoming. On Linux, the reads and writes fields (ru_inblock and ru_outblock) are reliably populated by the kernel. On some other platforms, they may not be maintained — NULL values indicate your platform declines to report them. Zeros, on the other hand, are good news: they mean the queries were served entirely from the OS page cache with no physical disk I/O.

Q: Does pg_stat_kcache work without pg_stat_statements?

No, and this is not negotiable. pg_stat_kcache depends on pg_stat_statements for query identification — it hooks into the same query processing pipeline and uses the queryid to associate resource usage with specific query patterns. Both must be listed in shared_preload_libraries, with pg_stat_statements appearing first. They are a pair, and each relies on the other.

Q: What does track_planning do in pg_stat_kcache?

When pg_stat_kcache.track_planning is enabled (requires PostgreSQL 13+), the extension records resource usage during the planning phase separately from execution. This surfaces as plan_reads, plan_writes, plan_user_time, and plan_system_time columns. It is off by default, and rightly so — planning overhead tracking is rarely the bottleneck you are looking for. But when you suspect the planner itself is consuming resources on complex queries with many joins or partitions, this is how you confirm it.

Track OS-level cache and disk I/O statistics per query — the layer of detail that shared buffer stats alone cannot provide.

The Waiter of Gold Lapel · Updated Mar 30, 2026 Published Mar 21, 2026 · 7 min read

Allow me to draw your attention to an uncomfortable truth: when pg_stat_statements reports a "shared buffer read," it cannot tell you whether that read was served from the OS page cache in microseconds or dragged from a spinning platter in milliseconds. It simply does not know. pg_stat_kcache knows.

pg_stat_kcache is a third-party PostgreSQL extension that uses the kernel's getrusage() system call to capture actual disk reads, disk writes, and CPU time for every query. It joins with pg_stat_statements via queryid, filling in the gap that shared buffer statistics leave: whether a "buffer read" came from the fast OS page cache or from a slow physical disk.

What pg_stat_kcache does

PostgreSQL's built-in statistics and pg_stat_statements tell you how many shared buffer hits and reads each query generates. A "shared buffer read" means the block was not in PostgreSQL's shared buffer cache — but it may still have been in the operating system's page cache, served in microseconds rather than milliseconds. From PostgreSQL's perspective, both cases look the same. From a performance perspective, they could not be more different.

pg_stat_kcache resolves this ambiguity. By calling getrusage(2) before and after each query execution, it captures the kernel's accounting of actual physical I/O and CPU consumption. The result is a set of per-query counters — bytes read from disk, bytes written, user-mode CPU time, kernel-mode CPU time, page faults, and context switches — that reveal what is really happening at the OS level.

The extension stores its data in shared memory and exposes it through two views: pg_stat_kcache (aggregated per database) and pg_stat_kcache_detail (per query, joining with pg_stat_statements). The detail view is where the diagnostic value lives — it lets you correlate a specific query pattern with its true I/O and CPU footprint.

When to use pg_stat_kcache

pg_stat_kcache is most valuable when you need to go deeper than shared buffer statistics. Specific scenarios:

Identifying truly I/O-bound queries — a query with high shared_blks_read might be entirely served from OS cache. pg_stat_kcache tells you the truth by reporting actual bytes read from disk.
Separating OS cache hits from real disk reads — compare pg_stat_statements.shared_blks_read with pg_stat_kcache_detail.exec_reads_blks to calculate the OS page cache hit ratio per query.
Finding CPU-bound queries — some queries burn CPU without touching disk. exec_user_time and exec_system_time surface these directly.
Diagnosing context switch storms — high exec_nivcsws (involuntary context switches) can indicate CPU contention; high exec_nvcsws (voluntary) can indicate I/O waits.
Validating shared_buffers sizing — if most "buffer reads" are hitting the OS cache (low exec_reads relative to shared_blks_read), your working set may be larger than shared_buffers but still fits in RAM.

Installation and setup

pg_stat_kcache is a third-party extension maintained by the PoWA team. It must be loaded via shared_preload_libraries alongside pg_stat_statements, which is a hard dependency. The order matters — pg_stat_statements must appear before pg_stat_kcache.

postgresql.conf + SQL

-- 1. Add to postgresql.conf (requires restart)
-- pg_stat_kcache must be loaded AFTER pg_stat_statements
shared_preload_libraries = 'pg_stat_statements,pg_stat_kcache'

-- 2. Restart PostgreSQL, then create both extensions
CREATE EXTENSION pg_stat_statements;
CREATE EXTENSION pg_stat_kcache;

-- 3. Verify it's working
SELECT * FROM pg_stat_kcache LIMIT 5;

On Debian/Ubuntu systems with PGDG repositories, you can install the package directly: apt install postgresql-16-pg-stat-kcache. On RHEL/Rocky: yum install pg_stat_kcache_16. Replace 16 with your PostgreSQL major version.

After the restart, the extension begins tracking immediately using default settings. The pg_stat_kcache.track parameter (default top) controls whether nested statements (inside functions) are tracked separately. The pg_stat_kcache.track_planning parameter (default off, PostgreSQL 13+) enables separate tracking of planning-phase resource usage.

Key columns

The pg_stat_kcache_detail view is the primary diagnostic interface. It joins with pg_stat_statements by queryid and provides per-query OS-level metrics.

Column reference

-- Key columns in pg_stat_kcache_detail (per-query view)
-- Identity
queryid            -- matches pg_stat_statements.queryid
datname            -- database name
toplevel           -- true if top-level statement, false if nested

-- Disk I/O (execution phase)
exec_reads         -- bytes read from disk during execution
exec_reads_blks    -- 8K blocks read from disk during execution
exec_writes        -- bytes written to disk during execution
exec_writes_blks   -- 8K blocks written to disk during execution

-- CPU time (execution phase)
exec_user_time     -- CPU time spent in user mode (seconds)
exec_system_time   -- CPU time spent in kernel mode (seconds)

-- Context switches (execution phase)
exec_nvcsws        -- voluntary context switches (I/O waits)
exec_nivcsws       -- involuntary context switches (CPU preemption)

-- Page faults (execution phase)
exec_minflts       -- soft page faults (served from OS cache)
exec_majflts       -- hard page faults (required disk read)

-- Planning phase equivalents (PostgreSQL 13+, requires track_planning)
plan_reads, plan_writes, plan_user_time, plan_system_time, ...

The aggregated pg_stat_kcache view sums these metrics per database. For query-level analysis, always use pg_stat_kcache_detail. The full column reference is documented in the project's README.

Note: on platforms where getrusage(2) does not maintain certain fields, those columns will be zero or NULL. On Linux, the I/O and CPU fields are reliably populated. Page fault and context switch counters depend on kernel version.

Practical queries

Most I/O-intensive queries

Find queries generating the most actual physical disk reads — not shared buffer reads, not OS cache reads, but real disk I/O as reported by the kernel.

SQL

-- Find the most I/O-intensive queries (actual disk reads)
SELECT
  k.datname,
  left(s.query, 80) AS query,
  s.calls,
  pg_size_pretty(k.exec_reads) AS disk_read,
  k.exec_reads_blks AS blks_read,
  round(k.exec_user_time::numeric, 3) AS cpu_user_s,
  round(k.exec_system_time::numeric, 3) AS cpu_sys_s
FROM pg_stat_kcache_detail k
JOIN pg_stat_statements s
  ON k.queryid = s.queryid AND k.datname = current_database()
WHERE k.toplevel = true
  AND k.exec_reads > 0
ORDER BY k.exec_reads DESC
LIMIT 10;

OS cache hit ratio per query

This is the query I would run first. It answers the question that pg_stat_statements alone cannot: how much of your "buffer reads" actually hit the OS page cache vs. requiring physical disk access. A high OS cache hit percentage means the data fits in RAM even though it was not in shared buffers — a distinction that changes the diagnosis entirely.

SQL

-- Compare shared buffer reads vs actual disk reads per query
-- This reveals whether "buffer reads" came from OS cache or real disk
SELECT
  left(s.query, 60) AS query,
  s.calls,
  s.shared_blks_read AS buffer_reads,
  k.exec_reads_blks AS disk_reads,
  CASE
    WHEN s.shared_blks_read > 0
    THEN round(
      (1.0 - k.exec_reads_blks::numeric / s.shared_blks_read) * 100, 1
    )
    ELSE NULL
  END AS os_cache_hit_pct
FROM pg_stat_kcache_detail k
JOIN pg_stat_statements s
  ON k.queryid = s.queryid AND k.datname = current_database()
WHERE k.toplevel = true
  AND s.shared_blks_read > 100
ORDER BY s.shared_blks_read DESC
LIMIT 10;

CPU-intensive queries

Not every slow query is waiting on disk. Some are simply thinking too hard. This surfaces the ones burning CPU on complex joins, sorts, or aggregations — the kind that benefit from a rewrite rather than a larger buffer pool.

SQL

-- Find CPU-intensive queries
SELECT
  left(s.query, 80) AS query,
  s.calls,
  round(s.total_exec_time::numeric, 2) AS total_exec_ms,
  round(k.exec_user_time::numeric, 3) AS cpu_user_s,
  round(k.exec_system_time::numeric, 3) AS cpu_sys_s,
  round((k.exec_user_time + k.exec_system_time)::numeric, 3) AS cpu_total_s
FROM pg_stat_kcache_detail k
JOIN pg_stat_statements s
  ON k.queryid = s.queryid AND k.datname = current_database()
WHERE k.toplevel = true
  AND (k.exec_user_time + k.exec_system_time) > 0
ORDER BY (k.exec_user_time + k.exec_system_time) DESC
LIMIT 10;

Resetting statistics

Reset pg_stat_kcache counters, typically alongside pg_stat_statements.

SQL

-- Reset pg_stat_kcache statistics
SELECT pg_stat_kcache_reset();

-- You typically want to reset pg_stat_statements at the same time
SELECT pg_stat_statements_reset();

Cloud availability

Provider	Status
Amazon RDS / Aurora	Not available — requires loading a third-party shared library, which RDS does not support
Google Cloud SQL	Not available — not in the supported extensions list
Azure Database for PostgreSQL	Not available — not in the supported extensions list
Supabase	Not available — not in the supported extensions list
Neon	Not available — not in the supported extensions list
Self-hosted / VMs	Available — install from PGDG packages or build from source

pg_stat_kcache is a third-party extension that requires shared_preload_libraries access. Most managed PostgreSQL services do not support it because they restrict which shared libraries can be loaded. It is primarily available on self-hosted PostgreSQL, VMs, and container-based deployments where you control postgresql.conf. If you manage your own PostgreSQL, this is one of the privileges that comes with it — and I would not let it go unused.

How Gold Lapel relates

A word about what Gold Lapel can and cannot see. Gold Lapel observes every query at the SQL protocol layer — timing, frequency, patterns. What it cannot observe is what happens beneath PostgreSQL, at the kernel level. That is pg_stat_kcache's domain, and I would not presume to duplicate it.

The two are natural companions. When Gold Lapel identifies a query pattern worth optimizing, pg_stat_kcache can confirm whether the cost is driven by physical I/O, CPU, or both — and that distinction determines whether the correct remedy is an index, a materialized view, or a rewrite. Gold Lapel proposes the optimization. pg_stat_kcache validates the result at the OS level. Each attends to its own floor of the household.

For self-hosted deployments where pg_stat_kcache is available, I would encourage installing both. Gold Lapel handles the optimization; pg_stat_kcache provides the kernel-level confirmation that the optimization actually reduced physical I/O, not merely rearranged which cache served it.

pg_stat_kcache

What pg_stat_kcache does

When to use pg_stat_kcache

Installation and setup

Key columns

Practical queries

Most I/O-intensive queries

OS cache hit ratio per query

CPU-intensive queries

Resetting statistics

Cloud availability

How Gold Lapel relates

Frequently asked questions

What is the difference between pg_stat_statements shared_blks_read and pg_stat_kcache exec_reads?

Does pg_stat_kcache add noticeable overhead?

Why are reads and writes showing zero or NULL?

Does pg_stat_kcache work without pg_stat_statements?

What does track_planning do in pg_stat_kcache?

Related content