PostgreSQL Buffer Cache Analysis with pg_buffercache
If I may, a proper inventory of what PostgreSQL keeps in memory — and what it reveals about the state of your household.
What is in the PostgreSQL buffer cache?
I should like to conduct a brief inventory. Every well-run household knows what is in its larder at any given moment — and PostgreSQL is no different. It does not read data directly from disk for every query. It maintains a region of shared memory called the shared buffer cache, controlled by the shared_buffers parameter, where it keeps copies of disk pages in RAM. Every table read, index lookup, and write operation passes through this cache. When a query needs a page that is already cached, it reads from memory. When the page is not cached, PostgreSQL reads it from disk (or the OS page cache) into a buffer slot before processing it.
The cache is organized as a fixed-size array of 8KB pages — the same page size PostgreSQL uses on disk. Each slot in the array can hold one page from any relation in any database on the server. All backends (connections) share this same pool. A slot holding a page from the orders table is indistinguishable in structure from one holding a page from an index or a TOAST table. The cache does not know or care about tables — it knows about pages.
This design has a consequence that matters for analysis: you cannot tell from standard statistics which specific pages are cached at any given moment. You know the overall hit ratio from pg_stat_user_tables, but that is an aggregate over time. It does not tell you whether your most critical table is fully resident in memory right now, or whether a bulk import just evicted half of it.
That is the gap pg_buffercache fills. It gives you a row-by-row view of every slot in the buffer cache — which relation owns it, which page number it represents, whether it has been modified, and how recently it was accessed. Instead of inferring cache behaviour from hit ratios, you can open the pantry door and look. I find the difference between guessing and knowing to be rather material.
How do you install pg_buffercache?
Allow me to put you at ease. pg_buffercache is a contrib module that ships with every PostgreSQL installation. It does not require shared_preload_libraries, so there is no server restart. One statement and the view is available.
-- pg_buffercache ships with PostgreSQL — no external packages needed
CREATE EXTENSION IF NOT EXISTS pg_buffercache;
-- Confirm it is working
SELECT count(*) FROM pg_buffercache; The extension requires the pg_monitor role (or superuser privileges) to read the view. On managed cloud platforms — Amazon RDS, Google Cloud SQL, Azure Database for PostgreSQL, Supabase, and Neon — it is available and can be installed with CREATE EXTENSION. Azure requires adding it to the azure.extensions allowlist first.
What are the pg_buffercache view columns?
The pg_buffercache view exposes one row per buffer slot — think of it as a manifest for every shelf in the larder. Unused slots appear with null values in all columns except bufferid. These are the columns you will work with.
-- The pg_buffercache view: one row per buffer slot in shared_buffers
bufferid -- integer: buffer slot number (1 .. shared_buffers)
relfilenode -- oid: filenode of the cached relation (null if unused)
reltablespace -- oid: tablespace OID of the relation
reldatabase -- oid: database OID (0 for shared catalogs)
relforknumber -- smallint: fork number within the relation
relblocknumber -- bigint: page number within the relation
isdirty -- boolean: page modified in memory, not yet flushed to disk
usagecount -- smallint: clock-sweep access count (0-5)
pinning_backends -- integer: backends currently pinning this buffer Three columns deserve special attention:
usagecountreflects PostgreSQL's clock-sweep eviction algorithm. Each time a backend accesses a buffer, its usage count increments, capped at 5. The clock-sweep hand rotates through all buffers, decrementing counts as it goes, and evicts the first buffer it finds at 0. A page at usagecount 5 has been accessed repeatedly and recently — it is well-established. A page at 0 is next in line for eviction.isdirtyindicates the page has been modified in memory but not yet written back to disk. Dirty pages are flushed by the background writer, the checkpointer, or when a backend needs the slot and no clean buffer is available. A high concentration of dirty buffers means the write rate is outpacing the flushing rate.pinning_backendscounts how many backends are currently using (pinning) the buffer. A pinned buffer cannot be evicted. Under normal operations, this is usually 0 or 1. Persistently high pin counts on specific buffers may indicate contention.
How much of each table is cached?
This is the question that brings most guests to my door. Which tables and indexes are actually resident in memory, and what fraction of each is cached? One does not manage a household by assuming the larder is well-stocked. One checks. This query joins the buffer cache to pg_class to translate filenodes into relation names.
SELECT
c.relname,
count(*) AS buffers,
pg_size_pretty(count(*) * 8192::bigint) AS cached_size,
pg_size_pretty(pg_relation_size(c.oid)) AS total_size,
round(
100.0 * count(*) * 8192 / pg_relation_size(c.oid), 1
) AS pct_cached
FROM pg_buffercache b
JOIN pg_class c
ON b.relfilenode = pg_relation_filenode(c.oid)
AND b.reldatabase IN (0, (SELECT oid FROM pg_database
WHERE datname = current_database()))
WHERE pg_relation_size(c.oid) > 0
GROUP BY c.oid, c.relname
ORDER BY buffers DESC
LIMIT 20; Sample output on a system with 660MB of shared_buffers:
relname | buffers | cached_size | total_size | pct_cached
------------------+---------+-------------+------------+-----------
orders | 48210 | 377 MB | 1124 MB | 33.5
idx_orders_cust | 12847 | 100 MB | 284 MB | 35.3
line_items | 9403 | 73 MB | 892 MB | 8.2
customers | 4812 | 38 MB | 38 MB | 100.0
products | 1203 | 9416 kB | 42 MB | 22.4 This tells you several things at a glance. The customers table is 100% cached — every page fits in memory, and the table is accessed often enough that nothing has been evicted. The orders table is only 33.5% cached, meaning two-thirds of its pages must be read from disk (or the OS page cache) when accessed. The line_items table has the worst coverage at 8.2% — queries against it are likely I/O-bound.
The idx_orders_cust index consuming 100MB of cache is worth noting. Index pages are cached alongside table pages, competing for the same buffer slots. An index that dominates the cache may be pushing table pages out. Whether that is acceptable depends on whether the index is earning its keep — a member of staff who consumes a generous salary ought to be producing proportionate results.
Which relations dominate the cache?
A broader view — ranking all cached relations by page count, distinguishing between tables, indexes, TOAST tables, and materialized views.
SELECT
c.relname,
CASE c.relkind
WHEN 'r' THEN 'table'
WHEN 'i' THEN 'index'
WHEN 't' THEN 'toast'
WHEN 'm' THEN 'matview'
ELSE c.relkind::text
END AS kind,
count(*) AS cached_pages,
pg_size_pretty(count(*) * 8192::bigint) AS cached_size,
pg_size_pretty(pg_relation_size(c.oid)) AS total_size
FROM pg_buffercache b
JOIN pg_class c
ON b.relfilenode = pg_relation_filenode(c.oid)
AND b.reldatabase IN (0, (SELECT oid FROM pg_database
WHERE datname = current_database()))
GROUP BY c.oid, c.relname, c.relkind
ORDER BY cached_pages DESC
LIMIT 20; This query is particularly revealing after a bulk data load, a migration, or a change in access patterns. You may discover that an index you rarely use occupies 15% of the cache, or that a TOAST table from a text-heavy column is consuming significant memory. The cache does not play favourites — it stocks whatever was requested most recently, regardless of whether that request was a one-time bulk scan or a critical production query. It is, in this regard, a pantry without a housekeeper. We shall be the housekeeper.
Where are the dirty buffers?
Dirty buffers are pages that have been modified in memory but not yet flushed to disk. They represent pending I/O — work the background writer or checkpointer must attend to. If you will permit the analogy: these are dishes that have been used but not yet washed. A few in the basin is normal. A growing pile warrants concern.
SELECT
c.relname,
count(*) FILTER (WHERE b.isdirty) AS dirty_buffers,
count(*) AS total_buffers,
round(
100.0 * count(*) FILTER (WHERE b.isdirty) / count(*), 1
) AS dirty_pct
FROM pg_buffercache b
JOIN pg_class c
ON b.relfilenode = pg_relation_filenode(c.oid)
AND b.reldatabase IN (0, (SELECT oid FROM pg_database
WHERE datname = current_database()))
GROUP BY c.relname
HAVING count(*) FILTER (WHERE b.isdirty) > 0
ORDER BY dirty_buffers DESC
LIMIT 15; A table with a high dirty_pct is receiving heavy write traffic. If a table shows 40-50% or more dirty buffers, the background writer or checkpointer may not be keeping pace. This can lead to problems at checkpoint time — when the checkpointer must flush all dirty buffers, it can cause a burst of I/O that affects read performance. Tuning bgwriter_lru_maxpages, bgwriter_delay, or checkpoint_completion_target can spread the write load more evenly.
Conversely, if very few buffers are dirty, writes are being flushed promptly. This is usually the desired state.
What does the usage count distribution tell you?
If the per-table view is a manifest of what is in the larder, the usage count distribution is a record of what gets reached for regularly and what has been gathering dust. It tells you whether pages are staying resident (high counts) or being constantly cycled through (low counts). I consider this one of the most informative views into cache health.
SELECT
usagecount,
count(*) AS buffers,
round(100.0 * count(*) / sum(count(*)) OVER (), 1) AS pct
FROM pg_buffercache
WHERE reldatabase IS NOT NULL
GROUP BY usagecount
ORDER BY usagecount; A healthy distribution on a production system:
usagecount | buffers | pct
-----------+---------+------
0 | 1204 | 1.5
1 | 12847 | 15.7
2 | 9403 | 11.5
3 | 18241 | 22.3
4 | 14102 | 17.2
5 | 26003 | 31.8 This distribution is healthy. Nearly 50% of buffers are at usagecount 4-5, meaning they are accessed frequently and will survive multiple clock-sweep passes. Only 1.5% are at 0 (immediate eviction candidates). The working set fits comfortably.
A distribution under pressure looks rather different, and not in a way I care for. When 30-40% or more of buffers are at usagecount 0-1, pages are being loaded and evicted before they can be reused. The working set exceeds available cache, and queries are paying the cost in disk reads. The cache is churning — provisions arrive and are discarded before anyone can make use of them. This is not economy. It is waste.
How do you get faster stats in PostgreSQL 16+?
Scanning the full pg_buffercache view on a system with 32GB of shared_buffers means examining over 4 million buffer headers. PostgreSQL 16 added two functions that provide aggregate statistics without that overhead.
pg_buffercache_summary()
Returns a single-row summary of the entire cache.
-- PostgreSQL 16+: aggregate stats without scanning every buffer
SELECT * FROM pg_buffercache_summary();
-- Returns a single row:
-- buffers_used | buffers_unused | buffers_dirty
-- buffers_pinned | usagecount_avg buffers_used | buffers_unused | buffers_dirty | buffers_pinned | usagecount_avg
--------------+----------------+---------------+----------------+---------------
81800 | 2584 | 4201 | 3 | 3.21 From one row you can see: total used and unused buffers, dirty buffer count, pinned buffers, and the average usage count across all slots. The usagecount_avg of 3.21 indicates a reasonably healthy cache. An average below 2.0 suggests eviction pressure. An average above 4.0 suggests the working set fits comfortably with room to spare.
pg_buffercache_usage_counts()
Returns the usage count distribution in one call, with dirty and pinned breakdowns per count level.
-- PostgreSQL 16+: usage count distribution in one call
SELECT * FROM pg_buffercache_usage_counts(); usage_count | buffers | dirty | pinned
-------------+---------+-------+-------
0 | 1204 | 12 | 0
1 | 12847 | 841 | 0
2 | 9403 | 602 | 0
3 | 18241 | 1103 | 1
4 | 14102 | 812 | 0
5 | 26003 | 831 | 2 These functions are cheap enough to call every few minutes from a monitoring system. Unlike the full view, they do not acquire locks on individual buffer headers, making them suitable for continuous observation without measurable impact on normal operations. A discreet glance at the shelves, rather than a full inventory that requires closing the kitchen.
Is your shared_buffers too small, too large, or just right?
I confess this is a subject on which I hold firm views. Sizing shared_buffers by rule of thumb — "set it to 25% of RAM" — is the infrastructural equivalent of buying groceries without checking what is already in the pantry. It may work out. It often does not. pg_buffercache gives you the data to answer this question with evidence, and I find guesswork on matters of memory allocation to be entirely avoidable.
Check for unused space
-- How much of shared_buffers is actually in use?
SELECT
count(*) FILTER (WHERE reldatabase IS NOT NULL) AS used,
count(*) FILTER (WHERE reldatabase IS NULL) AS unused,
count(*) AS total,
round(
100.0 * count(*) FILTER (WHERE reldatabase IS NULL) / count(*), 1
) AS pct_unused
FROM pg_buffercache; used | unused | total | pct_unused
--------+--------+-------+-----------
81800 | 2584 | 84384 | 3.1 3.1% unused is fine — the cache is well-utilized without being completely saturated. If you see 15-20% or more unused during peak hours, shared_buffers is likely oversized. That memory would serve better as OS page cache or work_mem.
Check for eviction pressure
-- Is the cache under eviction pressure?
-- A healthy cache has most buffers at usagecount 3-5
-- Eviction pressure: many buffers at 0-1
SELECT
CASE
WHEN usagecount <= 1 THEN 'low (eviction candidates)'
WHEN usagecount <= 3 THEN 'moderate'
ELSE 'high (well-established)'
END AS category,
count(*) AS buffers,
round(100.0 * count(*) / sum(count(*)) OVER (), 1) AS pct
FROM pg_buffercache
WHERE reldatabase IS NOT NULL
GROUP BY
CASE
WHEN usagecount <= 1 THEN 'low (eviction candidates)'
WHEN usagecount <= 3 THEN 'moderate'
ELSE 'high (well-established)'
END
ORDER BY min(usagecount); If the "low (eviction candidates)" category holds more than 25-30% of buffers during normal operations, the cache is under pressure. Pages are being loaded and evicted before they accumulate enough accesses to survive the clock sweep. This is the signal that shared_buffers may benefit from an increase.
Three scenarios summarized:
| Indicator | Diagnosis | Action |
|---|---|---|
| 15%+ unused buffers at peak | Cache oversized | Reduce shared_buffers or accept the overhead |
| 30%+ at usagecount 0-1 | Eviction pressure | Increase shared_buffers or reduce working set |
| 95%+ used, avg usagecount 3+ | Well-sized | No change needed |
I should note that PostgreSQL also benefits from the OS page cache — a secondary larder, if you will. Pages evicted from shared_buffers may still be in the OS cache, making the "miss" faster than a true disk read. The effective cache is larger than shared_buffers alone — which is why the parameter effective_cache_size exists, to tell the planner about total expected cache (typically set to 50-75% of system RAM). The household has more storage than the pantry alone.
How do you identify cache hogs?
A cache hog is a relation that occupies a disproportionate share of the buffer cache relative to its importance to the workload. I have seen this pattern enough times to find it personally vexing. A batch job runs a sequential scan over a large table, loading millions of pages into the cache, which evicts the pages of tables that active queries depend on. The batch job finishes and moves on — having consumed every shelf in the pantry to prepare a single meal that no one ordered. The production workload pays the price for minutes afterward as it re-reads evicted pages from disk.
-- Tables consuming the most cache relative to their importance
-- Compare cached percentage against actual query frequency
SELECT
c.relname,
count(*) AS cached_pages,
pg_size_pretty(count(*) * 8192::bigint) AS cached_size,
round(
100.0 * count(*) /
(SELECT count(*) FROM pg_buffercache
WHERE reldatabase IS NOT NULL), 1
) AS pct_of_cache,
s.seq_scan + s.idx_scan AS total_scans
FROM pg_buffercache b
JOIN pg_class c
ON b.relfilenode = pg_relation_filenode(c.oid)
AND b.reldatabase IN (0, (SELECT oid FROM pg_database
WHERE datname = current_database()))
JOIN pg_stat_user_tables s ON s.relid = c.oid
GROUP BY c.oid, c.relname, s.seq_scan, s.idx_scan
ORDER BY cached_pages DESC
LIMIT 15; What to look for:
- A table consuming 20%+ of the cache but with low
total_scans— it was loaded by a bulk operation, not by the normal workload - An index consuming significant cache that does not correspond to frequent index scans — it may be unused or redundant
- A table with high cache occupancy but accessed only via sequential scans (check
pg_stat_user_tables.seq_scan) — this is the classic full-table-scan cache pollution
Remediation depends on the cause, and the cause is nearly always discoverable. If a batch job is polluting the cache, running it during off-peak hours helps — the equivalent of scheduling deliveries for when the kitchen is closed. If a large table is accessed via sequential scans that could be index scans, adding the appropriate index both speeds up the query and reduces cache pressure — a double benefit that I find deeply satisfying. If the table genuinely needs to be scanned in full, consider using SET LOCAL effective_io_concurrency in the batch session to hint that the pages are transient.
How does pg_buffercache work with pg_prewarm?
pg_buffercache and pg_prewarm are natural companions, and I endorse the pairing without reservation. One takes inventory of the larder; the other restocks it. The practical workflow around server restarts, planned maintenance, and failovers combines both — because restarting a database and hoping the cache sorts itself out is not a plan. It is an abdication.
Before a restart: inspect what matters
Capture the current cache contents, filtered to pages with a usagecount of 3 or higher — the established provisions that have earned their place on the shelf, not the transient visitors passing through.
-- Before a planned restart: record what is cached
-- Save this output to decide what to prewarm afterward
SELECT c.relname, count(*) AS pages
FROM pg_buffercache b
JOIN pg_class c
ON b.relfilenode = pg_relation_filenode(c.oid)
AND b.reldatabase IN (0, (SELECT oid FROM pg_database
WHERE datname = current_database()))
WHERE b.usagecount >= 3
GROUP BY c.relname
ORDER BY pages DESC
LIMIT 20; After a restart: restore the cache
Armed with the list of important relations, prewarm them explicitly. Or, better, let autoprewarm handle it automatically.
-- After restart: prewarm the critical relations
SELECT pg_prewarm('orders');
SELECT pg_prewarm('idx_orders_customer_id');
SELECT pg_prewarm('customers');
-- Or enable autoprewarm (recommended):
-- In postgresql.conf:
-- shared_preload_libraries = 'pg_prewarm'
-- pg_prewarm.autoprewarm = on
-- pg_prewarm.autoprewarm_interval = 300 With autoprewarm enabled, PostgreSQL periodically saves the list of cached pages to a file (autoprewarm.blocks) and restores them after a restart using background workers. This makes the manual inspect-and-prewarm cycle unnecessary for routine restarts. You still want pg_buffercache for ad-hoc analysis and verification — to confirm that after a restart, the cache has been restored to the expected state.
How do you correlate cache misses with slow queries?
Knowing what is in the larder is only half the picture. You also need to know who keeps coming to the kitchen and finding the shelves bare. pg_buffercache tells you what is in the cache. pg_stat_statements tells you which queries are missing the cache. Combining them gives you the complete account: the queries causing the most disk I/O, and the tables they are reading.
-- Correlate buffer misses with slow queries
-- Requires pg_stat_statements to be installed
SELECT
substr(query, 1, 80) AS query_preview,
calls,
round(mean_exec_time::numeric, 2) AS avg_ms,
shared_blks_hit,
shared_blks_read,
round(
100.0 * shared_blks_hit /
nullif(shared_blks_hit + shared_blks_read, 0), 1
) AS hit_ratio
FROM pg_stat_statements
WHERE shared_blks_read > 1000
ORDER BY shared_blks_read DESC
LIMIT 15; Sample output:
query_preview | calls | avg_ms | shared_blks_hit | shared_blks_read | hit_ratio
----------------------------------------------------+-------+--------+-----------------+------------------+----------
SELECT l.*, p.name FROM line_items l JOIN products | 4812 | 12.40 | 184201 | 892410 | 17.1
SELECT * FROM orders WHERE created_at > $1 AND sta | 1247 | 48.21 | 42891 | 289312 | 12.9
SELECT count(*) FROM events WHERE event_type = $1 | 892 | 142.80 | 8412 | 482103 | 1.7 The hit_ratio column is rather forthcoming. The events count query has a 1.7% hit ratio — nearly every page it reads comes from disk. The line_items join is at 17.1%. These queries are not merely slow; they are slow because they arrive at the pantry to find their provisions missing. The distinction matters for diagnosis.
The diagnosis then splits into two paths:
- The working set is too large for the cache — the tables these queries touch are bigger than
shared_bufferscan hold. Increasingshared_buffersor adding indexes to reduce the number of pages scanned will help. - The queries are scanning more data than necessary — a sequential scan on a 10GB table to find 200 rows reads far more pages than an index scan would. Adding the right index reduces both execution time and cache pressure simultaneously.
For each query with poor hit ratios, run EXPLAIN (ANALYZE, BUFFERS) to see exactly where the reads are happening in the query plan. The Buffers: shared hit=... read=... line in each plan node tells you which operation is generating the cache misses.
How does Gold Lapel reduce buffer cache pressure?
pg_buffercache shows you the physical layer — what is in memory right now, regardless of who put it there. It does not distinguish between pages loaded by your application's critical path and pages loaded by a one-time analytics query or autovacuum. It is the inventory, not the housekeeper.
Gold Lapel operates at the query layer. As a proxy between your application and PostgreSQL, it observes the actual query patterns — which queries run most frequently, which parameters produce the most data, and which access patterns are responsible for cache pressure. When it identifies a query that would benefit from an index, the index reduces the number of pages scanned, which directly reduces cache consumption. A query that previously required a sequential scan over 100,000 pages, evicting cached data along the way, becomes an index scan touching 50 pages. The cache pressure disappears because the root cause — excessive page reads — is eliminated.
The combination is straightforward: use pg_buffercache to see what is happening in the larder, and let Gold Lapel address the query patterns that caused the problem. The cache does not need to be larger if the queries need fewer pages.
A practical checklist for buffer cache analysis
If you will permit me, a methodical procedure for when you suspect memory-related performance issues. I have found that a structured approach to cache investigation prevents the most common oversight: treating symptoms while the cause persists undisturbed.
- Install the extension if not already present.
CREATE EXTENSION pg_buffercache;— no restart needed. - Check overall utilization. What percentage of shared_buffers is in use? If unused buffers exceed 15% at peak, the cache is oversized. If 100% is in use, proceed to step 3.
- Examine the usage count distribution. Are most buffers at usagecount 3-5 (healthy) or 0-1 (eviction pressure)? In PostgreSQL 16+,
pg_buffercache_summary()gives you the average in one call. - Identify the top cached relations. Which tables and indexes dominate the cache? Are they the ones your workload depends on?
- Check per-table cache coverage. Are critical tables fully cached? Are unimportant tables consuming disproportionate space?
- Look at dirty buffer distribution. Is one table generating most of the dirty buffers? Is the background writer keeping pace?
- Cross-reference with pg_stat_statements. Which queries have the worst cache hit ratios? Do they correspond to the tables with low cache coverage?
- Run EXPLAIN (ANALYZE, BUFFERS) on the queries identified in step 7. Find the specific plan nodes generating cache misses.
- Act on the findings. Add indexes to reduce page reads. Increase shared_buffers if the working set genuinely exceeds cache. Schedule bulk operations during off-peak hours to avoid cache pollution.
The pantry, once properly inventoried, will tell you exactly what it needs. One simply has to ask.