← PostgreSQL Concepts

Buffer cache

The pantry of the PostgreSQL household. What's stocked here is served instantly; what isn't requires a trip to the cellar.

Concept · March 21, 2026 · 8 min read

The buffer cache is a region of shared memory, controlled by the shared_buffers setting, where PostgreSQL keeps recently accessed data pages. Think of it as the household pantry — stocked with everything the kitchen needs for tonight's service. Every table and index in PostgreSQL is stored as a series of 8KB pages on disk. When a query needs a page, PostgreSQL checks the pantry first. If the page is there (a cache hit), it reads directly from memory in roughly 0.1ms. If it is not (a cache miss), someone must descend to the cellar — disk — which takes anywhere from 1ms to 10ms depending on storage. A well-stocked pantry means the vast majority of reads never require the trip.

What the buffer cache is

PostgreSQL allocates the buffer cache as a fixed block of shared memory at startup. Its size is set by the shared_buffers parameter in postgresql.conf and cannot be changed without a restart. The default is typically 128MB — which, if I may be direct, is rather like stocking a pantry with a single shelf. Far too little for any production household.

The cache is organized as an array of 8KB slots, each holding one page of table or index data. When PostgreSQL needs a page, it computes a hash of the page's identity (relation file, fork, block number) and checks whether that page is already resident. If it is, the access is serviced entirely from RAM. If not, PostgreSQL reads the page from disk into a free or evictable slot.

Eviction follows a clock-sweep algorithm — a variant of the classic clock (second-chance) page replacement. Each buffer has a usage counter that increments when the page is accessed and decrements each time the clock hand passes over it. Pages with a zero counter are candidates for eviction. This is simpler than LRU but performs well in practice because frequently accessed pages accumulate higher usage counts and survive many sweeps. The items you reach for most often stay at the front of the pantry. The ones nobody has touched in a while make room for new arrivals.

Pages in the buffer cache can be clean (identical to the on-disk copy) or dirty (modified in memory but not yet flushed to disk). Dirty pages are written back to disk by the background writer process and during checkpoints. This decoupling is important — writes to a page in the buffer cache return immediately without waiting for disk I/O.

Why it matters

The performance difference between a cache hit and a cache miss is roughly two orders of magnitude. Reading a page from shared memory takes on the order of 0.1 milliseconds. Reading the same page from an SSD takes 1–5ms. From a spinning disk, 5–10ms. For a query that touches thousands of pages, these numbers compound quickly. Every trip to the cellar costs time. A well-run household minimizes the trips.

Consider a query that performs an index scan reading 500 pages. With a 99% cache hit ratio, 495 pages come from memory and 5 from disk. Total I/O wait: roughly 5–25ms. With an 80% cache hit ratio, 100 pages come from disk. Total I/O wait: 100–500ms. Same query, same data, same indexes — the only difference is how much of the working set fits in the buffer cache.

The cache hit ratio is the primary metric for buffer cache health — the number I check first when assessing the efficiency of a household. It measures the fraction of page requests satisfied from memory versus disk. For OLTP workloads (web applications, APIs, transactional systems), a healthy ratio is above 99%. For analytical workloads that scan large amounts of data, lower ratios are expected and not necessarily problematic.

Checking your cache hit ratio

PostgreSQL tracks buffer hits and disk reads in pg_stat_database. Allow me to share the single most useful query for assessing the state of your pantry.

SQL
-- Check buffer cache hit ratio per database
SELECT
  datname,
  blks_hit,
  blks_read,
  round(
    blks_hit::numeric / nullif(blks_hit + blks_read, 0) * 100, 2
  ) AS hit_ratio
FROM pg_stat_database
WHERE datname = current_database();

A ratio above 99% means your working set fits comfortably in memory. Between 95% and 99% warrants a closer look — some tables may be too large for the cache, or queries may be scanning more data than necessary. Below 95% on an OLTP workload is the kind of number that would prompt me to clear my afternoon schedule.

To identify which specific tables are responsible for the trips to the cellar, check per-table statistics.

SQL — per-table hit ratios
-- Cache hit ratio per table (find the worst offenders)
SELECT
  schemaname,
  relname,
  heap_blks_hit,
  heap_blks_read,
  round(
    heap_blks_hit::numeric /
    nullif(heap_blks_hit + heap_blks_read, 0) * 100, 2
  ) AS hit_ratio
FROM pg_statio_user_tables
WHERE heap_blks_hit + heap_blks_read > 1000
ORDER BY hit_ratio ASC
LIMIT 15;

Tables with low hit ratios are either too large to fit in memory, or their access patterns are not cache-friendly (sequential scans over large tables with no filtering, for instance). The remedy depends on context: sometimes it is more memory, sometimes it is a missing index, sometimes it is a query that has no business scanning that table at all.

Sizing shared_buffers

The standard recommendation is to set shared_buffers to 25% of total system RAM. On a server with 64GB of RAM, that means 16GB. This is a sensible starting point for the majority of workloads, and many production databases never need to deviate from it.

Why 25% and not more? Because PostgreSQL is not the only member of the household that requires provisions. The operating system needs RAM for its own page cache, which acts as a second layer of caching for PostgreSQL's data files. Background processes, connection overhead, and work_mem allocations for sorts and hash joins all consume additional memory. Setting shared_buffers too high starves these other consumers — and a household where the pantry has been expanded at the expense of the kitchen is not a household running well.

When more helps:

  • Large working set — if your actively queried data exceeds 25% of RAM, increasing shared_buffers up to 40% can improve hit ratios.
  • Dedicated database server — when PostgreSQL is the only significant process, you have more headroom.
  • Write-heavy workloads — larger buffer cache means more dirty pages can accumulate, reducing checkpoint frequency and spreading I/O more evenly.

When more hurts:

  • Shared servers — other applications need RAM too.
  • Very large databases — if the database is many times larger than RAM, a bigger buffer cache just means more memory dedicated to caching a tiny fraction of the data. The OS page cache would use that RAM more effectively.
  • Many connections with high work_mem — each sort or hash join allocates up to work_mem per operation. With hundreds of connections, this adds up fast.
SQL
-- Check current shared_buffers setting
SHOW shared_buffers;

-- Check actual usage of the buffer cache (requires pg_buffercache extension)
SELECT
  count(*) AS total_buffers,
  sum(case when reldatabase is not null then 1 else 0 end) AS used_buffers,
  round(
    sum(case when reldatabase is not null then 1 else 0 end)::numeric /
    count(*) * 100, 2
  ) AS usage_pct
FROM pg_buffercache;

The double-caching problem

Here is a detail I find worth understanding properly: PostgreSQL's data files sit on the filesystem, and the operating system caches filesystem reads in its own page cache. This means the same data page can exist in two places simultaneously — PostgreSQL's buffer cache and the OS page cache. Two pantries, stocked with the same provisions. This is known as double caching, and it is the fundamental tension in PostgreSQL memory tuning.

When PostgreSQL reads a page from "disk," it often does not actually hit the physical drive — the OS serves it from its own cache. This is fast (microseconds, not milliseconds), but it is not as fast as reading from PostgreSQL's shared memory, because it still involves a system call and a memory copy. Knowing where the redundancy lives is the first step to managing it.

The effective_cache_size parameter exists to account for this. It tells the query planner how much total cache is available — shared_buffers plus the estimated OS page cache. The planner uses this to decide whether an index scan is likely to find its pages in cache (favoring the index scan) or whether the data will need to come from disk (potentially favoring a sequential scan instead).

postgresql.conf
-- Check effective_cache_size (planner hint, not an allocation)
SHOW effective_cache_size;

-- A typical setting: shared_buffers + estimated OS page cache
-- Example for a 32GB server:
--   shared_buffers = 8GB
--   effective_cache_size = 24GB  (8GB shared_buffers + ~16GB OS cache)

effective_cache_size does not allocate any memory. It is purely an estimate that influences planning decisions. Setting it too low makes the planner pessimistic about caching, leading it to undervalue index scans. Setting it too high makes the planner optimistic, which can lead to index scans on data that is not actually cached. A value of 50-75% of total RAM is typical. Getting this right is a matter of knowing the household's full inventory — not just the pantry, but the larder downstairs as well.

How Gold Lapel relates

Gold Lapel sits between your application and PostgreSQL, observing every query that passes through. One of the principal ways it improves performance is by reducing the number of pages a query needs to read in the first place. A smaller appetite means less pressure on the pantry.

When Gold Lapel creates a materialized view to cache the results of an expensive aggregation, it replaces a query that might scan millions of rows across multiple tables with a read from a compact, pre-computed result set. Fewer pages read means fewer pages competing for space in the buffer cache, which means better hit ratios for everything else in the household.

Index recommendations work similarly. A query performing a sequential scan on a large table reads every page. Adding the right index turns that into a targeted read of a handful of pages. The cascade through the buffer cache is significant — the pages that were being loaded, evicting other useful pages, and then being evicted themselves on the next clock sweep are simply never loaded at all.

Gold Lapel does not tune shared_buffers or other memory parameters directly. Those are server-level settings that depend on hardware and workload mix. What it does is reduce the demands your queries place on the buffer cache — and in my experience, reducing demand is more effective than expanding supply.

Frequently asked questions