← PostgreSQL Concepts

Checkpoint

The periodic operation that flushes dirty pages from memory to disk. Checkpoints are where PostgreSQL's in-memory changes become permanent on the data files.

Concept · March 21, 2026 · 8 min read

A checkpoint is a periodic operation where PostgreSQL writes all dirty pages (modified data that exists only in the buffer cache) to the underlying data files on disk. It then updates the control file to record where in the WAL stream the checkpoint occurred. After a crash, PostgreSQL replays WAL only from the last checkpoint forward — everything before it is already safely on disk. Frequent checkpoints mean less WAL to replay but more I/O during normal operation. Infrequent checkpoints reduce I/O overhead but increase the time needed to recover from a crash.

What a checkpoint is

PostgreSQL does not write data changes directly to tables and indexes on disk at commit time. Instead, it writes a WAL record (ensuring durability) and modifies the page in the shared buffer cache. The actual data files lag behind — they contain a mix of current and stale data, with the WAL holding the authoritative record of recent changes.

A checkpoint closes this gap. The checkpointer process scans the buffer cache, identifies every dirty page (one that has been modified since it was last written to disk), and flushes all of them to the data files. Once all dirty pages are written and an fsync confirms they are on stable storage, PostgreSQL updates the pg_control file with the checkpoint's WAL position. This position is the recovery starting point — the guarantee that everything before it is safely persisted in the data files.

After a checkpoint completes, the WAL segments that predate it are no longer needed for crash recovery. PostgreSQL recycles these segments, reusing the disk space for new WAL writes. This is how WAL disk usage stays bounded rather than growing without limit.

Why checkpoints matter

Checkpoints sit at the center of a fundamental trade-off: crash recovery speed versus steady-state I/O load.

  • Crash recovery — after an unclean shutdown, PostgreSQL replays all WAL generated since the last checkpoint. A checkpoint that completed 2 minutes ago means 2 minutes of WAL to replay. A checkpoint that completed 30 minutes ago means 30 minutes of WAL. The checkpoint interval directly controls your worst-case recovery time.
  • I/O overhead — each checkpoint flushes potentially thousands of dirty pages to disk. More frequent checkpoints mean this I/O burst happens more often. On I/O-constrained systems, checkpoint activity can compete with query workloads and cause periodic latency spikes.
  • WAL volume — PostgreSQL performs a full-page write (the entire 8 KB page) to WAL the first time a page is modified after a checkpoint. Frequent checkpoints mean more full-page writes, which increases WAL volume and can affect replication bandwidth and archive storage.

The goal is to find a checkpoint frequency that keeps recovery time acceptable without creating noticeable I/O interference during normal operation. Most production systems land on a checkpoint interval between 5 and 15 minutes.

Key configuration

Three settings control when and how checkpoints happen. A fourth controls visibility into their behavior.

postgresql.conf
-- Key checkpoint settings (shown with typical production values)

-- Maximum time between automatic checkpoints
checkpoint_timeout = '10min'

-- WAL size that triggers an early checkpoint
max_wal_size = '4GB'

-- Minimum WAL retained after a checkpoint
min_wal_size = '1GB'

-- Spread writes over this fraction of the checkpoint interval
checkpoint_completion_target = 0.9

-- Log checkpoint start, completion, and statistics
log_checkpoints = on

checkpoint_timeout

The maximum time between automatic checkpoints. The default is 5 minutes. When this timer expires, PostgreSQL starts a new checkpoint regardless of how much WAL has been generated. Increasing this to 10 or 15 minutes reduces checkpoint frequency and the associated I/O, at the cost of longer crash recovery.

max_wal_size

The soft limit on WAL size between checkpoints. If WAL accumulates beyond this threshold before checkpoint_timeout fires, PostgreSQL forces an early checkpoint. These forced checkpoints appear as checkpoints_req in pg_stat_bgwriter. If you see many requested checkpoints, raise max_wal_size so the timeout triggers first.

checkpoint_completion_target

Controls how aggressively the checkpointer writes dirty pages. A value of 0.9 means PostgreSQL spreads checkpoint writes over 90% of the checkpoint interval. This smooths out the I/O load rather than writing everything in a burst at the start.

SQL
-- checkpoint_completion_target controls I/O spreading
-- With checkpoint_timeout = 10min and completion_target = 0.9:
-- PostgreSQL spreads dirty page writes over 9 minutes (90% of interval)
-- This prevents a burst of I/O at checkpoint time

-- A lower value (e.g., 0.5) writes faster but creates I/O spikes
-- A higher value (e.g., 0.9) spreads writes more evenly — recommended

-- Check current setting
SHOW checkpoint_completion_target;

The default changed from 0.5 to 0.9 in PostgreSQL 14, reflecting the consensus that spreading writes is almost always preferable. If you are on an older version, setting this to 0.9 manually is one of the easiest performance improvements available.

log_checkpoints

When enabled, PostgreSQL logs detailed information at the start and completion of every checkpoint — how many buffers were written, how long the write and sync phases took, and the WAL distance covered. This is essential for understanding checkpoint behavior.

PostgreSQL log output
-- Example log_checkpoints output (in PostgreSQL log):
-- LOG:  checkpoint starting: time
-- LOG:  checkpoint complete: wrote 8234 buffers (50.2%);
--       0 WAL file(s) added, 3 removed, 2 recycled;
--       write=53.012 s, sync=0.089 s, total=53.241 s;
--       sync files=412, longest=0.014 s, average=0.001 s;
--       distance=524288 kB, estimate=524288 kB

Monitoring checkpoints

The pg_stat_bgwriter view is the primary source of checkpoint statistics. It accumulates counters since the last server restart.

SQL
-- Check checkpoint activity
SELECT
  checkpoints_timed,
  checkpoints_req,
  checkpoint_write_time / 1000 AS write_seconds,
  checkpoint_sync_time / 1000 AS sync_seconds,
  buffers_checkpoint,
  buffers_clean,
  buffers_backend
FROM pg_stat_bgwriter;

-- checkpoints_timed:  scheduled checkpoints (normal, triggered by checkpoint_timeout)
-- checkpoints_req:    requested checkpoints (triggered by max_wal_size or manual CHECKPOINT)
-- buffers_backend:    pages written directly by backends (should be low)

checkpoints_timed vs checkpoints_req — in a well-tuned system, nearly all checkpoints should be timed (scheduled). A high ratio of requested checkpoints means WAL is hitting max_wal_size before the timeout, which creates more frequent and potentially less efficient checkpoints. Increase max_wal_size or checkpoint_timeout to correct this.

buffers_checkpoint vs buffers_backend — dirty pages should be flushed by the checkpointer (buffers_checkpoint) or the background writer (buffers_clean), not by backend processes running queries (buffers_backend). High buffers_backend indicates that the buffer cache is under pressure and backends are being forced to evict dirty pages themselves, which adds latency to the queries those backends are running.

SQL
-- Force an immediate checkpoint (use sparingly)
CHECKPOINT;

-- Check when the last checkpoint occurred
SELECT
  checkpoint_time,
  redo_lsn,
  redo_wal_file
FROM pg_control_checkpoint();

The pg_control_checkpoint() function shows when the last checkpoint occurred and the WAL position it recorded. This tells you exactly where crash recovery would begin if the server went down right now.

How Gold Lapel relates

Gold Lapel operates above the checkpoint layer — it works at the query level, sitting between your application and PostgreSQL. It does not trigger, configure, or directly interact with checkpoints.

That said, query optimization has downstream effects on checkpoint behavior. When Gold Lapel routes a frequently executed aggregation to a materialized view instead of re-executing the underlying joins and sorts, the avoided writes generate fewer dirty pages. Fewer dirty pages mean less work for each checkpoint and less I/O contention. Similarly, when Gold Lapel recommends an index that replaces a sequential scan, the more targeted reads reduce buffer cache churn — pages stay useful longer and are less likely to be evicted and re-read in a pattern that amplifies checkpoint writes.

The relationship is indirect but real: more efficient queries mean a calmer buffer cache, and a calmer buffer cache means smoother checkpoints.

Frequently asked questions