MVCC (Multi-Version Concurrency Control)
PostgreSQL's concurrency model. Readers never block writers. Writers never block readers. Someone, however, must attend to the dead tuples.
MVCC is the mechanism PostgreSQL uses to handle concurrent access to data. Rather than locking rows when they are read, each transaction sees a consistent snapshot of the database as it existed when the transaction began. When a row is updated, PostgreSQL does not overwrite it — it creates a new version and marks the old one as dead. Readers never block writers. Writers never block readers. The household keeps a record of everything that happened, but someone must periodically clear away the old records. That someone is VACUUM.
What MVCC is
If you'll permit me a moment of admiration for the architecture: MVCC is, at its core, a simple idea. Instead of one version of each row, keep multiple versions. Every INSERT creates a tuple. Every UPDATE creates a new tuple and marks the old one for eventual removal. Every DELETE marks a tuple as dead. No data is overwritten in place.
Each tuple carries two hidden fields: xmin and xmax. xmin records the transaction ID that created the tuple. xmax records the transaction ID that deleted it (or zero if it is still live). When a transaction reads a table, it uses these fields — combined with its own snapshot of which transactions have committed — to determine which version of each row is visible.
-- Every row in PostgreSQL carries hidden system columns
SELECT xmin, xmax, ctid, *
FROM your_table
LIMIT 5;
-- xmin: the transaction ID that created this row version
-- xmax: the transaction ID that deleted or locked this row version (0 if still live)
-- ctid: the physical location of the tuple (page, offset) This is what makes PostgreSQL's concurrency model work. Every transaction sees a self-consistent view of the data, even while other transactions are actively modifying it. No dirty reads. No read locks. The foundation of the manor, if you will — everything else is built on top of this.
Why MVCC matters
In a traditional locking system, a reader that wants to access a row must wait if a writer is modifying it. A writer that wants to modify a row must wait if a reader is accessing it. Under load, this creates contention — transactions queue behind each other, and throughput drops. It is, to be frank, an unseemly arrangement.
MVCC eliminates read-write contention entirely. A SELECT never acquires a row-level lock. It reads from its snapshot, which is immutable. A concurrent UPDATE creates a new version of the row, but the reader does not see it — it continues reading the version that was current when its transaction began. Neither transaction waits for the other.
-- Each transaction sees a consistent snapshot
-- Session A:
BEGIN;
SELECT count(*) FROM orders; -- returns 1000
-- Session B (concurrently):
INSERT INTO orders (item) VALUES ('widget');
COMMIT;
-- Session A (still in its transaction):
SELECT count(*) FROM orders; -- still returns 1000
COMMIT;
-- Session A sees the snapshot from when its transaction began,
-- not the row Session B inserted. Write-write conflicts still exist — I should be honest about that. Two transactions cannot update the same row at the same time; the second will block until the first commits or rolls back. But read-write conflicts, which are far more common in typical workloads, are gone entirely. This is why PostgreSQL handles mixed read/write workloads with such composure.
How it works in practice
Allow me to walk through the mechanism. Every transaction is assigned a transaction ID (a 32-bit integer that increments monotonically). When a transaction starts, it takes a snapshot: a record of which transaction IDs are committed, which are in-progress, and which have not started yet.
To determine whether a tuple is visible, PostgreSQL checks:
- Is
xmincommitted and visible in my snapshot? If not, the tuple does not exist yet from my perspective. - Is
xmaxzero, or is the deleting transaction not yet committed in my snapshot? If so, the tuple is still live. - If both conditions pass, the tuple is visible. Otherwise, it is not.
This check happens for every tuple PostgreSQL accesses. It is fast — the snapshot comparison is essentially a set membership check — but it is not free. Nothing in a well-run household is truly free. The visibility map, which tracks pages where all tuples are known-visible to all transactions, lets PostgreSQL skip these checks entirely for clean pages. This is what makes index-only scans possible.
PostgreSQL's default isolation level is Read Committed: each statement within a transaction gets a fresh snapshot. Under Repeatable Read or Serializable, the snapshot is taken once at the start of the transaction and reused for every statement. The MVCC machinery is the same in either case — only the snapshot timing changes.
The cost of MVCC
I should be forthcoming about the trade-off, because pretending it does not exist would be a disservice. MVCC's elegance has a cost, and that cost is dead tuples.
Every UPDATE leaves behind the old version of the row. Every DELETE leaves behind the entire row. These dead tuples are invisible to new transactions but still occupy physical space on disk — old records that no one will consult again, yet they remain in the filing cabinet, taking up space. They must be scanned over during sequential scans, and their index entries must be checked during index scans. Until VACUUM removes them, they are dead weight.
-- Check dead tuple counts per table
SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup,
round(n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0) * 100, 1) AS dead_pct,
last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 0
ORDER BY n_dead_tup DESC
LIMIT 10; If dead tuples accumulate faster than VACUUM removes them, tables and indexes grow — a condition called bloat. Bloated tables slow down sequential scans because PostgreSQL must read through pages full of dead rows. Bloated indexes slow down index scans and waste memory in the buffer cache.
The second cost is transaction ID management. Since transaction IDs are 32-bit integers, PostgreSQL has roughly 4 billion of them before the counter wraps around. To prevent old rows from suddenly appearing to be from the future, PostgreSQL must periodically "freeze" rows — replacing their xmin with a special frozen transaction ID that is always considered visible. This is what aggressive vacuuming and VACUUM FREEZE do.
-- Current transaction ID and age of the oldest unfrozen row
SELECT
txid_current() AS current_txid,
age(datfrozenxid) AS oldest_unfrozen_age,
datname
FROM pg_database
WHERE datname = current_database();
-- When age approaches 2 billion, PostgreSQL forces a VACUUM FREEZE
-- to prevent transaction ID wraparound. If freezing falls behind — because autovacuum is too slow, or long-running transactions prevent it from making progress — PostgreSQL will eventually refuse new connections and shut down to protect against wraparound corruption. This is the most severe consequence of neglected MVCC maintenance. It is also entirely preventable. One simply has to pay attention to the vacuum health of the household.
How Gold Lapel relates
Gold Lapel operates at the proxy level, between your application and PostgreSQL. It does not alter how MVCC works — nor would you want it to; the architecture deserves respect, not revision. But it deals directly with MVCC's consequences.
Table bloat caused by dead tuple accumulation is one of the primary performance problems Gold Lapel monitors for. When bloat reaches a level that materially affects query performance, Gold Lapel surfaces it alongside the queries most affected. It also tracks vacuum health — whether autovacuum is keeping pace with write volume, and whether any tables are approaching transaction ID wraparound thresholds.
The materialized views Gold Lapel creates are themselves subject to MVCC. When a materialized view is refreshed, the old data becomes dead tuples that need vacuuming. Gold Lapel accounts for this in its refresh scheduling — it would be rather embarrassing to create the very problem one is hired to prevent.