Chapter 13: Change Streams, TTL, and Capped Collections

The Waiter of Gold Lapel · Published Apr 19, 2026 · 11 min

The previous chapter closed on aggregation; this one turns, with the triggers warming up, to a category of feature that has nothing to do with documents — which is rather the point.

The previous ten chapters attended to the document model itself: storing JSONB, querying it, indexing it, aggregating it. This chapter turns to the three features that elevate MongoDB from a database to something that feels like a full platform: MongoDB change streams for reacting to writes, the MongoDB TTL index for expiring rows automatically, and the capped collection for bounded append-only storage. Each is a legitimate convenience. MongoDB implements each thoughtfully, and I should say so plainly before going any further — these features were not accidents. They are craft.

Each also already exists in PostgreSQL. Not as a bolt-on extension, not as a patched-in afterthought, but as a natural expression of the trigger system PostgreSQL has shipped for a quarter-century. The operational features are the same features. The primitives are simply older.

Why Triggers Are PostgreSQL's Native Event System

MongoDB ships change streams, TTL monitors, and capped file preallocation as first-class primitives because its architecture requires it — three distinct subsystems, each purpose-built. Change streams tail the replication oplog. TTL relies on a background monitor thread that wakes every sixty seconds. Capped collections preallocate a fixed file and maintain an insertion-order cursor. Three rooms, three staircases, three separate sets of keys. Each is excellent in its way. Each is also its own operational behavior to learn.

PostgreSQL reaches the same destination by a different route. Row-level triggers have shipped since version 6.4 (October 1998), and PL/pgSQL has been installed by default since 9.0 (September 2010) — though the language itself has been available since the late 6.x series. Asynchronous LISTEN/NOTIFY was rewritten to use an SLRU-backed async queue in version 9.0, making it reliable enough for production event fanout — over fifteen years of deployment experience at the time of writing.

Combine a BEFORE or AFTER trigger with a pg_notify() call, add a functional index over a JSONB field, and the same semantics MongoDB offers fall out for free. No polling. No external scheduler. No sidecar process. The trigger system is already the event system; you are only choosing what to do with each event.

A small mercy, that — when one already owns the tool, one need not shop for it.

Change Streams — `doc_watch` and `doc_unwatch`

What MongoDB Change Streams Give You

A MongoDB change stream is an event stream of {operationType, documentKey, fullDocument, updateDescription} records emitted whenever a document is inserted, updated, replaced, or deleted, backed by the replication oplog. I will say it directly: this is a genuinely useful feature. Cache invalidation, downstream search indexing, audit trails, webhook fanout, materialized-view refresh, and cross-service notifications all benefit enormously from a database that tells you when something changed instead of requiring you to ask.

A database that volunteers information is a considerate database. MongoDB was early to this particular courtesy, and the industry is better for it.

PostgreSQL LISTEN/NOTIFY — The Same Mechanism, Already Built In

Every PostgreSQL connection can issue LISTEN channel_name and receive asynchronous payloads as other connections call NOTIFY channel_name, payload (or the function form, pg_notify('channel_name', 'payload')). The payload is delivered as a TEXT value up to 8000 bytes long, and delivery happens when the notifying transaction commits. This is the postgresql listen notify pattern, and it is the backbone of postgresql change data capture when durability requirements are modest.

Gold Lapel's doc_watch(collection, callback) installs a single trigger function on the target table and has the caller subscribe to the channel doc_<collection>. The trigger constructs a JSON payload in the same shape a MongoDB change stream event carries, calls pg_notify, and returns. doc_unwatch drops the trigger or simply unsubscribes the listener, depending on whether other consumers remain.

The trigger function itself is short enough to read over a single cup of tea:

Change-notify trigger function

CREATE OR REPLACE FUNCTION doc_notify_change()
RETURNS TRIGGER AS $$
DECLARE
  payload  jsonb;
  op       text;
  doc_id   text;
  full_doc jsonb;
BEGIN
  op := CASE TG_OP
          WHEN 'INSERT' THEN 'insert'
          WHEN 'UPDATE' THEN 'update'
          WHEN 'DELETE' THEN 'delete'
        END;

  IF TG_OP = 'DELETE' THEN
    doc_id   := OLD._id::text;
    full_doc := NULL;
  ELSE
    doc_id   := NEW._id::text;
    full_doc := NEW.data;
  END IF;

  payload := jsonb_build_object(
    'operationType', op,
    '_id',           doc_id,
    'fullDocument',  full_doc,
    'ns',            jsonb_build_object('coll', TG_TABLE_NAME)
  );

  PERFORM pg_notify('doc_' || TG_TABLE_NAME, payload::text);

  IF TG_OP = 'DELETE' THEN
    RETURN OLD;
  ELSE
    RETURN NEW;
  END IF;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER orders_doc_notify
AFTER INSERT OR UPDATE OR DELETE ON orders
FOR EACH ROW EXECUTE FUNCTION doc_notify_change();

Two details, if I may draw your attention to them. First, NEW is null in a DELETE trigger — so any attempt to read NEW.data or NEW._id in that branch would raise, and the function correctly reads from OLD. Returning OLD rather than NEW on DELETE is the idiomatic convention; for AFTER triggers the return value is ignored either way, but consistency matters when the same pattern is copied into a BEFORE trigger later. A small thing that, left unnoticed, produces loud failures at three in the morning. Second, this is FOR EACH ROW because change streams are per-document. The TTL and capped collection triggers later in this chapter are FOR EACH STATEMENT because they operate on the table as a whole. The distinction is load-bearing; I would not have you conflate them.

The Event Shape — Identical to MongoDB Change Streams

Applications that already consume MongoDB change streams should require minimal adaptation. Compare what MongoDB emits:

MongoDB

{
  "operationType": "insert",
  "documentKey":   { "_id": "a7f2c..." },
  "fullDocument":  { "_id": "a7f2c...", "total": 42, "status": "paid" },
  "ns":            { "db": "shop", "coll": "orders" }
}

to what Gold Lapel emits:

Gold Lapel

{
  "operationType": "insert",
  "_id":           "a7f2c...",
  "fullDocument":  { "_id": "a7f2c...", "total": 42, "status": "paid" },
  "ns":            { "coll": "orders" }
}

The divergence is small and named explicitly. Gold Lapel flattens documentKey._id to a top-level _id, and ns.db is omitted because the connection already knows its database. Every field a MongoDB consumer reads today reads identically; any documentKey._id access becomes _id. One line of adapter code. I have known less forgiving ports.

Language-Appropriate Delivery Across Seven Languages

The raw LISTEN/NOTIFY channel is wrapped idiomatically per language — a courtesy to the reader who should not have to translate from database to application in their head every time. Python exposes a callback, Go exposes a buffered channel, JavaScript and TypeScript expose an async iterator, Java exposes a listener interface, Ruby takes a block, and C# takes a delegate. The wire format is identical; only the shape of the handoff changes.

Three are shown below; the other four follow the same pattern in their host idiom.

Python

def on_change(event):
    print(event["operationType"], event["_id"])

doc_watch("orders", on_change)

ch, cancel := gl.DocWatch(ctx, "orders")
defer cancel()
for event := range ch {
    fmt.Println(event.OperationType, event.ID)
}

JavaScript

for await (const event of docWatch("orders")) {
  console.log(event.operationType, event._id);
}

Java registers a ChangeListener<Order> against the collection handle. Ruby uses doc_watch("orders") { |event| ... }. C# uses docWatch("orders", evt => ...). Under all seven, a single database connection dedicated to notifications dispatches events to the handler. One connection, seven idioms, one mechanism underneath.

Honest Boundary — What LISTEN/NOTIFY Doesn't Do

I would be remiss if I did not confess the limits plainly. LISTEN/NOTIFY is not a replacement for a durable event log, and pretending otherwise would disappoint you precisely when it mattered most.

Not durable. If no connection is actively listening on the channel when the notifying transaction commits, the notification is dropped. MongoDB change streams carry a resumeToken backed by oplog position, so a consumer that reconnects can pick up where it left off. Gold Lapel's change streams do not replay missed events. This is a real difference, and it is MongoDB's credit.
Payload size limit. The documented maximum is 8000 bytes per notification. Large documents won't fit; the pragmatic pattern is to emit an ID-only event and have the listener fetch the current row with a follow-up SELECT. The example trigger above sends fullDocument for convenience; a variant that emits only {operationType, _id, ns} is the right choice for tables with documents larger than a few kilobytes.
Per-connection, best-effort delivery. Notifications are buffered in a server-side async queue. Under sustained load with slow consumers, the queue can fill and producers will block. This is usually the desired behavior, but it is worth knowing before you discover it empirically.
For durability, use an outbox. The standard pattern is a transactional outbox table — rows inserted alongside the business write, consumed by a worker that marks them processed. Logical replication via pgoutput or wal2json is the heavier alternative when you need exactly-once ordered delivery. Chapter 15 covers when to reach for these.

If you require replayable events, build an outbox. If you require fire-and-forget notification of changes, LISTEN/NOTIFY is the answer the database has shipped for fifteen years. Choose the tool to fit the requirement — not the other way round.

TTL Indexes — `doc_create_ttl_index` and `doc_remove_ttl_index`

What MongoDB TTL Indexes Give You

A MongoDB TTL index expires documents automatically, and the convenience of that cannot be overstated. You index a date-typed field and declare expireAfterSeconds: 3600, and a background monitor thread scans the index approximately every sixty seconds, deleting documents whose timestamp plus the expiry interval is in the past. Sessions, rate-limit counters, cached API responses, temporary uploads, email verification tokens — anything with a natural deadline fits this pattern cleanly. A row that knows when to take its leave is a considerate row indeed.

Trigger-Based TTL in PostgreSQL — Expire Rows Automatically

doc_create_ttl_index("sessions", field="createdAt", expire_after_seconds=3600) creates two things. The first is a B-tree index on the extracted TTL field so that expiry queries don't scan the whole table. The second is a statement-level BEFORE INSERT trigger that deletes expired rows as part of each insert. This is the architectural choice worth dwelling on: MongoDB uses a background sweep, while Gold Lapel does lazy cleanup at write time. The difference shows up in the honest boundary below — and it is a genuine difference, not a cosmetic one.

TTL trigger

CREATE INDEX idx_sessions_ttl
  ON sessions (((data->>'createdAt')::timestamptz));

CREATE OR REPLACE FUNCTION sessions_ttl_sweep()
RETURNS TRIGGER AS $$
BEGIN
  DELETE FROM sessions
  WHERE (data->>'createdAt')::timestamptz < NOW() - INTERVAL '3600 seconds';
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER sessions_ttl_before_insert
BEFORE INSERT ON sessions
FOR EACH STATEMENT
EXECUTE FUNCTION sessions_ttl_sweep();

Three details repay a moment's attention. The index casts data->>'createdAt' to timestamptz to match the predicate — a plain text index would force a sequential scan, which is precisely the fate we are trying to avoid. The trigger is FOR EACH STATEMENT, not FOR EACH ROW, so a COPY of ten thousand rows fires the sweep once rather than ten thousand times. And the function returns NULL because the return value of statement-level triggers is ignored, and NULL is the conventional choice there.

One function. One index. One trigger. The whole feature.

The Python/JS/Go API Surface

Across all seven languages the operation is one call — as it should be:

TTL API

doc_create_ttl_index("sessions", field="createdAt", expire_after_seconds=3600)
doc_remove_ttl_index("sessions")

Go exposes DocCreateTTLIndex, Java uses docCreateTtlIndex, Ruby and Python use the snake-case form, C# uses DocCreateTtlIndex, and TypeScript and JavaScript use docCreateTtlIndex. The arguments and behavior are identical; only the casing follows the host language's conventions. A reader fluent in one will read the others without a glossary.

Honest Boundary — Write-Time Cleanup vs Background Sweep

The architectural difference matters, and I would rather you know it now than discover it later.

MongoDB sweeps in the background. A collection receiving no writes still gets cleaned within roughly sixty seconds, because the monitor thread does not care whether your application is active. This is a genuine strength of MongoDB's design.
Gold Lapel sweeps on write. A table with no inserts accumulates expired rows indefinitely. For session tables, rate limiters, caches, and anything actively being written, this is entirely fine — often better, in fact, because the cleanup amortises into work you were already doing, and a cold table costs you zero cycles.
For collections with sparse writes, schedule pg_cron (or any external scheduler) to call doc_ttl_sweep("collection") every N minutes. Gold Lapel ships this as an optional helper. You add one moving part; you get MongoDB-style eager cleanup.
Recommendation. For the ninety percent case — sessions, caches, rate limits, ephemeral application state — trigger-based cleanup is the correct default. For archival retention, compliance-driven expiry, or anything where exact timing of deletion is load-bearing, schedule the sweep explicitly with pg_cron so the behaviour matches the requirement.

MongoDB's background sweep is a different design suited to a different architecture. It is not wrong. It is simply not what PostgreSQL needs, and the honest version of this chapter says so.

Capped Collections — `doc_create_capped` and `doc_remove_cap`

What MongoDB Capped Collections Give You

A capped collection is fixed-size storage with insertion-order guarantees and automatic oldest-document eviction once the cap is reached. MongoDB preallocates the underlying files and maintains a tail-able cursor so that consumers can stream new inserts in order. Classic uses are rolling application logs, ring buffers for events, and recent-activity feeds — the kind of data that wants to be kept, briefly, and then graciously forgotten. MongoDB itself has quietly nudged new workloads toward time-series collections since 5.0, but the capped pattern — keep the last N items and drop the rest — remains a clean fit for plenty of problems.

Capped Collection PostgreSQL — Trigger-Based Row Limits

doc_create_capped("logs", max_documents=10000) installs an AFTER INSERT statement-level trigger that enforces the cap by deleting the oldest rows whenever the row count exceeds the limit. An optional max_bytes argument gates on pg_total_relation_size instead, for byte-bounded caps.

The clever bit is in the first SELECT: pg_class.reltuples is the planner's row-count estimate, maintained by ANALYZE and autovacuum, and it costs nothing to read. We consult it first and only fall back to an exact count(*) when the estimate says we might be over.

Capped collection trigger

CREATE OR REPLACE FUNCTION logs_enforce_cap()
RETURNS TRIGGER AS $$
DECLARE
  est_rows bigint;
  excess   bigint;
BEGIN
  -- cheap O(1) gate before committing to an exact count
  SELECT reltuples::bigint INTO est_rows
  FROM pg_class
  WHERE oid = 'logs'::regclass;

  IF est_rows > 10000 THEN
    SELECT count(*) INTO est_rows FROM logs;
    excess := est_rows - 10000;
    IF excess > 0 THEN
      DELETE FROM logs
      WHERE _id IN (
        SELECT _id FROM logs
        ORDER BY (data->>'_ts')::timestamptz ASC
        LIMIT excess
      );
    END IF;
  END IF;

  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE INDEX idx_logs_ts
  ON logs (((data->>'_ts')::timestamptz));

CREATE TRIGGER logs_cap_after_insert
AFTER INSERT ON logs
FOR EACH STATEMENT
EXECUTE FUNCTION logs_enforce_cap();

On a healthy table the estimate tracks reality closely enough to spare most unnecessary counts. The index on the ordering key is mandatory; without it, the ORDER BY in the delete subquery would sequentially scan the table on every batch of inserts, which is the sort of thing one notices at two in the morning and not before.

Ask the estimate first. Ask the exact count only when the estimate hints you must.

When a Capped Collection Is the Right Answer

Capped collections are the right default for append-only logs with a bounded retention count — "keep the last 100,000 events" — rather than a bounded retention age, which is what TTL gives you. For rolling application logs, recent-activity feeds, debug ring buffers, and anything where "how many" matters more than "how old," the capped pattern is cleaner than TTL because the disk footprint is genuinely bounded regardless of write rate.

Now, a candid word on scale. For very high write rates — tens of thousands of inserts per second sustained — the count-and-delete pattern has real overhead even with the reltuples gate, because the delete itself holds locks and generates WAL. At that tier, partitioned tables with a scheduled DROP PARTITION give you the same rolling-window semantics without touching individual rows. A partition is dropped as a single catalog operation; individual rows are never disturbed. Chapter 15 revisits this trade-off in detail. Different instruments for different tempos.

The Pattern — All Three Use Triggers

Comparison Table — MongoDB Feature to PostgreSQL Primitive

MongoDB Feature	Gold Lapel API	PostgreSQL Primitive	Fires
Change streams	`doc_watch` / `doc_unwatch`	`pg_notify` + row trigger	`AFTER INSERT OR UPDATE OR DELETE`, row-level
TTL indexes	`doc_create_ttl_index` / `doc_remove_ttl_index`	B-tree functional index + sweep trigger	`BEFORE INSERT`, statement-level
Capped collections	`doc_create_capped` / `doc_remove_cap`	B-tree index + row-count trigger	`AFTER INSERT`, statement-level

Why This Architecture Holds Up

MongoDB implements these three features with three distinct subsystems: the replication oplog for change streams, the TTL monitor thread for expiry, and preallocated capped files with a tailing cursor for bounded collections. Each is well-engineered. Each is a separate thing to learn, operate, monitor, and tune.

PostgreSQL implements these three features with three compositions of one subsystem. The trigger system is the common substrate; pg_notify is a function you call from a trigger; functional B-tree indexes are the same indexes that serve every other query. Nothing in this chapter introduced a new component to the database. If you already understand how PostgreSQL triggers fire, you already understand change streams, TTL expiry, and capped collections on PostgreSQL.

This is the recurring argument across these books, now applied to a third domain: cache, search, and document operational features — all composed from primitives the database already ships. One set of mechanics, three features. Three features, one thing to learn.

Change streams, TTL, and capped collections close the operational parity story, and I should like to mark the moment before we move on. Twenty-one methods across seven languages now cover the MongoDB surface the majority of applications actually use, and every one of them resolves to SQL a DBA can read over morning coffee without a decoder ring. That is a good place to have arrived.

For thirteen chapters the ledger has been pointed one way — showing where PostgreSQL can match what MongoDB does. Chapter 14 turns the ledger around and names, politely and with receipts, what MongoDB by the shape of its design cannot match back. ACID transactions spanning a JSONB collection and a relational table in one BEGIN/COMMIT. Native SQL joins between documents and the rest of the schema, planned rather than pipelined. PostGIS, pgvector, and tsvector running against the same JSONB column in a single query plan. Row-level security that enforces itself on every SELECT forever. pg_dump capturing the whole world in one file. Six capabilities, each a consequence of the document rows living inside a relational database rather than next to one.

The parity half of the argument is complete. The next chapter makes the other half. If you'll permit me, I shall hold the door.