wal2json

Q: Does wal2json require shared_preload_libraries?

No, and this is one of the things I appreciate about it. Unlike extensions such as pg_stat_statements, wal2json does not need to be added to shared_preload_libraries. It is loaded dynamically when a logical replication slot is created with wal2json as the output plugin. The only server-level requirement is setting wal_level = logical, which does require a restart.

Q: What is the difference between format version 1 and format version 2?

Format version 1 produces a single JSON object per transaction containing an array of all changes. This is convenient when you need to process entire transactions atomically. Format version 2 produces one JSON object per individual row change, with separate begin (B) and commit (C) markers. Version 2 is better for streaming because it uses less memory — the consumer can process changes incrementally rather than buffering an entire transaction.

Q: What happens if no consumer reads from the replication slot?

This is the question that earns its place at the top of every wal2json deployment checklist. PostgreSQL retains WAL segments as long as a replication slot exists — even if nothing is consuming from it. An idle slot can cause WAL to accumulate indefinitely, eventually filling your disk. Always monitor slot lag with pg_replication_slots and drop slots you no longer need. Promptly.

Q: Can I use wal2json with Debezium?

Yes, Debezium supports wal2json as an output plugin for its PostgreSQL connector. However, I would note that Debezium also supports pgoutput — the built-in logical replication protocol available since PostgreSQL 10 — which requires no additional plugins at all. For new Debezium deployments, pgoutput is generally the recommended choice. When a built-in capability meets your needs, simplicity argues in its favour.

Q: Does wal2json capture DDL changes like CREATE TABLE or ALTER TABLE?

No. Logical decoding in PostgreSQL captures DML operations only — INSERT, UPDATE, DELETE, and (since PostgreSQL 11 with wal2json) TRUNCATE. DDL changes such as CREATE TABLE or ALTER TABLE are not included in the logical decoding stream. If you need DDL tracking, you would need a separate mechanism such as event triggers. A reasonable limitation, not a deficiency — wal2json is concerned with what happened to your data, not what happened to your schema.

Q: How does wal2json compare to pgoutput?

pgoutput is the built-in logical replication output plugin that ships with PostgreSQL 10 and later. It uses a binary protocol and is the default for native logical replication. wal2json converts changes to human-readable JSON, which is easier to debug and integrate with tools that consume JSON natively. The trade-off is real: wal2json adds serialization overhead and produces larger payloads. For CDC pipelines that support pgoutput, it is often the more efficient choice. wal2json earns its place when you need JSON output directly, when your tooling requires it, or when being able to read the stream with your own eyes has value — and during a 2 a.m. incident, it very much does.

Every committed change, rendered as structured JSON. If you'll allow me, I find this one rather elegant.

The Waiter of Gold Lapel · Updated Mar 30, 2026 Published Mar 21, 2026 · 8 min read

PostgreSQL already records every change it makes — that is what the write-ahead log is for. The question is whether anyone outside the household can read it. wal2json is a logical decoding output plugin that translates those WAL entries into JSON objects. When a row is inserted, updated, or deleted, wal2json emits the change as structured JSON that downstream consumers — CDC tools, event pipelines, audit systems — can parse without understanding PostgreSQL's internal binary format.

What wal2json does

PostgreSQL's write-ahead log records every data change the database makes — a meticulous ledger, maintained without being asked. Logical decoding is the mechanism that reads those WAL entries and translates them into a format that external systems can consume. wal2json is one such translator: it takes the raw logical decoding stream and outputs JSON.

The plugin sits between PostgreSQL's logical decoding infrastructure and whatever tool consumes the changes. You create a logical replication slot specifying wal2json as the output plugin, and from that point forward, every committed DML change (INSERT, UPDATE, DELETE, TRUNCATE) in the database is available as a JSON object from that slot.

wal2json supports two output formats. Format version 1 groups all changes within a transaction into a single JSON object — useful when you need transactional atomicity in the consumer. Format version 2 emits one JSON object per individual row change, with separate begin and commit markers — better for streaming because the consumer can process changes incrementally without buffering entire transactions in memory.

When to use wal2json

wal2json is specifically designed for change data capture. Use it when you need to get data changes out of PostgreSQL and into another system.

CDC pipelines — stream database changes to Kafka, Redis, or other message brokers for downstream processing
Data synchronization — replicate changes from PostgreSQL to a data warehouse, search index, or cache
Audit trails — capture a complete history of every row-level change with before and after values
Event sourcing — derive domain events from database mutations for event-driven architectures
Debugging and inspection — watch changes in real time during development with human-readable JSON output

If your CDC tool supports pgoutput (PostgreSQL's built-in logical replication protocol, available since PostgreSQL 10), that is often the more efficient choice. wal2json is valuable when you specifically need JSON output, when your tooling requires it, or when the human-readable format simplifies debugging.

Installation and setup

wal2json does not require shared_preload_libraries. It is loaded dynamically when a replication slot is created. The only server-level prerequisite is setting wal_level = logical in postgresql.conf, which does require a restart.

On self-managed PostgreSQL, install wal2json from source via the GitHub repository or via your distribution's package manager. If using the PGDG apt repository: apt install postgresql-17-wal2json. For yum: yum install wal2json_17. On most managed cloud providers, wal2json is pre-installed — you only need to enable logical replication.

postgresql.conf + SQL

-- 1. Set wal_level in postgresql.conf (requires restart)
wal_level = logical
max_replication_slots = 10   -- at least 1 per consumer
max_wal_senders = 10         -- at least 1 per consumer

-- 2. Create a logical replication slot using wal2json
SELECT * FROM pg_create_logical_replication_slot('my_slot', 'wal2json');

Once the slot is created, changes begin accumulating. Nothing is lost between the time the slot is created and the time a consumer connects — PostgreSQL retains WAL segments until the slot is read. A diligent record-keeper, though one that will fill the pantry with unsorted correspondence if nobody collects the post.

Consuming changes with pg_recvlogical

The simplest way to consume changes from a wal2json slot is pg_recvlogical, a command-line tool that ships with PostgreSQL. It connects to a replication slot and streams decoded changes to stdout or a file.

Shell

# Stream changes to stdout in format version 1 (one JSON object per transaction)
pg_recvlogical -d mydb --slot my_slot --start --plugin=wal2json \
  -o pretty-print=1 \
  -o include-timestamp=1 \
  -f -

# Stream changes in format version 2 (one JSON object per change)
pg_recvlogical -d mydb --slot my_slot --start --plugin=wal2json \
  -o format-version=2 \
  -o include-timestamp=1 \
  -f -

Options are passed via -o key=value flags. Common options include include-timestamp, include-xids, include-lsn, and pretty-print.

Output format: v1 vs v2

The two format versions serve different consumption patterns.

Format version 1

Groups all changes in a transaction into a single JSON object with a change array. The entire transaction is one message.

JSON (format-version 1)

-- Format version 1: one JSON object per transaction
{
  "change": [
    {
      "kind": "insert",
      "schema": "public",
      "table": "orders",
      "columnnames": ["id", "customer_id", "total"],
      "columntypes": ["integer", "integer", "numeric"],
      "columnvalues": [1001, 42, 99.95]
    },
    {
      "kind": "update",
      "schema": "public",
      "table": "customers",
      "columnnames": ["id", "last_order_at"],
      "columntypes": ["integer", "timestamp with time zone"],
      "columnvalues": [42, "2026-03-21 14:30:00+00"],
      "oldkeys": {
        "keynames": ["id"],
        "keytypes": ["integer"],
        "keyvalues": [42]
      }
    }
  ]
}

Format version 2

Emits one JSON object per row change. Transactions are delimited by B (begin) and C (commit) action markers. Each change uses a columns array with name, type, and value for each field.

JSON (format-version 2)

-- Format version 2: one JSON object per change
{"action":"B"}
{"action":"I","schema":"public","table":"orders",
 "columns":[
   {"name":"id","type":"integer","value":1001},
   {"name":"customer_id","type":"integer","value":42},
   {"name":"total","type":"numeric","value":99.95}
 ]}
{"action":"U","schema":"public","table":"customers",
 "columns":[
   {"name":"id","type":"integer","value":42},
   {"name":"last_order_at","type":"timestamp with time zone","value":"2026-03-21 14:30:00+00"}
 ],
 "identity":[
   {"name":"id","type":"integer","value":42}
 ]}
{"action":"C"}

Version 2 is generally preferred for production CDC pipelines because it allows incremental processing and uses less memory on both the producer and consumer sides. Version 1 is convenient for debugging or when transactional grouping is needed by the consumer. The parameter reference in the project README documents all available output options. I appreciate when a tool offers both — the choice between batched elegance and streaming pragmatism should belong to you, not the plugin author.

Using the SQL API

In addition to pg_recvlogical, you can consume changes directly via SQL using pg_logical_slot_peek_changes (non-destructive) and pg_logical_slot_get_changes (advances the slot).

SQL

-- Peek at changes without consuming them
SELECT * FROM pg_logical_slot_peek_changes(
  'my_slot', NULL, NULL,
  'format-version', '2',
  'include-timestamp', '1'
);

-- Consume changes (advances the slot position)
SELECT * FROM pg_logical_slot_get_changes(
  'my_slot', NULL, NULL,
  'format-version', '2',
  'include-timestamp', '1'
);

The SQL API is useful for testing and for applications that poll for changes on a schedule rather than maintaining a persistent streaming connection.

Table filtering

wal2json supports filtering at the table level, so you can capture changes from only the tables you care about.

Shell

-- Only capture changes from specific tables
pg_recvlogical -d mydb --slot my_slot --start --plugin=wal2json \
  -o format-version=2 \
  -o add-tables=public.orders,public.customers \
  -f -

-- Exclude specific tables
pg_recvlogical -d mydb --slot my_slot --start --plugin=wal2json \
  -o format-version=2 \
  -o filter-tables=public.audit_log,public.sessions \
  -f -

Table names are specified as schema.table pairs separated by commas. Use add-tables for an allow list or filter-tables for a deny list.

Slot management

Replication slots retain WAL on the server until a consumer reads and acknowledges the changes. An unused or abandoned slot will cause WAL to accumulate indefinitely, eventually filling the disk. I have seen this particular oversight take down production databases that were otherwise running perfectly well. Always monitor and clean up slots you no longer need.

SQL

-- List active replication slots
SELECT slot_name, plugin, active, restart_lsn
FROM pg_replication_slots;

-- Drop a replication slot when no longer needed
SELECT pg_drop_replication_slot('my_slot');

Cloud availability

Provider	Status
Amazon RDS / Aurora	Available — enable via `rds.logical_replication` parameter
Google Cloud SQL	Available — enable via `cloudsql.logical_decoding` flag
Azure Database for PostgreSQL	Available — enable via `azure.replication_support = logical`
Supabase	Available — used internally for Realtime; enable logical replication in dashboard
Neon	Available — pre-installed; enable logical replication in project settings

How Gold Lapel relates

Gold Lapel and wal2json address different concerns entirely, and I see no reason to pretend otherwise. wal2json reads the write-ahead log after changes have been committed — it is a post-transaction tool for getting data out of PostgreSQL. Gold Lapel sits in front of PostgreSQL as a proxy, analyzing and optimizing queries before they reach the database. One watches what leaves the kitchen; the other attends to what arrives at the door.

There is no direct integration between the two, and I see no reason to pretend otherwise. If you are using wal2json for CDC, Gold Lapel will not interfere with your replication slots or logical decoding setup — the proxy layer is transparent to replication traffic. They coexist without conflict, each attending to its own duties.

wal2json

What wal2json does

When to use wal2json

Installation and setup

Consuming changes with pg_recvlogical

Output format: v1 vs v2

Format version 1

Format version 2

Using the SQL API

Table filtering

Slot management

Cloud availability

How Gold Lapel relates

Frequently asked questions

Does wal2json require shared_preload_libraries?

What is the difference between format version 1 and format version 2?

What happens if no consumer reads from the replication slot?

Can I use wal2json with Debezium?

Does wal2json capture DDL changes like CREATE TABLE or ALTER TABLE?

How does wal2json compare to pgoutput?

Related content