← PostgreSQL Extensions

wal2json

Every committed change, rendered as structured JSON. If you'll allow me, I find this one rather elegant.

Extension · March 21, 2026 · 8 min read

PostgreSQL already records every change it makes — that is what the write-ahead log is for. The question is whether anyone outside the household can read it. wal2json is a logical decoding output plugin that translates those WAL entries into JSON objects. When a row is inserted, updated, or deleted, wal2json emits the change as structured JSON that downstream consumers — CDC tools, event pipelines, audit systems — can parse without understanding PostgreSQL's internal binary format.

What wal2json does

PostgreSQL's write-ahead log records every data change the database makes — a meticulous ledger, maintained without being asked. Logical decoding is the mechanism that reads those WAL entries and translates them into a format that external systems can consume. wal2json is one such translator: it takes the raw logical decoding stream and outputs JSON.

The plugin sits between PostgreSQL's logical decoding infrastructure and whatever tool consumes the changes. You create a logical replication slot specifying wal2json as the output plugin, and from that point forward, every committed DML change (INSERT, UPDATE, DELETE, TRUNCATE) in the database is available as a JSON object from that slot.

wal2json supports two output formats. Format version 1 groups all changes within a transaction into a single JSON object — useful when you need transactional atomicity in the consumer. Format version 2 emits one JSON object per individual row change, with separate begin and commit markers — better for streaming because the consumer can process changes incrementally without buffering entire transactions in memory.

When to use wal2json

wal2json is specifically designed for change data capture. Use it when you need to get data changes out of PostgreSQL and into another system.

  • CDC pipelines — stream database changes to Kafka, Redis, or other message brokers for downstream processing
  • Data synchronization — replicate changes from PostgreSQL to a data warehouse, search index, or cache
  • Audit trails — capture a complete history of every row-level change with before and after values
  • Event sourcing — derive domain events from database mutations for event-driven architectures
  • Debugging and inspection — watch changes in real time during development with human-readable JSON output

If your CDC tool supports pgoutput (PostgreSQL's built-in logical replication protocol, available since PostgreSQL 10), that is often the more efficient choice. wal2json is valuable when you specifically need JSON output, when your tooling requires it, or when the human-readable format simplifies debugging.

Installation and setup

wal2json does not require shared_preload_libraries. It is loaded dynamically when a replication slot is created. The only server-level prerequisite is setting wal_level = logical in postgresql.conf, which does require a restart.

On self-managed PostgreSQL, install wal2json from source or via your distribution's package manager. If using the PGDG apt repository: apt install postgresql-17-wal2json. For yum: yum install wal2json_17. On most managed cloud providers, wal2json is pre-installed — you only need to enable logical replication.

postgresql.conf + SQL
-- 1. Set wal_level in postgresql.conf (requires restart)
wal_level = logical
max_replication_slots = 10   -- at least 1 per consumer
max_wal_senders = 10         -- at least 1 per consumer

-- 2. Create a logical replication slot using wal2json
SELECT * FROM pg_create_logical_replication_slot('my_slot', 'wal2json');

Once the slot is created, changes begin accumulating. Nothing is lost between the time the slot is created and the time a consumer connects — PostgreSQL retains WAL segments until the slot is read. A diligent record-keeper, though one that will fill the pantry with unsorted correspondence if nobody collects the post.

Consuming changes with pg_recvlogical

The simplest way to consume changes from a wal2json slot is pg_recvlogical, a command-line tool that ships with PostgreSQL. It connects to a replication slot and streams decoded changes to stdout or a file.

Shell
# Stream changes to stdout in format version 1 (one JSON object per transaction)
pg_recvlogical -d mydb --slot my_slot --start --plugin=wal2json \
  -o pretty-print=1 \
  -o include-timestamp=1 \
  -f -

# Stream changes in format version 2 (one JSON object per change)
pg_recvlogical -d mydb --slot my_slot --start --plugin=wal2json \
  -o format-version=2 \
  -o include-timestamp=1 \
  -f -

Options are passed via -o key=value flags. Common options include include-timestamp, include-xids, include-lsn, and pretty-print.

Output format: v1 vs v2

The two format versions serve different consumption patterns.

Format version 1

Groups all changes in a transaction into a single JSON object with a change array. The entire transaction is one message.

JSON (format-version 1)
-- Format version 1: one JSON object per transaction
{
  "change": [
    {
      "kind": "insert",
      "schema": "public",
      "table": "orders",
      "columnnames": ["id", "customer_id", "total"],
      "columntypes": ["integer", "integer", "numeric"],
      "columnvalues": [1001, 42, 99.95]
    },
    {
      "kind": "update",
      "schema": "public",
      "table": "customers",
      "columnnames": ["id", "last_order_at"],
      "columntypes": ["integer", "timestamp with time zone"],
      "columnvalues": [42, "2026-03-21 14:30:00+00"],
      "oldkeys": {
        "keynames": ["id"],
        "keytypes": ["integer"],
        "keyvalues": [42]
      }
    }
  ]
}

Format version 2

Emits one JSON object per row change. Transactions are delimited by B (begin) and C (commit) action markers. Each change uses a columns array with name, type, and value for each field.

JSON (format-version 2)
-- Format version 2: one JSON object per change
{"action":"B"}
{"action":"I","schema":"public","table":"orders",
 "columns":[
   {"name":"id","type":"integer","value":1001},
   {"name":"customer_id","type":"integer","value":42},
   {"name":"total","type":"numeric","value":99.95}
 ]}
{"action":"U","schema":"public","table":"customers",
 "columns":[
   {"name":"id","type":"integer","value":42},
   {"name":"last_order_at","type":"timestamp with time zone","value":"2026-03-21 14:30:00+00"}
 ],
 "identity":[
   {"name":"id","type":"integer","value":42}
 ]}
{"action":"C"}

Version 2 is generally preferred for production CDC pipelines because it allows incremental processing and uses less memory on both the producer and consumer sides. Version 1 is convenient for debugging or when transactional grouping is needed by the consumer. I appreciate when a tool offers both — the choice between batched elegance and streaming pragmatism should belong to you, not the plugin author.

Using the SQL API

In addition to pg_recvlogical, you can consume changes directly via SQL using pg_logical_slot_peek_changes (non-destructive) and pg_logical_slot_get_changes (advances the slot).

SQL
-- Peek at changes without consuming them
SELECT * FROM pg_logical_slot_peek_changes(
  'my_slot', NULL, NULL,
  'format-version', '2',
  'include-timestamp', '1'
);

-- Consume changes (advances the slot position)
SELECT * FROM pg_logical_slot_get_changes(
  'my_slot', NULL, NULL,
  'format-version', '2',
  'include-timestamp', '1'
);

The SQL API is useful for testing and for applications that poll for changes on a schedule rather than maintaining a persistent streaming connection.

Table filtering

wal2json supports filtering at the table level, so you can capture changes from only the tables you care about.

Shell
-- Only capture changes from specific tables
pg_recvlogical -d mydb --slot my_slot --start --plugin=wal2json \
  -o format-version=2 \
  -o add-tables=public.orders,public.customers \
  -f -

-- Exclude specific tables
pg_recvlogical -d mydb --slot my_slot --start --plugin=wal2json \
  -o format-version=2 \
  -o filter-tables=public.audit_log,public.sessions \
  -f -

Table names are specified as schema.table pairs separated by commas. Use add-tables for an allow list or filter-tables for a deny list.

Slot management

Replication slots retain WAL on the server until a consumer reads and acknowledges the changes. An unused or abandoned slot will cause WAL to accumulate indefinitely, eventually filling the disk. I have seen this particular oversight take down production databases that were otherwise running perfectly well. Always monitor and clean up slots you no longer need.

SQL
-- List active replication slots
SELECT slot_name, plugin, active, restart_lsn
FROM pg_replication_slots;

-- Drop a replication slot when no longer needed
SELECT pg_drop_replication_slot('my_slot');

Cloud availability

ProviderStatus
Amazon RDS / AuroraAvailable — enable via rds.logical_replication parameter
Google Cloud SQLAvailable — enable via cloudsql.logical_decoding flag
Azure Database for PostgreSQLAvailable — enable via azure.replication_support = logical
SupabaseAvailable — used internally for Realtime; enable logical replication in dashboard
NeonAvailable — pre-installed; enable logical replication in project settings

How Gold Lapel relates

I should be straightforward: Gold Lapel and wal2json address different concerns entirely. wal2json reads the write-ahead log after changes have been committed — it is a post-transaction tool for getting data out of PostgreSQL. Gold Lapel sits in front of PostgreSQL as a proxy, analyzing and optimizing queries before they reach the database. One watches what leaves the kitchen; the other attends to what arrives at the door.

There is no direct integration between the two, and I see no reason to pretend otherwise. If you are using wal2json for CDC, Gold Lapel will not interfere with your replication slots or logical decoding setup — the proxy layer is transparent to replication traffic. They coexist without conflict, each attending to its own duties.

Frequently asked questions