Symfony Messenger's PostgreSQL Transport: LISTEN/NOTIFY Performance and the Scaling Pitfalls
Each Messenger worker holds one persistent database connection. That connection cannot be pooled, cannot be shared, and cannot be released. Here is what that means at 40 workers — and what to do about it.
Good evening. Your message queue is hiding inside your database.
Symfony Messenger is one of the more thoughtfully designed queue components in PHP. It dispatches messages to transports, consumes them with workers, retries failures, and handles serialization with a degree of care that suggests someone on the Symfony team has operated a message queue at three in the morning and would prefer not to repeat the experience.
It supports Redis, RabbitMQ, Amazon SQS, and Doctrine — the usual suspects. Each of these transports carries its own infrastructure burden: a Redis server to manage, a RabbitMQ cluster to keep healthy, an AWS account to fund. For a team that already runs PostgreSQL, each additional service is another system to deploy, monitor, back up, and explain to the on-call engineer at 2 AM when it stops accepting connections.
But there is a fifth option that gets less attention: the native PostgreSQL transport. Not Doctrine's DBAL transport operating against a PostgreSQL database — though the two are related and frequently confused. The dedicated PostgreSQL transport that uses LISTEN/NOTIFY for real-time message delivery.
If you already run PostgreSQL, this transport is genuinely appealing. No additional infrastructure. No Redis to manage. No RabbitMQ cluster to keep healthy. Your messages live in the same database as your application data, with the same backup strategy, the same monitoring, the same connection credentials. One system. One operational surface.
It works well — until you scale it. And the problems that emerge are not bugs. They are not regressions or missing features. They are architectural constraints baked into PostgreSQL's connection model that become visible at exactly the wrong time: when you need more throughput and add more workers.
I should like to walk you through the full picture. How the transport works internally. Where the performance limits live. What EXPLAIN ANALYZE reveals about the polling query. Why DBAL versions matter more than they should. And what you can do about the connection problem before it becomes urgent.
If you will permit me, I shall be thorough.
How does the PostgreSQL transport store messages?
Whether you use the Doctrine transport or the native PostgreSQL transport, messages end up in the same place: a table. This is PostgreSQL's answer to the question every message queue must answer — where do messages wait between dispatch and consumption?
-- Symfony creates this table automatically (auto_setup=true)
-- or you can create it yourself:
CREATE TABLE messenger_messages (
id BIGSERIAL PRIMARY KEY,
body TEXT NOT NULL,
headers TEXT NOT NULL,
queue_name VARCHAR(190) NOT NULL,
created_at TIMESTAMP(0) WITHOUT TIME ZONE NOT NULL,
available_at TIMESTAMP(0) WITHOUT TIME ZONE NOT NULL,
delivered_at TIMESTAMP(0) WITHOUT TIME ZONE DEFAULT NULL
);
CREATE INDEX idx_messenger_available ON messenger_messages
(queue_name, available_at)
WHERE delivered_at IS NULL;
-- This partial index is doing real work. It narrows the scan
-- to only undelivered messages, ordered by availability.
-- Without it, every poll query scans the full table. The schema is straightforward, and I mean that as a compliment. Each message is a row. The body column holds the serialized PHP object — your SendInvoice or ResizeImage message, serialized through Symfony's serializer component. The headers column holds Messenger's metadata: stamps, routing information, retry count, the class name of the message. The queue_name column partitions messages across logical queues without needing separate tables.
The partial index on (queue_name, available_at) WHERE delivered_at IS NULL is the performance-critical piece. Every poll query filters on exactly these columns. Without the partial index, the query scans every row in the table, including the thousands — or hundreds of thousands — of already-delivered messages that are waiting for cleanup.
Allow me to demonstrate precisely how critical this index is.
-- EXPLAIN ANALYZE on the polling query with the partial index:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT m.* FROM messenger_messages m
WHERE m.queue_name = 'default'
AND m.available_at <= NOW()
AND m.delivered_at IS NULL
ORDER BY m.available_at ASC
LIMIT 1
FOR UPDATE SKIP LOCKED;
-- QUERY PLAN
-- ---------------------------------------------------------
-- Limit (cost=0.28..4.30 rows=1 width=540)
-- (actual time=0.031..0.032 rows=1 loops=1)
-- -> LockRows (cost=0.28..4.30 rows=1 width=540)
-- (actual time=0.030..0.031 rows=1 loops=1)
-- -> Index Scan using idx_messenger_available
-- on messenger_messages m
-- (cost=0.28..4.29 rows=1 width=540)
-- (actual time=0.021..0.022 rows=1 loops=1)
-- Index Cond: ((queue_name = 'default')
-- AND (available_at <= now()))
-- Filter: (delivered_at IS NULL)
-- Buffers: shared hit=4
-- Planning Time: 0.082 ms
-- Execution Time: 0.048 ms
--
-- 0.048 ms. Four buffer hits, all from shared memory.
-- This query could run 20,000 times per second
-- without inconveniencing anyone. 0.048 milliseconds. Four shared buffer hits, all served from memory. The partial index narrows the scan to only undelivered messages, and since LIMIT 1 stops at the first qualifying row, the query touches at most a handful of index entries. This query could run twenty thousand times per second on modest hardware. It is, if you will forgive the enthusiasm, rather efficient.
Now observe what happens without it.
-- The same query WITHOUT the partial index.
-- A cautionary tale.
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT m.* FROM messenger_messages m
WHERE m.queue_name = 'default'
AND m.available_at <= NOW()
AND m.delivered_at IS NULL
ORDER BY m.available_at ASC
LIMIT 1
FOR UPDATE SKIP LOCKED;
-- QUERY PLAN
-- ---------------------------------------------------------
-- Limit (cost=0.00..1847.33 rows=1 width=540)
-- (actual time=42.187..42.188 rows=1 loops=1)
-- -> LockRows (cost=0.00..1847.33 rows=1 width=540)
-- (actual time=42.186..42.187 rows=1 loops=1)
-- -> Sort (cost=1847.32..1847.33 rows=1 width=540)
-- (actual time=42.170..42.171 rows=1 loops=1)
-- Sort Key: available_at
-- Sort Method: top-N heapsort Memory: 25kB
-- -> Seq Scan on messenger_messages m
-- (cost=0.00..1847.31 rows=1 width=540)
-- (actual time=0.015..42.089 rows=1 loops=1)
-- Filter: ((delivered_at IS NULL)
-- AND (queue_name = 'default')
-- AND (available_at <= now()))
-- Rows Removed by Filter: 87429
-- Buffers: shared hit=1247
-- Planning Time: 0.094 ms
-- Execution Time: 42.221 ms
--
-- 42ms. Sequential scan across 87,429 rows.
-- 1,247 buffer hits. At 40 workers polling once per second,
-- that is 49,880 buffer hits per second -- for nothing.
-- The partial index reduces this by 99.7%. 42 milliseconds. A sequential scan across 87,429 rows. 1,247 shared buffer hits. The query must examine every row in the table, filter out the 87,428 already-delivered messages, sort the remaining one by available_at, and then lock it. At 40 workers polling once per second, that is 40 sequential scans per second — 49,880 buffer hits per second, with a cumulative query time of 1.7 seconds of CPU time per second. On a busy database, this is the kind of load that makes everything else slower without an obvious culprit in the slow query log, because each individual query is only 42ms.
The partial index reduces the work by 99.7%. If you take one thing from this section: verify the index exists. In production, create it in a migration and set auto_setup=false. The auto_setup=true default is convenient for development, but it means the index creation runs on first boot — potentially under load, potentially racing with other workers. A migration gives you control over when and how the index is created.
LISTEN/NOTIFY: how PostgreSQL delivers messages without polling
The difference between the Doctrine transport and the native PostgreSQL transport comes down to one feature: LISTEN/NOTIFY. It is the difference between checking your letterbox every hour and having the postman ring the doorbell.
-- How LISTEN/NOTIFY works in PostgreSQL:
-- Worker process (consumer) subscribes to a channel:
LISTEN messenger_default;
-- When a new message is inserted, a trigger fires:
NOTIFY messenger_default;
-- The worker receives the notification without polling.
-- No repeated SELECT queries. No wasted cycles.
-- The notification carries no payload in Symfony's case --
-- it is a "wake up and check the table" signal.
-- What the worker does when notified:
SELECT m.* FROM messenger_messages m
WHERE m.queue_name = 'default'
AND m.available_at <= NOW()
AND m.delivered_at IS NULL
ORDER BY m.available_at ASC
LIMIT 1
FOR UPDATE SKIP LOCKED;
-- FOR UPDATE SKIP LOCKED is the concurrency primitive.
-- Multiple workers can poll simultaneously.
-- Each gets a different message. No duplicates. No blocking. Without LISTEN/NOTIFY, a Messenger worker polls the database on a fixed interval. Every second (or every 5 seconds, or every 60 seconds — depending on your configuration), the worker executes a SELECT query against the messenger_messages table. Most of the time, the query returns zero rows. The worker is burning a query cycle to learn that nothing happened. This is not a crisis — we have just seen that the polling query with the partial index runs in 0.048ms. But it is wasteful in the way that checking an empty letterbox is wasteful: the effort is small, the cumulative effect is real, and there is a better way.
With LISTEN/NOTIFY, the worker subscribes to a PostgreSQL notification channel and waits. When a new message is inserted, a trigger fires pg_notify(), and PostgreSQL pushes a notification to every listening connection. The worker wakes up immediately, fetches the message, and processes it. No polling. No wasted queries. No latency beyond the network round trip.
-- Latency comparison: LISTEN/NOTIFY vs polling intervals.
--
-- Message dispatched at T=0. When does the worker pick it up?
--
-- Transport | Best case | Average | Worst case
-- ----------------------------+-----------+----------+-----------
-- LISTEN/NOTIFY | ~1 ms | ~5 ms | ~50 ms
-- Polling @ 100ms interval | ~1 ms | ~50 ms | ~100 ms
-- Polling @ 1s interval | ~1 ms | ~500 ms | ~1000 ms
-- Polling @ 5s interval | ~1 ms | ~2500 ms | ~5000 ms
-- Polling @ 60s (default) | ~1 ms | ~30 s | ~60 s
--
-- The best case is identical: the poll fires at the exact moment
-- the message arrives. But the average tells the real story.
-- LISTEN/NOTIFY delivers in 5ms on average.
-- The default Symfony polling interval: 30 seconds on average.
-- That is a 6,000x difference in typical delivery latency. The numbers are plain. LISTEN/NOTIFY delivers in 5ms on average. The default Symfony polling interval delivers in 30 seconds on average. That is a 6,000x difference in typical delivery latency. For an invoice email, 30 seconds may be acceptable. For a real-time notification, a webhook callback, or an inventory sync that blocks checkout — it is not.
-- Symfony's PostgreSQL transport creates this trigger
-- when use_notify is enabled:
CREATE OR REPLACE FUNCTION notify_messenger_default()
RETURNS TRIGGER AS $$
BEGIN
PERFORM pg_notify('messenger_default', '');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER messenger_default_notify
AFTER INSERT ON messenger_messages
FOR EACH ROW
WHEN (NEW.queue_name = 'default')
EXECUTE FUNCTION notify_messenger_default();
-- One trigger per queue_name. If you have three queues
-- (default, priority, bulk), you get three triggers
-- and three LISTEN channels. One important detail that deserves emphasis: the notification carries no payload in Symfony's implementation. It does not contain the message body, the message ID, or even the queue name (though the channel name encodes the queue). The notification is purely a signal: "something was inserted, come check." The worker still executes the SELECT ... FOR UPDATE SKIP LOCKED query to actually retrieve the message.
This design is deliberate, and I approve of it. Keeping the notification payload-free means the trigger is simple, the notification is small (no serialization concerns, no size limits), and the actual message retrieval goes through the same transactional path regardless of whether it was triggered by a notification or by polling. The notification is a performance optimization, not a delivery mechanism. The table is always the source of truth.
When does NOTIFY actually fire? A subtlety about transactions.
There is a timing detail about NOTIFY that trips up even experienced PostgreSQL users, and it is particularly relevant for Symfony applications where message dispatch often occurs inside a Doctrine transaction.
-- A subtlety that trips up even experienced PostgreSQL users:
-- NOTIFY is delivered when the transaction COMMITS, not when
-- pg_notify() is called.
BEGIN;
INSERT INTO messenger_messages (body, headers, queue_name,
created_at, available_at)
VALUES ('{"class":"SendInvoice"}', '{}', 'default',
NOW(), NOW());
-- The AFTER INSERT trigger fires pg_notify() here.
-- But the notification is NOT delivered yet.
-- ... more work happens in the same transaction ...
INSERT INTO audit_log (event, created_at)
VALUES ('invoice_dispatched', NOW());
COMMIT;
-- NOW the notification is delivered to all LISTENing connections.
-- This is correct behavior. If the transaction rolled back,
-- the message row would not exist, and waking a worker
-- to find nothing would be wasteful.
--
-- But it means that long-running transactions that dispatch
-- messages will delay notification delivery. A transaction
-- that takes 5 seconds to commit delays the NOTIFY by 5 seconds. The trigger calls pg_notify() on INSERT. But PostgreSQL does not deliver the notification to listening connections until the enclosing transaction commits. This is correct behavior — if the transaction rolled back, the message row would not exist, and waking a worker to find nothing would be wasteful. But it means that message delivery latency includes the time between INSERT and COMMIT.
In most Symfony applications, this gap is negligible. The message dispatch happens inside a controller action or a service method, the Doctrine flush commits shortly after, and the notification fires. Perhaps 10-50ms of additional latency.
But consider a batch operation. A Symfony command that imports 10,000 products, dispatching a SyncInventory message for each one, all inside a single transaction. The 10,000 NOTIFY calls queue up inside the transaction. None of them fire until the final COMMIT. If the import takes 30 seconds, the first message dispatched waits 30 seconds for its notification. The last message dispatched gets notified almost immediately after commit.
This is not a bug. It is a consequence of PostgreSQL's transactional notification semantics, and it is the right behavior for data consistency. But if you are dispatching messages inside long-running transactions and wondering why delivery latency is higher than expected, this is the explanation. The solution is to break long-running operations into smaller transactions, or to dispatch messages outside the main transaction boundary using Symfony's dispatch-after-handler-completes pattern.
The connection problem: one worker, one persistent connection
Here is where the architectural constraint appears. I have been building toward this, and I am afraid the news is not entirely welcome.
PostgreSQL's LISTEN command subscribes the current connection to a notification channel. The subscription is tied to the session — the backend process that serves this particular connection. If the connection is returned to a pool, the subscription is lost. If the connection is handed to a different client, that client inherits the subscription — which it did not ask for and does not expect. If the connection is closed, the subscription evaporates.
This means every Messenger worker that uses LISTEN/NOTIFY must hold a dedicated, persistent, never-pooled connection to PostgreSQL for the entire lifetime of the worker process. One worker, one connection, no exceptions, no negotiation.
# Each Symfony Messenger worker holds one persistent connection.
# The connection must stay open to receive LISTEN notifications.
# Worker 1: LISTEN messenger_default (connection held open)
# Worker 2: LISTEN messenger_default (connection held open)
# Worker 3: LISTEN messenger_default (connection held open)
# Worker 4: LISTEN messenger_priority (connection held open)
# Worker 5: LISTEN messenger_bulk (connection held open)
# 5 workers = 5 persistent PostgreSQL connections.
# These connections cannot be returned to a pool.
# They cannot be shared. They cannot be time-sliced.
# Each one sits in pg_stat_activity, waiting.
# On a managed PostgreSQL instance with max_connections = 100:
# - 20 web server connections (PHP-FPM pool)
# - 5 worker connections (Messenger consumers)
# - 3 connections (cron jobs, admin tools, migrations)
# - 5 reserved (superuser_reserved_connections)
# = 33 connections used
#
# Comfortable. Now scale to 40 workers:
# - 20 web + 40 workers + 3 admin + 5 reserved = 68
#
# And the web traffic doubles:
# - 40 web + 40 workers + 3 admin + 5 reserved = 88
#
# You are now negotiating with max_connections. At 5 workers, this is unremarkable. At 40 workers, you are consuming 40 of your PostgreSQL max_connections slots just for message listening — connections that spend 99.9% of their time idle, waiting for notifications that arrive perhaps a few times per second.
-- What each idle LISTEN connection actually costs PostgreSQL:
--
-- Component | Per connection
-- -------------------------------+----------------
-- Backend process (RSS) | ~5-10 MB
-- work_mem allocation | 4 MB (default)
-- temp_buffers allocation | 8 MB (default)
-- Catalog cache (populated) | ~1-3 MB
-- Lock table entries | ~2 KB
-- pg_stat_activity row | ~1 KB
-- Process scheduling overhead | kernel-dependent
-- -------------------------------+----------------
-- Total per idle connection | ~10-20 MB
--
-- 40 idle LISTEN connections: 400-800 MB of RAM
-- doing absolutely nothing except waiting for pg_notify.
--
-- That is RAM that could fund shared_buffers,
-- which would actually improve query performance.
SELECT sum(pg_total_relation_size('messenger_messages'))
AS table_bytes,
count(*) AS backend_count,
count(*) FILTER (WHERE state = 'idle') AS idle_count,
count(*) FILTER (WHERE query LIKE '%LISTEN%') AS listen_count
FROM pg_stat_activity
WHERE datname = current_database();
-- backend_count | idle_count | listen_count
-- --------------+------------+--------------
-- 53 | 42 | 40 Each of those idle connections still costs PostgreSQL resources. On the server side, every connection is a process (or a thread, in newer PostgreSQL builds with the experimental thread model). Each connection consumes memory for its work_mem allocation, maintenance buffers, session state, and catalog cache. The pg_stat_activity view shows them sitting in state idle, wait_event ClientRead. They are doing nothing. They are costing something.
Forty idle LISTEN connections consume 400-800 MB of RAM on the PostgreSQL server. That is RAM that is not available for shared_buffers, which is where PostgreSQL caches table and index data for all queries. You are, in effect, trading query performance for message delivery latency. The trade is often worth making. But you should make it knowingly.
I should be honest about the broader context: this is not a Symfony problem. It is not a PHP problem. It is a PostgreSQL LISTEN/NOTIFY problem. Any system that uses LISTEN for real-time notification delivery faces the same constraint: Redis Pub/Sub connections, RabbitMQ consumer connections, ZeroMQ subscriber sockets. The difference is that PostgreSQL connections are expensive. Redis connections are cheap — a Redis server handles 10,000 concurrent connections with minimal memory overhead. RabbitMQ connections multiplex over channels, allowing hundreds of logical consumers on a single TCP connection. PostgreSQL does not have that luxury. Each connection is a process. Each process costs memory. This is the fundamental architectural tension.
DBAL 3 vs DBAL 4: the compatibility minefield
If you have migrated a Symfony application between DBAL versions, you may have encountered this already. If you have not — and you are planning to upgrade — consider this fair warning from someone who has seen the aftermath.
# The DBAL compatibility issue -- a known pain point.
#
# Symfony's Doctrine transport relies on DBAL for database access.
# The PostgreSQL LISTEN/NOTIFY transport has had documented issues
# across DBAL versions:
#
# DBAL 3.x:
# - Used PDO under the hood
# - LISTEN/NOTIFY required the native pgsql extension
# - Switching between PDO and native for the same connection
# caused subtle connection state bugs
#
# DBAL 4.x:
# - Dropped the PDO abstraction for PostgreSQL
# - Native driver only (pgsql extension required)
# - Fixed some LISTEN issues, introduced others
# - Connection reset behavior changed
#
# The practical impact:
# - Workers occasionally miss notifications after reconnect
# - Idle connections may be killed by the server,
# and the LISTEN subscription is lost on reconnect
# - The transport falls back to polling when LISTEN fails
#
# This is why check_delayed_interval exists -- it is
# the polling fallback interval (in milliseconds).
# Even with LISTEN/NOTIFY enabled, Symfony polls
# at this interval as a safety net. The core issue is that LISTEN/NOTIFY requires the native pgsql PHP extension's connection handle. Doctrine DBAL has historically abstracted over both PDO and native drivers, and the boundary between them is where things go wrong. It is the sort of abstraction leak that looks manageable in documentation and becomes harrowing in production.
In DBAL 3, the PostgreSQL transport sometimes needed to reach "underneath" the PDO layer to access the native connection for LISTEN. This worked, but it was fragile. Connection recycling, error recovery, and reconnection logic all had edge cases where the native handle and the PDO handle got out of sync. The Doctrine DBAL issue tracker contains several reports from Messenger users encountering silent notification failures after connection recovery events.
DBAL 4 simplified this by dropping PDO support for PostgreSQL entirely, requiring the native pgsql driver. This fixed the abstraction mismatch but changed connection lifecycle behavior. Workers that relied on specific reconnection patterns broke in subtle ways — the kind of subtle where the worker appears healthy in logs and monitoring but is not actually receiving notifications.
<?php
// The reconnection problem in practice.
//
// Scenario: managed PostgreSQL instance kills idle connections
// after 5 minutes (AWS RDS default: idle_in_transaction_session_timeout).
// Your worker has been sitting idle for 6 minutes.
// What SHOULD happen:
// 1. pg_notify arrives on the channel
// 2. Worker wakes, finds connection dead
// 3. Worker reconnects
// 4. Worker re-issues LISTEN messenger_default
// 5. Worker fetches the message
// What ACTUALLY happens in some DBAL 3.x configurations:
// 1. pg_notify fires, but nobody is listening (connection dead)
// 2. Notification is discarded (PostgreSQL does not queue them)
// 3. Worker's internal socket read eventually times out or errors
// 4. DBAL catches the error, creates a new connection
// 5. New connection does NOT re-issue LISTEN
// 6. Worker is now connected but deaf to notifications
// 7. Worker falls back to polling at check_delayed_interval
// 8. If check_delayed_interval is 60s, messages wait up to 60s
// DBAL 4.x improved this, but the reconnect-and-resubscribe
// sequence still has edge cases. The --time-limit flag is
// the pragmatic fix: restart the worker periodically,
// and the fresh process gets a clean connection and LISTEN. The reconnection scenario is particularly insidious because it occurs exactly when you would least expect it: during quiet periods. Your worker has been running for hours, processing messages promptly. Traffic dies down. The worker goes idle. After 5-10 minutes of inactivity, the managed PostgreSQL instance (AWS RDS, Google Cloud SQL, Azure Database) kills the idle connection. The worker's next attempt to read from the socket encounters a dead connection. DBAL catches the error, creates a new connection — but the new connection does not automatically re-issue the LISTEN command. The worker is now connected but deaf.
DBAL 4 improved the reconnection handling, but edge cases remain. The --time-limit flag on the messenger:consume command is the pragmatic solution: restart the worker periodically, and the fresh process gets a clean connection with a proper LISTEN subscription. One hour (--time-limit=3600) is conventional. For critical queues where notification reliability matters, 15-30 minutes is safer.
The practical advice: if you use the PostgreSQL transport with LISTEN/NOTIFY, use DBAL 4 with the native pgsql driver. Set --time-limit on your workers. And always keep the polling fallback enabled — it is your safety net when LISTEN fails silently.
NOTIFY under load: what happens at 500 messages per second?
LISTEN/NOTIFY scales well on the PostgreSQL side. Notifications are delivered through shared memory, not written to WAL. The cost of a pg_notify() call is negligible compared to the INSERT that triggered it. If the INSERT costs 0.5ms, the NOTIFY adds perhaps 5 microseconds. At 500 messages per second, that is 2.5 milliseconds of total NOTIFY overhead per second. Negligible.
-- NOTIFY under heavy message throughput:
-- Scenario: 500 messages/second dispatched to the same queue.
-- Each INSERT fires the trigger. Each trigger calls pg_notify().
-- That is 500 NOTIFY calls per second.
-- PostgreSQL handles this efficiently -- notifications are
-- delivered through shared memory, not the WAL. They are
-- essentially free compared to the INSERT itself.
-- But there is a subtlety. If no worker is listening,
-- notifications are silently discarded. PostgreSQL does not
-- queue them. There is no replay. Miss it and it is gone.
-- This matters during:
-- 1. Worker restarts (brief window with no LISTEN)
-- 2. Connection drops (network blip, server restart)
-- 3. Worker deployment (rolling restart of all consumers)
-- In all three cases, messages are NOT lost -- they are still
-- in the messenger_messages table. But workers will not know
-- to check until the next polling interval fires.
-- With check_delayed_interval: 60000, that is up to 60 seconds
-- of latency after a missed notification.
-- Recommendation: set check_delayed_interval to 1000 (1 second)
-- if latency matters. The polling query is cheap because of
-- the partial index. One lightweight SELECT per second per
-- worker is negligible. The concern is not throughput. The concern is reliability.
PostgreSQL notifications are fire-and-forget. There is no delivery guarantee beyond "the connection was listening at the moment the notification was delivered." There is no acknowledgment protocol. There is no replay buffer. There is no dead letter queue for missed notifications. If a worker is not listening at the moment the notification is sent — because it is restarting, because its connection dropped, because the network hiccupped, because the DBAL reconnection logic did not re-issue LISTEN — that notification is gone.
I want to be emphatic about what "gone" means and what it does not mean. The messages themselves are safe. They are rows in a table, protected by PostgreSQL's full transactional guarantees. ACID semantics. WAL-logged. Backed up. Nothing is lost. But the notification that would have told a worker to check for those messages is lost. The worker will not know to look until its next polling cycle.
-- Monitoring the notification queue.
-- PostgreSQL allocates 8 GB for pending notifications.
-- Under normal Messenger workloads, you will never approach this.
-- But if a consumer is slow to drain notifications, they accumulate.
SELECT pg_notification_queue_usage() AS queue_usage_pct;
-- queue_usage_pct
-- ----------------
-- 0.0000 (healthy: essentially empty)
-- If this number climbs above 0.01 (1%), investigate.
-- Common causes:
-- 1. A connection is LISTENing but not reading notifications
-- (stuck worker, deadlocked handler)
-- 2. NOTIFY rate vastly exceeds consumer throughput
-- 3. A long-running transaction on a LISTENing connection
-- prevents notification delivery (they queue until
-- the transaction commits or rolls back) This is why the check_delayed_interval configuration exists. It is not just for delayed messages, despite the name. It is the polling fallback that catches anything LISTEN/NOTIFY missed. The default of 60,000 milliseconds (one minute) is conservative — perhaps too conservative for applications where message latency matters. For those applications, 1000ms (one second) is a reasonable setting. The polling query is a single indexed SELECT returning in 0.048ms — its cost is measured in microseconds, not milliseconds. One lightweight query per second per worker is the kind of overhead that does not register on any monitoring dashboard.
At very high throughput (thousands of messages per second), you will also want to monitor PostgreSQL's notification queue size. The pg_notification_queue_usage() function returns the fraction of the 8 GB notification queue currently in use. In normal Messenger workloads, this will read 0.0000. If it climbs above 0.01, you likely have a listening connection that is not draining its notifications — a stuck worker, a deadlocked handler, or a long-running transaction on a listening connection that prevents notification delivery.
FOR UPDATE SKIP LOCKED: how 40 workers avoid duplicate processing
Multiple workers consuming from the same queue need a way to avoid processing the same message twice. This is the fundamental concurrency problem in any work queue, and Symfony Messenger solves it with one of PostgreSQL's most elegant concurrency primitives.
-- What happens when 40 workers compete for messages?
-- FOR UPDATE SKIP LOCKED at scale.
-- Scenario: 10 messages in the queue, 40 workers wake up
-- simultaneously after a batch insert fires NOTIFY.
-- Worker 1: SELECT ... FOR UPDATE SKIP LOCKED -> gets msg #1
-- Worker 2: SELECT ... FOR UPDATE SKIP LOCKED -> gets msg #2
-- Worker 3: SELECT ... FOR UPDATE SKIP LOCKED -> gets msg #3
-- ...
-- Worker 10: SELECT ... FOR UPDATE SKIP LOCKED -> gets msg #10
-- Worker 11: SELECT ... FOR UPDATE SKIP LOCKED -> 0 rows
-- ...
-- Worker 40: SELECT ... FOR UPDATE SKIP LOCKED -> 0 rows
-- 40 queries executed. 10 returned a message. 30 returned nothing.
-- Each query: ~0.05ms with the partial index.
-- Total database time: ~2ms. Negligible.
-- But consider: each of those 40 queries came on a separate
-- connection. That is 40 round trips. On a local connection,
-- each round trip is ~0.1ms. Over a network (separate host),
-- each round trip is 0.5-2ms.
-- 40 workers x 2ms network round trip = 80ms of wall time
-- just in network overhead, to distribute 10 messages. FOR UPDATE SKIP LOCKED is PostgreSQL's built-in solution for the job queue pattern. When Worker A locks a row with FOR UPDATE, Worker B's query would normally block, waiting for Worker A's transaction to complete. With SKIP LOCKED, Worker B skips the locked row entirely and grabs the next available message. No blocking. No retries. No duplicate processing. No race conditions.
The behavior under contention is important to understand. When a batch of messages arrives and NOTIFY wakes 40 workers simultaneously, all 40 execute the same SELECT query. The first worker locks row #1. The second worker skips row #1 (locked), takes row #2. The third skips #1 and #2, takes #3. This cascades until either every message is claimed or workers run out of available messages and get zero rows.
The database cost of this contention is remarkably low. Each SKIP LOCKED check is a lock table lookup, not a disk operation. With the partial index, each query touches only the handful of index entries pointing to undelivered messages. Even when 40 queries execute nearly simultaneously, the total database time is approximately 2ms — 40 queries at 0.05ms each. The contention is resolved in the lock manager, not in I/O.
The real cost is network round trips. Each of those 40 queries arrives on a separate connection, requiring a separate round trip. On a local connection (Unix socket or localhost TCP), each round trip is 0.1ms — negligible. Over a network (database on a separate host, which is the norm in cloud deployments), each round trip is 0.5-2ms. Forty workers times 2ms is 80ms of wall time in network overhead to distribute 10 messages. Still fast. But it is the network, not the database, that bounds the distribution speed.
Advisory locks: the coordination layer you did not know was there
FOR UPDATE SKIP LOCKED handles message-level contention. But Symfony Messenger uses a second concurrency primitive for operations that should not run on every worker simultaneously.
-- Advisory locks: the other concurrency mechanism
-- FOR UPDATE SKIP LOCKED handles row-level contention.
-- But Symfony also uses advisory locks for certain operations:
-- 1. Delayed message redelivery (checking available_at)
SELECT pg_try_advisory_lock(
hashtext('symfony_messenger_default')
);
-- Only one worker per queue acquires this lock.
-- That worker is responsible for moving delayed messages
-- from "not yet available" to "available" status.
-- 2. Table setup (auto_setup=true)
SELECT pg_try_advisory_lock(
hashtext('symfony_messenger_setup')
);
-- Prevents multiple workers from racing to create
-- the messenger_messages table on first boot.
-- Advisory locks do not block -- pg_try_advisory_lock returns
-- false immediately if the lock is held. But each lock attempt
-- is a round trip to PostgreSQL. With 40 workers checking
-- every second, that is 40 advisory lock queries per second
-- on top of the 40 message-fetch queries.
-- These locks are session-level by default.
-- If the connection drops, the lock is released.
-- If the connection is pooled (PgBouncer transaction mode),
-- advisory locks break -- they are bound to the backend
-- connection, not the transaction. Advisory locks serve a different purpose from row locks. They coordinate singleton operations — things that only one worker should do at a time, regardless of how many workers are running.
The delayed message check is the primary use case. Symfony's PostgreSQL transport supports delayed messages: messages dispatched with a DelayStamp that sets available_at to some future time. These messages sit in the table, invisible to the normal fetch query (which filters on available_at <= NOW()), until their time arrives. One worker per queue acquires the advisory lock and checks for messages whose available_at has passed. This prevents 40 workers from all running the same delayed-message scan simultaneously — a scan that, unlike the normal fetch query, may need to examine multiple rows.
The scaling concern with advisory locks is minor but worth knowing. Each pg_try_advisory_lock() call is a round trip to PostgreSQL. The lock check itself is nearly instantaneous — it is a hashtable lookup in shared memory. But with 40 workers, each checking once per polling interval, that is 40 additional round trips per second on top of the 40 message-fetch queries. Not expensive individually. Not negligible at scale.
A more serious concern: advisory locks are session-level. They are bound to the PostgreSQL backend connection, not to the transaction. If you run workers behind a connection pooler in transaction mode (PgBouncer, for example), the advisory lock is acquired on one backend connection, but the next query might execute on a different backend connection. The lock appears to be held by "your session," but your session is now on a different backend. The lock is orphaned on the original backend — held by no one who wants it, blocking everyone who needs it. This is a well-documented incompatibility between session-level advisory locks and transaction-mode pooling. It applies to Messenger workers just as it applies to any other advisory lock user.
Advisory lock patterns, row-level lock contention, and the interaction between locks and connection pooling are all covered in the PostgreSQL lock contention guide. It may prove useful if your workers are competing for the same rows.
"LISTEN/NOTIFY provides real-time event notification between database sessions. A trigger fires on INSERT, sends a notification, and a listening process receives it within milliseconds. No Redis pub/sub required. No RabbitMQ required."
— from You Don't Need Redis, Chapter 19: The Case for Simplicity
Transport comparison: PostgreSQL vs the alternatives
I should be forthcoming about the alternatives, because pretending they do not exist would be a disservice to you and an embarrassment to me. The PostgreSQL transport is not the right choice for every application. Here is an honest comparison.
| Transport | Latency | Throughput | Connections | Setup | Notes |
|---|---|---|---|---|---|
| PostgreSQL (LISTEN/NOTIFY) | < 50 ms | ~2,000 msg/s | 1 per worker (persistent) | Built-in | Best for existing Postgres apps |
| PostgreSQL (polling only) | 0.5-60 s | ~2,000 msg/s | Poolable | Built-in | Simpler, higher latency |
| Redis | < 10 ms | ~50,000 msg/s | 1 per worker | Requires Redis | Fastest, no persistence guarantee |
| RabbitMQ (AMQP) | < 10 ms | ~20,000 msg/s | 1 per worker | Requires RabbitMQ | Best for complex routing |
| Amazon SQS | 100-500 ms | Virtually unlimited | HTTP (stateless) | AWS account | Best for serverless/cloud |
The PostgreSQL LISTEN/NOTIFY transport wins on operational simplicity. No additional infrastructure to deploy, monitor, back up, or secure. For applications processing fewer than 1,000 messages per second — which describes most Symfony applications — the throughput is more than sufficient. The latency is excellent with LISTEN/NOTIFY enabled.
The PostgreSQL polling-only transport (Doctrine transport without use_notify) trades latency for connection flexibility. Without LISTEN, worker connections can be pooled normally through PgBouncer in transaction mode. If you can tolerate 1-5 seconds of message delivery latency, this is the operationally simpler option. You lose the doorbell but keep the postman.
Redis wins on raw throughput and latency, and I shall not pretend otherwise. If you already run Redis for caching — and most Symfony applications do — the marginal operational cost is low. Redis Streams (the recommended transport for Symfony 6.3+) provide consumer groups with at-least-once delivery guarantees, addressing the persistence concern that made earlier Redis transports (Pub/Sub-based) unsuitable for critical messages. The trade-off is that your message data now lives in a separate system from your application data, with a separate failure mode, a separate backup strategy, and a separate monitoring dashboard. For teams that already operate Redis confidently, this is manageable. For teams that are adding Redis solely for message queuing, it is additional complexity.
RabbitMQ wins on routing sophistication. If you need topic exchanges, dead letter queues with custom routing policies, priority queues with multiple priority levels, or the ability to inspect and replay messages through a management UI, RabbitMQ's AMQP protocol supports these natively. Symfony's PostgreSQL transport can approximate some of these patterns — multiple queues, basic retry with delay — but not all. If your messaging topology is complex, RabbitMQ earns its operational cost.
Amazon SQS wins on scale-to-zero and managed operations. No server to manage. No connections to count. Virtually unlimited throughput. The latency is higher (100-500ms) because SQS is an HTTP API, not a persistent connection. For serverless deployments, Lambda-triggered processing, or applications where operational simplicity outweighs latency requirements, SQS is difficult to argue against. You are paying Amazon to worry about message durability so you do not have to.
My honest assessment: the PostgreSQL transport is the right choice for applications that already run PostgreSQL, process fewer than 2,000 messages per second, value operational simplicity over raw throughput, and have a team that would rather master one system deeply than operate three systems adequately.
Designing for multiple queues: when to use LISTEN and when to poll
Most production Symfony applications do not run a single queue. They run two, three, or five — separating work by priority, latency requirement, or resource consumption. The PostgreSQL transport handles multiple queues through the queue_name column, and you can configure each queue independently.
# config/packages/messenger.yaml -- multiple queues
framework:
messenger:
transports:
async_default:
dsn: '%env(MESSENGER_TRANSPORT_DSN)%'
options:
use_notify: true
check_delayed_interval: 1000
queue_name: default
async_priority:
dsn: '%env(MESSENGER_TRANSPORT_DSN)%'
options:
use_notify: true
check_delayed_interval: 1000
queue_name: priority
async_bulk:
dsn: '%env(MESSENGER_TRANSPORT_DSN)%'
options:
use_notify: false
check_delayed_interval: 5000
queue_name: bulk
routing:
'App\Message\SendInvoice': async_priority
'App\Message\SendWelcomeEmail': async_default
'App\Message\ResizeImage': async_bulk
'App\Message\SyncInventory': async_bulk
'App\Message\GenerateReport': async_bulk The insight here is that not every queue needs LISTEN/NOTIFY. A priority queue that handles payment webhooks needs sub-second delivery. A bulk queue that resizes uploaded images can tolerate 5-second polling intervals. By enabling use_notify only on latency-sensitive queues and using polling for bulk queues, you reduce the number of persistent LISTEN connections without sacrificing performance where it matters.
# Supervisor configuration for three queues
# /etc/supervisor/conf.d/messenger-workers.conf
[program:messenger-priority]
command=php /var/www/app/bin/console messenger:consume async_priority --time-limit=3600 --memory-limit=128M
numprocs=2
process_name=%(program_name)s_%(process_num_02)d
autostart=true
autorestart=true
[program:messenger-default]
command=php /var/www/app/bin/console messenger:consume async_default --time-limit=3600 --memory-limit=128M
numprocs=4
process_name=%(program_name)s_%(process_num_02)d
autostart=true
autorestart=true
[program:messenger-bulk]
command=php /var/www/app/bin/console messenger:consume async_bulk --time-limit=3600 --memory-limit=256M
numprocs=8
process_name=%(program_name)s_%(process_num_02)d
autostart=true
autorestart=true
# Connection accounting:
# priority: 2 workers x 1 LISTEN conn = 2 persistent
# default: 4 workers x 1 LISTEN conn = 4 persistent
# bulk: 8 workers x 0 LISTEN conn = 0 persistent (polling only)
# Total LISTEN connections: 6
# Total workers: 14
#
# The bulk queue uses polling only (use_notify: false)
# because bulk jobs tolerate 5-second latency
# and those 8 workers do not need persistent connections.
# That is 8 connections saved. In this configuration, the priority queue has 2 workers with LISTEN/NOTIFY — 2 persistent connections, instant delivery. The default queue has 4 workers with LISTEN/NOTIFY — 4 persistent connections, instant delivery. The bulk queue has 8 workers with polling only — 0 persistent LISTEN connections, 5-second maximum latency. The total persistent connection count drops from 14 (if all queues used LISTEN) to 6.
This is the kind of optimization that costs nothing to implement and saves 8 persistent connections. Eight connections may not sound like much, but on a managed PostgreSQL instance where max_connections is capped at 100 or 200, every connection matters. And the bulk queue workers — which are doing CPU-intensive work like image resizing or PDF generation — do not benefit from instant notification delivery anyway. Their bottleneck is handler execution time, not message fetch latency.
Worker deployment: supervisor, restarts, and the LISTEN gap
Messenger workers are long-running PHP processes. In production, they run under a process supervisor — typically Supervisor, systemd, or container orchestration. This is not optional. A PHP process that exits silently at 3 AM and is not restarted is a queue that stops processing at 3 AM.
# Typical Symfony Messenger supervisor configuration
# /etc/supervisor/conf.d/messenger-worker.conf
[program:messenger-default]
command=php /var/www/app/bin/console messenger:consume async --time-limit=3600 --memory-limit=128M
autostart=true
autorestart=true
numprocs=4
process_name=%(program_name)s_%(process_num_02)d
startsecs=0
redirect_stderr=true
stdout_logfile=/var/log/supervisor/messenger-default.log
# 4 workers for the "async" transport.
# Each holds one persistent database connection.
# --time-limit=3600: worker restarts every hour
# (releases memory, refreshes connection state)
# --memory-limit=128M: safety net for leaky handlers
#
# During the restart window (~1-2 seconds per worker),
# that worker misses LISTEN notifications.
# The other 3 workers continue receiving normally.
# Staggered restarts (supervisor's default) help here. The --time-limit=3600 flag tells the worker to gracefully exit after one hour. Supervisor immediately restarts it. This pattern serves three purposes, and all three are important enough to enumerate.
- Memory management. PHP was not designed for processes that run for days. Long-running workers accumulate memory from handlers that do not fully clean up — Doctrine's identity map, logged exceptions, cached service instances. The periodic restart is a pragmatic solution that acknowledges PHP's memory model without fighting it.
--memory-limit=128Mprovides a safety net for particularly leaky handlers. - Connection health. Database connections go stale. Firewalls kill idle connections. Managed PostgreSQL instances enforce idle connection timeouts. The worker's LISTEN subscription may silently stop receiving after a connection recovery event (see the DBAL section above). A restart refreshes everything: new process, new connection, new LISTEN subscription, clean state.
- Code deployment. When you deploy new code, workers need to restart to pick up the changes. The
--time-limitensures that even without a manual restart signal (likemessenger:stop-workers), workers cycle within the configured interval. During deployment, you can signal an immediate graceful stop. Between deployments, the time limit handles the rest.
During the restart window — typically 1-2 seconds per worker — that worker is not listening. If a message arrives during that window, the NOTIFY is delivered to the remaining workers on the same queue. This is why running multiple workers per queue matters: if one worker restarts, the others continue receiving notifications. If all workers restart simultaneously — a deployment, a Supervisor restart, a container orchestration rollout — there is a brief period with no listeners. Messages are not lost, but they wait until a worker comes back and either receives a notification or polls.
Staggered restarts help. Supervisor restarts processes sequentially by default. Container orchestration systems support rolling deployments with configurable maxUnavailable. The goal is to always have at least one worker listening per queue during any operational event. This is achievable with standard deployment tooling. It simply requires awareness.
Message handler patterns that help (and patterns that hurt)
The transport configuration determines how messages are delivered. But the message handler determines how long the worker's connections are held. A poorly written handler can negate every transport optimization.
<?php
// Message handler best practices for the PostgreSQL transport.
// GOOD: Fast handler, commits quickly, releases resources.
#[AsMessageHandler]
class SendInvoiceHandler
{
public function __construct(
private readonly InvoiceMailer $mailer,
private readonly EntityManagerInterface $em,
) {}
public function __invoke(SendInvoice $message): void
{
$invoice = $this->em->find(
Invoice::class, $message->invoiceId
);
if (!$invoice) {
return; // Message references deleted entity.
}
$this->mailer->send($invoice);
$invoice->setEmailSentAt(new \DateTimeImmutable());
$this->em->flush();
// Handler completes. Transaction commits.
// Total time: ~50-200ms (dominated by the email send).
}
}
// BAD: Long-running handler that holds a connection for minutes.
#[AsMessageHandler]
class GenerateMonthlyReportHandler
{
public function __invoke(
GenerateMonthlyReport $message
): void
{
// Queries 500,000 rows, aggregates in PHP,
// generates a PDF, uploads to S3, updates the DB.
// Total time: 45-120 seconds.
//
// During those 45-120 seconds, the worker's query
// connection is HELD for the entire duration if
// EntityManager wraps them in an implicit transaction.
//
// Solution: break the work into stages.
// Dispatch sub-messages. Or set this handler on a
// polling-only transport so its long execution does
// not hold a LISTEN connection.
}
} The ideal handler is fast: fetch data, do work, update state, return. 50-500ms. The worker's LISTEN connection remains idle throughout — it is only used for notifications, not for handler work. The handler's database queries go through a separate connection (Doctrine's EntityManager connection), which can be pooled normally.
The problematic handler is slow: long-running queries, external API calls with timeouts, file processing that takes minutes. While the handler runs, the worker cannot process other messages from the queue. If the handler's database queries run inside an implicit Doctrine transaction, the query connection is held for the full duration. At 40 workers with 30-second handlers, you have 40 connections occupied for most of every 30-second cycle.
The solution for slow handlers is architectural, not configurational. Break large jobs into stages using message chaining: the first handler does the quick work and dispatches a follow-up message for the slow work. Or assign slow handlers to a dedicated transport with polling-only connections, so they do not consume LISTEN connections that could serve faster handlers.
One pattern I see frequently in Symfony applications deserves specific mention: handlers that call external HTTP APIs synchronously. A handler that sends an email through an API, waits for the response, then updates the database. If the email API is slow (500ms-2s), the worker is idle — waiting on network I/O, holding a database connection it is not using. For this pattern, consider using Symfony's HTTP client with retries and timeouts, and separate the "call API" step from the "update database" step with a follow-up message.
Table maintenance: the messenger_messages table grows without bound
This is the section that people discover six months after deploying Messenger to production. The table is growing. Autovacuum is running more frequently. Disk usage is climbing. Nobody set up pruning because the table was small when they deployed and the documentation mentions it only briefly.
-- The messenger_messages table grows without bound.
-- Processed messages are marked (delivered_at IS NOT NULL)
-- but not deleted. Over time, the table bloats.
-- Check current table size and row counts:
SELECT
pg_size_pretty(pg_total_relation_size('messenger_messages'))
AS total_size,
pg_size_pretty(pg_relation_size('messenger_messages'))
AS table_size,
pg_size_pretty(
pg_total_relation_size('messenger_messages')
- pg_relation_size('messenger_messages')
) AS index_size,
count(*) AS total_rows,
count(*) FILTER (WHERE delivered_at IS NULL) AS pending,
count(*) FILTER (WHERE delivered_at IS NOT NULL) AS processed
FROM messenger_messages;
-- total_size | table_size | index_size | total_rows | pending | processed
-- -----------+------------+------------+------------+---------+----------
-- 847 MB | 720 MB | 127 MB | 1247831 | 23 | 1247808
-- 1.2 million processed rows consuming 720 MB of table space.
-- The partial index is still fast (it only covers the 23 pending rows).
-- But autovacuum is working overtime on 720 MB of dead tuples.
-- Symfony provides:
-- php bin/console messenger:prune-messages --older-than=7days
-- Or, if you prefer SQL:
DELETE FROM messenger_messages
WHERE delivered_at IS NOT NULL
AND delivered_at < NOW() - INTERVAL '7 days';
-- After pruning, consider a manual VACUUM:
VACUUM (VERBOSE) messenger_messages; The messenger_messages table is an append-only log by nature. Every message is inserted. When a worker processes it, delivered_at is set, but the row is not deleted. Over weeks and months, the table accumulates hundreds of thousands or millions of processed rows that serve no purpose.
The partial index (WHERE delivered_at IS NULL) keeps the polling query fast regardless of total table size. A table with 23 pending messages and 1,247,808 processed messages returns results in the same 0.048ms. The partial index covers only the 23 pending rows. The query planner does not care about the other 1,247,808.
But the table itself cares. Autovacuum must scan the full table to mark dead tuples and update the visibility map. As the table grows, autovacuum runs more frequently and takes longer. pg_stat_user_tables shows increasing n_dead_tup counts. The table's on-disk size grows beyond what is necessary, consuming storage and slowing full-table operations like pg_dump.
Symfony provides the messenger:prune-messages console command. Run it on a schedule — daily via cron or Symfony Scheduler — with --older-than=7days or whatever retention period your compliance requirements dictate. After pruning, a standard VACUUM reclaims space for reuse within PostgreSQL (but does not return it to the operating system). For tables that have grown very large, VACUUM FULL rewrites the table and reclaims disk space, but requires an ACCESS EXCLUSIVE lock — schedule it during a maintenance window.
For a deeper discussion of table bloat, autovacuum tuning, and the operational patterns around growing tables, the guides on PostgreSQL table bloat and autovacuum tuning cover these concerns in depth.
Monitoring worker connections and queue health
A message queue that you cannot observe is a message queue that will surprise you. PostgreSQL provides the observability tools. The question is whether you are using them.
-- Monitoring worker connections in pg_stat_activity:
SELECT pid, state, query, wait_event_type, wait_event,
backend_start, state_change,
age(now(), state_change) AS idle_duration
FROM pg_stat_activity
WHERE application_name LIKE '%messenger%'
OR query LIKE '%LISTEN%'
ORDER BY state_change;
-- Healthy output:
-- pid | state | query | wait_event_type | wait_event | idle_duration
-- -----+-------+---------------------+-----------------+----------------+--------------
-- 1234 | idle | LISTEN messenger_* | Client | ClientRead | 00:00:12
-- 1235 | idle | LISTEN messenger_* | Client | ClientRead | 00:00:08
-- 1236 | active| SELECT m.* FROM ... | (null) | (null) | 00:00:00
-- 1237 | idle | LISTEN messenger_* | Client | ClientRead | 00:00:45
-- What to watch for:
-- 1. idle_duration > check_delayed_interval * 3 -> connection may be dead
-- 2. state = 'idle in transaction' -> handler not committing (leak)
-- 3. wait_event = 'Lock' -> advisory lock or row lock contention The pg_stat_activity view is your primary diagnostic tool for worker connections. Healthy workers show state = idle, wait_event = ClientRead, with the last query being a LISTEN command. They are waiting for notifications. This is correct behavior — they are supposed to be idle.
The danger signals are specific. state = 'idle in transaction' means a handler started a transaction and did not commit or roll back. The connection is held, the transaction is open, and row locks acquired by that transaction are blocking other queries. This is typically a handler that threw an exception before committing, and the error handling code path did not include a rollback. It should be treated as a critical alert.
wait_event = 'Lock' means a worker is blocked waiting for a lock — either a row lock from FOR UPDATE (which should not happen with SKIP LOCKED) or an advisory lock. If multiple workers show lock waits, investigate whether advisory lock contention is throttling delayed-message processing.
-- A comprehensive health check query for Messenger workers.
WITH worker_stats AS (
SELECT
count(*) AS total_connections,
count(*) FILTER (WHERE state = 'idle'
AND query LIKE '%LISTEN%') AS listening,
count(*) FILTER (WHERE state = 'active') AS active,
count(*) FILTER (
WHERE state = 'idle in transaction') AS leaked,
count(*) FILTER (WHERE state = 'idle'
AND age(now(), state_change)
> interval '5 minutes') AS stale
FROM pg_stat_activity
WHERE application_name LIKE '%messenger%'
),
queue_stats AS (
SELECT
queue_name,
count(*) FILTER (
WHERE delivered_at IS NULL) AS pending,
count(*) FILTER (WHERE delivered_at IS NULL
AND available_at < NOW()
- interval '30 seconds') AS stuck,
max(age(now(), created_at)) FILTER (
WHERE delivered_at IS NULL) AS oldest_pending
FROM messenger_messages
GROUP BY queue_name
)
SELECT w.total_connections, w.listening, w.active,
w.leaked, w.stale,
q.queue_name, q.pending, q.stuck, q.oldest_pending
FROM worker_stats w
CROSS JOIN queue_stats q;
-- Alerts to configure:
-- leaked > 0 -> CRITICAL: handler not committing
-- stale > 0 -> WARNING: connection may need restart
-- stuck > 0 -> WARNING: messages not being processed
-- oldest_pending > 5m -> CRITICAL: queue is backing up The comprehensive health query above crosses worker connection state with queue state. It answers two questions simultaneously: are the workers healthy? And is the queue being drained? A healthy worker count with a growing queue indicates that handlers are too slow. A shrinking worker count with an empty queue indicates that workers are crashing — check supervisor logs.
The stuck count in the queue stats deserves particular attention. A message with available_at more than 30 seconds in the past that has not been delivered suggests either a worker shortage, handler bottleneck, or a LISTEN delivery failure. If stuck > 0 persists across multiple checks, investigate whether workers are receiving notifications.
Managing worker connections with a proxy
The fundamental tension with the PostgreSQL transport is this: LISTEN connections must be persistent, but persistent connections do not scale. You need both — persistent connections for notification delivery, pooled connections for transactional queries — and PostgreSQL's connection model does not distinguish between them.
A connection proxy can resolve this tension by managing two classes of upstream connections: pinned connections for LISTEN channels and shared connections for everything else.
# Gold Lapel sits between your PHP workers and PostgreSQL.
# Workers connect to Gold Lapel; Gold Lapel connects to PostgreSQL.
# .env -- composer require goldlapel/goldlapel, then configure the connection
MESSENGER_TRANSPORT_DSN=postgresql://app:secret@127.0.0.1:6432/myapp
# Gold Lapel configuration (goldlapel.toml)
[listen]
address = "127.0.0.1:6432"
[upstream]
host = "your-postgres-host.example.com"
port = 5432
database = "myapp"
# LISTEN/NOTIFY connections are automatically detected.
# Gold Lapel pins these connections (they cannot be pooled)
# but manages the upstream connection lifecycle:
# - Automatic reconnection on upstream failure
# - LISTEN resubscription after reconnect
# - Health monitoring of idle LISTEN connections
#
# Non-LISTEN connections (the message fetch queries,
# advisory lock checks, web traffic) are fully pooled.
# 40 workers sharing 10 upstream connections for their
# transactional queries, while each maintaining a
# dedicated LISTEN channel. When a worker sends LISTEN messenger_default, the proxy detects the LISTEN command and pins that downstream connection to a dedicated upstream connection. The LISTEN subscription is preserved for the lifetime of the worker process. Notifications flow through the proxy transparently — PostgreSQL sends the notification to the upstream connection, the proxy relays it to the downstream worker. The worker does not know or care that a proxy exists between it and PostgreSQL.
When the same worker (or another worker, or a web request) sends a transactional query — the SELECT ... FOR UPDATE SKIP LOCKED to fetch a message, an advisory lock check, a Doctrine query from a message handler — that query goes through the shared pool. Forty workers might share 10 upstream connections for their transactional work, because those queries are fast (sub-millisecond) and the connections are immediately returned to the pool.
# Connection accounting: without proxy vs with proxy.
#
# Scenario: 40 Messenger workers + 30 PHP-FPM web processes
#
# --- Without proxy ---
# LISTEN connections: 40 (one per worker, persistent)
# Worker query conns: 40 (one per worker for SELECT/advisory)
# Web traffic conns: 30 (one per PHP-FPM process)
# Admin/monitoring: 5
# Total upstream: 115 connections to PostgreSQL
# max_connections needed: 120+ (with headroom)
#
# --- With Gold Lapel ---
# LISTEN connections: 40 (still persistent, must be pinned)
# Pooled query conns: 10 (shared by all 40 workers + 30 web)
# Admin/monitoring: 5
# Total upstream: 55 connections to PostgreSQL
# max_connections needed: 60 (with headroom)
#
# Savings: 60 connections. That is 600 MB of RAM
# returned to PostgreSQL for useful work. The arithmetic changes substantially. Without a proxy: 40 workers plus 30 web processes need 115 connections. With a proxy: the same workload needs 55 connections. Sixty connections saved. On a managed PostgreSQL instance where max_connections defaults to 100, this is the difference between "works" and "does not work."
More importantly, the proxy adds connection lifecycle management that PHP workers lack. If the upstream PostgreSQL connection drops — network blip, server restart, failover — the proxy reconnects and resubscribes to the LISTEN channel. The worker does not need to know. It keeps its downstream connection open, receives notifications as before, and never executes the reconnection logic that DBAL sometimes handles imperfectly. This is particularly valuable on managed PostgreSQL instances that enforce idle connection timeouts: the proxy keeps the upstream LISTEN connection alive with periodic keepalive probes, preventing the timeout from firing.
Gold Lapel handles this natively. LISTEN connections are detected automatically — no configuration needed to tell the proxy which connections are LISTEN connections and which are transactional. The proxy observes the query stream, identifies the LISTEN command, and pins the connection. Transactional queries are pooled. The upstream connection count stays stable regardless of how many workers you run. And because Gold Lapel observes the full query stream, it can identify patterns specific to Messenger — the polling query frequency, advisory lock contention, message processing latency, queue depth trends — and surface them alongside the rest of your PostgreSQL performance data.
For PHP applications, where connection management has always been complicated by the tension between PHP-FPM's request-lifecycle model and long-running workers' different needs, having the proxy handle the distinction between LISTEN and transactional connections removes a category of operational complexity that PHP cannot solve internally.
Practical recommendations
If you are running or considering the PostgreSQL transport for Symfony Messenger, here is the configuration that works. I shall be direct.
# config/packages/messenger.yaml
framework:
messenger:
transports:
async:
dsn: '%env(MESSENGER_TRANSPORT_DSN)%'
options:
use_notify: true
check_delayed_interval: 60000
queue_name: default
retry_strategy:
max_retries: 3
delay: 1000
multiplier: 2
max_delay: 60000
routing:
'App\Message\SendInvoice': async
'App\Message\ResizeImage': async
'App\Message\SyncInventory': async # .env
# The DSN tells Symfony to use the PostgreSQL transport
MESSENGER_TRANSPORT_DSN=doctrine://default?auto_setup=true
# Or, if you want the native PostgreSQL connection directly:
MESSENGER_TRANSPORT_DSN=postgresql://app:secret@127.0.0.1:5432/myapp Use LISTEN/NOTIFY (use_notify: true) on queues where message latency matters. Keep check_delayed_interval at 1000-5000ms as a polling fallback. Do not set it to 60000ms unless you genuinely do not care about delivery latency.
Use polling only (use_notify: false) on bulk queues where 1-5 seconds of delivery latency is acceptable. This eliminates persistent LISTEN connections for those workers, freeing connection slots for queues that need them.
Set worker time limits. --time-limit=3600 and --memory-limit=128M prevent the two failure modes that catch people: memory leaks and stale connections. For critical queues, consider --time-limit=1800 to refresh connection state more frequently.
Run multiple workers per queue. At minimum two per LISTEN-enabled queue. This ensures that when one worker restarts, the other continues receiving notifications. For high-throughput queues, 4-8 workers provide both redundancy and parallelism.
Monitor pg_stat_activity. Watch for workers stuck in idle in transaction state — that indicates a handler that is not committing. Watch for connection counts approaching max_connections. Watch for stale connections (idle for longer than 3x the polling interval).
Clean up processed messages. The messenger_messages table grows without bound. Run messenger:prune-messages --older-than=7days on a daily schedule. Without it, the partial index stays efficient for reads, but table bloat accumulates, autovacuum cost increases, and disk usage climbs.
Verify the partial index exists. Run \di+ idx_messenger_available in psql. If it does not exist, create it. The polling query is 880x slower without it.
Use a connection proxy if you run more than 10 workers. The connection overhead is manageable at small scale. At 20+ workers, especially alongside a busy web tier, a proxy like Gold Lapel turns a max_connections planning exercise into a non-issue. LISTEN semantics are preserved. Connection pooling handles the rest.
Your messages are safe in PostgreSQL. Your LISTEN channels are delivering notifications in under 50 milliseconds. The partial index is doing its job in 0.048ms. The only question is how many telephone lines you are willing to run to the database — and whether you might prefer a switchboard.
Frequently asked questions
Terms referenced in this article
The connection management question at the heart of this guide has a broader treatment. I have written about PostgreSQL max_connections tuning — how to set it, why the default of 100 is almost always wrong, and what happens when 40 Messenger workers meet a connection limit that was never designed for persistent listeners.