How Laravel Horizon Workers Exhaust Your PostgreSQL Connections — and What to Do About It
150 workers, 197 connections, one very unhappy database. The arithmetic was always going to catch up.
Good evening. Your workers are holding connections they are not using.
If you run Laravel Horizon with PostgreSQL as your application database, and you have recently been greeted by the message FATAL: too many connections for role "app_user" during what should have been a routine Tuesday afternoon, you are experiencing one of the more predictable failure modes in the Laravel ecosystem.
The issue is well-documented. GitHub issue #733 on the Horizon repository reports the canonical case: 150 workers, approximately 2,000 jobs per hour, connections climbing past 197 before the database refuses new ones entirely. The application halts. Jobs pile up in Redis. The Horizon dashboard turns an alarming shade of red.
The cause is structural, not incidental. Each Horizon worker is a long-running PHP process. Each PHP process, the moment it touches the database, opens a PDO connection to PostgreSQL. That connection stays open for the lifetime of the process. Not the lifetime of the job. The lifetime of the process.
A worker that finishes a job and returns to polling Redis for the next one? Still holding its connection. A worker that has been idle for 20 minutes because its queue is empty? Still holding its connection. 150 workers, 3 of which are actively running queries at any given moment? 150 connections.
I find this behaviour — creating connections without a mechanism for releasing them — to be the infrastructural equivalent of hiring 150 staff members, seating each one at their own desk with a dedicated telephone line, and then discovering that 147 of them are asleep at any given moment while the switchboard runs out of lines for guests who are actually trying to call.
This warrants a proper investigation. We shall examine why the problem exists, what it looks like from PostgreSQL's perspective, how to diagnose it before it becomes an outage, every available fix and its honest trade-offs, and the architectural approach that eliminates the problem entirely. There is rather a lot to cover. I shall be thorough.
The connection topology that nobody draws on the whiteboard
To understand why this happens, you need to see the full picture. Laravel Horizon sits between Redis (the job queue) and PostgreSQL (the application database). These are entirely separate systems. Horizon uses Redis for job dispatch, scheduling, and worker coordination. It uses PostgreSQL for the actual work that jobs perform — reading records, writing results, updating statuses.
The connection topology of a Laravel Horizon deployment:
Redis (job queue)
|
| Jobs dispatched via RPUSH, consumed via BLPOP
|
Horizon Supervisor
|
+-- Worker 1 ──── PDO Connection 1 ──── PostgreSQL backend 1
+-- Worker 2 ──── PDO Connection 2 ──── PostgreSQL backend 2
+-- Worker 3 ──── PDO Connection 3 ──── PostgreSQL backend 3
| ...
+-- Worker 150 ─── PDO Connection 150 ─── PostgreSQL backend 150
|
| Meanwhile, also connecting to the same PostgreSQL:
|
PHP-FPM (web requests)
+-- FPM Worker 1 ─── PDO Connection 151
+-- FPM Worker 2 ─── PDO Connection 152
| ...
+-- FPM Worker 50 ─── PDO Connection 200
PostgreSQL max_connections = 100 (default)
Connections requested = 200
Result: FATAL: too many connections for role "app_user" The critical observation: Horizon workers are long-lived processes, unlike PHP-FPM workers which handle a request and may terminate or be recycled. A Horizon worker starts when the supervisor launches it and runs until you stop Horizon, deploy new code, or the server restarts. That could be hours. Days. Weeks, if your deployment cadence is relaxed.
During that entire lifetime, the PDO connection stays open. PHP's database abstraction layer opens it on first use and keeps it until the process exits. There is no idle timeout. There is no automatic recycling. There is no "return to pool" — because there is no pool. Each PHP process has exactly one connection to each database it touches, and it holds that connection until death.
This is not a PHP quirk. It is a design decision that made perfect sense for the language's original use case. PHP was designed for short-lived request processing: a request arrives, PHP handles it, the process terminates (or at least releases resources). FPM workers are recycled by pm.max_requests. The connection lives for a few hundred milliseconds, perhaps a few seconds for slow requests. Nobody cares that each process holds one connection, because each process is ephemeral.
Horizon workers are not ephemeral. They are the opposite of ephemeral. They are designed to run continuously, waiting for work. And they carry PHP's connection model — one connection per process, held until process death — into a context where that model is ruinous.
Now add PHP-FPM for your web requests. A typical FPM pool might run 30-50 worker processes, each holding its own PDO connection. Add Horizon's 150 workers. You need 200 simultaneous PostgreSQL connections just to start. PostgreSQL's default max_connections is 100.
If you are also running Laravel Octane (Swoole or RoadRunner), the picture is slightly more nuanced.
<?php
// How Octane differs from Horizon in connection behaviour.
// Understanding both helps you plan connection capacity.
// Horizon workers:
// - Each worker is a separate PHP process (forked by Supervisor)
// - Each process has its own PDO instance
// - Connection opened on first DB call, held for process lifetime
// - 150 workers = 150 connections (1:1 mapping, always)
// Octane workers (Swoole/RoadRunner):
// - N worker processes, each handling many requests concurrently
// - Each concurrent request gets its own coroutine
// - But PDO is NOT coroutine-safe in Swoole
// - Octane uses a connection pool per worker: DatabaseManager::purge()
// - Default: each Octane worker opens 1 connection per database
// - With --workers=8: 8 connections (much better than FPM's 30-50)
//
// If you run both Horizon AND Octane (common in modern Laravel):
// Horizon: 150 connections
// Octane: 8 connections
// Scheduler: 1 connection
// Tinker/migrations: 2-3 connections
// Total: ~162 connections
//
// Octane is not the problem. Horizon is. Octane is more connection-efficient than FPM because each Octane worker process serves many requests concurrently but shares a single database connection. Eight Octane workers need eight connections, not fifty. But Horizon workers are the same regardless of whether your web layer uses FPM or Octane. The queue workers are separate processes with separate PDO instances. Horizon's connection cost does not decrease because you moved your web layer to Octane.
What does a perfectly normal job look like?
This is the part that I find particularly instructive, because it demonstrates that the problem is not in your code. Your jobs may be impeccable. The issue is in the architecture surrounding them.
<?php
// app/Jobs/ProcessImport.php
// A perfectly normal job. Nothing unusual about the code.
// The problem is not in the job — it is in the architecture.
namespace App\Jobs;
use App\Models\Import;
use App\Services\ImportProcessor;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
class ProcessImport implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
public function __construct(
public int $importId
) {}
public function handle(ImportProcessor $processor): void
{
$import = Import::findOrFail($this->importId);
// This line opens a PDO connection to PostgreSQL.
// That connection will remain open for the lifetime of
// this worker process — not just this job.
$processor->process($import);
$import->update(['status' => 'completed']);
}
// When this job finishes, the worker loops back to Redis
// and waits for the next job. The PDO connection stays open.
// Idle. Consuming a PostgreSQL backend slot.
// Doing absolutely nothing.
} There is nothing wrong with this job. It fetches a record, processes it, updates a status. It is the kind of job that every Laravel application has dozens of. The handle() method runs, and it completes, and the worker loops back to Redis to wait for the next job.
But when Import::findOrFail($this->importId) executes, Laravel's Eloquent ORM resolves the database connection. It calls PDO::__construct() with the DSN from your config/database.php. A TCP connection is established to PostgreSQL. The server forks a new backend process to handle this connection. Authentication completes. The connection is ready.
The query runs. The update runs. The job finishes. The worker returns to its Redis polling loop.
The PDO connection? Still open. The PostgreSQL backend process? Still allocated. The RAM consumed by that backend? Still consumed. The max_connections slot? Still occupied.
The worker might sit idle for 30 seconds. Or 5 minutes. Or an hour, if it is assigned to a queue that receives infrequent jobs. During that entire idle period, it is holding a PostgreSQL backend hostage for no purpose. PostgreSQL does not know the difference between an idle connection and an active one in terms of resource allocation. The backend process exists. It consumes memory. It counts toward max_connections. It is doing nothing.
What the database sees
The view from PostgreSQL is illuminating and somewhat depressing. When you query pg_stat_activity during one of these incidents, the picture is unambiguous.
-- What the connection leak looks like from PostgreSQL's perspective.
-- Run this during a Horizon burst and watch the numbers climb.
SELECT application_name,
state,
count(*) AS connections
FROM pg_stat_activity
WHERE datname = current_database()
GROUP BY application_name, state
ORDER BY connections DESC;
-- Typical output during the incident:
--
-- application_name | state | connections
-- ----------------------+--------+------------
-- | idle | 147
-- Laravel | idle | 38
-- Laravel | active | 9
-- | active | 3
--
-- 147 idle connections. Each one is a Horizon worker sitting in its
-- BLPOP loop, waiting for the next job from Redis. The PDO connection
-- is open but unused. PostgreSQL is holding a backend process for each.
-- Each backend process consumes ~5-10 MB of RAM.
-- 147 * 10 MB = 1.47 GB of PostgreSQL memory doing nothing. 147 idle connections. Each one represents a Horizon worker that finished its last job and is sitting in a BLPOP loop on Redis, waiting for work. The worker is doing nothing with PostgreSQL. It is not running queries. It is not holding locks. It is not reading data. But it is holding a connection, and that connection is consuming resources.
A more detailed investigation reveals the true scope of the waste.
-- A more detailed diagnostic: how long have these connections been idle?
-- This reveals the true scope of the waste.
SELECT pid,
application_name,
state,
query,
now() - state_change AS idle_duration,
now() - backend_start AS connection_age
FROM pg_stat_activity
WHERE datname = current_database()
AND state = 'idle'
ORDER BY idle_duration DESC
LIMIT 20;
-- Sample output during a typical Tuesday afternoon:
--
-- pid | application_name | state | query | idle_duration | connection_age
-- ------+------------------+-------+---------------------+---------------+----------------
-- 4821 | | idle | UPDATE imports ... | 00:47:22 | 03:12:45
-- 4833 | | idle | SELECT * FROM ... | 00:41:17 | 03:12:44
-- 4829 | | idle | INSERT INTO ... | 00:38:54 | 03:12:44
-- 4847 | | idle | UPDATE imports ... | 00:35:02 | 03:12:43
-- ...
--
-- These connections have been idle for 35-47 minutes.
-- They have been open for over 3 hours.
-- The last query each ran was a Horizon job that finished long ago.
-- The worker is alive, waiting on Redis. The connection is alive,
-- waiting on nothing. PostgreSQL does not know the difference. Connections idle for 35-47 minutes. Open for over three hours. The last query each executed was a Horizon job that completed long ago. The worker has since processed dozens of other jobs on other queues (which may not touch the database), or has simply been waiting. The PDO connection persists through all of it.
The memory arithmetic
Each PostgreSQL backend process — the server-side counterpart to a client connection — allocates approximately 5-10 MB of RAM. It holds a slot in the process table. It consumes a file descriptor. At 150 backends, the overhead alone is 750 MB to 1.5 GB of memory. On a managed database instance with 4-8 GB of total RAM, this is a significant fraction of available memory being spent on processes that are contributing nothing.
-- How much memory are these idle backends actually consuming?
-- PostgreSQL does not expose per-backend RSS directly, but
-- we can estimate from the server's perspective.
-- Check total backend count and reserved memory:
SELECT count(*) AS total_backends,
count(*) FILTER (WHERE state = 'idle') AS idle_backends,
count(*) FILTER (WHERE state = 'active') AS active_backends,
current_setting('work_mem') AS work_mem_per_sort,
current_setting('max_connections') AS max_connections
FROM pg_stat_activity
WHERE backend_type = 'client backend';
-- On the server itself (requires shell access):
-- ps aux | grep postgres | grep -v grep | wc -l
--
-- Each postmaster child process:
-- Base overhead: ~5 MB (shared library mappings, connection state)
-- Per-query overhead: work_mem allocation per sort/hash node
-- Catalog cache: grows with number of distinct tables queried
--
-- At 150 backends with work_mem = '4MB':
-- Base: 150 * 5 MB = 750 MB (always allocated)
-- Worst case: 150 * 4 MB per sort node = 600 MB additional
-- Total potential: 1.35 GB for connection overhead alone
--
-- On RDS db.t3.medium (4 GB RAM), that is 34% of total memory
-- consumed by backends that are not executing queries. The memory consumed by idle backends is not merely wasted — it is stolen from the resources that PostgreSQL needs to run queries efficiently. Every megabyte allocated to an idle backend is a megabyte not available for shared_buffers, not available for the OS page cache, not available for work_mem allocations during sorts and hash joins.
The practical consequence is that your actively running queries become slower because idle connections are consuming the memory that would otherwise keep your indexes and hot data in cache. This is not a theoretical concern. I shall demonstrate it with an EXPLAIN ANALYZE later in this article.
Worse, those idle connections are invisible to your application monitoring. Laravel does not track PDO connection state. Horizon's dashboard shows job throughput and queue depth, not database connection counts. You will not know you are at 147/100 connections until connection number 101 fails and throws a PDOException.
How does the auto-balancer make it worse?
Horizon's auto balance strategy is one of its most useful features. It monitors queue depth and redistributes workers toward busy queues. When your import queue suddenly has 5,000 jobs, Horizon scales up the import supervisor from its minProcesses toward its maxProcesses.
This is exactly what you want for throughput. It is exactly what you do not want for connection counts.
// The auto-balancer makes it worse.
//
// Horizon's 'auto' balance strategy monitors queue depth and
// redistributes workers to busy queues. When a burst of import
// jobs arrives:
//
// 1. supervisor-imports scales from 5 to 60 workers
// 2. 55 new PHP processes fork
// 3. Each opens a PDO connection to PostgreSQL
// 4. 55 new connections appear in ~2 seconds
//
// If PostgreSQL was already at 80/100 connections,
// those 55 new connections push it to 135/100.
// 35 workers fail to connect. Jobs fail. Horizon retries them.
// The retries fork more workers. More connection attempts.
// This is a positive feedback loop.
//
// The GitHub issue (laravel/horizon #733) reports exactly this:
// 150 workers, ~2,000 jobs/hour, 197 connections observed,
// application halted. The auto-balancer's scaling is rapid. It forks new worker processes in batches, governed by the balanceMaxShift setting (default: 1, commonly increased to 5 or 10 for faster scaling). Each new process opens a connection to PostgreSQL. If 55 new workers fork within a few seconds, that is 55 new connection establishment calls hitting PostgreSQL in rapid succession.
If PostgreSQL is already near its max_connections limit, this burst exceeds it. The workers that fail to connect throw exceptions. The jobs they were assigned fail. Horizon retries them. The retries may land on workers that also cannot connect. Meanwhile, the auto-balancer sees a growing queue and attempts to fork more workers.
This is a positive feedback loop. More failures lead to more retries, which lead to more queue depth, which leads to more worker scaling, which leads to more connection attempts, which leads to more failures. The only thing that breaks the cycle is Horizon reaching its maxProcesses ceiling or an operator manually pausing the supervisor.
A reconstructed timeline
The following timeline is reconstructed from a production incident report. The details are generalized, but the sequence is representative of what multiple teams have reported in the Horizon GitHub issue and in Laravel community forums.
// Timeline of a connection storm (reconstructed from logs):
//
// 14:32:00 Queue depth: 5,200 import jobs arrive from CSV upload
// 14:32:03 Auto-balancer detects queue depth > threshold
// 14:32:03 supervisor-imports: 5 workers -> balanceMaxShift +5 = 10
// 14:32:06 supervisor-imports: 10 -> 15 (cooldown: 3s)
// 14:32:09 supervisor-imports: 15 -> 20
// 14:32:12 supervisor-imports: 20 -> 25
// ... (each shift forks 5 new workers, each opens a PDO conn)
// 14:33:00 supervisor-imports: 55 workers. Total connections: 145/100
//
// 14:33:01 PHP-FPM web request: PDOException "too many connections"
// 14:33:01 Health check fails. Load balancer removes server.
// 14:33:02 Remaining servers absorb traffic. Their FPM pools scale up.
// 14:33:02 More PDO connections attempted against same PostgreSQL.
// 14:33:03 Second server fails health check.
// 14:33:05 Application fully offline.
//
// 14:33:05 Horizon is still scaling up. Queue is still growing.
// Auto-balancer sees failed jobs re-entering the queue.
// It interprets this as "queue is busy, add more workers."
//
// The positive feedback loop runs for 4 minutes before an
// engineer manually pauses Horizon. Total incident duration: 7 min.
// Time to diagnose root cause: 3 days. Seven minutes from trigger to resolution. Three days to identify the root cause. The incident report blamed "a sudden spike in import jobs." The actual cause was architectural: no limit on the rate of connection creation, no awareness of database connection capacity, and no mechanism for workers to release connections they were not using.
I should note that this is not a bug in Horizon. The auto-balancer is doing exactly what it is designed to do — allocate workers to busy queues. The problem is that each worker allocation carries a hidden cost (one PostgreSQL connection) that Horizon has no visibility into. Horizon monitors Redis queue depth. It does not monitor PostgreSQL connection count. The two systems are managed independently, and the failure occurs at their intersection.
The performance impact you cannot see in the logs
Connection exhaustion — the FATAL: too many connections error — is the visible failure. It is dramatic, immediately noticed, and impossible to ignore. But there is a subtler degradation that occurs long before you hit max_connections, and it is more damaging precisely because it goes unnoticed.
When PostgreSQL is maintaining 150-200 backend processes, the server's memory is redistributed in ways that hurt query performance. Backends consume RAM. That RAM is no longer available for shared_buffers (PostgreSQL's internal buffer cache) or for the operating system's page cache (which caches index and table data read from disk).
-- EXPLAIN ANALYZE showing the performance difference between
-- a query running on a database with 20 connections vs 200.
--
-- Same query, same data, same hardware.
-- The only difference is connection count and its effect on
-- shared buffers and OS page cache.
-- At 20 connections (after deploying connection pooling):
EXPLAIN (ANALYZE, BUFFERS)
SELECT o.id, o.total, u.email
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.created_at > now() - interval '7 days'
AND o.status = 'completed';
-- QUERY PLAN
-- ---------------------------------------------------------------
-- Hash Join (cost=12.50..487.30 rows=1,247 width=52)
-- (actual time=0.89..3.41 ms rows=1,312)
-- Hash Cond: (o.user_id = u.id)
-- Buffers: shared hit=412
-- -> Index Scan using idx_orders_created_status
-- on orders o (cost=0.42..462.10 rows=1,247 width=28)
-- (actual time=0.05..2.14 ms rows=1,312)
-- Index Cond: (created_at > ...)
-- Filter: (status = 'completed')
-- Buffers: shared hit=389
-- -> Hash (cost=8.42..8.42 rows=327 width=28)
-- (actual time=0.76..0.76 ms rows=327)
-- Buffers: shared hit=23
-- Planning Time: 0.18 ms
-- Execution Time: 3.67 ms
-- shared hit=412 — every buffer read came from shared memory.
-- At 200 connections (before pooling, during the incident):
-- Same EXPLAIN, same query, same indexes:
--
-- Buffers: shared hit=198 read=214
-- Execution Time: 47.3 ms
--
-- shared hit=198, read=214. Over half the buffer reads went to disk.
-- Why? 200 backend processes consuming 1.6 GB of RAM left less
-- memory for shared_buffers and OS page cache. The working set
-- that fit comfortably in memory at 20 connections no longer fits
-- at 200. Same query. Same indexes. Same hardware. 12x slower. The same query. The same indexes. The same hardware. At 20 connections, every buffer read comes from shared memory: shared hit=412. Execution time: 3.67ms. At 200 connections, over half the buffer reads go to disk: shared hit=198 read=214. Execution time: 47.3ms. A 12x regression caused not by a bad query plan, not by a missing index, not by table bloat — but by idle connections consuming the memory that the database needs to keep data cached.
This degradation is invisible to application monitoring. Your p99 latency climbs from 15ms to 180ms. You blame the database. You blame the cloud provider. You consider upgrading to a larger instance. Meanwhile, 140 of your 150 Horizon workers are sitting idle, each holding a PostgreSQL backend, each consuming 8 MB of memory that could otherwise cache your most frequently accessed data.
I have seen teams upgrade from a db.r6g.large to a db.r6g.xlarge to solve this problem. Twice the RAM, twice the cost: $500/month to $1,000/month. The actual fix — deploying DB::disconnect() or a connection proxy — would have cost nothing and reduced connection count from 150 to 20. The database was never underpowered. It was oversubscribed.
If you take one thing from this article, let it be this: idle connections are not free. They do not merely occupy a slot in max_connections. They consume memory. They reduce cache hit rates. They degrade query performance for every other connection — including the ones that are actively doing work. The cost is real, measurable, and entirely avoidable.
"In transaction mode, the connection pooler assigns a database connection to a client only for the duration of a single transaction. The moment the transaction completes, the connection returns to the pool."
— from You Don't Need Redis, Chapter 17: Sorting Out the Connection Poolers
Fix #1: DB::disconnect() after every job
The most commonly recommended fix, and the one discussed in the GitHub issue, is to explicitly close the database connection after each job completes. Laravel provides DB::disconnect() for this purpose.
<?php
// Fix #1: DB::disconnect() in the job's finally block.
// Forces PDO to close the connection after each job completes.
// The next job that needs the database will open a fresh one.
namespace App\Jobs;
use App\Models\Import;
use App\Services\ImportProcessor;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Illuminate\Support\Facades\DB;
class ProcessImport implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
public function __construct(
public int $importId
) {}
public function handle(ImportProcessor $processor): void
{
try {
$import = Import::findOrFail($this->importId);
$processor->process($import);
$import->update(['status' => 'completed']);
} finally {
DB::disconnect();
// Connection closed. Backend process freed.
// Worker loops back to Redis to wait for the next job.
// No idle PostgreSQL connection held during the wait.
}
}
} This works. After each job, the PDO connection is closed. The PostgreSQL backend process exits. When the next job runs and touches the database, Laravel opens a fresh connection. The worker only holds a connection while actually processing a job — not while sitting idle in the Redis BLPOP loop.
A note on disconnect() vs reconnect()
I raise this because the distinction is subtle and the wrong choice negates the entire fix.
<?php
// DB::disconnect() vs DB::reconnect() — they are not the same.
//
// DB::disconnect() closes the PDO connection and sets it to null.
// The next database call creates a fresh connection on demand.
// Between jobs, no connection exists. This is what you want.
DB::disconnect();
// $pdo = null. Connection closed. Backend freed.
// Next query: new PDO(...) called automatically by Laravel.
// DB::reconnect() closes AND immediately reopens the connection.
// The worker holds a fresh connection even while idle.
// You have replaced one idle connection with another idle connection.
DB::reconnect();
// $pdo = new PDO(...). Connection open. Backend allocated.
// Worker sits in Redis BLPOP. Connection idle. Backend wasted.
// This is the same problem with extra steps.
// A subtle distinction, but a consequential one.
// Use disconnect(), not reconnect(). DB::disconnect() sets the PDO instance to null. No connection exists between jobs. DB::reconnect() closes and immediately reopens the connection, meaning the worker holds a fresh backend even while idle. You have replaced one idle connection with another idle connection — the same problem with extra steps and a TCP handshake thrown in for good measure.
I have encountered this mistake in production. A team deployed DB::reconnect() in their Queue::after() handler and spent two weeks wondering why their connection count did not change. The answer was in the method name. They wanted to disconnect. They reconnected instead.
Multiple database connections
If your application is configured with multiple database connections — a primary, a read replica, an analytics database, a tenant-specific connection — DB::disconnect() only closes the default connection. The others remain open.
<?php
// If your application uses multiple database connections,
// DB::disconnect() only closes the default connection.
// You must disconnect each one explicitly.
// config/database.php
'connections' => [
'pgsql' => [
'driver' => 'pgsql',
'host' => env('DB_HOST'),
// ... primary database
],
'analytics' => [
'driver' => 'pgsql',
'host' => env('ANALYTICS_DB_HOST'),
// ... analytics/reporting database
],
'tenant' => [
'driver' => 'pgsql',
'host' => env('TENANT_DB_HOST'),
// ... tenant-specific database
],
],
// In your AppServiceProvider:
Queue::after(function (JobProcessed $event) {
DB::disconnect('pgsql');
DB::disconnect('analytics');
DB::disconnect('tenant');
// Or, if you prefer the nuclear option:
foreach (array_keys(config('database.connections')) as $name) {
DB::disconnect($name);
}
// This disconnects every configured connection,
// including ones the job may not have used.
// Harmless — disconnecting an unused connection is a no-op.
}); The nuclear option — iterating over all configured connections and disconnecting each one — is safe. Calling DB::disconnect() on a connection that was never opened is a no-op. The overhead of checking each connection is negligible compared to the cost of leaving one open.
The trade-offs of disconnect()
The trade-off is connection churn. Every job now pays the cost of establishing a new PDO connection: TCP handshake, TLS negotiation (if using SSL), PostgreSQL authentication, connection initialization. This typically takes 2-5 milliseconds on a local network, 10-30 milliseconds across availability zones, and 50-150 milliseconds across regions with TLS.
For 2,000 jobs per hour, the total overhead is 4-10 seconds of cumulative reconnection time. Negligible. For 200,000 jobs per hour with a cross-region database, it becomes 2,800-8,300 seconds — 46-138 minutes of aggregate reconnection time per hour. That is not negligible.
| Scenario | Connect time | Per-job overhead | Idle connections | PG memory |
|---|---|---|---|---|
| No disconnect (persistent) | 0ms | 0ms | 150 | 1.2 GB |
| DB::disconnect() after each job | 2.8ms | 2.8ms | 3-8 | 64 MB |
| DB::disconnect() + TLS | 14ms | 14ms | 3-8 | 64 MB |
| DB::disconnect() + cross-AZ | 8.2ms | 8.2ms | 3-8 | 64 MB |
| DB::disconnect() + cross-AZ + TLS | 22ms | 22ms | 3-8 | 64 MB |
| Connection proxy (local) | 0.3ms | ~0ms | 0 | 160 MB |
There is also the problem of doing this in every job class. If you have 40 job classes and you forget DB::disconnect() in one of them, that one class holds connections indefinitely. The leak continues. The incident recurs. Someone spends an hour bisecting which job class is missing the cleanup.
Which brings us to the better variant.
Fix #2: A global event listener
A cleaner approach is to handle disconnection centrally, using Laravel's queue event system.
<?php
// Fix #2: A global event listener that disconnects after every job.
// Cleaner than putting DB::disconnect() in every job class.
// Place this in AppServiceProvider or a dedicated HorizonServiceProvider.
// app/Providers/AppServiceProvider.php
namespace App\Providers;
use Illuminate\Queue\Events\JobProcessed;
use Illuminate\Queue\Events\JobFailed;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Queue;
use Illuminate\Support\ServiceProvider;
class AppServiceProvider extends ServiceProvider
{
public function boot(): void
{
// After every job completes (success or failure), disconnect.
Queue::after(function (JobProcessed $event) {
DB::disconnect();
});
Queue::failing(function (JobFailed $event) {
DB::disconnect();
});
}
}
// This handles every job class automatically.
// No code changes to individual jobs required.
// Trade-off: every job pays the cost of reconnecting (~2-5ms).
// For jobs processing 2,000/hour, that is 4-10 seconds of total overhead.
// Compared to an application-wide outage, this is acceptable. This is the approach I would recommend if you are going the disconnect route. It covers every job class, including third-party packages that dispatch their own jobs (and you cannot modify). It handles both successful and failed jobs. It is one file, one concern, one place to audit.
The same connection churn trade-off applies. But centralized management means you will not forget it, and removing it later (if you move to a connection pooler) is a single edit.
I should be honest about what this approach does not solve. It reduces idle connections — workers release their backends between jobs. But it does not reduce peak connections. If 80 workers are processing jobs simultaneously, 80 connections are open simultaneously. And every one of those 80 workers pays the reconnection cost when its next job arrives. At high concurrency with cross-AZ latency, that cost compounds.
Monitoring the results
After deploying the event listener, you will want to verify it is working. Here is a service provider that combines the disconnect with connection monitoring.
<?php
// A middleware for monitoring Horizon connection state.
// Reports to your metrics backend (Prometheus, Datadog, etc.)
// so you can see the problem before it becomes an outage.
// app/Providers/HorizonServiceProvider.php
namespace App\Providers;
use Illuminate\Queue\Events\JobProcessed;
use Illuminate\Queue\Events\JobFailed;
use Illuminate\Queue\Events\JobProcessing;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Queue;
use Illuminate\Support\ServiceProvider;
class HorizonServiceProvider extends ServiceProvider
{
public function boot(): void
{
Queue::before(function (JobProcessing $event) {
// Record connection count before job starts
$connections = DB::select("
SELECT count(*) AS total
FROM pg_stat_activity
WHERE datname = current_database()
");
// Emit metric: horizon.pg_connections.total
app('metrics')->gauge(
'horizon.pg_connections.total',
$connections[0]->total
);
});
Queue::after(function (JobProcessed $event) {
DB::disconnect();
});
Queue::failing(function (JobFailed $event) {
DB::disconnect();
});
}
} I should note that querying pg_stat_activity inside every job's before event adds its own overhead — a query against a system catalog view. For most applications, running this check every 100th job (using a counter or a probability check) provides sufficient visibility without measurable cost.
Fix #3: Reduce maxProcesses (the unsatisfying answer)
If you cannot add infrastructure and cannot tolerate connection churn, the arithmetic leaves you with one lever: reduce the number of workers.
// config/horizon.php — a typical production configuration
// with 150 workers across multiple supervisors.
'environments' => [
'production' => [
'supervisor-default' => [
'connection' => 'redis',
'queue' => ['default', 'notifications'],
'balance' => 'auto',
'minProcesses' => 10,
'maxProcesses' => 50,
'balanceMaxShift' => 5,
'balanceCooldown' => 3,
'tries' => 3,
'timeout' => 120,
],
'supervisor-imports' => [
'connection' => 'redis',
'queue' => ['imports', 'exports'],
'balance' => 'auto',
'minProcesses' => 5,
'maxProcesses' => 60,
'tries' => 1,
'timeout' => 600,
],
'supervisor-emails' => [
'connection' => 'redis',
'queue' => ['mail'],
'balance' => 'simple',
'processes' => 40,
'tries' => 3,
'timeout' => 60,
],
],
],
// Total potential workers: 50 + 60 + 40 = 150.
// Each worker is a PHP process.
// Each PHP process holds one PDO connection to PostgreSQL.
// 150 workers = 150 connections. Minimum. With max_connections = 100 and PHP-FPM consuming 30-50 connections, you have 50-70 connections available for Horizon. That means your total maxProcesses across all supervisors should not exceed 50-70.
This is simple and reliable. It is also limiting. Your queue throughput is directly proportional to your worker count. 50 workers processing jobs that take 200ms each can handle 250 jobs per second, or 900,000 per hour. That sounds generous until you have a supervisor running import jobs that take 30 seconds each — 50 workers processing 30-second jobs yields 1.6 jobs per second. If 5,000 import jobs arrive in a batch, the queue takes 52 minutes to drain.
Reducing workers is a capacity trade-off. You are buying connection headroom by paying with throughput. For many applications, this is acceptable. For applications with bursty, long-running jobs, it is not.
There is a more nuanced approach: reduce maxProcesses but increase balanceMaxShift cautiously. A lower ceiling prevents connection storms, while a moderate shift rate lets the auto-balancer redistribute workers effectively within that ceiling. But this requires knowing your connection budget precisely — and adjusting it every time your infrastructure changes. It is, if I may say, rather a lot of arithmetic to perform repeatedly when the alternative is to remove the 1:1 constraint entirely.
Fix #4: Increase max_connections (the tempting non-fix)
This is what everyone does first. It is what the on-call engineer does at 3 AM when the application is down and the queue is backing up and the Horizon dashboard looks like a disaster movie.
# postgresql.conf — the "just increase it" non-fix.
#
# Default:
max_connections = 100
# After the incident:
max_connections = 300
# This "works" in the sense that the error goes away.
# It fails in the sense that:
#
# 1. Each connection consumes ~5-10 MB of RAM for the backend process.
# 300 connections = 1.5 - 3 GB of RAM just for process overhead.
# On an RDS db.t3.medium (4 GB total), that is most of your memory.
#
# 2. PostgreSQL performance degrades past ~50 active connections
# on a typical 4-core instance. Those 300 backends contend for
# CPU, shared buffers, and WAL locks.
#
# 3. You are treating the symptom. The next time Horizon auto-scales
# up, or you add another supervisor, you will be back here again
# changing 300 to 500.
#
# 4. shared_buffers, effective_cache_size, and work_mem all need to be
# recalculated when max_connections changes. Most people skip this.
# PostgreSQL allocates shared memory based on max_connections at
# startup. More connections = more shared memory = less OS cache. Increasing max_connections from 100 to 300 does make the error go away. It also makes several other things happen, none of them good.
Each connection's backend process consumes memory. On a managed database like RDS db.t3.medium with 4 GB of RAM, the kernel, PostgreSQL's shared buffers, and the OS page cache are already competing for space. Adding 200 more backend processes — each with its own work_mem allocation, catalog cache, and sort buffers — squeezes everything else.
# The arithmetic of "just add more connections":
#
# Instance: RDS db.r6g.large (16 GB RAM)
#
# PostgreSQL memory allocation at max_connections = 100:
# shared_buffers = 4 GB (25% of RAM, standard)
# OS page cache ≈ 8 GB (what the OS uses for disk caching)
# Backend overhead = 100 * 8 MB = 800 MB
# Available for queries = comfortable
#
# PostgreSQL memory allocation at max_connections = 500:
# shared_buffers = 4 GB (unchanged, but should it be?)
# Backend overhead = 500 * 8 MB = 4 GB
# OS page cache ≈ 4 GB (halved — your index scans just
# started hitting disk)
# Available for queries = tight
#
# The per-backend RAM is not just the base allocation. Each backend
# can allocate up to work_mem per sort/hash node in a query plan.
# A query with 3 sort nodes and work_mem = 4MB uses up to 12 MB.
# 50 backends running such queries simultaneously: 600 MB.
# 200 backends: 2.4 GB. On top of the 4 GB base overhead.
#
# This is why PostgreSQL veterans set max_connections low and use
# connection pooling. Not because they are conservative.
# Because they have done the arithmetic. When work_mem is set to 4 MB (a common default) and 200 connections each perform a sort operation, that is 800 MB of sort memory alone. If those sorts exceed work_mem, they spill to disk. Queries that ran in 5ms at 50 connections now take 50ms at 200 connections because they are sorting on disk instead of in memory.
PostgreSQL's throughput also degrades from lock contention at high connection counts. The ProcArrayLock, WALInsertLock, and buffer_content locks are all hot paths that see increasing contention as backend processes multiply. On a 4-core instance, performance typically peaks around 40-60 active connections and declines from there. The PostgreSQL documentation on connection settings is diplomatically silent on this, but benchmarks from Percona and pganalyze consistently show throughput declining past the 50-connection mark on typical hardware.
Note the word "active." Most of your 150 Horizon connections are idle. They are not contending for CPU or locks. But they are consuming memory, file descriptors, and max_connections slots that could be used by processes that actually need them — like your web application's PHP-FPM workers trying to serve user requests.
There is a further complication that teams frequently overlook: shared_buffers, effective_cache_size, and work_mem all need to be recalculated when max_connections changes. PostgreSQL allocates shared memory based on max_connections at startup. More connections means more shared memory for connection metadata, which leaves less physical memory for data caching. The parameters that were tuned for 100 connections are wrong at 300 connections. Most teams skip this recalculation. They change one number and restart PostgreSQL. Then they wonder why the queries that were fast yesterday are slow today.
Fix #5: A connection proxy (the structural answer)
Every fix discussed so far treats the connection problem on the client side. Disconnect after jobs: client-side. Reduce workers: client-side. Increase max_connections: server-side, but treating the symptom rather than the cause.
The cause is that PHP has no built-in connection pooling. Each process gets one connection, and it holds it until the process ends. This is a language-level architectural decision that dates to PHP's origins as a request-scoped scripting language. Horizon's long-lived workers expose this limitation in a way that short-lived PHP-FPM requests do not.
A connection proxy sits between your application processes and PostgreSQL. Your 150 Horizon workers connect to the proxy. The proxy maintains a small pool of actual PostgreSQL connections — 15, 20, 25 — and multiplexes the 150 client connections across them.
PgBouncer: the established option
I should give PgBouncer its due. It is the most widely deployed PostgreSQL connection pooler, it is battle-tested at significant scale, and for many Horizon deployments it works well.
# PgBouncer in transaction mode — the most common proxy solution.
#
# pgbouncer.ini
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp
[pgbouncer]
listen_port = 6432
pool_mode = transaction
default_pool_size = 20
max_client_conn = 300
# How it works:
# - 150 Horizon workers connect to PgBouncer (port 6432)
# - PgBouncer maintains 20 connections to PostgreSQL (port 5432)
# - When a worker starts a transaction, PgBouncer assigns a backend
# - When the transaction ends, the backend returns to the pool
#
# What works:
# - Connection multiplexing (150 -> 20)
# - Idle workers hold no PostgreSQL backend
# - Auto-balancer can scale freely
#
# What breaks in transaction mode:
# - Prepared statements (PARSE/BIND/EXECUTE protocol)
# Fix: PDO::ATTR_EMULATE_PREPARES = true
# Cost: boolean parameters sent as integers (see our article
# on the PgBouncer boolean prepared statement trap)
#
# - SET commands do not persist across transactions
# - Advisory locks released at transaction boundary
# - LISTEN/NOTIFY does not work
# - Temporary tables vanish between transactions
#
# For Horizon workers running simple CRUD jobs, transaction mode
# is usually fine. For workers that rely on session-level features,
# it requires careful testing. PgBouncer in transaction mode achieves the connection multiplexing that solves the Horizon problem. 150 workers connect to PgBouncer. PgBouncer maintains 20 connections to PostgreSQL. Workers only consume a backend during active transactions. Between jobs, no backend is allocated.
The limitation is that transaction mode shuffles backends between transactions. Server-side prepared statements — which PDO uses by default — are bound to a specific backend. If your PARSE command runs on backend A and your EXECUTE command lands on backend B, PostgreSQL returns an error: prepared statement "pdo_stmt_00000001" does not exist.
The standard fix is to enable emulated prepares in your Laravel database.php configuration: PDO::ATTR_EMULATE_PREPARES => true. This tells PDO to handle parameter binding client-side, substituting values directly into the SQL string rather than using the PostgreSQL extended query protocol. It works, but it introduces a second problem: PDO's client-side binding converts PHP true/false to the integers 1/0 instead of the PostgreSQL boolean literals true/false. If you have boolean columns in your schema, your WHERE is_active = true queries become WHERE is_active = 1, which PostgreSQL rejects with operator does not exist: boolean = integer.
The full story of this boolean casting bug and its remedies is a matter I have attended to at length in a separate guide on the PgBouncer prepared statement trap. It is solvable — casting, query builder adjustments, PgBouncer 1.21+'s prepared statement support — but it requires awareness and testing.
Gold Lapel: the drop-in alternative
# With Gold Lapel between Horizon and PostgreSQL:
#
# .env
DB_HOST=127.0.0.1 # Gold Lapel proxy
DB_PORT=6432 # Gold Lapel's port (not Postgres 5432)
DB_DATABASE=myapp
DB_USERNAME=app_user
DB_PASSWORD=secret
# What changes architecturally:
#
# Horizon Workers (150)
# |
# | Each worker opens a PDO connection to Gold Lapel
# |
# Gold Lapel (connection proxy)
# |
# | Maintains 20 actual PostgreSQL connections
# | Multiplexes 150 client connections onto 20 backends
# |
# PostgreSQL
# | Sees 20 connections. Not 150. Not 200. Twenty.
#
# When a Horizon worker is idle (waiting on Redis for the next job),
# its connection to Gold Lapel is idle too — but Gold Lapel returns
# the upstream PostgreSQL backend to the pool. The backend is available
# for other workers who are actively running queries.
#
# No DB::disconnect() needed. No connection churn. No RAM waste.
# The auto-balancer can scale to 300 workers if it wants.
# PostgreSQL still sees 20 connections. With a proxy in place, the arithmetic changes completely:
- 150 Horizon workers open 150 connections to the proxy. Cost: minimal. The proxy handles lightweight client connections efficiently.
- The proxy maintains 20 connections to PostgreSQL. These are the only connections PostgreSQL sees.
- When a worker runs a query, the proxy assigns one of its 20 backend connections to handle it. When the query completes, the backend connection returns to the proxy's pool, available for the next worker.
- When a worker is idle (waiting on Redis), its proxy connection is idle too, but no PostgreSQL backend is allocated. The backend was returned to the pool when the last query finished.
No DB::disconnect() needed. No connection churn. No reconnection overhead per job. No limit on maxProcesses imposed by max_connections. The auto-balancer can scale to 300 workers during a burst, and PostgreSQL still sees 20 connections.
You do not need to change a single line of application code. Run composer require goldlapel/goldlapel-laravel and the service provider handles the rest. Laravel does not know the difference. PDO does not know the difference. Your jobs run exactly as before.
Gold Lapel's session-mode pooling preserves prepared statements, SET commands, advisory locks, and temporary tables within a session — features that PgBouncer's transaction mode discards between transactions. For Horizon workers that use any of these session-level PostgreSQL features, this distinction matters.
Visibility into the pool
# Gold Lapel exposes connection pool metrics that make
# the problem visible before it becomes an incident.
#
# curl http://localhost:9090/metrics
#
# goldlapel_client_connections_total 153
# goldlapel_upstream_connections_active 8
# goldlapel_upstream_connections_idle 12
# goldlapel_upstream_connections_total 20
# goldlapel_client_connections_waiting 0
# goldlapel_query_latency_p99_ms 4.2
#
# 153 client connections (150 Horizon + 3 FPM workers),
# but only 8 upstream connections actively running queries.
# 12 backends idle in the pool, ready for the next query.
# Zero clients waiting for a backend. No contention.
#
# Alert when goldlapel_client_connections_waiting > 0
# for more than 30 seconds. That means the pool is too small
# and clients are queuing. Increase default_pool_size. The metrics endpoint provides the visibility that was missing from the original architecture. You can see exactly how many client connections exist, how many backend connections are in use, and whether any clients are waiting for a backend. An alert on goldlapel_client_connections_waiting > 0 tells you the pool is too small before it becomes a bottleneck. No more discovering the problem when max_connections is exceeded.
Comparing the options honestly
Each approach has a place, depending on your scale, your tolerance for operational complexity, and whether you have an engineer who enjoys tuning maxProcesses every time the traffic pattern changes.
| Solution | Connections to PG | Latency impact | Complexity | Trade-off |
|---|---|---|---|---|
| DB::disconnect() in finally block | Variable (0 to maxProcesses) | +2-5ms per job (reconnect cost) | Low — one line per job class | Connection churn; every job reconnects |
| Queue::after() event listener | Variable (0 to maxProcesses) | +2-5ms per job (reconnect cost) | Low — one listener, all jobs covered | Same churn, but centralized |
| Reduce maxProcesses | Lower ceiling, but still 1:1 | None (fewer connections) | None | Reduced throughput; jobs queue longer |
| Increase max_connections | Higher ceiling, still 1:1 | Performance degrades past ~50 active | None | RAM waste; treats symptom not cause |
| PgBouncer (transaction mode) | Multiplexed (e.g., 150 to 20) | Sub-millisecond proxy overhead | Medium — separate service to operate | No prepared statements in txn mode |
| Gold Lapel (session mode) | Multiplexed (e.g., 150 to 20) | Sub-millisecond proxy overhead | Low — drop-in, self-optimizing | Prepared statements work; session state preserved |
For small deployments — 20-30 workers, a managed database with a generous connection limit — Queue::after() with DB::disconnect() is perfectly adequate. The reconnection overhead is lost in the noise. The implementation is trivial. I would not suggest adding infrastructure to solve a problem that one event listener handles cleanly.
For larger deployments — 100+ workers, auto-balancing supervisors, multiple services sharing a database — a connection proxy is not optional. The 1:1 mapping of PHP processes to PostgreSQL connections does not scale. No amount of DB::disconnect() or max_connections tuning changes the fundamental architecture.
I should be forthcoming about a case where none of these application-level fixes is ideal: when your jobs use multiple database connections in rapid succession. A job that reads from a primary, writes to an analytics database, and updates a cache in a third PostgreSQL instance holds three connections simultaneously. DB::disconnect() must close all three. A connection proxy must pool all three. The complexity scales with the number of distinct databases your jobs touch. For this topology, a single proxy instance per database is the cleanest approach — each proxy manages its own pool independently.
PostgreSQL safety nets you should deploy regardless
Whether you choose DB::disconnect(), a connection proxy, or both, there are PostgreSQL-side configurations that provide defense in depth. These do not solve the connection exhaustion problem, but they limit the blast radius when things go wrong.
-- PostgreSQL safety net: kill connections stuck in transactions.
-- This catches jobs that crash mid-transaction without rollback.
-- postgresql.conf (or ALTER SYSTEM):
ALTER SYSTEM SET idle_in_transaction_session_timeout = '5min';
SELECT pg_reload_conf();
-- What this does:
-- If a connection opens a transaction (BEGIN) and then does nothing
-- for 5 minutes, PostgreSQL terminates it. The transaction rolls back.
-- The connection closes. The backend process exits.
--
-- Why this matters for Horizon:
-- A job that calls DB::beginTransaction(), crashes in a way that
-- bypasses the finally block, and leaves the worker alive creates
-- a connection in the 'idle in transaction' state. That connection
-- holds row locks. It blocks autovacuum on the affected tables.
-- It prevents table maintenance. Over hours, table bloat grows.
--
-- Without this timeout, the connection persists until the worker
-- is manually restarted or Horizon is redeployed.
--
-- With the timeout, PostgreSQL cleans it up after 5 minutes.
-- The worker's next database call will get a new connection.
-- Check for idle-in-transaction connections right now:
SELECT pid, now() - xact_start AS transaction_age, query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
ORDER BY transaction_age DESC; The idle_in_transaction_session_timeout setting is particularly important for Horizon workers. If a job opens a transaction, crashes in a way that bypasses the finally block (a segfault, an OOM kill, a SIGKILL from the process supervisor), the connection may persist in the idle in transaction state. That connection holds row locks. It blocks autovacuum on the affected tables. Over hours, table bloat grows, and queries slow down.
A 5-minute timeout ensures PostgreSQL terminates these abandoned transactions before they cause cascading damage. The connection is closed. The locks are released. Autovacuum can proceed. The worker's next database call will get a fresh connection — which, if you have DB::disconnect() deployed, will happen naturally at the start of the next job.
Connection monitoring alerts
-- A simple alert query for your monitoring system.
-- Run this every 60 seconds. Alert if connection usage
-- exceeds 80% of max_connections.
SELECT count(*) AS current_connections,
current_setting('max_connections')::int AS max_connections,
round(
100.0 * count(*) /
current_setting('max_connections')::int,
1
) AS usage_pct
FROM pg_stat_activity
WHERE backend_type = 'client backend';
-- usage_pct > 80: warning — you have headroom, but not much.
-- usage_pct > 90: critical — one auto-scale event away from outage.
-- usage_pct > 95: page someone. Now.
-- Also check for connections that have been idle too long:
SELECT count(*) AS stale_connections
FROM pg_stat_activity
WHERE state = 'idle'
AND now() - state_change > interval '10 minutes';
-- If stale_connections > 50, your Horizon workers are holding
-- connections between jobs. Deploy DB::disconnect() or a proxy. The first alert — connection usage exceeding 80% of max_connections — gives you warning before the application halts. The second alert — more than 50 connections idle for over 10 minutes — tells you that Horizon workers are holding connections between jobs. Both are symptoms that DB::disconnect() or a proxy would resolve, but the alerts give you time to respond rather than discovering the problem via a customer support ticket.
A production checklist for Laravel Horizon + PostgreSQL
For those who prefer actionable steps over architectural discourse, here is what I would configure for a production Horizon deployment with PostgreSQL.
- Count your total worker processes. Sum
maxProcessesacross all supervisors, add your PHP-FPMpm.max_children(or Octane--workers), add 3 forsuperuser_reserved_connections, and add 5-10 for ad hoc connections (migrations,tinker, monitoring tools,pg_dump). That is your connection requirement. - Compare against
max_connections. If your requirement exceedsmax_connections, do not increasemax_connectionsas the first response. Read the next items. - Add
Queue::after()withDB::disconnect()as an immediate mitigation. This reduces steady-state connections from "all workers" to "actively processing workers." Deploy it today. It takes five minutes. - If you use multiple database connections, disconnect all of them. Iterate over
config('database.connections')in yourQueue::after()handler. One missed connection negates the fix. - Use
DB::disconnect(), notDB::reconnect(). Reconnect opens a fresh connection immediately. You want no connection between jobs. - Set
balanceMaxShiftconservatively. A value of 1-3 prevents the auto-balancer from forking 50 workers simultaneously. Slower scaling, but no connection storm. - Set
idle_in_transaction_session_timeoutto 5 minutes on PostgreSQL. This catches jobs that crash mid-transaction and leave connections holding locks indefinitely. - Monitor
pg_stat_activityconnection counts. Set an alert at 80% ofmax_connections. Set a second alert for more than 50 idle connections lasting over 10 minutes. If either fires, you have time to respond before the application halts. - For 100+ workers, deploy a connection proxy. PgBouncer in transaction mode (with emulated prepares and boolean casting handled), or Gold Lapel in session mode (
composer require goldlapel/goldlapel-laravel). Keep PostgreSQL'smax_connectionsat 100 or less. - After deploying any fix, verify with
pg_stat_activity. Run the diagnostic query during peak queue activity. Confirm that idle connections have decreased to single digits, not triple digits.
Why this problem will not be fixed in Horizon
The connection exhaustion problem in Laravel Horizon is not a bug. It is a consequence of combining long-lived PHP processes (which hold one connection per process, indefinitely) with a database that has a finite connection limit. The GitHub issue has been open since 2019. It is not going to be fixed in Horizon, because the fix is not in Horizon.
Horizon could, in theory, add a configuration option to disconnect the database after each job. Something like 'disconnectAfterJob' => true in the supervisor config. But this would paper over a fundamental PHP limitation — the absence of connection pooling — with application-level workarounds. And it would impose the reconnection overhead on every job, even in deployments where a proxy handles connection management more efficiently.
The Laravel team's position, as expressed in the issue thread, is reasonable: database connection management is a concern of the database infrastructure, not the queue worker framework. Horizon manages workers and queues. How those workers connect to external services — and how many connections they hold — is outside Horizon's scope.
I agree with this position, even though it means the problem persists for teams who deploy Horizon without understanding the connection implications. The fix is in the layer between your application and your database. That is the correct layer. A queue worker framework should not be in the business of managing database connection pools. That is what connection poolers are for.
Gold Lapel's connection multiplexing was designed for precisely this topology: many client processes, few queries at any moment, a database that performs best with a small number of actual connections. 150 Horizon workers become 20 PostgreSQL connections. The auto-balancer scales freely. The database breathes easily. And nobody gets paged at 3 AM because a worker count exceeded a connection limit.
That, if I may say, is a rather more pleasant arrangement for everyone involved — the workers, the database, and the engineer who would otherwise be awake at 3 AM wondering why 147 connections are idle.
Frequently asked questions
Terms referenced in this article
If the question of connection pooling interests you beyond Horizon's particular demands, I have taken the liberty of preparing a comparison of PostgreSQL connection poolers — PgBouncer, Pgcat, PgPool-II, and the alternatives — so you may choose the one best suited to your household.