PostgreSQL Job Queues with SKIP LOCKED: Replace Redis + Sidekiq
Six artists arrived to claim the commission. The remaining five skipped to the next available surface. Concurrency, at last, without contention.
Why PostgreSQL can be your job queue
Good evening. I see you have arrived with a Redis instance running solely to process background jobs. Allow me to suggest a simplification.
Most applications already have PostgreSQL. Adding Redis for background jobs introduces a second stateful system to monitor, back up, and keep running — connection management, failover, monitoring, capacity planning, and on-call runbooks all multiply. If PostgreSQL can handle the job queue workload, that is one fewer system in the household to attend to.
The traditional objection is that databases are too slow for job queues. This was largely true before PostgreSQL 9.5 introduced SKIP LOCKED. Before that, concurrent workers competing for the next job created lock contention — one worker would lock a row, and every other worker would wait for that lock to release before it could claim a different job.
SKIP LOCKED changes the behavior entirely: when a row is already locked by another transaction, skip it and move to the next unlocked row. This is the primitive that makes PostgreSQL competitive with Redis-backed queues for most workloads. Workers no longer contend with each other — they naturally distribute work by skipping locked jobs and claiming available ones.
I should be forthcoming about the scope. This approach works well for hundreds to thousands of jobs per second. If you need tens of thousands of jobs per second sustained, Redis or a dedicated message broker (RabbitMQ, Amazon SQS) is the right tool — they were purpose-built for that volume, and I would not presume to argue otherwise. But most applications never reach that threshold, and for those, PostgreSQL is more than sufficient.
What this guide builds: a complete, production-ready job queue from scratch — table schema, enqueue, dequeue, retry with exponential backoff, dead letter handling, stale lock recovery, and concurrency patterns. No external dependencies beyond PostgreSQL.
For Rails teams, see the Solid Queue guide, which uses this same SKIP LOCKED pattern with Rails conventions. For a broader look at replacing Redis with PostgreSQL, see the Redis alternatives guide.
The job queue table schema
If you'll permit me, I'd like to walk through the design of this table with some care. A job queue's reliability begins with its schema.
Core table design
CREATE TYPE job_status AS ENUM ('pending', 'running', 'completed', 'failed', 'dead');
CREATE TABLE jobs (
id BIGSERIAL PRIMARY KEY,
queue TEXT NOT NULL DEFAULT 'default',
payload JSONB NOT NULL,
status job_status NOT NULL DEFAULT 'pending',
priority INTEGER NOT NULL DEFAULT 0,
scheduled_at TIMESTAMPTZ NOT NULL DEFAULT now(),
locked_at TIMESTAMPTZ,
locked_by TEXT,
attempts INTEGER NOT NULL DEFAULT 0,
max_attempts INTEGER NOT NULL DEFAULT 5,
last_error TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
); Column-by-column rationale:
id — BIGSERIAL to avoid integer overflow on high-volume queues. This is the primary key and the unique identifier for each job.
queue — a text column supporting multiple named queues from a single table. Workers can process specific queues (default, emails, reports) or poll multiple queues in priority order.
payload — JSONB stores arbitrary job data. JSONB is preferred over JSON because it supports indexing and equality comparisons if needed. The payload contains everything the worker needs to execute the job.
status — an enum that enforces a finite state machine: pending (waiting to be claimed), running (claimed by a worker), completed (finished successfully), failed (will be retried), dead (exceeded max attempts, no more retries).
priority — integer column where lower values are dequeued first. Priority 0 is higher priority than priority 10. This enables urgent jobs without a separate queue.
scheduled_at — when the job should be processed. Defaults to now() for immediate execution. Set to a future timestamp for delayed jobs (e.g., "send this email in 30 minutes").
locked_at — timestamp of when a worker claimed the job. Used by the stale lock detector to identify abandoned jobs.
locked_by — identifier of the worker that claimed the job (hostname, PID, or a UUID). Used for debugging and for the stale lock detector.
attempts — how many times this job has been attempted. Incremented on each dequeue.
max_attempts — the maximum number of attempts before the job moves to dead status.
last_error — the error message from the most recent failed attempt. Preserved for debugging.
The indexes that keep it fast
-- The dequeue query's access path
CREATE INDEX idx_jobs_dequeue
ON jobs (queue, priority, scheduled_at)
WHERE status = 'pending';
-- The stale lock detector's access path
CREATE INDEX idx_jobs_stale
ON jobs (locked_at)
WHERE status = 'running'; The partial index WHERE status = 'pending' is the design decision that makes the entire approach viable long-term. The dequeue query only ever looks for pending jobs. As the table grows with completed and dead jobs, this index stays small — it contains only rows with status = 'pending'. A table with 10 million completed jobs and 50 pending jobs has a dequeue index measured in kilobytes. Quite manageable.
Without the partial index, the dequeue index would include every row in the table. As completed jobs accumulate, the index grows, and dequeue performance degrades. The partial index eliminates this problem entirely.
The stale lock index is similarly filtered — it only indexes running jobs, which are a small fraction of the total table.
The worker loop — dequeue with SKIP LOCKED
This is the heart of the arrangement. Allow me to present the dequeue query, then walk through it line by line.
The core dequeue query
WITH claimed AS (
SELECT id
FROM jobs
WHERE queue = 'default'
AND status = 'pending'
AND scheduled_at <= now()
ORDER BY priority, scheduled_at
FOR UPDATE SKIP LOCKED
LIMIT 1
)
UPDATE jobs
SET
status = 'running',
locked_at = now(),
locked_by = 'worker-1',
attempts = attempts + 1,
updated_at = now()
FROM claimed
WHERE jobs.id = claimed.id
RETURNING jobs.*; Line-by-line breakdown:
WHERE queue = 'default' — filter to the target queue. Workers can be dedicated to a single queue or poll multiple queues.
AND status = 'pending' — only claim jobs that are waiting to be processed. This condition hits the partial index.
AND scheduled_at <= now() — only claim jobs whose scheduled time has arrived. Delayed jobs remain in pending status until their time comes.
ORDER BY priority, scheduled_at — process higher-priority (lower number) jobs first. Within the same priority, process older jobs first (FIFO).
FOR UPDATE — lock the selected row. No other transaction can modify or claim this row until the current transaction commits or rolls back.
SKIP LOCKED — if the selected row is already locked by another worker, skip it and select the next qualifying row. This is the key to concurrent dequeue without contention.
LIMIT 1 — claim one job at a time. Batch patterns are discussed below.
The UPDATE runs in the same CTE, atomically transitioning the job from pending to running and recording the lock metadata. The RETURNING clause returns the full job row so the worker has the payload to process.
One CTE. One atomic transition. No race conditions.
The worker process loop
Python (asyncpg)
import asyncio
import asyncpg
import json
import signal
import socket
DEQUEUE_SQL = """
WITH claimed AS (
SELECT id
FROM jobs
WHERE queue = $1
AND status = 'pending'
AND scheduled_at <= now()
ORDER BY priority, scheduled_at
FOR UPDATE SKIP LOCKED
LIMIT 1
)
UPDATE jobs
SET
status = 'running',
locked_at = now(),
locked_by = $2,
attempts = attempts + 1,
updated_at = now()
FROM claimed
WHERE jobs.id = claimed.id
RETURNING jobs.*;
"""
COMPLETE_SQL = """
UPDATE jobs
SET status = 'completed', updated_at = now()
WHERE id = $1;
"""
FAIL_SQL = """
UPDATE jobs
SET
status = CASE
WHEN attempts >= max_attempts THEN 'dead'::job_status
ELSE 'pending'::job_status
END,
last_error = $2,
scheduled_at = CASE
WHEN attempts >= max_attempts THEN scheduled_at
ELSE now() + ($3 || ' seconds')::interval
END,
locked_at = NULL,
locked_by = NULL,
updated_at = now()
WHERE id = $1;
"""
class Worker:
def __init__(self, dsn, queue='default'):
self.dsn = dsn
self.queue = queue
self.worker_id = f"{socket.gethostname()}-{asyncio.current_task().get_name()}"
self.running = True
async def run(self):
pool = await asyncpg.create_pool(self.dsn, min_size=1, max_size=2)
# Graceful shutdown on SIGTERM
loop = asyncio.get_event_loop()
loop.add_signal_handler(signal.SIGTERM, self.shutdown)
while self.running:
async with pool.acquire() as conn:
job = await conn.fetchrow(DEQUEUE_SQL, self.queue, self.worker_id)
if job is None:
await asyncio.sleep(1) # No jobs available, wait and retry
continue
try:
await self.process(job)
async with pool.acquire() as conn:
await conn.execute(COMPLETE_SQL, job['id'])
except Exception as e:
backoff = 30 * (2 ** (job['attempts'] - 1))
async with pool.acquire() as conn:
await conn.execute(FAIL_SQL, job['id'], str(e), str(backoff))
await pool.close()
async def process(self, job):
payload = job['payload']
# Process the job based on payload contents
print(f"Processing job {job['id']}: {payload}")
def shutdown(self):
self.running = False Ruby (pg gem)
require 'pg'
require 'json'
require 'socket'
DEQUEUE_SQL = <<~SQL
WITH claimed AS (
SELECT id
FROM jobs
WHERE queue = $1
AND status = 'pending'
AND scheduled_at <= now()
ORDER BY priority, scheduled_at
FOR UPDATE SKIP LOCKED
LIMIT 1
)
UPDATE jobs
SET
status = 'running',
locked_at = now(),
locked_by = $2,
attempts = attempts + 1,
updated_at = now()
FROM claimed
WHERE jobs.id = claimed.id
RETURNING jobs.*;
SQL
class Worker
def initialize(conn_string, queue: 'default')
@conn = PG.connect(conn_string)
@queue = queue
@worker_id = "#{Socket.gethostname}-#{Process.pid}"
@running = true
Signal.trap('TERM') { @running = false }
end
def run
while @running
job = dequeue
if job.nil?
sleep 1
next
end
begin
process(job)
@conn.exec_params(
"UPDATE jobs SET status = 'completed', updated_at = now() WHERE id = $1",
[job['id']]
)
rescue => e
backoff = 30 * (2 ** (job['attempts'].to_i - 1))
@conn.exec_params(
"UPDATE jobs SET status = CASE WHEN attempts >= max_attempts THEN 'dead'::job_status ELSE 'pending'::job_status END, last_error = $2, scheduled_at = CASE WHEN attempts >= max_attempts THEN scheduled_at ELSE now() + ($3 || ' seconds')::interval END, locked_at = NULL, locked_by = NULL, updated_at = now() WHERE id = $1",
[job['id'], e.message, backoff.to_s]
)
end
end
ensure
@conn.close
end
private
def dequeue
result = @conn.exec_params(DEQUEUE_SQL, [@queue, @worker_id])
result.ntuples > 0 ? result[0] : nil
end
def process(job)
payload = JSON.parse(job['payload'])
puts "Processing job #{job['id']}: #{payload}"
end
end A few design decisions in the worker loop worth noting:
Transaction boundaries. The dequeue and status update happen in a single atomic statement (the CTE). The job processing happens outside the transaction. This means that if the worker crashes during processing, the job remains in running status — the stale lock detector handles recovery. The arrangement is deliberate.
Graceful shutdown. The worker traps SIGTERM and sets a flag to stop polling after the current job finishes. This prevents jobs from being interrupted mid-processing during deployments. One does not leave a task half-finished.
Poll interval. When no jobs are available, the worker sleeps for 1 second before checking again. Simple and sufficient for most workloads. For lower latency, use LISTEN/NOTIFY to wake workers immediately when a job is enqueued.
Batch dequeue — claiming multiple jobs
For high-throughput queues, you may wish to claim multiple jobs at once by increasing the LIMIT:
WITH claimed AS (
SELECT id
FROM jobs
WHERE queue = 'default'
AND status = 'pending'
AND scheduled_at <= now()
ORDER BY priority, scheduled_at
FOR UPDATE SKIP LOCKED
LIMIT 10
)
UPDATE jobs
SET
status = 'running',
locked_at = now(),
locked_by = 'worker-1',
attempts = attempts + 1,
updated_at = now()
FROM claimed
WHERE jobs.id = claimed.id
RETURNING jobs.*; When to batch: high-throughput queues where per-job overhead (connection round-trip, transaction commit) matters. Batch dequeue amortizes the overhead across multiple jobs.
The trade-off deserves a candid word: larger batches hold more row locks for longer. If a worker claims 50 jobs and crashes, all 50 remain in running status until the stale lock detector recovers them. Moderation is advised.
Recommended batch sizes:
| Throughput | Batch Size | Rationale |
|---|---|---|
| < 100 jobs/second | 1 | Per-job overhead is negligible |
| 100–1,000 jobs/second | 5–10 | Meaningful overhead reduction |
| 1,000+ jobs/second | 10–50 | Amortizes round-trip cost significantly |
The general pattern: claim a batch in one transaction, process each job individually, mark each completed or failed as it finishes.
Retry logic and failure handling
Jobs fail. This is not a character flaw — it is a fact of distributed systems. The question is how gracefully you recover.
Automatic retries with backoff
When a job fails, the worker resets it to pending with a future scheduled_at so it is not immediately retried:
UPDATE jobs
SET
status = CASE
WHEN attempts >= max_attempts THEN 'dead'::job_status
ELSE 'pending'::job_status
END,
last_error = 'Connection timeout to payment gateway',
scheduled_at = CASE
WHEN attempts >= max_attempts THEN scheduled_at
ELSE now() + make_interval(secs => 30 * power(2, attempts - 1))
END,
locked_at = NULL,
locked_by = NULL,
updated_at = now()
WHERE id = 42; The exponential backoff formula is base_interval * 2^(attempts - 1):
| Attempt | Backoff | Next Retry |
|---|---|---|
| 1 | 30 seconds | 30s after failure |
| 2 | 60 seconds | 1 minute after failure |
| 3 | 2 minutes | 2 minutes after failure |
| 4 | 4 minutes | 4 minutes after failure |
| 5 | 8 minutes | 8 minutes after failure |
Optional jitter prevents thundering herd when many jobs fail at the same time. Add a random 0–30% to the backoff:
-- With jitter
now() + make_interval(secs => 30 * power(2, attempts - 1) * (1 + random() * 0.3)) When attempts >= max_attempts, the job transitions to dead status instead of retrying.
Dead letter handling
Jobs that exceed max_attempts are marked status = 'dead'. They are not deleted and not retried automatically. They are, if you will, set aside for later attention — preserved for inspection, debugging, and manual retry after the root cause is fixed.
View dead jobs:
SELECT id, queue, payload, attempts, last_error, updated_at
FROM jobs
WHERE status = 'dead'
ORDER BY updated_at DESC; Manually retry a dead job (after fixing the root cause):
UPDATE jobs
SET
status = 'pending',
attempts = 0,
last_error = NULL,
scheduled_at = now(),
locked_at = NULL,
locked_by = NULL,
updated_at = now()
WHERE id = 42; Bulk retry all dead jobs from a specific queue:
UPDATE jobs
SET
status = 'pending',
attempts = 0,
last_error = NULL,
scheduled_at = now(),
locked_at = NULL,
locked_by = NULL,
updated_at = now()
WHERE status = 'dead'
AND queue = 'emails'; Cleanup: periodically archive or delete dead jobs older than a retention period:
DELETE FROM jobs
WHERE status = 'dead'
AND updated_at < now() - INTERVAL '30 days'; Stale lock detection
I'm afraid we must discuss what happens when a worker crashes mid-job. The row remains in running status with no one processing it — abandoned, as it were. The stale lock detector finds these orphaned jobs and resets them to pending.
The detection query:
UPDATE jobs
SET
status = 'pending',
locked_at = NULL,
locked_by = NULL,
updated_at = now()
WHERE status = 'running'
AND locked_at < now() - INTERVAL '10 minutes'
RETURNING id, queue, attempts; Run this on a schedule — every 5 minutes via pg_cron, a system cron job, or your application's scheduler:
-- Using pg_cron
SELECT cron.schedule(
'recover-stale-jobs',
'*/5 * * * *',
$$
UPDATE jobs
SET status = 'pending', locked_at = NULL, locked_by = NULL, updated_at = now()
WHERE status = 'running' AND locked_at < now() - INTERVAL '10 minutes'
$$
); The timeout must be longer than your longest-running job. If you have jobs that legitimately take 20 minutes, set the timeout to 30 minutes. A 10-minute timeout is typical for most workloads.
Concurrency patterns
Multiple workers, same queue
SKIP LOCKED guarantees that no two workers claim the same job. You scale horizontally by running more worker processes — each connects to the same database, runs the same dequeue query, and processes different jobs.
No coordinator is needed. PostgreSQL is the coordinator, via row-level locking. Five workers polling the same queue will each claim different jobs and process them in parallel. The database manages the coordination so your application does not have to.
Multiple queues, dedicated workers
Named queues allow different workers to process different job types:
-- Enqueue to specific queues
INSERT INTO jobs (queue, payload) VALUES ('emails', '{"to": "alice@example.com", "template": "welcome"}');
INSERT INTO jobs (queue, payload) VALUES ('reports', '{"report_id": 42, "format": "pdf"}');
INSERT INTO jobs (queue, payload, priority) VALUES ('default', '{"task": "cleanup"}', 10); Dedicated workers process their assigned queue:
# Email worker — processes only email jobs
email_worker = Worker(dsn, queue='emails')
# Report worker — processes only report jobs
report_worker = Worker(dsn, queue='reports') Mixed workers poll multiple queues in priority order:
async def poll_queues(self, conn):
# Try high-priority queue first
job = await conn.fetchrow(DEQUEUE_SQL, 'emails', self.worker_id)
if job:
return job
# Fall back to default queue
return await conn.fetchrow(DEQUEUE_SQL, 'default', self.worker_id) Unique jobs — preventing duplicates
A matter that deserves attention: enqueuing the same job twice before the first one completes. Two API requests both enqueue a "send welcome email to user 42" job, and the guest receives two welcome emails. This is not the kind of hospitality we aim for.
Solution 1: partial unique index
CREATE UNIQUE INDEX idx_jobs_unique_pending
ON jobs (queue, (payload->>'job_key'))
WHERE status IN ('pending', 'running'); This prevents duplicate active jobs with the same job_key. Completed and dead jobs do not block new enqueues.
Solution 2: advisory locks
-- Before enqueuing, try to acquire an advisory lock
SELECT pg_try_advisory_lock(hashtext('send-welcome-42'));
-- If true, enqueue the job
-- If false, the job already exists Solution 3: INSERT ... ON CONFLICT DO NOTHING
INSERT INTO jobs (queue, payload)
VALUES ('emails', '{"job_key": "send-welcome-42", "user_id": 42}')
ON CONFLICT (queue, (payload->>'job_key'))
WHERE status IN ('pending', 'running')
DO NOTHING; The trade-off is worth noting: exact deduplication requires the application to define what "same job" means. The job_key pattern puts this responsibility on the enqueuer — include a deduplication key in the payload, and PostgreSQL handles the rest.
Performance characteristics
Throughput. A single PostgreSQL instance with SKIP LOCKED comfortably handles 1,000–5,000 jobs per second with a single dequeue worker, depending on job complexity and hardware. With multiple workers, throughput scales linearly until the dequeue query itself becomes the bottleneck.
For comparison, Redis with BRPOP handles tens of thousands of dequeues per second. The gap is real — but for most applications, it is also irrelevant. If your queue processes 100 jobs per second, both solutions are orders of magnitude faster than needed.
Partial index size. The partial index on WHERE status = 'pending' stays small regardless of total table size. A table with 50 million completed jobs and 200 pending jobs has a dequeue index of a few kilobytes. This is the design decision that keeps the queue fast long-term.
Table maintenance. Completed jobs accumulate. While the partial index keeps dequeue fast, the growing table affects autovacuum work and backup size. Archive or delete completed jobs periodically:
-- Delete completed jobs older than 7 days
DELETE FROM jobs
WHERE status = 'completed'
AND updated_at < now() - INTERVAL '7 days';
-- Or move to an archive table
INSERT INTO jobs_archive
SELECT * FROM jobs
WHERE status = 'completed'
AND updated_at < now() - INTERVAL '7 days';
DELETE FROM jobs
WHERE status = 'completed'
AND updated_at < now() - INTERVAL '7 days'; Connection overhead. Each worker needs a database connection. If you run 20 workers, that is 20 connections to PostgreSQL. Use connection pooling (PgBouncer, pgcat) when running many workers to stay within PostgreSQL's connection limits.
Scaling ceiling. At very high throughput (10,000+ jobs per second sustained), the dequeue query — even with SKIP LOCKED and a partial index — can become the bottleneck. At that point, consider table partitioning by queue name or time range, or migrate the highest-volume queues to Redis while keeping lower-volume queues in PostgreSQL. There is no shame in using the right tool for each job. That is, in fact, the entire philosophy.
Comparison with Redis-backed queues
Allow me to present a frank comparison. Both approaches have genuine strengths, and I would be a poor guide if I obscured either side.
| Factor | PostgreSQL SKIP LOCKED | Redis + Sidekiq/Bull/Celery |
|---|---|---|
| Throughput ceiling | ~5,000 jobs/sec per queue | ~50,000+ jobs/sec |
| Transactional enqueue | Yes — enqueue + business data in same COMMIT | No — separate system, eventual consistency |
| Infrastructure | Uses existing PostgreSQL | Requires Redis instance |
| Persistence | Full ACID durability | Configurable (AOF, RDB snapshots) |
| Monitoring tools | SQL queries, pg_stat_activity | Mature dashboards (Sidekiq Web, Bull Board) |
| Retry/dead letter | Built-in (this article) | Built-in (mature, feature-rich) |
| Scheduling | scheduled_at column | Native scheduled jobs (Sidekiq, delayed jobs) |
| Priority queues | priority column + ORDER BY | Separate queues with priority weighting |
PostgreSQL's key advantage: transactional enqueue. When your application creates an order and enqueues a "send confirmation email" job in the same transaction, both succeed or both fail. With Redis, the order could be created in PostgreSQL while the Redis enqueue fails (or vice versa), requiring compensation logic. This is not a minor distinction — it is the difference between guaranteed consistency and eventual hope.
Redis's key advantage: throughput and ecosystem. Redis is excellent software, purpose-built for this workload. Libraries like Sidekiq, Bull, and Celery provide mature UIs, monitoring, rate limiting, batch operations, and workflow orchestration that would be significant effort to build on top of PostgreSQL. These are genuine capabilities that deserve respect.
The decision framework:
- If your job volume is under 5,000 per second and you already have PostgreSQL, start with the SKIP LOCKED approach. It is simpler to operate and provides transactional guarantees.
- If you need 50,000+ jobs per second, advanced routing, rate limiting, or workflow orchestration, Redis is the right tool. I would recommend it without hesitation.
- Hybrid approach: start with PostgreSQL. If specific queues outgrow it, migrate those queues to Redis while keeping lower-volume queues in PostgreSQL. The You Don't Need Redis chapter covers this migration path in detail.
What Gold Lapel sees in queue workloads
Job queue workloads have a distinctive query profile: the same dequeue query executes thousands of times per hour with identical structure but varying parameters. The partial index on WHERE status = 'pending' is critical to performance, and its effectiveness depends on the index remaining efficient as the table grows.
Gold Lapel observes this dequeue pattern through its proxy and ensures the SKIP LOCKED query consistently uses the optimal index access path. As the table grows and statistics shift, the planner might occasionally choose a different plan — Gold Lapel detects this and acts on it.
Queue workloads are predictable and repetitive — exactly the workload profile where a query-aware proxy adds the most value. The queue you build with this guide works independently. Gold Lapel attends to its performance as your table grows and your data distribution changes.