The Ruby pg Gem: Async Queries, Fiber Scheduling, and Connection Pooling Without an ORM

Q: Is the pg gem thread-safe?

A single PG::Connection object is NOT thread-safe — concurrent calls on the same connection will corrupt the protocol state. However, separate PG::Connection objects on separate threads are perfectly safe. The connection_pool gem solves this by giving each thread its own connection. Never share a connection across threads without a pool.

Q: Can I use prepared statements with PgBouncer in transaction mode?

With PgBouncer 1.21 and later, yes — set max_prepared_statements to a value greater than 0 and PgBouncer will track and re-prepare statements on new backends automatically. With older versions, use exec_params instead (parameterized but not server-side prepared) or prepare and deallocate within each transaction.

Q: How do I handle connection drops in a long-running process?

Call conn.reset to reconnect using the original connection parameters. For connection pools, add a health check that runs SELECT 1 on checkout and calls reset if it fails. The connection_pool gem does not include built-in health checks, so you need a thin wrapper — see the health check example in this article.

Q: Should I use Fibers or threads for concurrent database queries?

For I/O-bound workloads (waiting on PostgreSQL), Fibers use far less memory (~2KB vs ~1MB per thread) and avoid GVL contention. For CPU-bound workloads between queries (JSON parsing, encryption), threads are better because Fibers are cooperative and do not yield during computation. Most database-heavy applications are I/O-bound, making Fibers the better default.

Q: How many connections should my pool have?

Match pool size to the number of threads that will use the database concurrently. For Puma: threads per worker. For Sidekiq: the concurrency setting. For Fibers, the pool can be smaller than the Fiber count because Fibers yield while waiting — a pool of 10 can serve 100 Fibers. The total across all processes must stay under PostgreSQL max_connections (default 100).

Q: What is the difference between exec_params and exec_prepared?

Both use the extended query protocol and send parameters separately from SQL (preventing injection). exec_params parses and plans the query every time. exec_prepared references a previously prepared statement, skipping parse and plan. Use exec_params for one-off queries and exec_prepared for hot-path queries that run thousands of times with different parameters. The throughput difference is 20-40% for typical OLTP queries.

Q: Can I use the pg gem with Rails alongside ActiveRecord?

Yes. ActiveRecord uses the pg gem internally — you can access the raw connection with ActiveRecord::Base.connection.raw_connection, which returns a PG::Connection object. For performance-critical paths, use this raw connection with exec_params or exec_prepared. For new projects without Rails, the pg gem with connection_pool gives you everything you need without the ActiveRecord dependency.

ActiveRecord is a dependency of the pg gem. Not the other way around. A distinction worth appreciating.

The Waiter of Gold Lapel · Updated Mar 20, 2026 Published Mar 5, 2026 · 38 min read

The illustrator proposed an ORM being gently escorted from the premises. We felt it sent the wrong message.

Good evening. I understand you prefer to speak to the database directly.

There is a quiet cohort of Ruby developers who use the pg gem without ActiveRecord. Without Sequel. Without any ORM at all. They write parameterized SQL, manage their own connections, and handle result sets as arrays of hashes. They work on CLI tools, data pipelines, Sidekiq workers, microservices, and ETL scripts where an ORM would add ceremony without value.

These developers are underserved by the existing documentation. The pg gem's own README is thorough but terse. Every tutorial assumes you want Rails. Every "PostgreSQL with Ruby" article starts with rails new. The pg gem ships 60 million downloads on RubyGems and is a dependency of both ActiveRecord and Sequel, yet almost no long-form content exists for using it on its own.

This is that content.

We will cover the pg gem's full surface area: connections and type mapping, prepared statements and their plan cache behavior, Ruby 3.0+ Fiber.scheduler integration for async I/O, connection pooling with the connection_pool gem, the COPY protocol for bulk data loading, LISTEN/NOTIFY for real-time events, PgBouncer compatibility, error handling with retry patterns, and production architectures for Puma and Sidekiq. Real code throughout. Honest assessments of when each technique earns its keep and when it does not. No Rails required.

If you will permit me, I should like to begin at the beginning.

The fundamentals: connections, queries, and type mapping

The pg gem wraps libpq, PostgreSQL's C client library. Every PG.connect call establishes a real TCP connection to PostgreSQL — no pooling, no abstraction. One Ruby object, one database connection, one server process on the PostgreSQL side.

This directness is the entire point. There is no query builder interpreting your intent. No model layer instantiating objects on your behalf. No connection manager wrapping your connections in abstractions you did not request. You write SQL. PostgreSQL executes it. The result arrives as arrays of hashes. If you find this arrangement appealing, you are in the right place.

Basic pg gem usage

require "pg"

# Connect directly — no ORM, no abstraction layer
conn = PG.connect(dbname: "myapp", host: "localhost", port: 5432)

# Simple query — returns a PG::Result object
result = conn.exec("SELECT id, email, created_at FROM users WHERE active = true")

result.each do |row|
  puts "#{row['id']}: #{row['email']} (joined #{row['created_at']})"
end

# Parameterized query — $1, $2 placeholders, safe from injection
result = conn.exec_params(
  "SELECT * FROM orders WHERE customer_id = $1 AND status = $2",
  [42, "shipped"]
)

# Type casting — pg returns strings by default
# Use type_map to get native Ruby types automatically
conn.type_map_for_results = PG::BasicTypeMapForResults.new(conn)
result = conn.exec("SELECT id, total, created_at FROM orders LIMIT 5")
result.first["id"]         # => 1 (Integer, not "1")
result.first["total"]      # => 149.99 (BigDecimal)
result.first["created_at"] # => 2025-03-05 14:22:01 +0000 (Time)

conn.close

Three things worth noting immediately.

Always use exec_params for user input. The $1, $2 placeholders send parameters separately from the SQL text, which makes SQL injection structurally impossible — not just escaped, but impossible at the protocol level. The pg gem uses PostgreSQL's extended query protocol for this, the same mechanism that prepared statements use. I encounter applications that still build SQL strings with interpolation. I find this practice — in 2026 — to be the engineering equivalent of storing the household silver in an unlocked shed.

Enable type mapping early. By default, the pg gem returns every value as a string. Setting type_map_for_results to PG::BasicTypeMapForResults gives you native Ruby types — integers, floats, Time objects, booleans — without manual casting. The overhead is negligible and the ergonomics are vastly improved.

Type mapping in detail

require "pg"

conn = PG.connect(dbname: "myapp")

# Without type mapping — everything is a string:
row = conn.exec("SELECT 42 AS num, true AS flag, now() AS ts").first
row["num"]   # => "42"    (String)
row["flag"]  # => "t"     (String)
row["ts"]    # => "2026-03-15 14:22:01.123456+00" (String)

# BasicTypeMapForResults — automatic type coercion:
conn.type_map_for_results = PG::BasicTypeMapForResults.new(conn)
row = conn.exec("SELECT 42 AS num, true AS flag, now() AS ts").first
row["num"]   # => 42      (Integer)
row["flag"]  # => true    (TrueClass)
row["ts"]    # => 2026-03-15 14:22:01 +0000 (Time)

# BasicTypeMapForQueries — auto-encode Ruby types to PG params:
conn.type_map_for_queries = PG::BasicTypeMapForQueries.new(conn)
# Now you can pass Ruby types directly without manual casting:
conn.exec_params("INSERT INTO events (data, active, ts) VALUES ($1, $2, $3)",
  ['{"action":"click"}', true, Time.now])

# Custom type mapping for specific columns:
# If you need BigDecimal for monetary values instead of Float:
require "bigdecimal"
tm = PG::BasicTypeMapForResults.new(conn)
conn.type_map_for_results = tm
# numeric/decimal columns automatically map to BigDecimal
row = conn.exec("SELECT 149.99::numeric AS price").first
row["price"]  # => 0.14999e3 (BigDecimal — exact, not floating-point)

conn.close

Close your connections. Every PG.connect allocates a PostgreSQL backend process. Forgetting to close means that process lingers until PostgreSQL's idle timeout kills it, consuming one of your max_connections slots the entire time. In scripts that run and exit, Ruby's garbage collector will eventually close the connection. In long-running processes, "eventually" is not good enough. I find this behavior — creating connections without a mechanism for closing them — to be the infrastructural equivalent of opening every window in the house during winter and then complaining about the heating bill.

Connection parameters and production configuration

The pg gem accepts connection parameters as a hash, a URI string, or via libpq environment variables. In production, the environment variable approach (PGHOST, PGDATABASE, etc.) is often cleanest — it keeps credentials out of code and works identically across Ruby, Python, Go, and any other language that uses libpq or its conventions.

Connection parameters

require "pg"

# Connection parameters — every way to specify them
# 1. Hash (most common in Ruby code):
conn = PG.connect(
  host:     "db.example.com",
  port:     5432,
  dbname:   "myapp",
  user:     "app_user",
  password: ENV["PGPASSWORD"],
  connect_timeout: 5,
  sslmode:  "require",
)

# 2. Connection string (URI format):
conn = PG.connect("postgresql://app_user:secret@db.example.com:5432/myapp?sslmode=require")

# 3. Environment variables (libpq convention — no args needed):
#    PGHOST=db.example.com PGPORT=5432 PGDATABASE=myapp PGUSER=app_user
conn = PG.connect  # libpq reads PGHOST, PGPORT, etc. automatically

# Production connection with all the trimmings:
conn = PG.connect(
  host:              ENV["PGHOST"],
  port:              ENV.fetch("PGPORT", 5432),
  dbname:            ENV["PGDATABASE"],
  user:              ENV["PGUSER"],
  password:          ENV["PGPASSWORD"],
  connect_timeout:   5,
  sslmode:           "verify-full",
  sslrootcert:       "/etc/ssl/certs/rds-combined-ca-bundle.pem",
  application_name:  "order-processor",
  options:           "-c statement_timeout=30000",
)

# application_name appears in pg_stat_activity —
# invaluable when diagnosing "who is holding that lock?"
conn.exec("SELECT application_name FROM pg_stat_activity WHERE pid = pg_backend_pid()").first
# => {"application_name" => "order-processor"}

conn.close

The application_name parameter deserves particular attention. It appears in pg_stat_activity, which means when you are at 3am trying to determine which process is holding a lock on the orders table, you will see order-processor instead of an anonymous connection. Name your connections. Your future self will thank you.

Prepared statements: the 15-30% you are leaving on the table

When PostgreSQL receives a query, it goes through five stages: parse, rewrite, plan, optimize, execute. For a query you run once, all five stages are necessary. For a query you run ten thousand times with different parameters, the first four stages produce the same result every time. Prepared statements let you skip them.

require "pg"

conn = PG.connect(dbname: "myapp")
conn.type_map_for_results = PG::BasicTypeMapForResults.new(conn)

# Prepare once — PostgreSQL parses, plans, and caches the statement
conn.prepare("find_user", "SELECT id, email, tier FROM users WHERE id = $1")
conn.prepare("user_orders",
  "SELECT id, total, status FROM orders
   WHERE customer_id = $1
   ORDER BY created_at DESC LIMIT $2")

# Execute many times — skips parse and plan phases
# 15-30% faster for repeated OLTP queries
user = conn.exec_prepared("find_user", [42]).first
orders = conn.exec_prepared("user_orders", [42, 10])

# Prepared statements are bound to the connection.
# If you are pooling connections, each connection needs
# its own PREPARE calls — or use DEALLOCATE ALL on checkout.

# Check what is prepared on the current connection:
conn.exec("SELECT name, statement FROM pg_prepared_statements").each do |row|
  puts "#{row['name']}: #{row['statement']}"
end

The performance gain depends on query complexity. For a simple indexed lookup that executes in under a millisecond, prepared statements save roughly 0.1ms — a meaningful percentage improvement when multiplied across thousands of requests per minute. For a query with multiple joins where planning takes 2ms out of a 5ms total, the savings approach 40%.

Prepared vs unprepared benchmarks

# Benchmark: prepared vs unprepared on a simple indexed lookup
# Table: users (1M rows), query: SELECT * FROM users WHERE id = $1
# PostgreSQL 16, Ruby 3.3, local socket connection
#
# Method                  Median latency    Throughput (qps)
# ─────────────────────────────────────────────────────────────
# exec_params (unprepared)    0.38ms          2,400
# exec_prepared               0.29ms          3,200
# exec (string interpolation) 0.36ms          2,550  ← DON'T DO THIS
#
# Improvement: ~24% lower latency, ~33% higher throughput.
#
# The gap widens with complex queries:
# 3-table JOIN with WHERE clause:
# exec_params (unprepared)    1.82ms            520
# exec_prepared               1.14ms            840
# Improvement: ~37% lower latency, ~62% higher throughput.
#
# For queries that already take 50ms+, the planning overhead
# is < 1% of total time. Prepare your hot path, not everything.

For queries that already take 200ms because they scan millions of rows, the planning overhead is noise. Preparing them changes nothing the user can feel. The rule is straightforward: prepare your hot-path queries. The ones that execute thousands of times per minute with different parameters. Leave infrequent queries unprepared — the overhead of managing named statements is not worth it for a query that runs once per hour.

Generic plans and the plan cache

One important constraint: prepared statements are bound to the connection that created them. If you are pooling connections (and you should be), each connection in the pool needs its own prepare calls. The cleanest pattern is to prepare statements in the pool's connection factory, as shown in the pooling section below.

But there is a subtlety that most tutorials skip. PostgreSQL does not immediately use a generic (parameter-agnostic) plan for prepared statements. For the first five executions, it generates a custom plan using the actual parameter values for cost estimation. Only after five executions, if the generic plan consistently performs as well as custom plans, does PostgreSQL switch to the generic plan permanently.

Generic vs custom plan behavior

require "pg"

conn = PG.connect(dbname: "myapp")

# PostgreSQL 12+ chooses between custom and generic plans.
# After 5 executions, if the generic plan is consistently
# as good as or better than custom plans, PostgreSQL switches
# to the generic plan — skipping the planner entirely.

conn.prepare("status_count",
  "SELECT status, count(*) FROM orders
   WHERE created_at > $1 GROUP BY status")

# First 5 calls: PostgreSQL generates custom plans,
# using the actual parameter value for cost estimation.
5.times do
  conn.exec_prepared("status_count", [Time.now - 86400])
end

# From call 6 onward: if the generic plan cost is within
# tolerance of the average custom plan cost, PostgreSQL
# uses the generic plan. No more planning at all.

# You can check which mode is active:
result = conn.exec("SELECT * FROM pg_prepared_statements WHERE name = 'status_count'")
puts result.first["generic_plans"]  # number of times generic plan was used
puts result.first["custom_plans"]   # number of times custom plan was used

# Force generic plan (PostgreSQL 16+):
conn.prepare("forced_generic",
  "SELECT * FROM orders WHERE id = $1")
conn.exec("SET plan_cache_mode = 'force_generic_plan'")

# Force custom plan (useful when parameter distribution is highly skewed):
conn.exec("SET plan_cache_mode = 'force_custom_plan'")

conn.close

This matters when your data distribution is highly skewed. If 90% of queries pass status = 'active' and 10% pass status = 'archived', the optimal query plan for each value might be different — an index scan for the rare value, a sequential scan for the common one. The generic plan cannot adapt. PostgreSQL 16 introduced plan_cache_mode to give you explicit control, which is welcome when you understand the trade-off and unwelcome when you do not.

I should note: this is one area where ActiveRecord's approach of using exec_params everywhere (parameterized but not prepared) avoids a genuine footgun. Sometimes the ORM's conservative defaults exist for a reason. The Waiter acknowledges this without enthusiasm.

Ruby 3.0+ Fiber.scheduler: async without the ceremony

Ruby 3.0 introduced Fiber.scheduler, a hook that allows gems to perform non-blocking I/O without threads. The pg gem has supported this since version 1.1. When a Fiber.scheduler is installed, calls like conn.exec yield the current Fiber during I/O instead of blocking the thread. Other Fibers can run while PostgreSQL processes the query.

This is not theoretical. It works today, it is stable, and it eliminates the choice between "blocking threads" and "callback spaghetti" that plagued Ruby's async story for a decade.

Concurrent queries with Fiber.scheduler

# Ruby 3.0+ Fiber.scheduler integration
# The pg gem (>= 1.1) supports non-blocking I/O via Fiber.scheduler.
# This means async queries without threads — cooperative concurrency.

require "pg"
require "async"       # async gem provides a Fiber.scheduler
require "async/barrier"

# Async block installs a Fiber.scheduler automatically
Async do |task|
  conn1 = PG.connect(dbname: "myapp")
  conn2 = PG.connect(dbname: "myapp")

  barrier = Async::Barrier.new

  # These two queries run concurrently on separate connections.
  # The Fiber.scheduler handles the I/O multiplexing —
  # when conn1 is waiting for PostgreSQL, conn2 can send/receive.
  barrier.async do
    result = conn1.exec("SELECT count(*) FROM orders WHERE status = 'pending'")
    puts "Pending: #{result.first['count']}"
  end

  barrier.async do
    result = conn2.exec("SELECT sum(total) FROM orders WHERE status = 'shipped'")
    puts "Revenue: #{result.first['sum']}"
  end

  barrier.wait
  conn1.close
  conn2.close
end

# Wall clock time: ~max(query1, query2) instead of query1 + query2.
# No threads. No callbacks. No event loop boilerplate.
# Just Fibers yielding during I/O.

The async gem by Samuel Williams provides the Fiber.scheduler implementation. When you wrap code in an Async block, every I/O operation — network reads, socket writes, DNS lookups — becomes Fiber-aware. The pg gem detects the scheduler and uses non-blocking mode automatically.

What makes this remarkable is the absence of syntax changes. Your code looks synchronous. There are no async/await keywords, no promises, no callbacks. The Fiber scheduler handles the concurrency at the I/O layer, invisible to your application logic. If you have written Python's asyncio or JavaScript's async/await and wished it were simpler, this is what simpler looks like.

How it works under the hood

Fiber.scheduler internals

# How the pg gem integrates with Fiber.scheduler:
#
# 1. When you call conn.exec(), the pg gem sends the query
# 2. Instead of blocking the thread while waiting for the response,
#    it calls Fiber.scheduler.io_wait on the socket file descriptor
# 3. The scheduler suspends the current Fiber and resumes another
#    Fiber that has data ready to process
# 4. When PostgreSQL's response arrives, the scheduler resumes
#    the original Fiber with the result
#
# This is transparent — your code looks synchronous.
# No async/await keywords. No promises. No callbacks.
# The pg gem detects the scheduler automatically.

# Verify async mode is active:
require "pg"
require "async"

Async do
  conn = PG.connect(dbname: "myapp")

  # setnonblocking enables non-blocking I/O on the connection
  conn.setnonblocking(true)

  # send_query + get_result is the low-level async API
  conn.send_query("SELECT pg_sleep(2)")

  # In a Fiber.scheduler context, get_result yields the Fiber
  # instead of blocking the thread. Other Fibers run while we wait.
  result = conn.get_result
  conn.get_result  # drain the nil terminator

  puts result.first
  conn.close
end

When the pg gem calls socket.wait_readable in a Fiber.scheduler context, the scheduler adds the socket's file descriptor to an event loop (typically io_uring on Linux or kqueue on macOS) and suspends the Fiber. When data arrives on the socket, the event loop wakes the scheduler, which resumes the Fiber. From the pg gem's perspective, the read call blocked and returned data. From the scheduler's perspective, the thread ran other Fibers while waiting.

One thread. Many Fibers. Each Fiber appears to make blocking calls. None of them actually block. Quite.

Performance: Fibers vs threads vs sequential

Concurrency benchmarks

# Benchmark: sequential vs Fiber-concurrent queries
# 10 queries, each taking ~50ms (pg_sleep(0.05))
# Ruby 3.3, async gem 2.8, pg gem 1.5
#
# Mode                     Wall clock    Queries/sec
# ───────────────────────────────────────────────────
# Sequential (1 conn)        512ms          19.5
# Fiber concurrent (10 conn) 54ms          185.2
# Threaded (10 threads)      56ms          178.6
#
# Near-identical wall clock with Fibers and threads.
# But Fibers use ~2KB of stack each vs ~1MB per thread.
# At 1,000 concurrent operations:
#   Fibers: ~2MB total memory
#   Threads: ~1GB total memory + GVL contention
#
# For I/O-bound workloads, Fibers win on memory.
# For CPU-bound workloads, threads win on throughput
# (Ractors or multi-process for true parallelism).

The numbers tell a clear story: for I/O-bound database work, Fibers match thread throughput at a fraction of the memory cost. At 1,000 concurrent operations, the difference is not marginal — it is 2MB versus 1GB. This matters on application servers running multiple worker processes, where memory is the bottleneck long before CPU.

When Fiber.scheduler does not help — an honest accounting

I should be forthcoming about the limitations, because pretending they do not exist would be a disservice to you and an embarrassment to me.

Fiber.scheduler limitations

# When Fiber.scheduler does NOT help:
#
# 1. CPU-bound work between queries
#    Fibers are cooperative — they only yield at I/O boundaries.
#    If you parse a 50MB JSON blob between queries, all other
#    Fibers on that thread are blocked for the duration.
#
# 2. Gems that use C extensions with blocking I/O
#    The C extension must explicitly support Fiber.scheduler.
#    The pg gem does. Most HTTP client gems do. But if a gem
#    calls read() in C without going through rb_io_wait,
#    it blocks the entire thread.
#
# 3. Connection-level state conflicts
#    Each Fiber needs its own PG::Connection. You cannot share
#    a single connection across Fibers — a connection can only
#    process one query at a time. The connection_pool gem
#    solves this, as shown below.
#
# 4. PgBouncer in transaction mode
#    Async queries from multiple Fibers hitting PgBouncer in
#    transaction mode work fine — each checkout gets its own
#    backend. But LISTEN/NOTIFY and prepared statements need
#    session mode. More on this in the PgBouncer section.

Fiber-based concurrency is cooperative, not preemptive. A Fiber that performs a CPU-intensive computation without yielding will block all other Fibers on that thread. Database I/O yields naturally — waiting for PostgreSQL is waiting for a socket. JSON parsing, encryption, image processing — these do not yield. If your pipeline mixes heavy computation with database queries, threads may serve you better.

There is also the question of ecosystem maturity. The async gem and Fiber.scheduler are stable and well-tested. But not every gem in your Gemfile supports Fiber.scheduler. A gem that makes a blocking HTTP call in a C extension will block the entire thread, negating the benefit of Fibers for all other operations on that thread. Before committing to a Fiber-based architecture, audit your dependencies. The pg gem cooperates. The async-http gem cooperates. Your custom metrics reporter from 2019 probably does not.

When async pg IS worth the complexity

I have been cautious to this point. Allow me now to be direct about where Fiber-based async genuinely excels.

Fan-out aggregation. An API endpoint that queries five tables and combines the results. Sequential: 5 x 50ms = 250ms response time. With five Fibers: 50ms response time. The user does not know or care that you used Fibers. They care that the page loaded in 50ms instead of 250ms.

Data pipelines with external API calls. An ETL process that reads from PostgreSQL, enriches via an HTTP API, and writes back. The HTTP calls take 200ms each. With Fibers, you can have 50 enrichment calls in flight while other Fibers read and write to PostgreSQL. The pipeline throughput multiplies without threading complexity.

High-concurrency Sinatra/Roda applications on Falcon. The Falcon web server uses Fiber.scheduler natively. A single Falcon process can handle hundreds of concurrent requests, each with its own database connection from a pool, using a single thread. This is the architecture that makes Ruby competitive with Node.js on connection count without the callback-driven development model.

Connection pooling without ActiveRecord

If your application is multi-threaded — a Puma web server, Sidekiq workers, a parallel data pipeline — every thread needs its own database connection. Creating a fresh connection per operation is ruinous: each PG.connect involves a TCP handshake, TLS negotiation, and PostgreSQL process creation. That is 50-200ms of latency before a single query runs.

The connection_pool gem by Mike Perham (also the author of Sidekiq) solves this cleanly. It maintains a fixed-size pool of reusable connections, checks them out to threads on demand, and returns them when the block exits.

Thread-safe connection pool with connection_pool gem

require "pg"
require "connection_pool"  # gem install connection_pool

# Build a pool of PG connections
POOL = ConnectionPool.new(size: 10, timeout: 5) do
  conn = PG.connect(
    dbname: "myapp",
    host: "localhost",
    port: 5432,
    connect_timeout: 3,
  )
  conn.type_map_for_results = PG::BasicTypeMapForResults.new(conn)

  # Prepare hot-path statements on each fresh connection
  conn.prepare("find_user",
    "SELECT id, email, tier FROM users WHERE id = $1")
  conn.prepare("recent_orders",
    "SELECT id, total, created_at FROM orders
     WHERE customer_id = $1
     ORDER BY created_at DESC LIMIT $2")

  conn
end

# Use a connection — automatically checked out and returned
POOL.with do |conn|
  user = conn.exec_prepared("find_user", [42]).first
  orders = conn.exec_prepared("recent_orders", [42, 20])
  # conn is returned to the pool when the block exits
end

# Thread-safe. Works with Puma, Sidekiq, or any threaded runtime.
# The pool blocks (up to timeout) if all connections are in use.

# Pool sizing rule of thumb:
# - Web server: connections = Puma threads per worker
# - Sidekiq: connections = concurrency setting
# - CLI tool: connections = thread count (often just 1)

The pool factory block — the block passed to ConnectionPool.new — runs once per connection. This is where you set up type mapping and prepare statements. Every connection in the pool gets the same configuration, and prepared statements are ready to use immediately on checkout.

Pool sizing: the formula and the exceptions

Pool sizing deserves more than a rule of thumb. It deserves a formula, an explanation, and a table.

Runtime	Formula	Example	Total connections
Puma (web)	threads_per_worker	5 threads = 5 connections per worker	4 workers x 5 = 20
Sidekiq	concurrency	concurrency: 25 = 25 connections	2 processes x 25 = 50
Falcon (async)	1 per Fiber in flight	100 Fibers, pool of 10	10 (Fibers share pool)
CLI / script	1	Single-threaded = 1 connection	1
Rake task	thread count	Parallel.map(4) = 4 connections	4

The critical constraint: total connections across all processes must stay under PostgreSQL's max_connections, which defaults to 100 and should rarely exceed a few hundred. Four Puma workers at 5 connections each, plus two Sidekiq processes at 25 each, plus a Rake task — that is 75 connections before anyone opens a rails console. This arithmetic is why external connection poolers like PgBouncer exist, and why they appear later in this article.

Pool sizing formulas, overflow strategies, and the relationship between pool size and query latency are all addressed in the connection pooling guide. It goes into the detail this topic deserves.

Health checks and reconnection

The connection_pool gem does not include health checks. It trusts that the connection you created is still alive when you check it out. In development, this trust is well-placed. In production, where connections drop during PostgreSQL maintenance windows, network partitions, and the occasional cloud provider "networking event," this trust is misplaced.

Connection pool with health checks

require "pg"
require "connection_pool"

# Production-grade pool with health checks and reconnection
POOL = ConnectionPool.new(size: 10, timeout: 5) do
  conn = PG.connect(
    host:     ENV["PGHOST"],
    dbname:   ENV["PGDATABASE"],
    user:     ENV["PGUSER"],
    password: ENV["PGPASSWORD"],
  )
  conn.type_map_for_results = PG::BasicTypeMapForResults.new(conn)
  conn
end

# Wrapper that validates connections before use
def with_pg(&block)
  POOL.with do |conn|
    # Quick health check — catches stale connections
    begin
      conn.exec("SELECT 1")
    rescue PG::ConnectionBad, PG::UnableToSend
      conn.reset  # reconnect using original parameters
      conn.type_map_for_results = PG::BasicTypeMapForResults.new(conn)
    end
    block.call(conn)
  end
end

# Usage:
with_pg do |conn|
  conn.exec_params("SELECT * FROM users WHERE id = $1", [42])
end

# The health check adds ~0.1ms per checkout.
# Worth it in production where connections drop overnight,
# PostgreSQL restarts during maintenance windows,
# and network blips happen at 3am.

The conn.reset method deserves special mention. When a connection drops, reset attempts to re-establish it using the original connection parameters. This is cheaper than creating a new PG::Connection because it reuses the existing Ruby object and its configuration. The health check adds approximately 0.1ms per checkout — a cost I consider entirely reasonable in exchange for not waking up to a production incident caused by stale connections at 3am.

Fibers and pools together: cooperative concurrency at scale

The connection_pool gem and Fiber.scheduler compose elegantly. When a Fiber tries to check out a connection and none are available, it yields — allowing other Fibers to make progress — instead of blocking the thread. When a query is in flight, the Fiber yields during I/O. The result is many concurrent operations sharing a small number of connections without any thread overhead.

Fibers + connection pool

# connection_pool + Fiber.scheduler — async pool checkout
require "pg"
require "connection_pool"
require "async"
require "async/barrier"

POOL = ConnectionPool.new(size: 5, timeout: 5) do
  conn = PG.connect(dbname: "myapp")
  conn.type_map_for_results = PG::BasicTypeMapForResults.new(conn)
  conn
end

Async do
  barrier = Async::Barrier.new

  # Run 20 queries concurrently with only 5 connections.
  # Fibers yield while waiting for both pool checkout AND query I/O.
  20.times do |i|
    barrier.async do
      POOL.with do |conn|
        result = conn.exec_params(
          "SELECT count(*) FROM events WHERE category = $1",
          ["category_#{i}"]
        )
        puts "Category #{i}: #{result.first['count']}"
      end
    end
  end

  barrier.wait
end

# 20 concurrent Fibers, 5 connections, 0 threads beyond the main one.
# The Fiber.scheduler handles both pool contention and I/O waiting.

Twenty concurrent operations. Five database connections. One thread. This pattern is particularly effective for data pipelines that need to query many tables or partitions concurrently, fan-out workers that dispatch queries in parallel, and HTTP endpoints that aggregate data from multiple sources before responding.

Fiber pool performance

# Benchmark: Fibers + pool vs threads + pool
# Workload: 200 queries, each ~5ms, pool size: 10
# Ruby 3.3, pg 1.5, connection_pool 2.4
#
# Mode                      Wall clock  Memory   Context switches
# ──────────────────────────────────────────────────────────────────
# Sequential                   1,024ms   42MB     N/A
# 10 threads + pool             108ms    78MB     ~2,400
# 10 Fibers + pool              106ms    44MB     ~200
# 50 Fibers + pool              104ms    45MB     ~600
# 200 Fibers + pool             103ms    46MB     ~800
#
# Fibers match thread throughput with ~40% less memory.
# At higher concurrency (200 Fibers), memory stays flat.
# With threads, 200 threads would consume ~240MB.
#
# The sweet spot: pool_size = PostgreSQL backends you can afford,
# Fiber count = level of concurrency your workload needs.

The benchmark reveals something instructive: 50 Fibers with a pool of 10 performs nearly identically to 10 Fibers with a pool of 10. The pool is the bottleneck, not the Fiber count. Additional Fibers simply queue for connections, yielding while they wait. This means you can scale the number of concurrent operations freely without worrying about memory — the pool size determines your PostgreSQL connection budget, and Fibers cost almost nothing while waiting.

The sweet spot: set pool size to the number of PostgreSQL backends you can afford (considering max_connections and other consumers), and set Fiber count to whatever your workload needs. The pool manages the contention. The Fibers manage the concurrency. They were made for each other.

PgBouncer and the pg gem: what works, what breaks, and how to navigate

PgBouncer is the most widely deployed PostgreSQL connection pooler. It sits between your application and PostgreSQL, multiplexing many application connections across a smaller number of PostgreSQL backend connections. For pg gem users, PgBouncer is both useful and treacherous — useful because it solves the max_connections problem, treacherous because its pooling modes interact with pg gem features in ways that are not always obvious.

pg gem through PgBouncer

# Connecting the pg gem through PgBouncer
require "pg"

# PgBouncer listens on port 6432 (convention), proxies to PostgreSQL.
# From the pg gem's perspective, PgBouncer IS PostgreSQL.
conn = PG.connect(
  host: "localhost",
  port: 6432,          # PgBouncer's port, not PostgreSQL's 5432
  dbname: "myapp",
  user: "app_user",
  password: "secret",
)

# In transaction mode (PgBouncer default), the backend connection
# is assigned when a transaction starts and released when it ends.
# Between transactions, your pg gem connection has no backend.

# This works fine:
conn.transaction do |tx|
  tx.exec_params("INSERT INTO users (email) VALUES ($1)", ["ada@example.com"])
  tx.exec("SELECT currval('users_id_seq')")
  # Same backend for both queries — transaction mode guarantees this
end

# This breaks in transaction mode:
conn.exec("SET statement_timeout = '5s'")
# Connection is released back to PgBouncer's pool...
conn.exec("SELECT * FROM big_table")
# ...and might be assigned a DIFFERENT backend.
# The SET is gone. statement_timeout is back to default.

conn.close

In PgBouncer's transaction mode — the mode most production deployments use — the backend connection is assigned when a transaction begins and released when it commits. Between transactions, your pg gem connection has no backend. This means any connection-level state set outside a transaction — SET commands, session variables, advisory locks — disappears when the backend is reassigned.

This is the single most common source of confusion for pg gem users behind PgBouncer. The connection looks alive. conn.status returns CONNECTION_OK. But the PostgreSQL backend is different from the one you configured thirty seconds ago.

Prepared statements through PgBouncer

PgBouncer compatibility

# Prepared statements + PgBouncer: the compatibility matrix
#
# PgBouncer mode     Prepared statements   LISTEN/NOTIFY   SET commands
# ──────────────────────────────────────────────────────────────────────
# session            Yes                   Yes             Yes
# transaction        PgBouncer 1.21+*      No              No
# statement          No                    No              No
#
# * PgBouncer 1.21 added protocol-level prepared statement support.
#   With max_prepared_statements > 0, PgBouncer tracks PARSE/BIND/EXECUTE
#   messages and re-prepares statements on new backends automatically.

# For PgBouncer < 1.21 in transaction mode, use DEALLOCATE ALL:
require "pg"
require "connection_pool"

POOL = ConnectionPool.new(size: 10, timeout: 5) do
  conn = PG.connect(host: "localhost", port: 6432, dbname: "myapp")
  conn.type_map_for_results = PG::BasicTypeMapForResults.new(conn)
  # Do NOT prepare statements here — the backend will change
  conn
end

# Option 1: Use exec_params instead of prepared statements.
# It uses the extended query protocol (parameterized, safe)
# but does not create server-side prepared statements.
POOL.with do |conn|
  conn.exec_params("SELECT * FROM users WHERE id = $1", [42])
end

# Option 2: Prepare inside each transaction, deallocate after.
POOL.with do |conn|
  conn.transaction do |tx|
    tx.exec("DEALLOCATE ALL")  # clean slate
    tx.prepare("find_user", "SELECT * FROM users WHERE id = $1")
    result = tx.exec_prepared("find_user", [42])
    # Statement lives only for this transaction
  end
end

If you are running PgBouncer 1.21 or later, prepared statements work in transaction mode with the max_prepared_statements setting. PgBouncer tracks the PARSE, BIND, and EXECUTE messages in the wire protocol and automatically re-prepares statements when it assigns a new backend. This is a significant quality-of-life improvement that arrived in late 2023.

If you are running an older PgBouncer, the pragmatic choice is to use exec_params everywhere. It uses the extended query protocol — parameters are sent separately from SQL, so injection is impossible — but it does not create server-side prepared statements. You lose the 15-30% planning overhead savings, but you gain compatibility with any PgBouncer version in any pooling mode.

I recognize this is a frustrating trade-off. The performance benefit of prepared statements is real. The operational simplicity of PgBouncer in transaction mode is also real. If you cannot upgrade PgBouncer, exec_params is the correct choice. If you can upgrade, upgrade. The prepared statement support in 1.21 is precisely the feature that makes this trade-off disappear.

"PgBouncer has been in production since 2007. In infrastructure years, this makes it approximately ancient — and in infrastructure, ancient is a compliment."
— from You Don't Need Redis, Chapter 17: Sorting Out the Connection Poolers

COPY protocol: when INSERT is not fast enough

The COPY protocol is PostgreSQL's bulk data interface. Where INSERT processes one row at a time through the full SQL parser, planner, and executor pipeline, COPY streams raw data directly into the table's storage layer. The performance difference is not incremental. It is categorical.

COPY IN and COPY OUT

require "pg"

conn = PG.connect(dbname: "myapp")

# === COPY IN: bulk loading data into PostgreSQL ===
# 10-50x faster than individual INSERTs for large datasets.

# From a CSV file:
conn.exec("COPY events (user_id, event_type, payload, created_at)
           FROM STDIN WITH (FORMAT csv, HEADER false)")

File.open("events.csv", "r") do |f|
  while (line = f.gets)
    conn.put_copy_data(line)
  end
end
conn.put_copy_end
result = conn.get_result
puts "Loaded #{result.cmd_tuples} rows"

# From Ruby data — no file needed:
conn.exec("COPY events (user_id, event_type, payload, created_at)
           FROM STDIN WITH (FORMAT csv)")

events = [
  [1, "page_view", '{"url":"/pricing"}', Time.now.utc],
  [2, "signup", '{"plan":"pro"}', Time.now.utc],
  [3, "purchase", '{"amount":99.00}', Time.now.utc],
]

events.each do |row|
  conn.put_copy_data(row.join(",") + "\n")
end
conn.put_copy_end

# === COPY OUT: bulk exporting from PostgreSQL ===
conn.exec("COPY (SELECT id, email, created_at FROM users WHERE tier = 'enterprise')
           TO STDOUT WITH (FORMAT csv, HEADER true)")

File.open("enterprise_users.csv", "w") do |f|
  while (line = conn.get_copy_data)
    f.write(line)
  end
end

conn.close

Benchmark: 100,000 rows

# Benchmark: loading 100,000 rows into PostgreSQL
# Table: events (user_id int, event_type text, payload jsonb, created_at timestamptz)
# PostgreSQL 16, Ruby 3.3, local socket, minimal indexes
#
# Method                    Time        Rows/sec     Round trips
# ──────────────────────────────────────────────────────────────────
# Individual INSERTs        48.2s       2,075        100,000
# Batch INSERT (100/batch)  4.8s        20,833       1,000
# Batch INSERT (1000/batch) 3.1s        32,258       100
# COPY protocol             0.9s        111,111      1
#
# COPY is not an optimization. It is a different protocol.
# PostgreSQL processes COPY data in a streaming pipeline —
# no per-row SQL parsing, no per-row transaction overhead,
# no per-row network round trip. The data flows.
#
# With 3 indexes on the table:
# Individual INSERTs        72.4s       1,381
# COPY protocol             2.1s        47,619
# (Index maintenance is the cost — not the insert path)

111,111 rows per second with COPY versus 2,075 with individual INSERTs. A 53x improvement. Even batched INSERTs, which are themselves a significant optimization over individual ones, manage only 32,258 rows per second — still 3.4x slower than COPY. The round-trip column tells the story: individual INSERTs require 100,000 network round trips. COPY requires one.

The pg gem exposes COPY through put_copy_data and get_copy_data, which map directly to libpq's PQputCopyData and PQgetCopyData functions. The interface is low-level by design. You are responsible for formatting the data (CSV or tab-delimited), handling encoding, and calling put_copy_end to signal completion. In exchange, you get the fastest possible path for moving data in and out of PostgreSQL.

Binary COPY: faster still for numeric data

Binary COPY format

require "pg"

conn = PG.connect(dbname: "myapp")

# Binary COPY — even faster for numeric-heavy data.
# Skips text-to-internal conversion on the PostgreSQL side.

encoder = PG::BinaryEncoder::CopyRow.new
conn.exec("COPY measurements (sensor_id, value, recorded_at)
           FROM STDIN WITH (FORMAT binary)")

100_000.times do |i|
  conn.put_copy_data(
    encoder.encode([i % 1000, rand * 100.0, Time.now]),
  )
end

conn.put_copy_end
result = conn.get_result
puts "Loaded #{result.cmd_tuples} rows (binary)"

# Binary COPY benchmark vs text COPY (100K numeric rows):
#   Text COPY:   0.9s (111,111 rows/sec)
#   Binary COPY: 0.6s (166,667 rows/sec)
# ~50% faster for numeric data. Less benefit for text-heavy rows.

conn.close

Binary COPY skips the text-to-internal conversion on the PostgreSQL side. For numeric-heavy tables — measurements, financial data, time series — this can improve throughput by another 50%. For text-heavy data, the benefit is negligible because text data is already in its internal format.

The pg gem provides PG::BinaryEncoder::CopyRow for encoding rows in PostgreSQL's binary COPY format. This is a niche optimization. Most applications should start with text COPY, which is simpler to debug and already 50x faster than individual INSERTs. Reach for binary COPY only when text COPY is measured to be the bottleneck.

The COPY-then-upsert pattern

COPY does not support ON CONFLICT. If you need upsert semantics — insert new rows and update existing ones — the standard pattern is to COPY into a temporary table and then merge with an INSERT ... ON CONFLICT statement.

COPY into temp table, then upsert

require "pg"

conn = PG.connect(dbname: "myapp")

# COPY into a temp table, then upsert — the ETL pattern.
# Combines COPY speed with conflict handling.

conn.exec("CREATE TEMP TABLE staging_products (LIKE products INCLUDING ALL)")

# Bulk load into staging table via COPY
conn.exec("COPY staging_products (sku, name, price, updated_at)
           FROM STDIN WITH (FORMAT csv)")

products.each do |p|
  conn.put_copy_data("#{p[:sku]},#{p[:name]},#{p[:price]},#{p[:updated_at]}\n")
end
conn.put_copy_end

# Merge into production table with ON CONFLICT
result = conn.exec("
  INSERT INTO products (sku, name, price, updated_at)
  SELECT sku, name, price, updated_at FROM staging_products
  ON CONFLICT (sku) DO UPDATE SET
    name = EXCLUDED.name,
    price = EXCLUDED.price,
    updated_at = EXCLUDED.updated_at
  WHERE products.updated_at < EXCLUDED.updated_at
")

puts "Upserted #{result.cmd_tuples} products"

conn.exec("DROP TABLE staging_products")
conn.close

This two-step pattern gives you COPY speed for the data loading phase and SQL flexibility for the merge phase. The temporary table exists only for the duration of the connection and incurs no WAL overhead. For ETL pipelines that process hundreds of thousands of rows per batch, this pattern is the standard for good reason.

When to use COPY

Initial data loading. Importing CSVs, migrating from another database, seeding large datasets.
ETL pipelines. Loading transformed data into staging tables before merge/upsert.
Log ingestion. High-volume event streams where individual INSERTs cannot keep pace.
Data export. COPY TO STDOUT is faster than SELECT for large result sets because it bypasses the query result protocol overhead.

When not to use COPY: when you need per-row default values for omitted columns, when you need RETURNING clauses, or when your batch size is small enough that batched INSERTs are sufficient (under 1,000 rows, the complexity of COPY rarely justifies itself). COPY fires BEFORE and AFTER statement-level triggers but does not evaluate column defaults for columns present in the COPY column list. For most bulk loading scenarios, this is acceptable.

LISTEN/NOTIFY: real-time events without polling

PostgreSQL has a built-in publish/subscribe system. LISTEN subscribes a connection to a named channel. NOTIFY sends a message (with an optional payload of up to 8,000 bytes) to all connections listening on that channel. Delivery is transactional — notifications sent inside a transaction are only dispatched when the transaction commits. Notifications inside a rolled-back transaction are discarded.

LISTEN/NOTIFY basics

require "pg"

# === LISTEN/NOTIFY: real-time events from PostgreSQL ===
# No polling. No external message queue. Built into Postgres.

# LISTENER — runs in its own thread or Fiber
listener = PG.connect(dbname: "myapp")
listener.exec("LISTEN order_created")
listener.exec("LISTEN payment_received")

# This connection is now dedicated to receiving notifications.
# Do NOT use it for queries — LISTEN requires a persistent connection.

loop do
  # wait_for_notify blocks until a notification arrives (or timeout)
  listener.wait_for_notify(10) do |channel, pid, payload|
    case channel
    when "order_created"
      order = JSON.parse(payload)
      puts "New order ##{order['id']} — $#{order['total']}"
    when "payment_received"
      puts "Payment on PID #{pid}: #{payload}"
    end
  end
end

# NOTIFIER — any connection can send notifications
conn = PG.connect(dbname: "myapp")

# From Ruby:
conn.exec("NOTIFY order_created, '{\"id\": 789, \"total\": 149.99}'")

# Or from a PostgreSQL trigger — fire automatically on INSERT:
#   CREATE OR REPLACE FUNCTION notify_order() RETURNS trigger AS $$
#   BEGIN
#     PERFORM pg_notify('order_created', row_to_json(NEW)::text);
#     RETURN NEW;
#   END;
#   $$ LANGUAGE plpgsql;
#
#   CREATE TRIGGER order_notify AFTER INSERT ON orders
#   FOR EACH ROW EXECUTE FUNCTION notify_order();

conn.close

The pg gem's wait_for_notify blocks (or yields, with a Fiber scheduler) until a notification arrives. This is fundamentally different from polling. There is no "check every 500ms" loop burning CPU and database resources. The connection sits idle, consuming no resources beyond the PostgreSQL backend process, until PostgreSQL pushes a notification.

Practical uses:

Job queue wakeup. Workers LISTEN for new jobs instead of polling a jobs table. When a new job is inserted, a trigger fires NOTIFY. Workers wake immediately. Response time drops from "polling interval / 2" to "near-zero."
Cache invalidation. A trigger on the products table sends NOTIFY when prices change. Application caches subscribe and invalidate the relevant entries. No TTL guessing, no stale data, no cache-aside ceremony.
Live dashboards. Metrics changes are pushed to connected clients via LISTEN/NOTIFY, bridged to WebSockets at the application layer.

An important constraint: the listening connection must remain open and dedicated. You cannot use it for queries while it is listening — or rather, you can, but you might miss notifications during query execution. In practice, dedicate one connection to listening and use a separate pool for queries.

LISTEN/NOTIFY with Fibers

With Fiber.scheduler, the listener integrates naturally into a concurrent application. The listening Fiber yields while waiting for notifications, allowing other Fibers to run on the same thread.

LISTEN/NOTIFY with Fibers

require "pg"
require "async"
require "async/barrier"

# LISTEN/NOTIFY with Fiber.scheduler — non-blocking listener
Async do |task|
  listener = PG.connect(dbname: "myapp")
  listener.exec("LISTEN job_complete")

  worker = PG.connect(dbname: "myapp")

  barrier = Async::Barrier.new

  # Listener Fiber — yields while waiting for notifications
  barrier.async do
    loop do
      listener.wait_for_notify do |channel, pid, payload|
        puts "Job complete: #{payload}"
      end
    end
  end

  # Worker Fiber — sends notifications after processing
  barrier.async do
    5.times do |i|
      worker.exec_params(
        "INSERT INTO jobs (name, status) VALUES ($1, $2)",
        ["job_#{i}", "complete"]
      )
      worker.exec("NOTIFY job_complete, 'job_#{i} finished'")
      sleep 0.5
    end
  end

  barrier.wait
  listener.close
  worker.close
end

The honest limits of LISTEN/NOTIFY

I should be direct about what LISTEN/NOTIFY is not, because I have seen it misapplied with consequences that were, to put it gently, educational.

LISTEN/NOTIFY limitations

# LISTEN/NOTIFY limitations — an honest accounting
#
# 1. Payload limit: 8,000 bytes per notification.
#    For larger payloads, send only the ID and query the full
#    record in the listener. This is the correct pattern anyway.
#
# 2. No persistence. If no one is listening when NOTIFY fires,
#    the notification is lost. Not queued. Lost. This is not a
#    message queue — it is a signaling mechanism.
#
# 3. No backpressure. If the listener processes notifications
#    slower than they arrive, PostgreSQL buffers them in memory.
#    A sustained burst can exhaust the server's notification buffer.
#    For high-volume scenarios, use a proper queue (Sidekiq, NATS).
#
# 4. PgBouncer transaction mode drops LISTEN.
#    When the connection is returned to the pool, the LISTEN
#    subscription disappears. PgBouncer session mode is required.
#
# 5. One-to-many only. There is no competing-consumer pattern.
#    All listeners on a channel receive every notification.
#    For job queues, use SKIP LOCKED (shown in the Sidekiq section).
#
# When to use LISTEN/NOTIFY:
#   - Cache invalidation signals (small payload, low volume)
#   - "Wake up and check the queue" pings (no payload needed)
#   - Live dashboard updates (moderate volume, tolerable loss)
# When NOT to use LISTEN/NOTIFY:
#   - High-volume event streaming (use Kafka, NATS, or Redis Streams)
#   - Guaranteed delivery (use a transactional outbox pattern)
#   - Job queues (use SKIP LOCKED or Sidekiq)

LISTEN/NOTIFY is a signaling mechanism, not a message queue. The distinction matters. A message queue guarantees delivery — if no consumer is available, the message waits. LISTEN/NOTIFY does not. If no one is listening, the notification evaporates. If the listener is temporarily disconnected, notifications sent during that window are gone.

For cache invalidation and job queue wakeup, this is perfectly acceptable — if the signal is missed, the next poll cycle or the next notification catches up. For "process this financial transaction exactly once," LISTEN/NOTIFY is the wrong tool. Use a proper queue with persistence and acknowledgment semantics. The FOR UPDATE SKIP LOCKED pattern shown in the next section, combined with a polling fallback, is what serious job queue implementations use.

Production patterns: Sidekiq and Puma without ActiveRecord

Sidekiq without ActiveRecord is a common scenario for background job systems that process data but do not need an ORM's object model. A connection pool sized to Sidekiq's concurrency, prepared statements for hot-path queries, and raw SQL for everything else.

Sidekiq worker with pg gem pool

# Sidekiq worker using pg gem directly — no ActiveRecord
require "pg"
require "connection_pool"
require "sidekiq"

# Shared pool across all Sidekiq threads
PG_POOL = ConnectionPool.new(size: Sidekiq[:concurrency], timeout: 5) do
  conn = PG.connect(
    host: ENV["PGHOST"],
    dbname: ENV["PGDATABASE"],
    user: ENV["PGUSER"],
    password: ENV["PGPASSWORD"],
  )
  conn.type_map_for_results = PG::BasicTypeMapForResults.new(conn)
  conn.prepare("get_batch",
    "SELECT id, payload FROM jobs
     WHERE status = 'pending'
     ORDER BY priority DESC
     LIMIT $1 FOR UPDATE SKIP LOCKED")
  conn.prepare("complete_job",
    "UPDATE jobs SET status = 'complete', completed_at = now()
     WHERE id = $1")
  conn
end

class ProcessJobWorker
  include Sidekiq::Job

  def perform(batch_size = 100)
    PG_POOL.with do |conn|
      conn.transaction do |tx|
        jobs = tx.exec_prepared("get_batch", [batch_size])

        jobs.each do |job|
          result = process(job["payload"])
          tx.exec_prepared("complete_job", [job["id"]])
        end
      end
    end
  end

  private

  def process(payload)
    # Your business logic — no ORM overhead, no object instantiation
    JSON.parse(payload)
  end
end

A few details worth attending to. The FOR UPDATE SKIP LOCKED pattern turns a PostgreSQL table into a high-performance job queue — rows are locked for the duration of the transaction, and competing workers skip locked rows instead of waiting. This eliminates contention without external coordination.

FOR UPDATE SKIP LOCKED explained

-- FOR UPDATE SKIP LOCKED — PostgreSQL as a job queue
-- This is the pattern that makes pg-based job queues viable.

-- Worker 1 runs this:
BEGIN;
SELECT id, payload FROM jobs
WHERE status = 'pending'
ORDER BY priority DESC
LIMIT 10
FOR UPDATE SKIP LOCKED;
-- Gets rows 1-10, locks them for this transaction.

-- Worker 2 runs the same query concurrently:
BEGIN;
SELECT id, payload FROM jobs
WHERE status = 'pending'
ORDER BY priority DESC
LIMIT 10
FOR UPDATE SKIP LOCKED;
-- Gets rows 11-20. Rows 1-10 are locked, so they're skipped.

-- No contention. No deadlocks. No advisory locks.
-- Each worker gets its own batch, no coordination needed.

-- Performance with 10 concurrent workers, 100K pending jobs:
-- Method                    Throughput     Contention
-- ─────────────────────────────────────────────────────
-- SELECT ... FOR UPDATE     680 jobs/sec   High (lock waits)
-- Advisory locks            2,400 jobs/sec Moderate
-- SKIP LOCKED               8,200 jobs/sec None

-- This is why Solid Queue, good_job, and que use SKIP LOCKED.
-- It turns PostgreSQL into a competent job queue.

8,200 jobs per second with zero contention. This is the pattern that powers Solid Queue, good_job, and que. It is why PostgreSQL can serve as a job queue for most applications without needing Redis, RabbitMQ, or any external message broker. A bold claim. The numbers support it.

The absence of ActiveRecord in the Sidekiq example is not a rebellion. It is a pragmatic choice. A Sidekiq worker that processes 10,000 jobs per minute does not need model validations, callbacks, dirty tracking, or association loading. It needs to read a row, do work, update a status, and move on. The pg gem does exactly this, with no overhead beyond the query itself.

Puma web server with pg gem directly

The same pattern applies to web servers. A Sinatra or Roda application backed by the pg gem, with a connection pool sized to Puma's thread count, is a remarkably efficient way to build JSON APIs.

Puma + Sinatra + pg gem

# Puma web server with pg gem — no Rails, no ActiveRecord
require "pg"
require "connection_pool"
require "sinatra"
require "json"

# Initialize pool before Puma forks workers
PG_POOL = ConnectionPool.new(size: ENV.fetch("PUMA_THREADS", 5).to_i, timeout: 5) do
  conn = PG.connect(
    host: ENV["PGHOST"],
    dbname: ENV["PGDATABASE"],
    user: ENV["PGUSER"],
    password: ENV["PGPASSWORD"],
    application_name: "api-#{Process.pid}",
  )
  conn.type_map_for_results = PG::BasicTypeMapForResults.new(conn)
  conn.prepare("get_product", "SELECT id, name, price, stock FROM products WHERE id = $1")
  conn.prepare("list_products", "SELECT id, name, price FROM products WHERE active = true ORDER BY name LIMIT $1 OFFSET $2")
  conn
end

get "/products/:id" do
  content_type :json

  PG_POOL.with do |conn|
    product = conn.exec_prepared("get_product", [params[:id]]).first
    halt 404, { error: "not found" }.to_json unless product
    product.to_json
  end
end

get "/products" do
  content_type :json
  limit = [params.fetch("limit", 20).to_i, 100].min
  offset = [params.fetch("offset", 0).to_i, 0].max

  PG_POOL.with do |conn|
    products = conn.exec_prepared("list_products", [limit, offset]).to_a
    { data: products, limit: limit, offset: offset }.to_json
  end
end

# With Puma (clustered mode):
#   puma -w 4 -t 5:5 app.rb
# That is 4 workers x 5 threads = 20 connections total.
# Each worker initializes its own pool after fork.

Four Puma workers at five threads each, each worker with its own five-connection pool. Twenty total connections to PostgreSQL. No ORM overhead on any of them. For APIs that serve JSON from a handful of queries, this architecture handles thousands of requests per second on modest hardware.

One architectural note: initialize the connection pool before Puma forks workers if you are using preload. In Puma's clustered mode, each forked worker gets its own copy of the process. Connections created before the fork are shared across workers, which will corrupt the protocol state. Use Puma's on_worker_boot hook to initialize the pool after forking, or simply define the pool at the application level (as shown) and let each worker create it on first use.

Error handling: the pg gem's exception hierarchy

The pg gem maps PostgreSQL's error codes to a well-organized Ruby exception hierarchy. Every SQLSTATE code becomes a specific exception class under PG::Error. This means you can catch exactly the errors you expect and let unexpected ones propagate.

require "pg"

conn = PG.connect(dbname: "myapp")

# Robust error handling with the pg gem
begin
  conn.transaction do |tx|
    tx.exec_params(
      "INSERT INTO users (email) VALUES ($1)",
      ["ada@example.com"]
    )
    tx.exec_params(
      "INSERT INTO profiles (user_id, bio) VALUES (currval('users_id_seq'), $1)",
      ["Mathematician"]
    )
    # If either INSERT fails, the entire transaction rolls back.
  end
rescue PG::UniqueViolation => e
  puts "Duplicate email: #{e.message}"
rescue PG::ForeignKeyViolation => e
  puts "Invalid reference: #{e.message}"
rescue PG::ConnectionBad => e
  puts "Connection lost: #{e.message}"
  conn.reset  # attempt to reconnect
rescue PG::Error => e
  puts "Unexpected PG error: #{e.class} — #{e.message}"
end

# The pg gem maps PostgreSQL error codes to Ruby exception classes:
#   23505 -> PG::UniqueViolation
#   23503 -> PG::ForeignKeyViolation
#   42P01 -> PG::UndefinedTable
#   57014 -> PG::QueryCanceled (statement_timeout)
#   08006 -> PG::ConnectionFailure
#   40001 -> PG::SerializationFailure (serializable isolation)
#   40P01 -> PG::DeadlockDetected
#
# Full list: PG::ERROR_CLASSES hash or PostgreSQL docs appendix A.

conn.close

The exception hierarchy is PostgreSQL's gift to your error handling. Instead of parsing error message strings — a practice that should offend any self-respecting engineer — you catch PG::UniqueViolation and know with certainty that a unique constraint was violated. The error code is structured data. The message is commentary.

Retry patterns for transient failures

Not every database error is permanent. Serialization failures under SERIALIZABLE isolation are expected under high concurrency — PostgreSQL detects potential anomalies and asks you to retry. Deadlocks occur when two transactions lock resources in different orders. Connection drops happen during maintenance windows. All three are transient: the same operation, submitted again, is likely to succeed.

Retry with backoff

require "pg"

# Retry pattern for transient failures — serialization errors,
# deadlocks, and connection drops.

def with_retry(conn, max_attempts: 3, &block)
  attempts = 0
  begin
    attempts += 1
    conn.transaction(&block)
  rescue PG::SerializationFailure, PG::DeadlockDetected => e
    # These are expected under high concurrency with
    # SERIALIZABLE isolation or competing updates.
    raise if attempts >= max_attempts
    sleep(0.1 * attempts)  # linear backoff
    retry
  rescue PG::ConnectionBad, PG::UnableToSend => e
    raise if attempts >= max_attempts
    conn.reset
    conn.type_map_for_results = PG::BasicTypeMapForResults.new(conn)
    retry
  end
end

# Usage:
conn = PG.connect(dbname: "myapp")
conn.type_map_for_results = PG::BasicTypeMapForResults.new(conn)

with_retry(conn) do |tx|
  balance = tx.exec_params(
    "SELECT balance FROM accounts WHERE id = $1 FOR UPDATE",
    [42]
  ).first["balance"]

  tx.exec_params(
    "UPDATE accounts SET balance = $1 WHERE id = $2",
    [balance - 100, 42]
  )
end

conn.close

The retry wrapper catches only transient errors. A PG::UniqueViolation will not be retried — it would fail again with the same parameters. A PG::UndefinedTable will not be retried — the table is not going to materialize between attempts. Only errors that represent timing conflicts or infrastructure blips are worth retrying.

The linear backoff — 100ms, 200ms, 300ms — is deliberate. Exponential backoff is appropriate for systems where you are competing with millions of other clients (cloud APIs, for instance). For database transactions, where the conflict is between a handful of your own processes, a short linear delay is sufficient to let the competing transaction complete.

How does direct pg compare to the alternatives?

The honest assessment:

Approach	Connection pooling	Async support	Per-query overhead	Best for
pg gem direct	Manual / connection_pool gem	Fiber.scheduler (Ruby 3.0+)	Zero — raw libpq	CLI tools, pipelines, microservices
pg + Gold Lapel proxy	Built-in (GL manages pool)	Fiber.scheduler	Zero + auto-optimization	Production services at scale
Sequel (pg driver)	Built-in thread-safe pool	Limited	Minimal — thin SQL layer	Apps wanting light ORM
ActiveRecord (pg driver)	Built-in with reaper	load_async (Rails 7+)	Moderate — full ORM stack	Rails applications

Throughput comparison

# End-to-end throughput: pg gem vs Sequel vs ActiveRecord
# Test: 10,000 iterations of "find user by ID, return hash"
# PostgreSQL 16, Ruby 3.3, Puma, 5 threads, local socket
#
# Library          Median latency  p99 latency  Throughput   Memory
# ────────────────────────────────────────────────────────────────────
# pg gem direct       0.29ms         0.52ms      3,200 qps    48MB
# Sequel              0.34ms         0.61ms      2,750 qps    62MB
# ActiveRecord        0.58ms         1.24ms      1,600 qps   128MB
#
# ActiveRecord overhead comes from:
#   - Object instantiation (~0.08ms per row)
#   - Dirty tracking setup (~0.04ms)
#   - Type casting through ActiveModel (~0.06ms)
#   - Query cache check (~0.02ms)
#   - Callback chain evaluation (~0.03ms)
#
# For a single lookup, this adds ~0.23ms.
# At 10,000 queries/minute, that is 38 seconds of CPU time
# per minute spent on ORM machinery — not database work.
#
# Sequel sits in between: it instantiates model objects
# but skips dirty tracking, callbacks, and most of the
# ActiveModel stack. Its overhead is ~0.05ms per row.

The numbers deserve commentary. ActiveRecord's 0.58ms median versus pg gem's 0.29ms is a 2x difference for a simple indexed lookup. For a single request, 0.29ms of overhead is invisible. For an endpoint that makes 15 database calls, ActiveRecord adds 4.35ms of pure overhead — still modest. For a Sidekiq worker processing 10,000 jobs per minute, ActiveRecord's overhead consumes 38 seconds of CPU time per minute that is not doing database work. That is where the abstraction cost becomes a line item.

Sequel sits in the middle — a thin query builder with optional model support, a thread-safe connection pool, and an overhead of roughly 0.05ms per row. If you want query building without the full ORM stack, Sequel is the measured choice. Its dataset API is a genuine pleasure.

But for CLI tools that run one query and exit, for data pipelines that process millions of rows, for Sidekiq workers that need raw throughput, for microservices that own two tables and a handful of queries — the pg gem is the right tool. It does less, which means there is less to go wrong, less to configure, and less standing between your code and the database.

When ActiveRecord earns its overhead

I would be a poor guide if I only presented the case for the pg gem without acknowledging where the alternatives genuinely earn their keep.

ActiveRecord earns its overhead in applications with complex domain models, where validations, callbacks, and association management save more developer time than they cost in runtime performance. A Rails application with 50 models, polymorphic associations, STI hierarchies, and nested attributes is not the place to replace ActiveRecord with raw pg. The productivity cost of managing those relationships manually would dwarf any performance gain.

The pg gem is for the other case — the one that gets less attention in tutorials because it is less photogenic. The Sidekiq worker. The data pipeline. The CLI tool. The microservice. The places where the database is a utility, not a domain model.

Where Gold Lapel fits: pooling that the pg gem does not have to manage

The pg gem gives you direct connections. The connection_pool gem gives you a pool. But the pool lives in your Ruby process — if you run four Puma workers with five connections each, that is twenty connections to PostgreSQL. Add three Sidekiq processes at ten connections each, and you are at fifty. Add a cron runner, a console session, a migration — suddenly you are managing connection budgets across every process.

This is where an external connection pooler earns its place. Gold Lapel sits between your Ruby processes and PostgreSQL as a proxy. Install the gem with gem install goldlapel, add GoldLapel.start(url) to your entry point, and your pg gem connections route through Gold Lapel automatically. Gold Lapel manages a smaller pool of actual PostgreSQL connections and multiplexes your application connections across them.

For pg gem users specifically, Gold Lapel's session-mode pooling means prepared statements work without workarounds. Your conn.prepare calls bind to a stable backend. LISTEN/NOTIFY connections maintain their subscriptions. Session variables persist. The pg gem behaves exactly as if it were connected directly to PostgreSQL — because from its perspective, it is.

Beyond pooling, Gold Lapel observes your query traffic and acts on it: detecting missing indexes from access patterns, identifying queries that would benefit from materialized views, and rewriting inefficient patterns before they reach PostgreSQL. The pg gem sends honest, hand-written SQL. Gold Lapel ensures PostgreSQL executes it as efficiently as possible. They complement each other rather well.

I would not presume to suggest that every application requires a proxy between it and PostgreSQL. A single Puma process with five connections to a database that has ninety-five connections to spare does not need connection multiplexing. But the application that grew from one Puma process to four, then added Sidekiq, then added a second Sidekiq process for the priority queue, then a cron runner — that application will eventually need to reckon with its connection budget. When it does, a proxy that understands pg gem semantics is a good deal more pleasant than one that does not.

Frequently asked questions

Is the pg gem thread-safe?

Can I use prepared statements with PgBouncer in transaction mode?

How do I handle connection drops in a long-running process?

Should I use Fibers or threads for concurrent database queries?

How many connections should my pool have?

What is the difference between exec_params and exec_prepared?

Can I use the pg gem with Rails alongside ActiveRecord?

Terms referenced in this article

One does not often have the occasion to mention connection pooling at the driver level and at the infrastructure level in the same breath, but the occasion has arrived. The PostgreSQL connection pooler comparison covers PgBouncer, Pgpool-II, PgCat, and Odyssey — useful context when deciding how much pooling to handle in Ruby versus in front of PostgreSQL.

The Ruby pg Gem: Async Queries, Fiber Scheduling, and Connection Pooling Without an ORM

Good evening. I understand you prefer to speak to the database directly.

The fundamentals: connections, queries, and type mapping

Connection parameters and production configuration

Prepared statements: the 15-30% you are leaving on the table

Generic plans and the plan cache

Ruby 3.0+ Fiber.scheduler: async without the ceremony

How it works under the hood

Performance: Fibers vs threads vs sequential

When Fiber.scheduler does not help — an honest accounting

When async pg IS worth the complexity

Connection pooling without ActiveRecord

Pool sizing: the formula and the exceptions

Health checks and reconnection

Fibers and pools together: cooperative concurrency at scale

PgBouncer and the pg gem: what works, what breaks, and how to navigate

Prepared statements through PgBouncer

COPY protocol: when INSERT is not fast enough

Binary COPY: faster still for numeric data

The COPY-then-upsert pattern

When to use COPY

LISTEN/NOTIFY: real-time events without polling

LISTEN/NOTIFY with Fibers

The honest limits of LISTEN/NOTIFY

Production patterns: Sidekiq and Puma without ActiveRecord

Puma web server with pg gem directly

Error handling: the pg gem's exception hierarchy

Retry patterns for transient failures

How does direct pg compare to the alternatives?

When ActiveRecord earns its overhead

Where Gold Lapel fits: pooling that the pg gem does not have to manage

Frequently asked questions

Is the pg gem thread-safe?

Can I use prepared statements with PgBouncer in transaction mode?

How do I handle connection drops in a long-running process?

Should I use Fibers or threads for concurrent database queries?

How many connections should my pool have?

What is the difference between exec_params and exec_prepared?

Can I use the pg gem with Rails alongside ActiveRecord?

Terms referenced in this article

Related content