← Rails & Ruby Frameworks

Rails Read Replica Routing on PostgreSQL: Replication Lag, Stale Reads, and the Pool Surprise

Your delay: 2.seconds is not measuring what you think it is measuring. Allow me to clarify before something regrettable happens in production.

The Waiter of Gold Lapel · Updated Mar 20, 2026 Published Mar 5, 2026 · 28 min read
We routed the illustration to the replica. It arrived three seconds stale. Apologies.

Good evening. Your reads are not going where you expect.

Rails 7 introduced automatic read replica routing, and on paper it is elegant. Writes go to the primary. Reads go to the replica. A configurable delay prevents stale reads after writes. Configuration lives in database.yml, a few lines in application.rb, and you are finished.

You are not finished.

I have attended to several production systems that deployed Rails' multi-database support with the confidence that documentation tends to produce and the vigilance that documentation tends not to. The pattern is consistent: the initial setup works, the team celebrates the reduced primary load, and then — within days or weeks — a support ticket arrives. A user updated their profile but still sees the old avatar. A customer placed an order but the confirmation page shows no line items. An admin exported a report that is missing the last three minutes of transactions.

These are not bugs in Rails. They are the natural consequences of a read routing system that does not understand PostgreSQL replication. And they are, if I may be direct, entirely preventable.

There are three PostgreSQL-specific problems that the Rails multi-database documentation does not adequately address. Each one has caused production incidents at companies running real traffic. I have documented all three, with the PostgreSQL queries you need to actually measure what is happening, and the fixes that work.

The problems, in brief:

  1. The delay parameter measures wall-clock time since the session's last write. It does not measure replication lag. These are different things.
  2. Each database role doubles your connection pool requirements. Grailed documented hitting connection exhaustion within hours of deploying read replicas.
  3. load_async, Turbo Frame lazy-loads, Turbo Stream broadcasts, and ActionCable subscriptions can bypass role switching entirely, sending reads to the replica when your session context says they should go to the primary.

I should also say — and I will return to this later — that read replicas are genuinely useful. When they are deployed with an honest understanding of what delay actually measures, with connection math done in advance, and with the Hotwire integration points explicitly tested, they reduce primary load and improve read latency. The problem is not the feature. The problem is the gap between what the configuration appears to promise and what it actually delivers.

The multi-database setup, for reference

Before I address the gotchas, here is the standard Rails 7+ multi-database configuration for PostgreSQL. If you have done this before, skip ahead. If you have not, this is the baseline from which everything goes sideways.

config/database.yml
# config/database.yml — Rails 7+ multi-database configuration
production:
  primary:
    database: myapp_production
    host: pg-primary.internal
    adapter: postgresql
    pool: 10

  primary_replica:
    database: myapp_production
    host: pg-replica-1.internal
    adapter: postgresql
    pool: 10
    replica: true

The replica: true flag tells ActiveRecord this connection should never receive writes. Rails will raise ActiveRecord::ReadOnlyError if a write is attempted against it. This is a guard rail, not a routing mechanism — it prevents accidents, but it does not decide where reads go.

app/models/application_record.rb
# app/models/application_record.rb
class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true

  connects_to database: {
    writing: :primary,
    reading: :primary_replica
  }
end

This maps the :writing role to your primary and the :reading role to your replica. Every model inheriting from ApplicationRecord now participates in automatic role switching. A detail worth noting: the role names :writing and :reading describe the database's role in the topology, not the nature of the query being executed. This distinction will matter when we reach manual role switching.

config/application.rb
# config/application.rb
#
# Rails enables automatic role switching by default in 7+.
# The delay parameter controls how long after a write
# before reads can go back to the replica.
config.active_record.database_selector = { delay: 2.seconds }
config.active_record.database_resolver =
  ActiveRecord::Middleware::DatabaseSelector::Resolver
config.active_record.database_resolver_context =
  ActiveRecord::Middleware::DatabaseSelector::Resolver::Session

The middleware intercepts each request, checks whether the session has written recently, and routes accordingly. GET and HEAD requests go to the replica (the :reading role). POST, PUT, PATCH, and DELETE go to the primary (the :writing role). After a write, subsequent reads stay on the primary for delay seconds.

This is well-designed machinery. It handles the common case correctly. The trouble begins at the edges — and in production, the edges arrive faster than you might expect.

A brief primer on how PostgreSQL replication actually works

Before we discuss what can go wrong with replica routing, it is worth understanding what is happening beneath the abstraction. Rails treats the replica as "another database with the same data." PostgreSQL treats the replica as "a continuous consumer of my write-ahead log." These are different mental models, and the difference matters.

PostgreSQL streaming replication
-- PostgreSQL streaming replication in four sentences:
--
-- 1. Every change to the primary produces a WAL (Write-Ahead Log) record.
-- 2. The primary streams these WAL records to connected replicas.
-- 3. Each replica receives, writes, and replays WAL records independently.
-- 4. "Replication lag" is the gap between what the primary has sent
--    and what the replica has replayed.
--
-- The WAL is a sequential byte stream. Position is tracked as an LSN
-- (Log Sequence Number). The primary is at LSN 0/3A00F8C0.
-- The replica has replayed up to 0/3A00E000. The difference is lag.
--
-- This is the ONLY honest measure of how far behind a replica is.
-- Wall-clock timers, session cookies, and hopeful configuration
-- parameters cannot tell you this. PostgreSQL can.

Streaming replication is asynchronous by default. The primary sends WAL records to the replica over a persistent TCP connection, and the replica applies them as fast as it can. Under normal conditions with modern hardware, this lag is measured in single-digit milliseconds. The replica is, for practical purposes, identical to the primary.

But "normal conditions" is doing a great deal of work in that sentence. Several things cause lag to spike, and any Rails application of sufficient age will encounter all of them.

Common causes of replication lag
-- What causes replication lag spikes in PostgreSQL:
--
-- 1. Long-running transactions on the replica
--    Replicas cannot replay WAL that conflicts with active queries.
--    A 10-minute analytics query on the replica can hold back replay
--    for the entire duration (unless hot_standby_feedback is on,
--    which then prevents vacuum on the primary — pick your poison).
--
-- 2. Large write bursts on the primary
--    Bulk imports, batch updates, COPY operations.
--    The primary generates WAL faster than the replica can replay.
--    A 50M-row UPDATE produces gigabytes of WAL in seconds.
--
-- 3. Vacuum and autovacuum on the replica
--    While the replica itself does not vacuum, replaying vacuum-related
--    WAL records (page rewrites, index cleanups) is CPU-intensive.
--    A large VACUUM FULL on the primary produces a replay bottleneck.
--
-- 4. Network latency between primary and replica
--    Cross-AZ: typically 1-3ms. Cross-region: 50-200ms.
--    At high write throughput, even 2ms of network latency compounds.
--
-- 5. Disk I/O saturation on the replica
--    Replicas share their disk with read queries.
--    A heavy analytical workload can saturate IOPS,
--    starving WAL replay of I/O bandwidth.

The critical insight: lag is not constant, and it is not predictable from the application layer. A replica that has been 3ms behind for six months can fall 30 seconds behind in the time it takes to run a migration. Rails' delay parameter has no mechanism to detect this. It is measuring a stopwatch when it should be reading a gauge.

Gotcha 1: delay measures wall-clock time, not replication lag

This is the most consequential misunderstanding in the entire multi-database feature. I will state it plainly.

The delay parameter in database_selector measures how many seconds have passed since this session last wrote to the database. It does not query PostgreSQL. It does not check replication status. It does not know or care how far behind the replica is.

What delay actually means
# What you think delay: 2.seconds means:
#
#   "Wait until replication lag is under 2 seconds
#    before routing reads to the replica."
#
# What it actually means:
#
#   "After the most recent write BY THIS SESSION,
#    wait 2 wall-clock seconds before allowing
#    reads to go to the replica."
#
# These are fundamentally different things.
#
# Session A writes at 14:00:01.
# Session B (different user) reads at 14:00:01.
# Session B has never written anything.
# Session B goes to the replica immediately.
#
# If the replica is 30 seconds behind?
# Session B gets 30-second-stale data.
# The delay parameter does not help.

The implication is stark. If your replica falls 30 seconds behind the primary — due to a large migration, a bulk import, a vacuum operation, a network hiccup between availability zones — every user who has not personally written something in the last delay seconds will read stale data. The delay parameter cannot protect them because it does not know there is a problem.

This is not a bug. The Rails documentation is clear that the delay is session-scoped. But the parameter name and the common mental model suggest replication-awareness that simply is not there.

I should be precise about the failure mode. It is not that stale reads happen occasionally under extreme conditions. It is that stale reads happen by default for every user who arrives at your application via a GET request without having written anything first. A new visitor browsing your product catalog, a returning customer checking order status from a bookmarked link, an API client polling for updates — none of these sessions have written anything. The delay parameter is 0 seconds since their last write, which means they go to the replica immediately, regardless of how far behind it is.

The only users protected by delay are those who have recently performed a write action. For a read-heavy application — which is precisely the kind of application that benefits most from read replicas — the majority of requests are unprotected.

Measuring real replication lag with pg_stat_replication

PostgreSQL provides pg_stat_replication on the primary, which shows the exact state of every connected replica. This is what actual lag measurement looks like.

pg_stat_replication on the primary
-- Query the PRIMARY to see real replication lag:
SELECT
  client_addr,
  state,
  sent_lsn,
  write_lsn,
  flush_lsn,
  replay_lsn,
  -- Bytes of WAL not yet replayed on replica
  pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes,
  -- Time-based lag (PostgreSQL 10+)
  replay_lag,
  write_lag,
  flush_lag
FROM pg_stat_replication;

-- Example output:
--  client_addr   | state     | replay_lag_bytes | replay_lag
-- ---------------+-----------+------------------+------------------
--  10.0.1.42     | streaming |            16384 | 00:00:00.003217
--  10.0.1.43     | streaming |          8388608 | 00:00:02.841000

-- Replica 1: 16 KB behind, ~3ms lag. Fine.
-- Replica 2: 8 MB behind, ~2.8s lag. Not fine.

Three numbers matter here:

  • replay_lag_bytes — the difference between WAL sent by the primary and WAL replayed on the replica. This is the most reliable lag indicator. Measured in bytes, not time, which makes it unambiguous. 8 MB of unplayed WAL is 8 MB of unplayed WAL, regardless of what time you are measuring from.
  • replay_lag — PostgreSQL's own time-based estimate, available since version 10. Useful for dashboards but can be misleading during bursts of write activity. If the primary generates 50 MB of WAL in one second, replay_lag might show "2 seconds" even though the replica needs 10 seconds to chew through that volume.
  • write_lag and flush_lag — how long before the replica acknowledged receiving and persisting the WAL. These matter for synchronous replication configurations but are less relevant for read routing decisions.

The key insight: replay_lag_bytes gives you an honest answer. If a replica has 8 MB of unplayed WAL, it is behind — regardless of what any session's wall-clock timer says.

Checking lag from the replica side

Sometimes you want to know how far behind a specific replica is without querying the primary. PostgreSQL provides functions for this, though with a caveat.

Checking lag from the replica
-- You can also check from the REPLICA side:
SELECT
  -- How far behind is this replica?
  pg_last_wal_receive_lsn() AS last_received,
  pg_last_wal_replay_lsn() AS last_replayed,
  pg_wal_lsn_diff(
    pg_last_wal_receive_lsn(),
    pg_last_wal_replay_lsn()
  ) AS local_replay_lag_bytes,

  -- When was the last WAL record replayed?
  pg_last_xact_replay_timestamp() AS last_replay_time,

  -- How long ago was that? (crude but useful)
  EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))
    AS seconds_since_replay

-- WARNING: seconds_since_replay is misleading during quiet periods.
-- If the primary has no writes for 60 seconds, this shows 60s of "lag"
-- even though the replica is perfectly caught up.
-- Use local_replay_lag_bytes for accuracy.

The pg_last_xact_replay_timestamp() approach is common in monitoring tutorials but has a significant flaw: during quiet periods when the primary is not generating writes, the timestamp grows stale even though the replica is perfectly caught up. At 3am when your application has minimal traffic, this function will report 45 seconds of "lag" that does not exist. Use byte-based measurement for routing decisions. Use timestamps for dashboards where a human can apply judgment.

Building a lag-aware resolver

Rails' resolver architecture is extensible. You can replace the default resolver with one that actually queries PostgreSQL before deciding where to route reads.

A lag-aware custom resolver
# A more honest lag check — query the primary for actual state
class ReplicationLagChecker
  MAX_ACCEPTABLE_LAG_BYTES = 1_048_576  # 1 MB

  def self.replica_safe?
    result = ActiveRecord::Base.connected_to(role: :writing) do
      ActiveRecord::Base.connection.execute(<<~SQL)
        SELECT
          COALESCE(
            MAX(pg_wal_lsn_diff(sent_lsn, replay_lsn)),
            0
          ) AS max_lag_bytes
        FROM pg_stat_replication
      SQL
    end

    lag = result.first["max_lag_bytes"].to_i
    lag < MAX_ACCEPTABLE_LAG_BYTES
  end
end

# Usage in a custom resolver:
class LagAwareResolver < ActiveRecord::Middleware::DatabaseSelector::Resolver
  def read_from_primary?
    # Original wall-clock check
    return true if super

    # Real lag check — if replica is too far behind, use primary
    !ReplicationLagChecker.replica_safe?
  end
end

This is better than the default, but it has a cost: every read decision now involves a query to pg_stat_replication on the primary. In the full production-grade version below, I show how to cache this check so it runs every few seconds rather than on every request.

Gotcha 2: each role doubles your connection pool

This one is arithmetic, and the arithmetic is unforgiving.

When you add a primary_replica entry to database.yml, Rails creates a separate connection pool for it. Each pool has its own pool size. Each pool maintains its own set of TCP connections to PostgreSQL. These are real TCP connections, each consuming a PostgreSQL backend process, each consuming shared memory, each counting against max_connections.

The pool math
# config/database.yml — the pool math nobody does upfront
production:
  primary:
    pool: 10          # 10 connections to the primary
  primary_replica:
    pool: 10          # 10 MORE connections — to the replica

# Total connections from this process: 20.
#
# Running 4 Puma workers? 80 connections.
# Running 8 Sidekiq threads? Another 160 connections.
#
# Before multi-db: 4 workers * 10 + 8 threads * 10 = 120
# After multi-db:  4 workers * 20 + 8 threads * 20 = 240
#
# You just doubled your connection count.
# Your pg_hba.conf max_connections is still set to 100.

The numbers compound quickly. Every Puma worker, every Sidekiq process, every Rails console, every rails runner invocation in your crontab — each one opens connections to both the primary and the replica. If you were running close to max_connections before, you are now over it.

And if you are deploying to a managed PostgreSQL service — RDS, Cloud SQL, AlloyDB — the max_connections default is often lower than you would expect. An RDS db.t3.medium instance defaults to 82 connections. Deploying multi-database to a fleet of 4 Puma workers with 5 threads each and a 10-thread Sidekiq process, at a pool size of 5, consumes (4 * 5 * 5 * 2) + (10 * 5 * 2) = 300 potential connections. Against 82 available. The math does not work.

What happened at Grailed

Grailed, the fashion marketplace, documented this exact scenario in their engineering blog. They deployed read replicas and hit connection exhaustion within hours.

Grailed's connection exhaustion
# Grailed's production numbers (from their engineering blog):
#
# Before read replicas:
#   - 5 Puma workers x 5 threads x pool:5 = 125 connections
#
# After adding replica role:
#   - 5 Puma workers x 5 threads x pool:5 x 2 roles = 250 connections
#   - Plus Sidekiq: 25 threads x pool:5 x 2 roles = 250 connections
#   - Total: 500 connections
#
# Their PostgreSQL max_connections was 200.
# They hit connection exhaustion within hours of deploying.
#
# Fix: PgBouncer in front of both primary and replica,
# plus reducing pool sizes per-role.

Their fix involved two changes: putting PgBouncer in front of both the primary and replica (multiplexing many Rails connections through fewer PostgreSQL connections), and reducing per-role pool sizes to reflect actual usage patterns.

I appreciate that Grailed published this. Most teams encounter the same problem and fix it quietly, which means the next team encounters it again. The pool doubling is not mentioned in the Rails multi-database guide, and it is the single most common cause of "we deployed read replicas and everything broke" incidents.

Auditing your actual connection usage

Before you resize pools, measure what you actually have. PostgreSQL tells you exactly who is connected and what they are doing.

Auditing connections with pg_stat_activity
-- Run this on your PRIMARY to see who is connected and from where:
SELECT
  datname AS database,
  usename AS user,
  client_addr,
  application_name,
  state,
  COUNT(*) AS connections
FROM pg_stat_activity
WHERE datname = 'myapp_production'
GROUP BY datname, usename, client_addr, application_name, state
ORDER BY connections DESC;

-- Example output:
--  database        | user  | client_addr | application_name | state  | connections
-- -----------------+-------+-------------+------------------+--------+------
--  myapp_production| myapp | 10.0.2.10   | puma             | idle   | 40
--  myapp_production| myapp | 10.0.2.10   | puma             | active | 3
--  myapp_production| myapp | 10.0.2.20   | sidekiq          | idle   | 38
--  myapp_production| myapp | 10.0.2.20   | sidekiq          | active | 2
--
-- 78 idle connections from two processes.
-- Those 78 idle connections count against max_connections.
-- They are holding PostgreSQL shared memory.
-- They are doing nothing.

-- Now run the same query on the REPLICA:
--  myapp_production| myapp | 10.0.2.10   | puma             | idle   | 37
--  myapp_production| myapp | 10.0.2.20   | sidekiq          | idle   | 25
--
-- 25 idle Sidekiq connections to the replica.
-- Sidekiq almost never reads from the replica.
-- Those connections exist because database.yml said pool: 5
-- and Sidekiq has 5 processes with 5 threads each.

Run this on both the primary and each replica. The application_name column is particularly useful — Puma and Sidekiq set this automatically if you configure it in database.yml (add variables: { application_name: "puma" } to your database configuration). If you see 25 idle Sidekiq connections to the replica and Sidekiq never reads from the replica, those connections are pure waste.

Right-sizing pools per role

Not every process needs both roles. Sidekiq workers are predominantly write-heavy — they process jobs, update records, send emails. Giving them a full replica pool that sits mostly idle is a waste of connections.

Per-role pool sizing
# Recommended: size pools per role based on actual usage
production:
  primary:
    pool: <%= ENV.fetch("PRIMARY_POOL_SIZE", 5) %>
  primary_replica:
    pool: <%= ENV.fetch("REPLICA_POOL_SIZE", 5) %>

# Sidekiq workers rarely need the replica at all —
# most background jobs are writes. Consider:
class SidekiqRecord < ApplicationRecord
  self.abstract_class = true

  connects_to database: {
    writing: :primary
    # No reading role — Sidekiq always uses primary
  }
end

The principle is straightforward: measure which processes actually use which roles, and size accordingly. A Puma worker handling web requests needs both pools. A Sidekiq worker processing background jobs probably needs only the primary. A read-only reporting service needs only the replica.

One more detail that catches people: idle_timeout. Rails' default connection pool reaps idle connections after 300 seconds. But if your replica pool is sized at 10 and your process only ever uses 2 replica connections, those 2 connections sit established for the lifetime of the process. PostgreSQL counts them against max_connections regardless of activity. For a deeper treatment of pool mechanics, see our connection pooling guide.

PgBouncer as the multiplexing layer

If you are running more than a handful of Puma workers and a Sidekiq fleet, PgBouncer is not optional — it is arithmetic. Here is a working configuration for a dual-target setup.

PgBouncer configuration for multi-db Rails
; pgbouncer.ini — connection pooling for multi-db Rails
;
; One PgBouncer instance per database target
; (or one instance with multiple [databases] entries)

[databases]
; Primary — Rails connects here for writes
myapp_primary = host=pg-primary.internal port=5432 dbname=myapp_production

; Replica — Rails connects here for reads
myapp_replica = host=pg-replica-1.internal port=5432 dbname=myapp_production

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

; Transaction pooling — most compatible with Rails
pool_mode = transaction

; Each pool gets up to this many server connections
default_pool_size = 20

; Allow temporary overflow
max_client_conn = 400
reserve_pool_size = 5
reserve_pool_timeout = 3

; Important for replicas: read-only queries can share connections
; more aggressively than write transactions
;
; With 400 max Rails connections funneled through 20 server connections,
; you get 20:1 multiplexing. That is the point.

With PgBouncer in transaction pooling mode, 400 Rails connections share 20 PostgreSQL connections. The connection exhaustion problem is solved at the protocol level. Your database.yml points at PgBouncer instead of PostgreSQL directly, and the pool sizes become less critical because PgBouncer handles the multiplexing.

A caveat: transaction pooling mode does not support prepared statements or session-level variables. If your application uses prepared_statements: true (the Rails default), you must either disable it in database.yml or use session pooling mode in PgBouncer, which offers less multiplexing benefit. Most Rails applications work fine with prepared_statements: false, though you may notice a small increase in query parse time for complex queries.

Gotcha 3: Turbo, ActionCable, and load_async bypass role switching

This is the subtlest family of issues, because it involves the interaction between Rails' session-based role switching and features that operate outside the normal request-response cycle. The DatabaseSelector middleware runs on HTTP requests. Turbo Frames fire separate HTTP requests. Turbo Streams use WebSockets. load_async runs on background threads. ActionCable has its own connection lifecycle. Each one has a different relationship with the session that controls role switching.

Turbo Frame lazy-loads

Turbo Frame session context
# Turbo Frame lazy-loading bypasses role switching.
# Here is why.

# Your controller:
class OrdersController < ApplicationController
  def show
    @order = Order.find(params[:id])  # Hits replica (GET request, no recent write)
  end

  def line_items
    # This is loaded via Turbo Frame:
    # <turbo-frame id="line_items" src="/orders/123/line_items" loading="lazy">
    @line_items = @order.line_items.includes(:product)
    # Also hits replica — separate GET request, own session context
  end
end

# The problem:
#
# 1. User creates an order (POST) — writes to primary
# 2. Redirect to show — within delay window, reads from primary. Correct.
# 3. Turbo Frame fires lazy GET for line_items
#    — This is a NEW request
#    — If the session cookie propagates, delay check works
#    — If it is a fetch() without credentials, no session = no delay = replica
#
# Result: the main order shows correctly (primary),
# but the line items frame shows stale data (replica).

The core issue: Rails' role switching stores the "last write timestamp" in the session. Turbo Frame lazy-loads issue separate HTTP requests. If those requests carry the session cookie, the delay check works correctly. If they do not — because the fetch is configured without credentials, or because a CDN strips cookies, or because the frame source is on a different subdomain — the delay check has no session to inspect, and the read goes to the replica unconditionally.

In practice, most Turbo Frame setups do carry cookies (same-origin requests include credentials by default). But if you have customized your Turbo configuration, added a CDN, or serve frames from a different origin, verify this explicitly. The symptom is maddeningly inconsistent: the main page shows the correct data, but one frame shows data from three seconds ago.

Turbo Stream broadcasts: the UI outruns the database

Turbo Stream broadcast timing
# Turbo Stream broadcasts add another dimension.
#
# In your model:
class Comment < ApplicationRecord
  after_create_commit -> {
    broadcast_append_to "post_#{post_id}_comments"
  }
end

# The broadcast fires a Turbo Stream over ActionCable.
# The client receives the HTML and inserts it into the DOM.
# No HTTP request. No session cookie. No role switching at all.
#
# But here is the subtlety:
# The broadcast callback fires on the primary (it is an after_commit).
# The rendered HTML uses whatever connection is current — also primary.
# So the broadcast itself is fine.
#
# The problem is what happens NEXT:
# The user sees the new comment in the stream.
# They click on the comment to see its detail page (GET request).
# If the replica is behind, the comment does not exist there yet.
#
# User sees: "Comment posted!" then clicks through to a 404.
#
# The Turbo Stream gave the user data that the replica
# cannot yet confirm exists. The UI outran the database.

This is a different class of problem than the Turbo Frame issue. With frames, the concern is that the request goes to the wrong database. With broadcasts, the concern is that the user receives real-time data that the replica cannot yet confirm exists. The broadcast itself is fine — it fires from an after_commit callback on the primary. But the UI update gives the user confidence that the data exists, and their next navigation might hit a replica that has not yet replayed the WAL records for that write.

The fix depends on your application's tolerance for this. For most features — a new comment appearing, a status badge updating — the user will simply reload and the replica will have caught up. For transactional features — a payment confirmation, an order receipt — the link targets from broadcast-injected HTML should point to controller actions that read from the primary via connected_to(role: :writing).

load_async on background threads

load_async and role inheritance
# load_async can also produce surprising results with replicas.

class DashboardController < ApplicationController
  def show
    # These fire asynchronously on background threads:
    @recent_orders = Order.recent.load_async
    @stats = OrderStat.current.load_async

    # The main thread wrote something earlier in this request.
    # ActiveRecord's role switching is session-scoped.
    # But load_async runs on the async executor thread pool.
    #
    # Does the async thread inherit the session's "recent write" flag?
    #
    # In Rails 7.0: No. The async thread has no session context.
    #               It may go to the replica regardless of recent writes.
    #
    # In Rails 7.1+: Improved, but still depends on the resolver
    #                implementation. Custom resolvers may not propagate.
    #
    # If you are using load_async with read replicas,
    # test this explicitly in your specific Rails version.
  end
end

Rails 7.0 introduced load_async to execute queries on background threads, reducing page latency by parallelizing database work. The question is whether those background threads inherit the current request's session context — specifically, the "this session recently wrote" flag that prevents reads from going to the replica.

In Rails 7.0, they did not reliably inherit this context. In 7.1 and later, improvements were made, but the behavior depends on your resolver implementation. If you have written a custom resolver (as I recommended above for lag-awareness), you must ensure it handles threaded execution correctly. The prevent_writes check in your custom resolver must be thread-safe and must access the correct session context from the background thread.

The safest approach: test load_async with your specific Rails version and resolver. Write to the primary, then immediately load_async a read, and verify which database received the query. The PostgreSQL log will tell you.

ActionCable: a parallel universe

ActionCable and role switching
# ActionCable subscriptions have their own connection lifecycle.
# They do not participate in role switching at all.

class NotificationsChannel < ApplicationCable::Channel
  def subscribed
    # This query runs on... which database?
    # ActionCable connections are WebSocket-based.
    # They do not go through the DatabaseSelector middleware.
    # They use whatever connection pool ApplicationRecord defaults to.
    @user = User.find(params[:user_id])
    stream_for @user
  end

  def unread_count
    # This is a client-initiated action over the WebSocket.
    # No HTTP request. No session-based role switching.
    # Goes to whichever pool the model's connection handler resolves to.
    count = Notification.where(user: @user, read: false).count
    transmit(count: count)
  end
end

# If you want ActionCable queries to use the replica:
def unread_count
  ActiveRecord::Base.connected_to(role: :reading) do
    count = Notification.where(user: @user, read: false).count
    transmit(count: count)
  end
end

# If you want them to always use the primary (safe default):
def unread_count
  ActiveRecord::Base.connected_to(role: :writing) do
    count = Notification.where(user: @user, read: false).count
    transmit(count: count)
  end
end

ActionCable WebSocket connections do not go through the Rack middleware stack. They do not hit DatabaseSelector. They do not have session-based role switching. They use whatever connection the model resolves to by default, which depends on your ApplicationRecord configuration and whether the query is a read or a write.

For many applications, this is fine — ActionCable queries are typically lightweight (fetching a user record, checking a count) and staleness tolerance is high. But if your ActionCable channels are doing significant database work — loading complex associations, running aggregation queries — you should be explicit about which database they use. The connected_to block gives you that control.

Manual role switching: the naming trap and patterns that work

When automatic switching is insufficient, Rails provides connected_to for explicit control. There is a naming trap here worth a moment of your time.

The role naming trap
# When automatic switching is not enough:
class ReportsController < ApplicationController
  def generate
    # Force this entire action to read from primary,
    # because the report must reflect the latest data:
    ActiveRecord::Base.connected_to(role: :reading) do
      # Wait — this goes to the replica. That is the READING role.
    end

    # You wanted this:
    ActiveRecord::Base.connected_to(role: :writing) do
      @data = Order.where(created_at: 1.hour.ago..).to_a
    end

    # Common mistake: :reading is the replica, :writing is the primary.
    # The naming is about the ROLE, not your intent.
  end
end

The roles are named for their purpose in the connection topology, not for what you are doing. :reading means "the database designated for reads" — your replica. :writing means "the database designated for writes" — your primary. If you want to force a fresh read from the primary, you use role: :writing.

This is logical once you understand it, but the first time you write connected_to(role: :writing) to perform a read, it feels wrong. It is correct.

Here are patterns for the three most common manual switching scenarios, so you do not have to rediscover them.

Manual role switching patterns
# Patterns for manual role switching that actually work:

# Pattern 1: Force primary for a critical read
class OrdersController < ApplicationController
  def receipt
    # The user just paid. The receipt must show the correct total.
    # Do not trust the replica for this.
    ActiveRecord::Base.connected_to(role: :writing) do
      @order = Order.includes(:line_items, :payment).find(params[:id])
    end
  end
end

# Pattern 2: Force replica for an expensive read-only query
class AnalyticsController < ApplicationController
  def dashboard
    # This query scans millions of rows. Keep it off the primary.
    ActiveRecord::Base.connected_to(role: :reading) do
      @metrics = OrderMetric
        .where(period: 30.days.ago..Time.current)
        .group(:day)
        .select("date_trunc('day', created_at) AS day, SUM(total) AS revenue")
    end
  end
end

# Pattern 3: Mixed — read from primary, then expensive work on replica
class ShipmentsController < ApplicationController
  def create
    # Write to primary
    @shipment = Shipment.create!(shipment_params)

    # Heavy read for label generation — safe on replica
    # because we do not need the shipment we just created
    ActiveRecord::Base.connected_to(role: :reading) do
      @carrier_rates = CarrierRate.for_weight(@shipment.weight).to_a
    end
  end
end

Pattern 3 is worth highlighting. You can mix roles within a single controller action by using multiple connected_to blocks. The write goes to the primary. An unrelated expensive read goes to the replica. This is fine and sometimes the right approach — but it adds cognitive overhead. Every developer touching that controller must understand why different queries go to different databases. Comment generously.

The complete gotcha table

For reference, every gotcha documented above, its symptom, and the fix.

GotchaSymptomFix
delay measures wall-clock time, not replication lagStale reads for users who never wrote anythingQuery pg_stat_replication for real lag; custom resolver
Each role doubles connection pool requirementsConnection exhaustion after deploying multi-dbSize pools per role; use PgBouncer; audit Sidekiq needs
Turbo Frame lazy-loads may bypass session contextMain page shows fresh data, frames show stale dataEnsure credentials: "same-origin" on Turbo fetches
Turbo Stream broadcasts outrun the replicaUser clicks through from broadcast to a 404Link targets in broadcasts to primary-backed controllers
load_async may not inherit session write flagAsync queries hit replica despite recent writeTest async + replica in your exact Rails version
ActionCable bypasses DatabaseSelector middleware entirelyWebSocket queries go to unexpected databaseExplicit connected_to in channel methods
:reading role means replica, not "I am reading"Manual connected_to sends reads to wrong databaseUse :writing role when freshness matters
Sidekiq opens pools for both roles even if it only writesBackground workers waste connections on unused replica poolSeparate abstract base class for workers with writing-only

None of these are exotic. They are the natural consequences of a session-scoped wall-clock delay, connection pool arithmetic, and features that execute outside the request lifecycle. Any Rails application running read replicas on PostgreSQL will encounter at least one of them. Most will encounter three or four.

"Read replicas distribute traffic across multiple copies of your data. But add read replicas before fixing N+1 queries, and you are scaling the problem rather than solving it."

— from You Don't Need Redis, Chapter 16: Everything to Try Before You Shard

A production-grade lag-aware resolver

I promised the full implementation earlier. Here it is: a resolver that combines the standard session delay with real PostgreSQL replication lag measurement, cached to avoid per-request overhead.

config/initializers/database_selector.rb
# config/initializers/database_selector.rb
#
# A production-grade resolver that checks real replication lag
# instead of relying solely on session wall-clock delay.

class LagAwareResolver < ActiveRecord::Middleware::DatabaseSelector::Resolver
  STALE_THRESHOLD_BYTES = 1_048_576   # 1 MB of WAL
  LAG_CHECK_INTERVAL = 5.seconds       # Don't check on every request

  def read_from_primary?
    # First: the standard session-based delay check
    return true if prevent_writes

    # Second: periodic real lag measurement
    return true if replica_lagging?

    false
  end

  private

  def replica_lagging?
    # Cache the lag check to avoid hammering pg_stat_replication
    cached = Rails.cache.read("replication_lag_safe")
    return !cached unless cached.nil?

    safe = check_actual_lag
    Rails.cache.write("replication_lag_safe", safe, expires_in: LAG_CHECK_INTERVAL)
    !safe
  end

  def check_actual_lag
    result = ActiveRecord::Base.connected_to(role: :writing) do
      ActiveRecord::Base.connection.select_value(<<~SQL)
        SELECT COALESCE(
          MAX(pg_wal_lsn_diff(sent_lsn, replay_lsn)),
          0
        )
        FROM pg_stat_replication
      SQL
    end

    result.to_i < STALE_THRESHOLD_BYTES
  rescue => e
    Rails.logger.warn("Replication lag check failed: #{e.message}")
    false  # Fail safe: use primary if we cannot check
  end
end

# Wire it up:
# config.active_record.database_resolver = LagAwareResolver

This resolver checks two things. First, the standard session-based delay — if this user wrote recently, reads stay on the primary. Second, the actual replication state from pg_stat_replication, cached for 5 seconds to avoid hammering the primary with monitoring queries.

The fail-safe behavior matters: if the lag check query itself fails (network issue, permission problem), the resolver routes to the primary. Stale reads are worse than slightly higher primary load. This is a principle worth stating explicitly: when your routing logic encounters an error, always fail toward the primary. The primary is the source of truth. Failing toward the replica means serving stale data during exactly the moments when something has already gone wrong.

Tuning the threshold

Choosing your lag threshold
# How to choose STALE_THRESHOLD_BYTES:
#
# Step 1: Measure your normal baseline lag.
#
# Run this every 10 seconds for a day and look at the distribution:
#   SELECT MAX(pg_wal_lsn_diff(sent_lsn, replay_lsn))
#   FROM pg_stat_replication;
#
# Typical results for a mid-traffic Rails app:
#   p50: 8 KB       (perfectly caught up)
#   p90: 64 KB      (brief micro-bursts)
#   p99: 512 KB     (background job flurry)
#   max: 4 MB       (deploy migration ran)
#
# Step 2: Set threshold above p99, below your pain point.
#
#   STALE_THRESHOLD_BYTES = 2_097_152  # 2 MB
#
#   This means: "route to replica unless it is more than 2 MB behind."
#   Normal traffic never triggers it. Migrations and bulk imports do.
#
# Step 3: Monitor and adjust.
#
#   If you see the resolver falling back to primary too often,
#   your threshold is too low. If users report stale data,
#   it is too high. There is no universal correct value.
#
# Rule of thumb per write volume:
#   < 100 KB/s WAL:    1 MB threshold
#   100 KB - 1 MB/s:   2-5 MB threshold
#   > 1 MB/s WAL:      5-10 MB threshold

The 1 MB default in the resolver above is conservative, which is appropriate for a starting point. If your application writes heavily, you may need to raise it to avoid constantly falling back to the primary and defeating the purpose of having a replica. The monitoring setup below will give you the data to make this decision with confidence rather than guesswork.

Testing replica routing in development

I have observed a consistent pattern: teams deploy read replicas to production without ever testing them in development. The reasoning is understandable — setting up a local replica feels like overhead for a feature that "should just work." It does not just work. Not when lag spikes, not when Turbo fires, not when ActionCable bypasses the middleware. You need a local replica to test these scenarios.

Local replica testing with Docker
# Simulating replication lag in development
#
# You cannot test replica routing without a replica.
# Here is how to create one locally with Docker.

# docker-compose.yml
# services:
#   pg-primary:
#     image: postgres:16
#     environment:
#       POSTGRES_PASSWORD: dev
#     command: >
#       postgres
#       -c wal_level=replica
#       -c max_wal_senders=3
#       -c hot_standby=on
#     ports:
#       - "5432:5432"
#
#   pg-replica:
#     image: postgres:16
#     environment:
#       POSTGRES_PASSWORD: dev
#     depends_on:
#       - pg-primary
#     ports:
#       - "5433:5432"

# After setup, introduce artificial lag on the replica:
# Connect to the replica and run a long transaction.
# While the transaction is open, WAL replay pauses.

# Terminal 1 — on the replica (port 5433):
# BEGIN;
# SELECT pg_sleep(30);  -- Holds back replay for 30 seconds
# COMMIT;

# Terminal 2 — your Rails console:
# Write something to primary, then check what the replica sees.

# In Rails:
ActiveRecord::Base.connected_to(role: :writing) do
  Order.create!(total: 42.0, status: "test_lag")
end

# Immediately check the replica:
ActiveRecord::Base.connected_to(role: :reading) do
  order = Order.find_by(status: "test_lag")
  puts order.nil? ? "REPLICA BEHIND" : "REPLICA CAUGHT UP"
end

The technique for simulating lag is simple and effective: open a long-running transaction on the replica. While it is open, WAL replay pauses (to avoid conflicting with the open transaction). Write something to the primary, then check whether the replica can see it. If the replica is behind, the row will be missing.

This is the simplest possible integration test for your read replica setup, and I am continually surprised how many teams skip it. If your CI pipeline includes a PostgreSQL instance, adding a replica and running this check takes minutes to set up and catches the most common issues before they reach production.

Monitoring replication lag in production

A resolver that routes around lag is necessary. Knowing when lag occurs and why is equally necessary. Without monitoring, you are fixing symptoms without understanding causes.

Replication lag monitoring
# config/initializers/replication_monitoring.rb
#
# Emit lag metrics to your monitoring system.
# This runs independently of the resolver.

class ReplicationMonitor
  INTERVAL = 10.seconds

  def self.start
    Thread.new do
      loop do
        check_and_report
        sleep INTERVAL
      rescue => e
        Rails.logger.error("ReplicationMonitor: #{e.message}")
        sleep INTERVAL
      end
    end
  end

  def self.check_and_report
    ActiveRecord::Base.connected_to(role: :writing) do
      rows = ActiveRecord::Base.connection.execute(<<~SQL)
        SELECT
          client_addr,
          pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes,
          EXTRACT(EPOCH FROM replay_lag) AS lag_seconds
        FROM pg_stat_replication
      SQL

      rows.each do |row|
        addr = row["client_addr"]
        lag_bytes = row["lag_bytes"].to_i
        lag_seconds = row["lag_seconds"].to_f

        # Emit to StatsD, Datadog, Prometheus, etc.
        StatsD.gauge("postgres.replication.lag_bytes",
                     lag_bytes, tags: ["replica:#{addr}"])
        StatsD.gauge("postgres.replication.lag_seconds",
                     lag_seconds, tags: ["replica:#{addr}"])

        # Alert if lag exceeds threshold
        if lag_bytes > 10_000_000  # 10 MB
          Rails.logger.warn(
            "Replication lag alert: #{addr} is #{lag_bytes} bytes behind"
          )
        end
      end
    end
  end
end

# In config/initializers/replication_monitoring.rb:
# ReplicationMonitor.start if Rails.env.production?

Three metrics to track, in order of importance:

  1. replay_lag_bytes per replica — the primary metric. Graph it. Alert if it exceeds your resolver's threshold for more than 30 seconds. If it spikes during deploys, your migrations are generating large volumes of WAL. If it spikes at consistent times, you have a scheduled job or cron task producing write bursts.
  2. Resolver fallback rate — how often your lag-aware resolver routes reads to the primary instead of the replica. If this is above 5% during normal traffic, your threshold is too low or your replica is genuinely struggling. Either way, you are not getting the read offloading you deployed replicas for.
  3. Replica connection count vs. primary connection count — if these are equal and your application is write-heavy, you are wasting replica connections. If the replica count is near zero and your application is read-heavy, the resolver is not routing to the replica at all. Both are signals that something is misconfigured.

The monitoring thread in the example above is simple but effective. For a more thorough approach, consider using your existing monitoring infrastructure — Prometheus Ruby client, Datadog's pg integration, or New Relic's database monitoring — which can query pg_stat_replication directly without application-layer code.

An honest assessment: when replicas are worth the complexity

I have spent considerable time documenting what can go wrong. I should be equally candid about when read replicas are the right choice, because dismissing them entirely would be as dishonest as overselling them.

Read replicas genuinely earn their keep when:

  • Your primary's CPU is saturated by read queries. If your primary is spending 70% of its CPU on SELECT statements and 30% on writes, moving those reads to a replica gives your writes breathing room. This is the textbook use case, and it works.
  • You have expensive analytical queries competing with transactional workload. A 30-second report query holding locks and consuming I/O on the same instance that handles user-facing transactions is a problem. Running that report on a replica eliminates the contention entirely.
  • You need geographic read latency reduction. A replica in us-west-2 serving reads for West Coast users while the primary lives in us-east-1 reduces read latency by 60-80ms. Cross-region replication lag is higher, but for many read patterns, 200ms of staleness is acceptable.
  • You want a hot standby for failover. Even if you never route reads to it, a streaming replica that can be promoted to primary in seconds is valuable disaster recovery infrastructure.

Read replicas are not worth the complexity when:

  • Your primary is not under read pressure. If your primary is at 20% CPU and your p99 read latency is 15ms, adding a replica adds operational complexity without solving a problem you have. Optimize your queries first. Add appropriate indexes. Consider connection pooling. Replicas should be a response to measured pressure, not a prophylactic.
  • Your application has strong read-after-write consistency requirements everywhere. If every page in your application shows data that the user may have just modified, the replica is getting very little traffic after the delay routing sends most reads to the primary anyway. You have the operational burden of a replica with minimal benefit.
  • You have one or two expensive queries that could be optimized. I have seen teams add a replica to offload a dashboard query that could have been fixed with a single index and a materialized view. The index costs nothing to operate. The replica costs money, connections, monitoring, and complexity.

The honest position: read replicas are a legitimate tool for scaling reads, not a substitute for understanding your query performance. Add them when you have measured the need and exhausted simpler alternatives. When you do add them, deploy them with the resolver, monitoring, and testing described in this article — not with the default delay: 2.seconds and a prayer.

What Gold Lapel does differently

The resolver above is a meaningful improvement over the Rails default, but it still operates at the request level. Every read in a request goes to the same place — either all to the primary, or all to the replica. There is no per-query granularity. And the resolver, the monitoring, the PgBouncer configuration, the Sidekiq pool separation, the Turbo Frame verification — that is a meaningful amount of infrastructure to build and maintain for a feature that should, one might argue, work correctly out of the box.

Gold Lapel sits between your Rails application and PostgreSQL as a database proxy. It sees every query individually and can make routing decisions at the query level, not the request level. It does not care about HTTP sessions, Turbo Frame boundaries, ActionCable lifecycles, or load_async thread inheritance. It sees SQL. It routes SQL.

For read replica routing, this means three things:

  1. Real lag measurement. Gold Lapel queries pg_stat_replication continuously, not on a cached 5-second interval. It knows the exact byte position of every replica at all times. When a replica falls behind, Gold Lapel stops sending it reads before any session delay would notice.
  2. Per-query routing. A single request might contain a read that tolerates staleness (a dashboard count) and a read that does not (the order the user just created). Gold Lapel can route these differently. The dashboard count goes to the replica. The order goes to the primary. Rails' role switching cannot distinguish between them.
  3. Connection pooling per role. Gold Lapel maintains its own connection pools to the primary and each replica. Your Rails application connects to one endpoint — the proxy — with one pool. The doubling problem disappears. Whether you have one replica or five, your Rails database.yml has one entry and one pool size.

Add gem "goldlapel-rails" to your Gemfile and bundle. No custom resolver to maintain. No database.yml changes. No PgBouncer to configure. No monitoring thread to run. Gold Lapel handles the lag measurement, the routing decisions, and the connection multiplexing at the protocol level, where it belongs.

Your delay: 2.seconds becomes irrelevant — not because the problem went away, but because something that actually understands PostgreSQL replication state is making the decision instead.

Point your Rails application at Gold Lapel and the read replica routing problem becomes someone else's problem. Specifically, ours. We are quite good at it.

The household in order

Read replica routing is, at its core, a trust problem. You are trusting that data read from one place is sufficiently consistent with data written to another. Rails' default mechanism bases that trust on a stopwatch. PostgreSQL offers you an actual measurement of the gap. The resolver, monitoring, and testing approach described here replaces trust with verification.

To summarize the work ahead, if you are deploying or already running read replicas on a Rails application:

  1. Replace the default resolver with one that queries pg_stat_replication. The implementation above is complete and production-tested.
  2. Audit your connection pool math. Run pg_stat_activity on both primary and replica. Right-size pools per role. Deploy PgBouncer if the numbers demand it.
  3. Test every Hotwire integration point: Turbo Frames, Turbo Streams, ActionCable channels, load_async. Each has a different relationship with session-based role switching. Verify each one explicitly.
  4. Monitor replay_lag_bytes continuously. Alert on sustained lag. Correlate spikes with deploy events, cron jobs, and bulk operations.
  5. Set up a local replica in development. Simulate lag. Break things safely before production breaks them for you.

A well-managed household does not simply hire additional staff and hope for the best. It understands each staff member's capabilities, assigns duties accordingly, monitors performance, and intervenes when standards slip. Your read replica deserves the same consideration. It is, after all, serving your guests.

Frequently asked questions

Terms referenced in this article

A guest who found this useful tends also to find the connection pooler comparison worth a visit — when you are already thinking about how connections reach your replicas, the choice of pooler has rather significant implications.