Turbo Stream Broadcasts Are a PostgreSQL Query Multiplier: How One Write Becomes 100 Queries

Q: Does this affect all Turbo Stream broadcasts?

It affects broadcasts that render partials with database-backed content — which is most of them. If your partial is purely static HTML with no ActiveRecord calls, there are no queries to multiply. But any partial that touches an association, runs a count, or loads related data will produce the multiplication effect.

Q: Is this specific to Turbo Streams, or does it affect LiveView, HTMX, and other real-time frameworks?

The per-subscriber rendering pattern is specific to Turbo Streams and ActionCable. Phoenix LiveView, for example, maintains a single server-side process per client and pushes diffs rather than re-rendered HTML. HTMX typically triggers client-initiated requests rather than server-pushed broadcasts. However, any framework that renders server-side content per subscriber will exhibit similar multiplication — the mechanics differ but the database impact is analogous.

Q: Will PostgreSQL shared_buffers help with identical queries?

Partially. PostgreSQL shared_buffers ensures the data pages are in memory, so identical queries avoid disk I/O after the first execution. But shared_buffers does not eliminate the CPU cost of parsing, planning, and executing each query. One hundred identical queries still consume one hundred parse-plan-execute cycles. The I/O is cached; the CPU work is not.

Q: Can I use prepared statements to reduce the overhead?

Prepared statements eliminate the parse and plan phase on repeated executions, which helps — but only on the same connection. Each subscriber render that uses a different connection must prepare the statement again. And even with prepared statements, the execution phase still runs for every query. The improvement is meaningful (roughly 30-40% reduction in per-query overhead) but does not eliminate the fundamental multiplication.

Q: How does broadcast multiplication interact with PgBouncer?

PgBouncer in transaction mode multiplexes connections, which means broadcast renders share a smaller pool of PostgreSQL connections. This reduces the connection pressure but increases queuing — renders wait for an available connection. In statement mode, PgBouncer does not support prepared statements, which means each query is parsed fresh. The multiplication still happens; PgBouncer just changes where the bottleneck appears.

Q: Is the render-once pattern safe for all use cases?

No. If your partial contains per-user content — edit buttons visible only to admins, user-specific pricing, or content filtered by permissions — render-once will show the same content to everyone. The pattern is safe for content that is identical across subscribers: chat messages, activity feed items, status updates, and notifications. For mixed content, use the hybrid approach: render shared content once, load per-user elements via lazy Turbo Frames.

You called Message.create! once. PostgreSQL received 601 queries. Shall I walk you through the arithmetic?

The Waiter of Gold Lapel · Updated Mar 20, 2026 Published Mar 5, 2026 · 28 min read

One chat message generated enough queries to require a second tally sheet.

Good evening. Your broadcast has a multiplicand problem.

Turbo Streams are genuinely elegant. A user posts a message, and every subscriber's page updates in real time. No JavaScript to write. No manual DOM manipulation. No WebSocket plumbing to maintain. A few lines of Ruby and the chat room works. The demos are convincing. The experience in development is effortless. I have no quarrel with the abstraction itself.

The elegance, however, conceals a cost that scales in a direction you are not watching.

When after_create_commit fires a Turbo Stream broadcast — the Hotwire team's handbook documents the mechanism — the server renders the partial template once per subscriber. Not once for all subscribers. Once per subscriber. Each render is independent. Each render instantiates its own ActiveRecord objects. Each render fires its own queries against PostgreSQL. There is no shared state between renders, no memoization across the subscriber list, no awareness that the previous render — completed 4 milliseconds ago — asked exactly the same questions and received exactly the same answers.

One message. One hundred viewers. Six queries per partial render. That is 601 queries from a single INSERT.

This is not a bug. This is how the system is designed. It is a reasonable design for a set of constraints that most applications do not actually have. The question is whether you have accounted for it — and in my experience, teams discover this arithmetic in production monitoring, not in code review.

What follows is a thorough look at precisely where the queries come from, why the framework renders this way, what it costs your database, and what you can do about it. I have several recommendations, each with honest trade-offs. I shall not pretend any of them is perfect.

The anatomy of a broadcast callback

Start with the simplest possible Turbo Stream broadcast. A chat message model with a single callback.

app/models/message.rb

class Message < ApplicationRecord
  belongs_to :room
  belongs_to :user

  after_create_commit -> {
    broadcast_append_to room,
      partial: "messages/message",
      locals: { message: self }
  }
end

When a message is created, after_create_commit renders messages/_message.html.erb and pushes the HTML to every subscriber of the room's stream. Clean. Readable. The kind of code that gets approved in review without comment. It is, in fact, the exact pattern from the Turbo Streams documentation — which makes it the exact pattern most teams adopt.

Now look at what that partial actually contains.

app/views/messages/_message.html.erb

<%# app/views/messages/_message.html.erb %>
<div id="<%= dom_id(message) %>" class="message">
  <div class="message-header">
    <span class="author"><%= message.user.display_name %></span>
    <span class="timestamp"><%= l(message.created_at, format: :short) %></span>
    <span class="badge"><%= message.user.role_in(message.room) %></span>
  </div>
  <div class="message-body">
    <%= message.formatted_body %>
  </div>
  <div class="message-meta">
    <span class="reactions"><%= message.reactions.count %> reactions</span>
    <span class="room"><%= message.room.name %></span>
  </div>
</div>

Each line that touches an association is a potential database query. message.user.display_name loads the user. message.user.role_in(message.room) queries the memberships table. message.reactions.count runs a COUNT. message.room.name loads the room. None of these associations are preloaded by the broadcast callback — the default broadcast_append_to passes self as the message, which is a bare ActiveRecord instance with no eager-loaded relationships.

For a single render — one subscriber receiving one message — here is what PostgreSQL actually processes:

PostgreSQL query log — single subscriber

-- One message is created. One subscriber receives the broadcast.
-- Here is what PostgreSQL sees for that SINGLE render:

-- 1. The original INSERT (your code)
INSERT INTO messages (room_id, user_id, body, created_at)
VALUES (12, 42, 'Hello everyone', NOW());

-- 2. Broadcast fires. Partial renders. Queries begin:
SELECT "users".* FROM "users" WHERE "users"."id" = 42 LIMIT 1;

-- 3. message.user.display_name (if not preloaded)
SELECT "users"."first_name", "users"."last_name"
FROM "users" WHERE "users"."id" = 42;

-- 4. message.user.role_in(message.room)
SELECT "memberships".* FROM "memberships"
WHERE "memberships"."user_id" = 42
AND "memberships"."room_id" = 12 LIMIT 1;

-- 5. message.formatted_body (say it checks mentions)
SELECT "users"."id", "users"."username" FROM "users"
WHERE "users"."username" IN ('alice', 'bob');

-- 6. message.reactions.count
SELECT COUNT(*) FROM "reactions"
WHERE "reactions"."message_id" = 1847;

-- 7. message.room.name
SELECT "rooms"."name" FROM "rooms" WHERE "rooms"."id" = 12;

-- That is 6 queries to render ONE partial for ONE subscriber.

Six queries to render one partial for one person. Individually, each is fast. Sub-millisecond with proper indexes. The user lookup hits a primary key index. The membership check uses a composite index. The reaction count is a simple aggregate. Each query plans and executes in microseconds.

The problem is not the individual query. The problem is the multiplier.

The multiplication table nobody checks

Turbo Streams renders the partial separately for each subscriber. There is no "render once, distribute many" step in the default implementation. The Broadcastable concern — you can read the source yourself in the turbo-rails repository — iterates through subscribers and renders independently for each one.

The arithmetic is unforgiving.

Concurrent viewers	Partial renders	Queries per render	Total queries	Scenario
1	1	6	7	Just the INSERT + 6 partial queries
10	10	6	61	Busy Slack channel
50	50	6	301	Team standup channel
100	100	6	601	Company-wide announcement
500	500	6	3,001	Large org broadcast

A company-wide announcement channel with 500 subscribers. A single message generates 3,001 queries. If someone pastes a message and follows up with a correction — two messages — that is 6,002 queries in under a second.

I should note that these are not theoretical numbers. They are the direct consequence of the framework's documented behavior. The table is just multiplication. The uncomfortable part is that nobody does the multiplication before deploying to production.

The numbers above assume a modest partial with 6 queries. Production partials are frequently richer.

A more realistic partial — 9+ queries per render

<%# A more realistic partial — a project update notification %>
<div class="activity-item">
  <div class="actor">
    <%= image_tag message.user.avatar_url, class: "avatar" %>
    <strong><%= message.user.display_name %></strong>
    <span class="role"><%= message.user.team.name %></span>
  </div>
  <div class="content">
    <p><%= message.formatted_body %></p>
    <% if message.attachments.any? %>
      <div class="attachments">
        <% message.attachments.each do |attachment| %>
          <%= render partial: "attachments/thumbnail", locals: { attachment: attachment } %>
        <% end %>
      </div>
    <% end %>
  </div>
  <div class="context">
    <span class="project"><%= message.room.project.name %></span>
    <span class="channel"><%= message.room.name %></span>
    <span class="timestamp"><%= time_ago_in_words(message.created_at) %> ago</span>
    <span class="read-count"><%= message.read_receipts.count %> read</span>
  </div>
</div>

<%# Queries per render of THIS partial:
#  1. message.user (load user)
#  2. message.user.avatar_url (may hit ActiveStorage)
#  3. message.user.display_name
#  4. message.user.team (load team)
#  5. message.attachments (load attachments)
#  6. Each attachment.thumbnail (N+1 within the partial)
#  7. message.room.project (load project)
#  8. message.room (load room)
#  9. message.read_receipts.count (COUNT query)
#
# That is 9+ queries per render.
# With 100 subscribers: 900+ queries from one message.
%>

Nine queries per render. With 100 subscribers, that is 901 queries from one message. With attachments triggering N+1 queries inside the partial, the number climbs further. I have seen production partials that fire 15 or more queries per render — avatar URLs through ActiveStorage, nested team hierarchies, permission checks, unread counts, mention parsing. Each feature adds a query. Each query multiplies across the subscriber list.

The cruelest part: every query is identical

This is the detail that separates broadcast multiplication from ordinary high-traffic query load. It is also what makes it such a distinctive waste.

When 100 subscribers are viewing the same room and the same message is broadcast, all 100 partial renders execute the same queries with the same parameters. The same user is loaded 100 times. The same reaction count is computed 100 times. The same room name is fetched 100 times.

PostgreSQL query log — 100 identical copies

-- Here is the truly painful part.
-- All 100 renders execute the SAME queries with the SAME parameters.
-- PostgreSQL's query log shows:

SELECT "users".* FROM "users" WHERE "users"."id" = 42 LIMIT 1;
SELECT "users".* FROM "users" WHERE "users"."id" = 42 LIMIT 1;
SELECT "users".* FROM "users" WHERE "users"."id" = 42 LIMIT 1;
-- ... 97 more identical copies

SELECT COUNT(*) FROM "reactions" WHERE "reactions"."message_id" = 1847;
SELECT COUNT(*) FROM "reactions" WHERE "reactions"."message_id" = 1847;
SELECT COUNT(*) FROM "reactions" WHERE "reactions"."message_id" = 1847;
-- ... 97 more identical copies

-- 600 queries. All identical. All returning the same rows.
-- PostgreSQL dutifully executes each one from scratch.

PostgreSQL has no concept of "I just answered this 12 microseconds ago." Each query arrives on its own connection (or the same connection sequentially), gets parsed, planned, and executed independently. The buffer cache helps — the data pages are warm after the first execution, so subsequent queries avoid disk I/O — but the planning and execution overhead still accumulates. Each query consumes CPU for parsing, plan generation, executor startup, tuple retrieval, and result serialization. The I/O is cached. The CPU work is not.

Six hundred queries. All returning the same rows. All consuming CPU cycles, buffer pin locks, and connection time for work that has already been done. If you ran EXPLAIN (ANALYZE, BUFFERS) on any of them, you would see Buffers: shared hit across the board — everything served from shared memory, no disk reads. The irony is that PostgreSQL's buffer cache is working perfectly. It is just being asked the same question six hundred times when one answer would suffice.

How this differs from the N+1 problem

If you have encountered the N+1 query problem — the topic has its own dedicated guide — broadcast multiplication may look familiar. Both involve an unexpectedly large number of queries. Both stem from framework defaults that prioritize correctness over performance. But they are structurally different problems with different solutions.

Dimension	N+1 problem	Broadcast multiplication
Pattern	N different queries (varying parameters)	N identical queries (same parameters)
Trigger	Iterating through a collection in application code	Turbo Stream rendering partial per subscriber
Detection	Many similar queries with different WHERE values	Many identical queries with the same WHERE values
Scaling factor	Grows with data size (row count)	Grows with audience size (subscriber count)
ORM fix	Eager loading (includes, select_related, joinedload)	Render-once pattern, debouncing, morphing
Proxy fix	Query batching (collapse N into 1 IN clause)	Local caching (serve identical results from memory)

The critical distinction: N+1 queries scale with data size. Broadcast multiplication scales with audience size. An N+1 on a page with 200 orders generates 201 queries regardless of how many people view it. A broadcast with 200 subscribers generates 200 renders regardless of how many items are in each one.

And they compound. An N+1 inside a broadcast partial multiplies the inner problem by the outer one.

N+1 inside a broadcast — the compound problem

# The broadcast multiplication problem COMPOUNDS with N+1 queries
# inside the partial. They are two different problems that multiply.

# Consider: message.attachments.each in the partial.
# If a message has 3 attachments, each render fires:
#   1. SELECT * FROM attachments WHERE message_id = 1847  (load collection)
#   2. SELECT * FROM active_storage_blobs WHERE id = 91   (attachment 1)
#   3. SELECT * FROM active_storage_blobs WHERE id = 92   (attachment 2)
#   4. SELECT * FROM active_storage_blobs WHERE id = 93   (attachment 3)
#
# That is 4 queries for attachments alone, per render.
# Add to the 6 base queries: 10 queries per render.
# With 100 subscribers: 1,000 queries from one message.
#
# The N+1 inside the partial is multiplied by the broadcast.
# Fix the N+1 (use includes(:attachments, attachments: :blob))
# and you drop from 10 to 7 queries per render.
# But 7 × 100 is still 700 queries.
#
# The broadcast multiplication is the OUTER multiplier.
# The N+1 is the INNER multiplier.
# You need to address both.

Fix the N+1, and you reduce queries per render. Fix the broadcast multiplication, and you reduce the number of renders. For maximum effect, you need to address both. The N+1 is the inner multiplier. The broadcast is the outer multiplier. Reducing either one helps. Reducing both is transformative.

Why it renders per subscriber, not once

A reasonable question: why does Turbo not render the partial once and send the same HTML to everyone?

The per-subscriber rendering model

# Turbo Streams broadcasts render the partial once per subscriber.
# The ActionCable server maintains a list of subscribers for each stream.
# When broadcast_append_to fires:

# 1. ActionCable resolves subscriber list for "room_12"
# 2. For EACH subscriber:
#      a. Renders messages/_message.html.erb
#      b. Wraps HTML in <turbo-stream action="append">
#      c. Sends over WebSocket
#
# Each render is independent. Each render loads its own
# ActiveRecord objects. Each render fires its own queries.
#
# There is no shared cache between renders.
# There is no "render once, send many."
#
# This is by design — different subscribers might see different
# content based on permissions. But in practice, 95% of partials
# render identically for every viewer.

The design rationale is sound. In applications with per-user permissions, the partial might render differently for each subscriber. An admin sees an edit button. A regular user does not. A moderator sees a flag link. A user who has been muted sees a different set of controls. The only way to guarantee correct output per user is to render per user.

This is a defensive design — it assumes the worst case (per-user customization) to prevent the worst outcome (leaking content to unauthorized users). It is the same reasoning that leads ORMs to default to lazy loading: correctness first, performance second. And like lazy loading, it is the correct default for the general case and the wrong behavior for the common case.

In practice, the vast majority of broadcast partials render identically for all subscribers. Chat messages, activity feeds, notification lists, status updates, typing indicators — the HTML is the same for everyone. You are paying the per-subscriber rendering cost for per-subscriber customization you are not using.

I want to be precise about this: the framework is not wrong to make this choice. If you do have per-user content in broadcast partials, per-subscriber rendering is the correct behavior. The problem is that the framework applies this expensive safety measure uniformly, and provides no mechanism to opt out. There is no broadcast_append_to room, render: :once option. The developer must build the opt-out themselves.

The connection pool pressure nobody sees

The query count is the visible cost. The connection pool pressure is the hidden one.

Connection pool impact of broadcast rendering

-- Broadcast multiplication does not just consume CPU.
-- It consumes connections.

-- A typical Rails app with Puma runs 5 threads per worker,
-- 2-4 workers. That is 10-20 threads competing for
-- a connection pool of (usually) 5-10 connections.

-- When a broadcast fires for 100 subscribers:
-- - If rendering is synchronous (default): the broadcast
--   holds a connection for all 100 renders sequentially.
--   Duration: 100 renders × ~3ms each = 300ms of connection time.
--
-- - If rendering is async (ActionCable async adapter):
--   each render may check out its own connection.
--   100 concurrent connection checkouts against a pool of 10.
--   90 renders wait for a connection. Timeouts begin.

-- In both cases, the connection pool is under pressure that
-- is invisible to request-level monitoring. The broadcast
-- happens in a callback, not in a controller action.
-- Your APM tool shows the request completing in 50ms.
-- The 300ms of broadcast rendering does not appear.

A typical Rails application running Puma maintains a connection pool sized to match its thread count — 5 to 20 connections, depending on configuration. When a broadcast fires, the rendering process checks out connections from this same pool. If the broadcast is synchronous (the default behavior when ActionCable uses the async adapter), it holds a single connection and renders all subscribers sequentially. A hundred renders at 3ms each ties up a connection for 300ms. That is 300ms during which one fewer connection is available for handling web requests.

If you are using the Redis adapter for ActionCable with threaded rendering, the situation inverts: each render may attempt to check out its own connection concurrently. One hundred concurrent connection requests against a pool of 10 means 90 renders queue up waiting for a connection. If your checkout_timeout is 5 seconds (the Rails default), you will not see errors. You will see latency — renders that should complete in 3ms waiting 50-200ms for a connection. The broadcast still completes, but slowly. The web requests sharing that pool also slow down.

The insidious part: this latency does not appear in your APM tool's request traces. The broadcast fires in an after_create_commit callback, not in the controller action. The request that created the message shows a clean 50ms response time. The 300ms of broadcast rendering happens after the response has been sent. It is invisible to request-level monitoring and visible only in database connection wait times, which most teams do not monitor until something breaks.

"I have observed, in production systems, pages generating over 400 database round trips for what appeared to be a simple list view."
— from You Don't Need Redis, Chapter 3: The ORM Tax

Mitigation strategies: from simple to structural

There are several approaches to reducing broadcast query multiplication, each with different trade-offs. I have organized them by implementation complexity, and I shall be direct about where each one falls short.

Strategy	Complexity	Query reduction	Trade-off
Preload associations before broadcast	Low	30-60%	Only helps with N+1s inside the partial, not the per-subscriber multiplication
Render once, broadcast HTML	Medium	90-95%	All subscribers see identical content — no per-user customization
Debounced broadcasts	Medium	70-90%	Adds latency (typically 50-200ms). Messages arrive in bursts, not real-time.
Turbo 8 page morphing	Medium	50-80%	Requires Turbo 8+. Full page morph can be less surgical than targeted appends.
Background job rendering	Medium	0% (but shifts load)	Same total queries, but spread over time. Prevents request timeouts.
Russian doll caching on partials	Medium	40-70%	Cache invalidation complexity. First render still hits the database.
Gold Lapel local caching	None (proxy)	90-99%	Requires Gold Lapel proxy between app and database. Cache invalidation handled automatically on writes.

Render once, broadcast HTML

The most impactful application-level fix. Render the partial a single time, then broadcast the pre-rendered HTML string to all subscribers.

Render once, broadcast to many

class Message < ApplicationRecord
  belongs_to :room
  belongs_to :user

  after_create_commit :broadcast_to_room

  private

  def broadcast_to_room
    # Preload everything the partial needs
    message = Message
      .includes(:user, :room, :reactions, user: :team)
      .find(id)

    # Render the partial ONCE
    html = ApplicationController.render(
      partial: "messages/message",
      locals: { message: message }
    )

    # Broadcast pre-rendered HTML to all subscribers
    Turbo::StreamsChannel.broadcast_append_to(
      room,
      target: "messages",
      html: html
    )
  end
end

# Before: 100 subscribers = 600 queries
# After:  100 subscribers = 6 queries (one render)
#
# The tradeoff: every subscriber sees the same HTML.
# No per-user permissions in the partial.
# For chat messages, this is almost always fine.

This drops query count from 6 × N to a flat 6, regardless of subscriber count. One render, one set of queries, N WebSocket deliveries. The database cost becomes constant. Whether you have 10 subscribers or 10,000, PostgreSQL processes the same 6 queries.

The trade-off is that every subscriber receives identical HTML — no per-user customization in the partial. For chat messages and activity feeds, this is almost always acceptable. The message content, author name, timestamp, and reaction count are the same for everyone.

But if your partial includes per-user elements — an edit button for the author, a delete button for moderators, a "mark as read" toggle — the render-once pattern strips those out or shows them to everyone. This is not a minor concern. Showing an admin-only delete button to every user is a security issue, not just a UX one.

The hybrid approach: render-once with lazy per-user elements

For partials that need both shared content and per-user elements, there is a middle path.

Render shared content once, load per-user actions lazily

class Message < ApplicationRecord
  after_create_commit :broadcast_to_room

  private

  def broadcast_to_room
    message = Message
      .includes(:user, :room, :reactions, user: :team)
      .find(id)

    # Render the "public" part once — the content everyone sees
    shared_html = ApplicationController.render(
      partial: "messages/message_body",
      locals: { message: message }
    )

    # Broadcast the shared HTML to all subscribers
    Turbo::StreamsChannel.broadcast_append_to(
      room,
      target: "messages",
      html: shared_html
    )

    # Per-user elements (edit button, delete button, moderation tools)
    # are loaded client-side via a Turbo Frame that checks permissions:
    #
    #   <turbo-frame id="message_actions_<%= message.id %>"
    #                src="/messages/<%= message.id %>/actions"
    #                loading="lazy">
    #   </turbo-frame>
    #
    # The frame src hits a controller that checks current_user permissions
    # and returns the appropriate action buttons — or nothing.
    # One broadcast. One render. Per-user actions load on demand.
  end
end

Render the content that is identical for all subscribers once. Broadcast it. Then use a lazy-loaded Turbo Frame for the per-user elements. Each client requests its own action buttons through a standard HTTP request, which hits the controller, checks current_user, and returns the appropriate controls.

The database cost: 6 queries for the shared render, plus 1 query per subscriber for the permissions check. That is 106 queries for 100 subscribers instead of 600. Not as clean as pure render-once, but a substantial reduction — and the per-user elements are correctly scoped.

Debounced broadcasts

In high-velocity channels where messages arrive in rapid bursts — active chat rooms, CI notification feeds, trading floors — debouncing collapses multiple broadcasts into one.

Debounced broadcast with SolidQueue

class Message < ApplicationRecord
  belongs_to :room
  belongs_to :user

  after_create_commit :schedule_broadcast

  private

  def schedule_broadcast
    # Debounce: wait 100ms, then broadcast all new messages at once
    BroadcastMessagesJob.set(wait: 0.1.seconds).perform_later(room_id)
  end
end

class BroadcastMessagesJob < ApplicationJob
  # Deduplicate — if 5 messages arrive in 100ms,
  # only one job actually runs
  self.queue_adapter = :solid_queue
  limits_concurrency to: 1, key: ->(room_id) { "broadcast_room_#{room_id}" }

  def perform(room_id)
    room = Room.find(room_id)
    recent = room.messages
      .includes(:user, :reactions, user: :team)
      .where("created_at > ?", 1.second.ago)
      .order(:created_at)

    html = ApplicationController.render(
      partial: "messages/message_batch",
      locals: { messages: recent }
    )

    Turbo::StreamsChannel.broadcast_append_to(
      room,
      target: "messages",
      html: html
    )
  end
end

# 5 messages in rapid succession:
# Without debouncing: 5 broadcasts x 100 viewers x 6 queries = 3,000 queries
# With debouncing:    1 broadcast  x 1 render    x 8 queries = 8 queries

Five messages in 100ms become one batch broadcast instead of five separate renders. Combined with render-once, this takes you from 3,000 queries (5 messages, 100 viewers, 6 queries each) down to roughly 8.

The trade-off is latency. Messages no longer appear the instant they are created — they arrive in batches, 100-200ms after the last message in a burst. For most chat applications, this delay is imperceptible. For real-time trading applications or live auction systems, it may not be acceptable. You are trading immediacy for efficiency, which is usually the right trade, but it is a trade nonetheless.

I should note that debouncing also affects the user experience of rapid-fire conversations. Instead of seeing messages appear one by one in real time, subscribers see a batch of messages materialize at once. This can feel less "live" than the default behavior. Whether this matters depends entirely on the product. For asynchronous collaboration tools, it is irrelevant. For a product that competes on real-time feel, it is worth testing with users.

Russian doll caching

Rails fragment caching can intercept the per-subscriber rendering cost by caching the rendered HTML output of each partial.

Fragment caching on broadcast partials

# Russian doll caching wraps partial fragments in cache blocks.
# If the cache key matches, Rails serves cached HTML without
# executing the partial — and without firing any queries.

<%# app/views/messages/_message.html.erb %>
<% cache message do %>
  <div id="<%= dom_id(message) %>" class="message">
    <div class="message-header">
      <% cache [message, message.user] do %>
        <span class="author"><%= message.user.display_name %></span>
        <span class="badge"><%= message.user.role_in(message.room) %></span>
      <% end %>
    </div>
    <div class="message-body">
      <%= message.formatted_body %>
    </div>
    <div class="message-meta">
      <span class="reactions"><%= message.reactions.count %> reactions</span>
    </div>
  </div>
<% end %>

# First render for a new message: cache miss, all queries fire.
# Subsequent renders (remaining 99 subscribers): cache hit.
# Zero queries. Rails serves the cached HTML fragment.
#
# Reduction: from 600 queries to ~6 (first render only).
#
# The catch: cache invalidation.
# message.reactions.count changes every time someone reacts.
# If the cache key includes updated_at, a reaction invalidates
# the cache for ALL fragments that include the message.
# And the next broadcast renders all 100 from scratch again.
#
# Russian doll caching works best for content that changes
# infrequently. For live, reactive data — reactions, read counts,
# typing indicators — the cache churn can negate the benefit.

When it works, Russian doll caching is remarkably effective. The first subscriber's render fires all 6 queries and stores the result in Rails.cache. The remaining 99 subscriber renders hit the cache and fire zero queries. Total cost: 6 queries instead of 600.

When it does not work — and this is the honest part — it does not work in a way that is difficult to diagnose. If any data in the partial changes frequently (reaction counts, read receipts, "last seen" timestamps), the cache invalidates on every change, and the next broadcast re-renders from scratch. A partial with message.reactions.count that updates every time someone adds an emoji will churn the cache so rapidly that caching provides no benefit. You have added complexity without reducing load.

Fragment caching also introduces a class of bugs that are uniquely frustrating: stale content served from cache. A user updates their display name. The cached partial still shows the old name until the cache key expires or is explicitly invalidated. The correct cache key design prevents this, but the correct cache key design for a partial with 6 associations and a count aggregate is not trivial to get right.

Turbo 8 page morphing

Turbo 8 introduced an alternative model — the Hotwire team documents it in their page refreshes handbook, and it is worth reading if you are considering this path. It sidesteps partial rendering entirely.

Turbo 8 broadcasts_refreshes

# Turbo 8 introduces page morphing as an alternative to
# granular Turbo Stream actions (append, prepend, replace).
#
# Instead of rendering a partial per subscriber, the server
# broadcasts a "refresh" signal. Each client re-requests the
# page and Turbo morphs the DOM to match.

# In your model — no partial rendering at all:
class Message < ApplicationRecord
  broadcasts_refreshes
end

# In your layout:
# <head>
#   <%= turbo_refreshes_with method: :morph, scroll: :preserve %>
# </head>

# What happens when a message is created:
# 1. Server broadcasts: { action: "refresh" } (no HTML, no queries)
# 2. Each client fetches GET /rooms/12 (standard page load)
# 3. Turbo morphs the existing DOM to match the new response

# Trade-offs:
# + No partial rendering on broadcast — zero extra queries at broadcast time
# + Each client request hits normal Rails caching (fragment, HTTP, etc.)
# - Each client makes a full HTTP request (adds load to web servers)
# - Morph can cause visual flicker on complex pages
# - Requires Turbo 8+ (released late 2023)

Instead of rendering a partial and pushing HTML, the server sends a lightweight "refresh" signal. Each client re-requests the page through normal HTTP, and Turbo morphs the DOM to match the new response. No partial rendering on broadcast means zero additional queries at broadcast time. Zero.

The load shifts from broadcast-time database queries to HTTP request-time page renders. Those page renders benefit from standard Rails caching — fragment caching, HTTP caching, and Russian doll caching all apply. It is a fundamentally different performance profile: instead of 600 identical database queries in a burst, you get 100 HTTP requests spread over a few hundred milliseconds, each served from cache.

The trade-offs are real, and I shall enumerate them because they matter for production deployments:

Web server load. Each client makes a full HTTP request. One hundred subscribers means 100 GET requests to your web server within a few hundred milliseconds. If your web server is provisioned for normal traffic patterns, a broadcast to a large channel creates a temporary spike. This is usually manageable — HTTP requests are cheaper than database queries — but it is a load pattern you should monitor.

Visual flicker. Morph replaces DOM nodes that have changed. On complex pages with animations, transitions, or ephemeral UI state (open dropdowns, text selections, scroll positions), the morph can cause visible flicker. Turbo 8's scroll: :preserve helps with scroll position, but it cannot preserve all client-side state. Test with your actual pages, not just with the Turbo demo chat app.

Version requirement. Morph requires Turbo 8 or later, which was released in late 2023. If your application is on Turbo 7 or earlier, upgrading is non-trivial — Turbo 8 changed several behaviors around form submissions and navigation. This is not a "change one line" migration.

Correctness. Because each client fetches the full page and morphs its DOM, per-user permissions work naturally. The admin sees the admin view. The regular user sees the regular view. No render-once compromises required. This is the cleanest solution from a correctness standpoint.

Background job rendering

Moving broadcast rendering to a background job (Sidekiq, GoodJob, SolidQueue) does not reduce the total number of queries. All 600 queries still execute. What it does is remove them from the request cycle. The message creation returns immediately. The broadcast renders asynchronously. Your web server connections are freed.

This is a pragmatic mitigation when you cannot change the rendering strategy but need to prevent broadcast rendering from blocking web requests. It is not a solution to the multiplication problem — it is a deferral. The database still does the work. It just does it on the background job's schedule rather than synchronously in the callback.

I mention it because it is often the first thing teams reach for, and it is important to understand what it does and does not accomplish. It prevents request timeouts. It does not reduce database load.

Finding broadcast multiplication in production

The pg_stat_statements extension is the fastest way to identify whether broadcast multiplication is affecting your database.

Detecting broadcast query patterns

-- Find broadcast-generated query patterns in production:
SELECT query,
       calls,
       mean_exec_time,
       total_exec_time,
       rows
FROM pg_stat_statements
WHERE query LIKE 'SELECT%users%WHERE%id = $1%'
   OR query LIKE 'SELECT COUNT%reactions%'
   OR query LIKE 'SELECT%rooms%WHERE%id = $1%'
ORDER BY calls DESC
LIMIT 20;

-- What you are looking for:
-- Queries with extremely high call counts relative to unique parameter values.
-- If SELECT * FROM users WHERE id = $1 has 50,000 calls/hour
-- but only 200 unique user IDs, something is rendering
-- the same user record 250 times per user per hour.
-- That is broadcast multiplication.

The signature is distinctive. Queries with very high call counts but very low cardinality of parameter values. If SELECT * FROM users WHERE id = $1 is called 50,000 times per hour but your application only has 200 active users, something is fetching the same users hundreds of times per hour. In a standard N+1 scenario, each query would have a different parameter. In broadcast multiplication, the parameters cluster — the same IDs appear over and over because every subscriber's render loads the same message author.

Cross-reference with your ActionCable subscriber counts. If your busiest room has 80 subscribers and your highest-frequency query is called 80x more than expected, the correlation is not coincidental.

Correlating call counts with subscriber activity

-- Compare call counts during business hours vs. off-hours.
-- Broadcast multiplication correlates with WebSocket subscriber count,
-- which correlates with active users, which peaks during work hours.

-- If your query call counts spike 10x from 2 AM to 10 AM alongside
-- your WebSocket connection count, you have found the correlation.

-- Also check: do call counts spike on specific events?
-- A company all-hands, a product launch, a Slack-style "here" announcement
-- that puts 500 people in one channel at the same time.

-- The query:
SELECT query,
       calls,
       total_exec_time,
       mean_exec_time,
       stddev_exec_time,
       rows
FROM pg_stat_statements
WHERE calls > 10000
  AND mean_exec_time < 1.0  -- fast individually
  AND total_exec_time > 5000 -- expensive in aggregate
ORDER BY calls DESC
LIMIT 30;

-- The pattern you are looking for:
-- Extremely fast queries (< 1ms) with extremely high call counts.
-- Each one is cheap. The sum is not.

The temporal correlation is the strongest diagnostic signal. Broadcast multiplication is directly proportional to concurrent WebSocket connections, which is directly proportional to active users, which peaks during business hours. If your query call counts follow the same curve as your ActionCable subscriber counts, you are looking at broadcast multiplication.

A second signal: burst patterns. Normal application traffic produces steady query rates. Broadcast multiplication produces spikes — a sudden burst of identical queries when a message is sent to a popular channel, followed by silence. If pg_stat_statements shows a query with a very high calls count but also a very high stddev_exec_time relative to mean_exec_time, the execution times are not consistent. Some executions are fast (buffer cache warm), some are slower (buffer cache cold at the start of a burst). That variance is the fingerprint of bursty, broadcast-driven traffic.

An honest word about when this does not matter

I should be forthcoming about the scenarios where broadcast multiplication is not a problem worth solving, because overstating the case would be a disservice to you and an embarrassment to me.

Small subscriber counts. If your application's busiest room has 5 subscribers, the multiplication produces 30 additional queries per message. PostgreSQL will not notice. Your monitoring will not notice. Your users will not notice. The total execution time is under 5ms. Optimizing this is engineering theater — effort that produces no measurable improvement. If your subscriber counts are in the single digits, you have better things to work on.

Low message frequency. A channel that receives one message per minute with 100 subscribers generates 600 queries per minute. That is 10 queries per second. For any PostgreSQL instance provisioned for a production workload, 10 queries per second is negligible. The multiplication matters when message frequency and subscriber count are both high — active chat in a large channel, rapid-fire notifications in a busy team, automated alerts flooding a monitoring room.

Partials with no database access. If your broadcast partial is purely static HTML — a typing indicator, a presence status dot, a "user is online" badge — there are no queries to multiply. The broadcast still renders per subscriber, but each render is a template evaluation with no database cost. The concern in this article applies specifically to partials that touch ActiveRecord associations.

The threshold where broadcast multiplication becomes a genuine problem is roughly: subscriber count above 30, queries per partial above 4, and message frequency above a few per minute. Below that, the database handles it without distress. Above it, the cost scales faster than most teams expect.

What Gold Lapel does with 100 identical queries

The application-level mitigations above are all sound engineering. Render-once eliminates the problem at the source. Debouncing reduces the frequency. Turbo 8 morphing restructures the architecture entirely. Russian doll caching intercepts repeated renders. Each requires code changes, testing, and trade-off decisions. Each is the right approach in certain circumstances.

Gold Lapel approaches the problem from the other side of the wire.

Gold Lapel local caching in action

-- Without Gold Lapel: 100 broadcasts, 100 identical queries
-- PostgreSQL executes each one independently.

-- Query 1:  SELECT * FROM users WHERE id = 42;    -- 0.3ms (disk/buffer)
-- Query 2:  SELECT * FROM users WHERE id = 42;    -- 0.3ms (same work)
-- Query 3:  SELECT * FROM users WHERE id = 42;    -- 0.3ms (same work)
-- ...
-- Query 100: SELECT * FROM users WHERE id = 42;   -- 0.3ms (same work)
-- Total: 30ms of PostgreSQL CPU for identical results.

-- With Gold Lapel: local caching intercepts after the first execution.
-- Query 1:  SELECT * FROM users WHERE id = 42;    -- 0.3ms (hits PostgreSQL)
-- Query 2:  SELECT * FROM users WHERE id = 42;    -- 0.02ms (LRU cache hit)
-- Query 3:  SELECT * FROM users WHERE id = 42;    -- 0.02ms (cache hit)
-- ...
-- Query 100: SELECT * FROM users WHERE id = 42;   -- 0.02ms (cache hit)
-- Total: 2.3ms. Same 100 queries. 99 served from cache.
--
-- Cache invalidation? Automatic. When an INSERT or UPDATE
-- touches the users table, Gold Lapel evicts affected entries.
-- The next SELECT executes against PostgreSQL. Fresh data. No stale reads.

Gold Lapel is a PostgreSQL proxy that sits between your Rails application and the database. Its local cache operates at the wire protocol level — when the same query with the same parameters arrives, Gold Lapel returns the cached result without touching PostgreSQL. The application is unaware. Rails sends 601 queries. PostgreSQL receives 7.

For broadcast multiplication, this is nearly perfect. The pattern is 100 renders of the same partial, each loading the same user, counting the same reactions, fetching the same room name. The first render's queries hit PostgreSQL. The remaining 99 are served from Gold Lapel's local cache in microseconds. No parse. No plan. No execute. Just a cache lookup and a wire protocol response.

The math: 601 queries reach Gold Lapel. 7 are unique (the INSERT plus 6 distinct SELECTs). 594 are cache hits. PostgreSQL processes 7 queries instead of 601. Your database CPU drops by 99%. Your connection pool pressure drops by 99%. The broadcast that was consuming 300ms of connection time now consumes 3ms.

Cache invalidation is automatic. When the next INSERT or UPDATE modifies the users, reactions, or rooms table, Gold Lapel evicts the affected cache entries. The next query executes against PostgreSQL and populates a fresh cache entry. No stale data. No manual invalidation logic. No cache keys to manage. No touch: true chains to maintain. The proxy observes the write traffic and invalidates accordingly.

And because Gold Lapel's auto-indexing ensures those queries are already running on optimal indexes, the first execution — the one that actually hits PostgreSQL — is fast too. The combination means broadcast multiplication goes from a scaling crisis to a rounding error.

No query changes. No architectural restructuring. No Turbo version upgrade. No render-once refactoring. No debouncing jobs. Add gem "goldlapel" to your Gemfile and the 601 queries become 7.

I should note what this does not do: it does not reduce the number of partial renders on the Rails side. Your application still renders 100 partials, still instantiates 100 sets of ActiveRecord objects, still builds 100 HTML strings. The CPU cost on the Rails side is unchanged. What changes is the database cost — the queries that back those renders are served from proxy-level cache instead of hitting PostgreSQL. For most applications, the database is the bottleneck, not the Rails renderer. But if your profiling shows that the rendering itself (ERB compilation, string building, WebSocket serialization) is the bottleneck, you need an application-level solution like render-once.

Combining strategies: a practical recommendation

If you have read this far, you may be wondering which strategy to adopt. Allow me to offer a practical recommendation based on the production systems I have observed.

Start with render-once. For every broadcast partial that renders identically for all subscribers — chat messages, activity items, notifications — switch to the render-once pattern. This is the most impactful change. One code modification per model, and the database cost becomes constant regardless of subscriber count. If the partial has per-user elements, use the hybrid approach with lazy Turbo Frames.

Add debouncing for high-velocity channels. If any of your rooms receive bursts of messages (more than 2-3 per second), debouncing reduces the number of broadcasts. Combined with render-once, the reduction is dramatic. Five messages to 100 subscribers goes from 3,000 queries to 8.

Consider Turbo 8 morphing for your next major upgrade. If you are starting a new project or planning a Turbo version upgrade, broadcasts_refreshes is the cleanest long-term architecture. It eliminates partial rendering entirely, handles per-user permissions naturally, and shifts the caching to HTTP-level mechanisms that Rails already excels at. But do not undertake a Turbo 8 migration solely for this benefit — the migration cost is real.

Use a proxy-level cache as a safety net. Application-level fixes address the broadcasts you know about and refactor. A proxy-level cache catches everything — including the broadcasts you have not refactored yet, the new features that introduce new broadcasts, and the edge cases that only appear under production load. The two approaches are complementary, not competing.

The order matters. Render-once is the fix. Debouncing is the optimization. Morphing is the architecture. Proxy caching is the safety net. Each layer reduces what the next layer needs to handle.

Frequently asked questions

Does this affect all Turbo Stream broadcasts?

Is this specific to Turbo Streams, or does it affect LiveView, HTMX, and other real-time frameworks?

Will PostgreSQL shared_buffers help with identical queries?

Can I use prepared statements to reduce the overhead?

How does broadcast multiplication interact with PgBouncer?

Is the render-once pattern safe for all use cases?

Terms referenced in this article

I have taken the liberty of preparing a guide on a related theme. The Rails counter_cache contention guide addresses another pattern where a single write triggers surprisingly many database operations — and where materialized views offer the same escape route you have just seen for broadcasts.