Turbo Stream Broadcasts Are a PostgreSQL Query Multiplier: How One Write Becomes 100 Queries
You called Message.create! once. PostgreSQL received 601 queries. Shall I walk you through the arithmetic?
Good evening. Your broadcast has a multiplicand problem.
Turbo Streams are genuinely elegant. A user posts a message, and every subscriber's page updates in real time. No JavaScript to write. No manual DOM manipulation. No WebSocket plumbing to maintain. A few lines of Ruby and the chat room works. The demos are convincing. The experience in development is effortless. I have no quarrel with the abstraction itself.
The elegance, however, conceals a cost that scales in a direction you are not watching.
When after_create_commit fires a Turbo Stream broadcast — the Hotwire team's handbook documents the mechanism — the server renders the partial template once per subscriber. Not once for all subscribers. Once per subscriber. Each render is independent. Each render instantiates its own ActiveRecord objects. Each render fires its own queries against PostgreSQL. There is no shared state between renders, no memoization across the subscriber list, no awareness that the previous render — completed 4 milliseconds ago — asked exactly the same questions and received exactly the same answers.
One message. One hundred viewers. Six queries per partial render. That is 601 queries from a single INSERT.
This is not a bug. This is how the system is designed. It is a reasonable design for a set of constraints that most applications do not actually have. The question is whether you have accounted for it — and in my experience, teams discover this arithmetic in production monitoring, not in code review.
What follows is a thorough look at precisely where the queries come from, why the framework renders this way, what it costs your database, and what you can do about it. I have several recommendations, each with honest trade-offs. I shall not pretend any of them is perfect.
The anatomy of a broadcast callback
Start with the simplest possible Turbo Stream broadcast. A chat message model with a single callback.
class Message < ApplicationRecord
belongs_to :room
belongs_to :user
after_create_commit -> {
broadcast_append_to room,
partial: "messages/message",
locals: { message: self }
}
end When a message is created, after_create_commit renders messages/_message.html.erb and pushes the HTML to every subscriber of the room's stream. Clean. Readable. The kind of code that gets approved in review without comment. It is, in fact, the exact pattern from the Turbo Streams documentation — which makes it the exact pattern most teams adopt.
Now look at what that partial actually contains.
<%# app/views/messages/_message.html.erb %>
<div id="<%= dom_id(message) %>" class="message">
<div class="message-header">
<span class="author"><%= message.user.display_name %></span>
<span class="timestamp"><%= l(message.created_at, format: :short) %></span>
<span class="badge"><%= message.user.role_in(message.room) %></span>
</div>
<div class="message-body">
<%= message.formatted_body %>
</div>
<div class="message-meta">
<span class="reactions"><%= message.reactions.count %> reactions</span>
<span class="room"><%= message.room.name %></span>
</div>
</div> Each line that touches an association is a potential database query. message.user.display_name loads the user. message.user.role_in(message.room) queries the memberships table. message.reactions.count runs a COUNT. message.room.name loads the room. None of these associations are preloaded by the broadcast callback — the default broadcast_append_to passes self as the message, which is a bare ActiveRecord instance with no eager-loaded relationships.
For a single render — one subscriber receiving one message — here is what PostgreSQL actually processes:
-- One message is created. One subscriber receives the broadcast.
-- Here is what PostgreSQL sees for that SINGLE render:
-- 1. The original INSERT (your code)
INSERT INTO messages (room_id, user_id, body, created_at)
VALUES (12, 42, 'Hello everyone', NOW());
-- 2. Broadcast fires. Partial renders. Queries begin:
SELECT "users".* FROM "users" WHERE "users"."id" = 42 LIMIT 1;
-- 3. message.user.display_name (if not preloaded)
SELECT "users"."first_name", "users"."last_name"
FROM "users" WHERE "users"."id" = 42;
-- 4. message.user.role_in(message.room)
SELECT "memberships".* FROM "memberships"
WHERE "memberships"."user_id" = 42
AND "memberships"."room_id" = 12 LIMIT 1;
-- 5. message.formatted_body (say it checks mentions)
SELECT "users"."id", "users"."username" FROM "users"
WHERE "users"."username" IN ('alice', 'bob');
-- 6. message.reactions.count
SELECT COUNT(*) FROM "reactions"
WHERE "reactions"."message_id" = 1847;
-- 7. message.room.name
SELECT "rooms"."name" FROM "rooms" WHERE "rooms"."id" = 12;
-- That is 6 queries to render ONE partial for ONE subscriber. Six queries to render one partial for one person. Individually, each is fast. Sub-millisecond with proper indexes. The user lookup hits a primary key index. The membership check uses a composite index. The reaction count is a simple aggregate. Each query plans and executes in microseconds.
The problem is not the individual query. The problem is the multiplier.
The multiplication table nobody checks
Turbo Streams renders the partial separately for each subscriber. There is no "render once, distribute many" step in the default implementation. The Broadcastable concern — you can read the source yourself in the turbo-rails repository — iterates through subscribers and renders independently for each one.
The arithmetic is unforgiving.
| Concurrent viewers | Partial renders | Queries per render | Total queries | Scenario |
|---|---|---|---|---|
| 1 | 1 | 6 | 7 | Just the INSERT + 6 partial queries |
| 10 | 10 | 6 | 61 | Busy Slack channel |
| 50 | 50 | 6 | 301 | Team standup channel |
| 100 | 100 | 6 | 601 | Company-wide announcement |
| 500 | 500 | 6 | 3,001 | Large org broadcast |
A company-wide announcement channel with 500 subscribers. A single message generates 3,001 queries. If someone pastes a message and follows up with a correction — two messages — that is 6,002 queries in under a second.
I should note that these are not theoretical numbers. They are the direct consequence of the framework's documented behavior. The table is just multiplication. The uncomfortable part is that nobody does the multiplication before deploying to production.
The numbers above assume a modest partial with 6 queries. Production partials are frequently richer.
<%# A more realistic partial — a project update notification %>
<div class="activity-item">
<div class="actor">
<%= image_tag message.user.avatar_url, class: "avatar" %>
<strong><%= message.user.display_name %></strong>
<span class="role"><%= message.user.team.name %></span>
</div>
<div class="content">
<p><%= message.formatted_body %></p>
<% if message.attachments.any? %>
<div class="attachments">
<% message.attachments.each do |attachment| %>
<%= render partial: "attachments/thumbnail", locals: { attachment: attachment } %>
<% end %>
</div>
<% end %>
</div>
<div class="context">
<span class="project"><%= message.room.project.name %></span>
<span class="channel"><%= message.room.name %></span>
<span class="timestamp"><%= time_ago_in_words(message.created_at) %> ago</span>
<span class="read-count"><%= message.read_receipts.count %> read</span>
</div>
</div>
<%# Queries per render of THIS partial:
# 1. message.user (load user)
# 2. message.user.avatar_url (may hit ActiveStorage)
# 3. message.user.display_name
# 4. message.user.team (load team)
# 5. message.attachments (load attachments)
# 6. Each attachment.thumbnail (N+1 within the partial)
# 7. message.room.project (load project)
# 8. message.room (load room)
# 9. message.read_receipts.count (COUNT query)
#
# That is 9+ queries per render.
# With 100 subscribers: 900+ queries from one message.
%> Nine queries per render. With 100 subscribers, that is 901 queries from one message. With attachments triggering N+1 queries inside the partial, the number climbs further. I have seen production partials that fire 15 or more queries per render — avatar URLs through ActiveStorage, nested team hierarchies, permission checks, unread counts, mention parsing. Each feature adds a query. Each query multiplies across the subscriber list.
The cruelest part: every query is identical
This is the detail that separates broadcast multiplication from ordinary high-traffic query load. It is also what makes it such a distinctive waste.
When 100 subscribers are viewing the same room and the same message is broadcast, all 100 partial renders execute the same queries with the same parameters. The same user is loaded 100 times. The same reaction count is computed 100 times. The same room name is fetched 100 times.
-- Here is the truly painful part.
-- All 100 renders execute the SAME queries with the SAME parameters.
-- PostgreSQL's query log shows:
SELECT "users".* FROM "users" WHERE "users"."id" = 42 LIMIT 1;
SELECT "users".* FROM "users" WHERE "users"."id" = 42 LIMIT 1;
SELECT "users".* FROM "users" WHERE "users"."id" = 42 LIMIT 1;
-- ... 97 more identical copies
SELECT COUNT(*) FROM "reactions" WHERE "reactions"."message_id" = 1847;
SELECT COUNT(*) FROM "reactions" WHERE "reactions"."message_id" = 1847;
SELECT COUNT(*) FROM "reactions" WHERE "reactions"."message_id" = 1847;
-- ... 97 more identical copies
-- 600 queries. All identical. All returning the same rows.
-- PostgreSQL dutifully executes each one from scratch. PostgreSQL has no concept of "I just answered this 12 microseconds ago." Each query arrives on its own connection (or the same connection sequentially), gets parsed, planned, and executed independently. The buffer cache helps — the data pages are warm after the first execution, so subsequent queries avoid disk I/O — but the planning and execution overhead still accumulates. Each query consumes CPU for parsing, plan generation, executor startup, tuple retrieval, and result serialization. The I/O is cached. The CPU work is not.
Six hundred queries. All returning the same rows. All consuming CPU cycles, buffer pin locks, and connection time for work that has already been done. If you ran EXPLAIN (ANALYZE, BUFFERS) on any of them, you would see Buffers: shared hit across the board — everything served from shared memory, no disk reads. The irony is that PostgreSQL's buffer cache is working perfectly. It is just being asked the same question six hundred times when one answer would suffice.
How this differs from the N+1 problem
If you have encountered the N+1 query problem — the topic has its own dedicated guide — broadcast multiplication may look familiar. Both involve an unexpectedly large number of queries. Both stem from framework defaults that prioritize correctness over performance. But they are structurally different problems with different solutions.
| Dimension | N+1 problem | Broadcast multiplication |
|---|---|---|
| Pattern | N different queries (varying parameters) | N identical queries (same parameters) |
| Trigger | Iterating through a collection in application code | Turbo Stream rendering partial per subscriber |
| Detection | Many similar queries with different WHERE values | Many identical queries with the same WHERE values |
| Scaling factor | Grows with data size (row count) | Grows with audience size (subscriber count) |
| ORM fix | Eager loading (includes, select_related, joinedload) | Render-once pattern, debouncing, morphing |
| Proxy fix | Query batching (collapse N into 1 IN clause) | Local caching (serve identical results from memory) |
The critical distinction: N+1 queries scale with data size. Broadcast multiplication scales with audience size. An N+1 on a page with 200 orders generates 201 queries regardless of how many people view it. A broadcast with 200 subscribers generates 200 renders regardless of how many items are in each one.
And they compound. An N+1 inside a broadcast partial multiplies the inner problem by the outer one.
# The broadcast multiplication problem COMPOUNDS with N+1 queries
# inside the partial. They are two different problems that multiply.
# Consider: message.attachments.each in the partial.
# If a message has 3 attachments, each render fires:
# 1. SELECT * FROM attachments WHERE message_id = 1847 (load collection)
# 2. SELECT * FROM active_storage_blobs WHERE id = 91 (attachment 1)
# 3. SELECT * FROM active_storage_blobs WHERE id = 92 (attachment 2)
# 4. SELECT * FROM active_storage_blobs WHERE id = 93 (attachment 3)
#
# That is 4 queries for attachments alone, per render.
# Add to the 6 base queries: 10 queries per render.
# With 100 subscribers: 1,000 queries from one message.
#
# The N+1 inside the partial is multiplied by the broadcast.
# Fix the N+1 (use includes(:attachments, attachments: :blob))
# and you drop from 10 to 7 queries per render.
# But 7 × 100 is still 700 queries.
#
# The broadcast multiplication is the OUTER multiplier.
# The N+1 is the INNER multiplier.
# You need to address both. Fix the N+1, and you reduce queries per render. Fix the broadcast multiplication, and you reduce the number of renders. For maximum effect, you need to address both. The N+1 is the inner multiplier. The broadcast is the outer multiplier. Reducing either one helps. Reducing both is transformative.
Why it renders per subscriber, not once
A reasonable question: why does Turbo not render the partial once and send the same HTML to everyone?
# Turbo Streams broadcasts render the partial once per subscriber.
# The ActionCable server maintains a list of subscribers for each stream.
# When broadcast_append_to fires:
# 1. ActionCable resolves subscriber list for "room_12"
# 2. For EACH subscriber:
# a. Renders messages/_message.html.erb
# b. Wraps HTML in <turbo-stream action="append">
# c. Sends over WebSocket
#
# Each render is independent. Each render loads its own
# ActiveRecord objects. Each render fires its own queries.
#
# There is no shared cache between renders.
# There is no "render once, send many."
#
# This is by design — different subscribers might see different
# content based on permissions. But in practice, 95% of partials
# render identically for every viewer. The design rationale is sound. In applications with per-user permissions, the partial might render differently for each subscriber. An admin sees an edit button. A regular user does not. A moderator sees a flag link. A user who has been muted sees a different set of controls. The only way to guarantee correct output per user is to render per user.
This is a defensive design — it assumes the worst case (per-user customization) to prevent the worst outcome (leaking content to unauthorized users). It is the same reasoning that leads ORMs to default to lazy loading: correctness first, performance second. And like lazy loading, it is the correct default for the general case and the wrong behavior for the common case.
In practice, the vast majority of broadcast partials render identically for all subscribers. Chat messages, activity feeds, notification lists, status updates, typing indicators — the HTML is the same for everyone. You are paying the per-subscriber rendering cost for per-subscriber customization you are not using.
I want to be precise about this: the framework is not wrong to make this choice. If you do have per-user content in broadcast partials, per-subscriber rendering is the correct behavior. The problem is that the framework applies this expensive safety measure uniformly, and provides no mechanism to opt out. There is no broadcast_append_to room, render: :once option. The developer must build the opt-out themselves.
The connection pool pressure nobody sees
The query count is the visible cost. The connection pool pressure is the hidden one.
-- Broadcast multiplication does not just consume CPU.
-- It consumes connections.
-- A typical Rails app with Puma runs 5 threads per worker,
-- 2-4 workers. That is 10-20 threads competing for
-- a connection pool of (usually) 5-10 connections.
-- When a broadcast fires for 100 subscribers:
-- - If rendering is synchronous (default): the broadcast
-- holds a connection for all 100 renders sequentially.
-- Duration: 100 renders × ~3ms each = 300ms of connection time.
--
-- - If rendering is async (ActionCable async adapter):
-- each render may check out its own connection.
-- 100 concurrent connection checkouts against a pool of 10.
-- 90 renders wait for a connection. Timeouts begin.
-- In both cases, the connection pool is under pressure that
-- is invisible to request-level monitoring. The broadcast
-- happens in a callback, not in a controller action.
-- Your APM tool shows the request completing in 50ms.
-- The 300ms of broadcast rendering does not appear. A typical Rails application running Puma maintains a connection pool sized to match its thread count — 5 to 20 connections, depending on configuration. When a broadcast fires, the rendering process checks out connections from this same pool. If the broadcast is synchronous (the default behavior when ActionCable uses the async adapter), it holds a single connection and renders all subscribers sequentially. A hundred renders at 3ms each ties up a connection for 300ms. That is 300ms during which one fewer connection is available for handling web requests.
If you are using the Redis adapter for ActionCable with threaded rendering, the situation inverts: each render may attempt to check out its own connection concurrently. One hundred concurrent connection requests against a pool of 10 means 90 renders queue up waiting for a connection. If your checkout_timeout is 5 seconds (the Rails default), you will not see errors. You will see latency — renders that should complete in 3ms waiting 50-200ms for a connection. The broadcast still completes, but slowly. The web requests sharing that pool also slow down.
The insidious part: this latency does not appear in your APM tool's request traces. The broadcast fires in an after_create_commit callback, not in the controller action. The request that created the message shows a clean 50ms response time. The 300ms of broadcast rendering happens after the response has been sent. It is invisible to request-level monitoring and visible only in database connection wait times, which most teams do not monitor until something breaks.
"I have observed, in production systems, pages generating over 400 database round trips for what appeared to be a simple list view."
— from You Don't Need Redis, Chapter 3: The ORM Tax
Mitigation strategies: from simple to structural
There are several approaches to reducing broadcast query multiplication, each with different trade-offs. I have organized them by implementation complexity, and I shall be direct about where each one falls short.
| Strategy | Complexity | Query reduction | Trade-off |
|---|---|---|---|
| Preload associations before broadcast | Low | 30-60% | Only helps with N+1s inside the partial, not the per-subscriber multiplication |
| Render once, broadcast HTML | Medium | 90-95% | All subscribers see identical content — no per-user customization |
| Debounced broadcasts | Medium | 70-90% | Adds latency (typically 50-200ms). Messages arrive in bursts, not real-time. |
| Turbo 8 page morphing | Medium | 50-80% | Requires Turbo 8+. Full page morph can be less surgical than targeted appends. |
| Background job rendering | Medium | 0% (but shifts load) | Same total queries, but spread over time. Prevents request timeouts. |
| Russian doll caching on partials | Medium | 40-70% | Cache invalidation complexity. First render still hits the database. |
| Gold Lapel local caching | None (proxy) | 90-99% | Requires Gold Lapel proxy between app and database. Cache invalidation handled automatically on writes. |
Render once, broadcast HTML
The most impactful application-level fix. Render the partial a single time, then broadcast the pre-rendered HTML string to all subscribers.
class Message < ApplicationRecord
belongs_to :room
belongs_to :user
after_create_commit :broadcast_to_room
private
def broadcast_to_room
# Preload everything the partial needs
message = Message
.includes(:user, :room, :reactions, user: :team)
.find(id)
# Render the partial ONCE
html = ApplicationController.render(
partial: "messages/message",
locals: { message: message }
)
# Broadcast pre-rendered HTML to all subscribers
Turbo::StreamsChannel.broadcast_append_to(
room,
target: "messages",
html: html
)
end
end
# Before: 100 subscribers = 600 queries
# After: 100 subscribers = 6 queries (one render)
#
# The tradeoff: every subscriber sees the same HTML.
# No per-user permissions in the partial.
# For chat messages, this is almost always fine. This drops query count from 6 × N to a flat 6, regardless of subscriber count. One render, one set of queries, N WebSocket deliveries. The database cost becomes constant. Whether you have 10 subscribers or 10,000, PostgreSQL processes the same 6 queries.
The trade-off is that every subscriber receives identical HTML — no per-user customization in the partial. For chat messages and activity feeds, this is almost always acceptable. The message content, author name, timestamp, and reaction count are the same for everyone.
But if your partial includes per-user elements — an edit button for the author, a delete button for moderators, a "mark as read" toggle — the render-once pattern strips those out or shows them to everyone. This is not a minor concern. Showing an admin-only delete button to every user is a security issue, not just a UX one.
The hybrid approach: render-once with lazy per-user elements
For partials that need both shared content and per-user elements, there is a middle path.
class Message < ApplicationRecord
after_create_commit :broadcast_to_room
private
def broadcast_to_room
message = Message
.includes(:user, :room, :reactions, user: :team)
.find(id)
# Render the "public" part once — the content everyone sees
shared_html = ApplicationController.render(
partial: "messages/message_body",
locals: { message: message }
)
# Broadcast the shared HTML to all subscribers
Turbo::StreamsChannel.broadcast_append_to(
room,
target: "messages",
html: shared_html
)
# Per-user elements (edit button, delete button, moderation tools)
# are loaded client-side via a Turbo Frame that checks permissions:
#
# <turbo-frame id="message_actions_<%= message.id %>"
# src="/messages/<%= message.id %>/actions"
# loading="lazy">
# </turbo-frame>
#
# The frame src hits a controller that checks current_user permissions
# and returns the appropriate action buttons — or nothing.
# One broadcast. One render. Per-user actions load on demand.
end
end Render the content that is identical for all subscribers once. Broadcast it. Then use a lazy-loaded Turbo Frame for the per-user elements. Each client requests its own action buttons through a standard HTTP request, which hits the controller, checks current_user, and returns the appropriate controls.
The database cost: 6 queries for the shared render, plus 1 query per subscriber for the permissions check. That is 106 queries for 100 subscribers instead of 600. Not as clean as pure render-once, but a substantial reduction — and the per-user elements are correctly scoped.
Debounced broadcasts
In high-velocity channels where messages arrive in rapid bursts — active chat rooms, CI notification feeds, trading floors — debouncing collapses multiple broadcasts into one.
class Message < ApplicationRecord
belongs_to :room
belongs_to :user
after_create_commit :schedule_broadcast
private
def schedule_broadcast
# Debounce: wait 100ms, then broadcast all new messages at once
BroadcastMessagesJob.set(wait: 0.1.seconds).perform_later(room_id)
end
end
class BroadcastMessagesJob < ApplicationJob
# Deduplicate — if 5 messages arrive in 100ms,
# only one job actually runs
self.queue_adapter = :solid_queue
limits_concurrency to: 1, key: ->(room_id) { "broadcast_room_#{room_id}" }
def perform(room_id)
room = Room.find(room_id)
recent = room.messages
.includes(:user, :reactions, user: :team)
.where("created_at > ?", 1.second.ago)
.order(:created_at)
html = ApplicationController.render(
partial: "messages/message_batch",
locals: { messages: recent }
)
Turbo::StreamsChannel.broadcast_append_to(
room,
target: "messages",
html: html
)
end
end
# 5 messages in rapid succession:
# Without debouncing: 5 broadcasts x 100 viewers x 6 queries = 3,000 queries
# With debouncing: 1 broadcast x 1 render x 8 queries = 8 queries Five messages in 100ms become one batch broadcast instead of five separate renders. Combined with render-once, this takes you from 3,000 queries (5 messages, 100 viewers, 6 queries each) down to roughly 8.
The trade-off is latency. Messages no longer appear the instant they are created — they arrive in batches, 100-200ms after the last message in a burst. For most chat applications, this delay is imperceptible. For real-time trading applications or live auction systems, it may not be acceptable. You are trading immediacy for efficiency, which is usually the right trade, but it is a trade nonetheless.
I should note that debouncing also affects the user experience of rapid-fire conversations. Instead of seeing messages appear one by one in real time, subscribers see a batch of messages materialize at once. This can feel less "live" than the default behavior. Whether this matters depends entirely on the product. For asynchronous collaboration tools, it is irrelevant. For a product that competes on real-time feel, it is worth testing with users.
Russian doll caching
Rails fragment caching can intercept the per-subscriber rendering cost by caching the rendered HTML output of each partial.
# Russian doll caching wraps partial fragments in cache blocks.
# If the cache key matches, Rails serves cached HTML without
# executing the partial — and without firing any queries.
<%# app/views/messages/_message.html.erb %>
<% cache message do %>
<div id="<%= dom_id(message) %>" class="message">
<div class="message-header">
<% cache [message, message.user] do %>
<span class="author"><%= message.user.display_name %></span>
<span class="badge"><%= message.user.role_in(message.room) %></span>
<% end %>
</div>
<div class="message-body">
<%= message.formatted_body %>
</div>
<div class="message-meta">
<span class="reactions"><%= message.reactions.count %> reactions</span>
</div>
</div>
<% end %>
# First render for a new message: cache miss, all queries fire.
# Subsequent renders (remaining 99 subscribers): cache hit.
# Zero queries. Rails serves the cached HTML fragment.
#
# Reduction: from 600 queries to ~6 (first render only).
#
# The catch: cache invalidation.
# message.reactions.count changes every time someone reacts.
# If the cache key includes updated_at, a reaction invalidates
# the cache for ALL fragments that include the message.
# And the next broadcast renders all 100 from scratch again.
#
# Russian doll caching works best for content that changes
# infrequently. For live, reactive data — reactions, read counts,
# typing indicators — the cache churn can negate the benefit. When it works, Russian doll caching is remarkably effective. The first subscriber's render fires all 6 queries and stores the result in Rails.cache. The remaining 99 subscriber renders hit the cache and fire zero queries. Total cost: 6 queries instead of 600.
When it does not work — and this is the honest part — it does not work in a way that is difficult to diagnose. If any data in the partial changes frequently (reaction counts, read receipts, "last seen" timestamps), the cache invalidates on every change, and the next broadcast re-renders from scratch. A partial with message.reactions.count that updates every time someone adds an emoji will churn the cache so rapidly that caching provides no benefit. You have added complexity without reducing load.
Fragment caching also introduces a class of bugs that are uniquely frustrating: stale content served from cache. A user updates their display name. The cached partial still shows the old name until the cache key expires or is explicitly invalidated. The correct cache key design prevents this, but the correct cache key design for a partial with 6 associations and a count aggregate is not trivial to get right.
Turbo 8 page morphing
Turbo 8 introduced an alternative model — the Hotwire team documents it in their page refreshes handbook, and it is worth reading if you are considering this path. It sidesteps partial rendering entirely.
# Turbo 8 introduces page morphing as an alternative to
# granular Turbo Stream actions (append, prepend, replace).
#
# Instead of rendering a partial per subscriber, the server
# broadcasts a "refresh" signal. Each client re-requests the
# page and Turbo morphs the DOM to match.
# In your model — no partial rendering at all:
class Message < ApplicationRecord
broadcasts_refreshes
end
# In your layout:
# <head>
# <%= turbo_refreshes_with method: :morph, scroll: :preserve %>
# </head>
# What happens when a message is created:
# 1. Server broadcasts: { action: "refresh" } (no HTML, no queries)
# 2. Each client fetches GET /rooms/12 (standard page load)
# 3. Turbo morphs the existing DOM to match the new response
# Trade-offs:
# + No partial rendering on broadcast — zero extra queries at broadcast time
# + Each client request hits normal Rails caching (fragment, HTTP, etc.)
# - Each client makes a full HTTP request (adds load to web servers)
# - Morph can cause visual flicker on complex pages
# - Requires Turbo 8+ (released late 2023) Instead of rendering a partial and pushing HTML, the server sends a lightweight "refresh" signal. Each client re-requests the page through normal HTTP, and Turbo morphs the DOM to match the new response. No partial rendering on broadcast means zero additional queries at broadcast time. Zero.
The load shifts from broadcast-time database queries to HTTP request-time page renders. Those page renders benefit from standard Rails caching — fragment caching, HTTP caching, and Russian doll caching all apply. It is a fundamentally different performance profile: instead of 600 identical database queries in a burst, you get 100 HTTP requests spread over a few hundred milliseconds, each served from cache.
The trade-offs are real, and I shall enumerate them because they matter for production deployments:
Web server load. Each client makes a full HTTP request. One hundred subscribers means 100 GET requests to your web server within a few hundred milliseconds. If your web server is provisioned for normal traffic patterns, a broadcast to a large channel creates a temporary spike. This is usually manageable — HTTP requests are cheaper than database queries — but it is a load pattern you should monitor.
Visual flicker. Morph replaces DOM nodes that have changed. On complex pages with animations, transitions, or ephemeral UI state (open dropdowns, text selections, scroll positions), the morph can cause visible flicker. Turbo 8's scroll: :preserve helps with scroll position, but it cannot preserve all client-side state. Test with your actual pages, not just with the Turbo demo chat app.
Version requirement. Morph requires Turbo 8 or later, which was released in late 2023. If your application is on Turbo 7 or earlier, upgrading is non-trivial — Turbo 8 changed several behaviors around form submissions and navigation. This is not a "change one line" migration.
Correctness. Because each client fetches the full page and morphs its DOM, per-user permissions work naturally. The admin sees the admin view. The regular user sees the regular view. No render-once compromises required. This is the cleanest solution from a correctness standpoint.
Background job rendering
Moving broadcast rendering to a background job (Sidekiq, GoodJob, SolidQueue) does not reduce the total number of queries. All 600 queries still execute. What it does is remove them from the request cycle. The message creation returns immediately. The broadcast renders asynchronously. Your web server connections are freed.
This is a pragmatic mitigation when you cannot change the rendering strategy but need to prevent broadcast rendering from blocking web requests. It is not a solution to the multiplication problem — it is a deferral. The database still does the work. It just does it on the background job's schedule rather than synchronously in the callback.
I mention it because it is often the first thing teams reach for, and it is important to understand what it does and does not accomplish. It prevents request timeouts. It does not reduce database load.
Finding broadcast multiplication in production
The pg_stat_statements extension is the fastest way to identify whether broadcast multiplication is affecting your database.
-- Find broadcast-generated query patterns in production:
SELECT query,
calls,
mean_exec_time,
total_exec_time,
rows
FROM pg_stat_statements
WHERE query LIKE 'SELECT%users%WHERE%id = $1%'
OR query LIKE 'SELECT COUNT%reactions%'
OR query LIKE 'SELECT%rooms%WHERE%id = $1%'
ORDER BY calls DESC
LIMIT 20;
-- What you are looking for:
-- Queries with extremely high call counts relative to unique parameter values.
-- If SELECT * FROM users WHERE id = $1 has 50,000 calls/hour
-- but only 200 unique user IDs, something is rendering
-- the same user record 250 times per user per hour.
-- That is broadcast multiplication. The signature is distinctive. Queries with very high call counts but very low cardinality of parameter values. If SELECT * FROM users WHERE id = $1 is called 50,000 times per hour but your application only has 200 active users, something is fetching the same users hundreds of times per hour. In a standard N+1 scenario, each query would have a different parameter. In broadcast multiplication, the parameters cluster — the same IDs appear over and over because every subscriber's render loads the same message author.
Cross-reference with your ActionCable subscriber counts. If your busiest room has 80 subscribers and your highest-frequency query is called 80x more than expected, the correlation is not coincidental.
-- Compare call counts during business hours vs. off-hours.
-- Broadcast multiplication correlates with WebSocket subscriber count,
-- which correlates with active users, which peaks during work hours.
-- If your query call counts spike 10x from 2 AM to 10 AM alongside
-- your WebSocket connection count, you have found the correlation.
-- Also check: do call counts spike on specific events?
-- A company all-hands, a product launch, a Slack-style "here" announcement
-- that puts 500 people in one channel at the same time.
-- The query:
SELECT query,
calls,
total_exec_time,
mean_exec_time,
stddev_exec_time,
rows
FROM pg_stat_statements
WHERE calls > 10000
AND mean_exec_time < 1.0 -- fast individually
AND total_exec_time > 5000 -- expensive in aggregate
ORDER BY calls DESC
LIMIT 30;
-- The pattern you are looking for:
-- Extremely fast queries (< 1ms) with extremely high call counts.
-- Each one is cheap. The sum is not. The temporal correlation is the strongest diagnostic signal. Broadcast multiplication is directly proportional to concurrent WebSocket connections, which is directly proportional to active users, which peaks during business hours. If your query call counts follow the same curve as your ActionCable subscriber counts, you are looking at broadcast multiplication.
A second signal: burst patterns. Normal application traffic produces steady query rates. Broadcast multiplication produces spikes — a sudden burst of identical queries when a message is sent to a popular channel, followed by silence. If pg_stat_statements shows a query with a very high calls count but also a very high stddev_exec_time relative to mean_exec_time, the execution times are not consistent. Some executions are fast (buffer cache warm), some are slower (buffer cache cold at the start of a burst). That variance is the fingerprint of bursty, broadcast-driven traffic.
An honest word about when this does not matter
I should be forthcoming about the scenarios where broadcast multiplication is not a problem worth solving, because overstating the case would be a disservice to you and an embarrassment to me.
Small subscriber counts. If your application's busiest room has 5 subscribers, the multiplication produces 30 additional queries per message. PostgreSQL will not notice. Your monitoring will not notice. Your users will not notice. The total execution time is under 5ms. Optimizing this is engineering theater — effort that produces no measurable improvement. If your subscriber counts are in the single digits, you have better things to work on.
Low message frequency. A channel that receives one message per minute with 100 subscribers generates 600 queries per minute. That is 10 queries per second. For any PostgreSQL instance provisioned for a production workload, 10 queries per second is negligible. The multiplication matters when message frequency and subscriber count are both high — active chat in a large channel, rapid-fire notifications in a busy team, automated alerts flooding a monitoring room.
Partials with no database access. If your broadcast partial is purely static HTML — a typing indicator, a presence status dot, a "user is online" badge — there are no queries to multiply. The broadcast still renders per subscriber, but each render is a template evaluation with no database cost. The concern in this article applies specifically to partials that touch ActiveRecord associations.
The threshold where broadcast multiplication becomes a genuine problem is roughly: subscriber count above 30, queries per partial above 4, and message frequency above a few per minute. Below that, the database handles it without distress. Above it, the cost scales faster than most teams expect.
What Gold Lapel does with 100 identical queries
The application-level mitigations above are all sound engineering. Render-once eliminates the problem at the source. Debouncing reduces the frequency. Turbo 8 morphing restructures the architecture entirely. Russian doll caching intercepts repeated renders. Each requires code changes, testing, and trade-off decisions. Each is the right approach in certain circumstances.
Gold Lapel approaches the problem from the other side of the wire.
-- Without Gold Lapel: 100 broadcasts, 100 identical queries
-- PostgreSQL executes each one independently.
-- Query 1: SELECT * FROM users WHERE id = 42; -- 0.3ms (disk/buffer)
-- Query 2: SELECT * FROM users WHERE id = 42; -- 0.3ms (same work)
-- Query 3: SELECT * FROM users WHERE id = 42; -- 0.3ms (same work)
-- ...
-- Query 100: SELECT * FROM users WHERE id = 42; -- 0.3ms (same work)
-- Total: 30ms of PostgreSQL CPU for identical results.
-- With Gold Lapel: local caching intercepts after the first execution.
-- Query 1: SELECT * FROM users WHERE id = 42; -- 0.3ms (hits PostgreSQL)
-- Query 2: SELECT * FROM users WHERE id = 42; -- 0.02ms (LRU cache hit)
-- Query 3: SELECT * FROM users WHERE id = 42; -- 0.02ms (cache hit)
-- ...
-- Query 100: SELECT * FROM users WHERE id = 42; -- 0.02ms (cache hit)
-- Total: 2.3ms. Same 100 queries. 99 served from cache.
--
-- Cache invalidation? Automatic. When an INSERT or UPDATE
-- touches the users table, Gold Lapel evicts affected entries.
-- The next SELECT executes against PostgreSQL. Fresh data. No stale reads. Gold Lapel is a PostgreSQL proxy that sits between your Rails application and the database. Its local cache operates at the wire protocol level — when the same query with the same parameters arrives, Gold Lapel returns the cached result without touching PostgreSQL. The application is unaware. Rails sends 601 queries. PostgreSQL receives 7.
For broadcast multiplication, this is nearly perfect. The pattern is 100 renders of the same partial, each loading the same user, counting the same reactions, fetching the same room name. The first render's queries hit PostgreSQL. The remaining 99 are served from Gold Lapel's local cache in microseconds. No parse. No plan. No execute. Just a cache lookup and a wire protocol response.
The math: 601 queries reach Gold Lapel. 7 are unique (the INSERT plus 6 distinct SELECTs). 594 are cache hits. PostgreSQL processes 7 queries instead of 601. Your database CPU drops by 99%. Your connection pool pressure drops by 99%. The broadcast that was consuming 300ms of connection time now consumes 3ms.
Cache invalidation is automatic. When the next INSERT or UPDATE modifies the users, reactions, or rooms table, Gold Lapel evicts the affected cache entries. The next query executes against PostgreSQL and populates a fresh cache entry. No stale data. No manual invalidation logic. No cache keys to manage. No touch: true chains to maintain. The proxy observes the write traffic and invalidates accordingly.
And because Gold Lapel's auto-indexing ensures those queries are already running on optimal indexes, the first execution — the one that actually hits PostgreSQL — is fast too. The combination means broadcast multiplication goes from a scaling crisis to a rounding error.
No query changes. No architectural restructuring. No Turbo version upgrade. No render-once refactoring. No debouncing jobs. Add gem "goldlapel-rails" to your Gemfile and the 601 queries become 7.
I should note what this does not do: it does not reduce the number of partial renders on the Rails side. Your application still renders 100 partials, still instantiates 100 sets of ActiveRecord objects, still builds 100 HTML strings. The CPU cost on the Rails side is unchanged. What changes is the database cost — the queries that back those renders are served from proxy-level cache instead of hitting PostgreSQL. For most applications, the database is the bottleneck, not the Rails renderer. But if your profiling shows that the rendering itself (ERB compilation, string building, WebSocket serialization) is the bottleneck, you need an application-level solution like render-once.
Combining strategies: a practical recommendation
If you have read this far, you may be wondering which strategy to adopt. Allow me to offer a practical recommendation based on the production systems I have observed.
Start with render-once. For every broadcast partial that renders identically for all subscribers — chat messages, activity items, notifications — switch to the render-once pattern. This is the most impactful change. One code modification per model, and the database cost becomes constant regardless of subscriber count. If the partial has per-user elements, use the hybrid approach with lazy Turbo Frames.
Add debouncing for high-velocity channels. If any of your rooms receive bursts of messages (more than 2-3 per second), debouncing reduces the number of broadcasts. Combined with render-once, the reduction is dramatic. Five messages to 100 subscribers goes from 3,000 queries to 8.
Consider Turbo 8 morphing for your next major upgrade. If you are starting a new project or planning a Turbo version upgrade, broadcasts_refreshes is the cleanest long-term architecture. It eliminates partial rendering entirely, handles per-user permissions naturally, and shifts the caching to HTTP-level mechanisms that Rails already excels at. But do not undertake a Turbo 8 migration solely for this benefit — the migration cost is real.
Use a proxy-level cache as a safety net. Application-level fixes address the broadcasts you know about and refactor. A proxy-level cache catches everything — including the broadcasts you have not refactored yet, the new features that introduce new broadcasts, and the edge cases that only appear under production load. The two approaches are complementary, not competing.
The order matters. Render-once is the fix. Debouncing is the optimization. Morphing is the architecture. Proxy caching is the safety net. Each layer reduces what the next layer needs to handle.
Frequently asked questions
Terms referenced in this article
I have taken the liberty of preparing a guide on a related theme. The Rails counter_cache contention guide addresses another pattern where a single write triggers surprisingly many database operations — and where materialized views offer the same escape route you have just seen for broadcasts.