Rails counter_cache at Scale: When Row Contention Kills Your PostgreSQL Write Throughput
Four alternatives to a one-line feature that starts a row-level lock queue on your most popular records.
Good evening. Your counters are holding up the line.
There is a particular genre of Rails performance incident that follows a predictable arc. The application launches. Traffic grows. A feature goes mildly viral. Then the on-call engineer discovers that counter_cache: true — one of Rails' most recommended patterns — has been quietly serializing writes on the application's most active records.
I have attended to this incident more times than I would prefer to admit. The symptom is always the same: write latency spikes on a table that should be fast, and the culprit is not the INSERT itself but a row lock acquired by a counter UPDATE that nobody remembers adding. One line in a model file, declared years ago, now holding the entire write path hostage.
The mechanism is simple. counter_cache increments an integer column on the parent row inside the same transaction as the child INSERT. That UPDATE acquires an exclusive row-level lock. Every other transaction trying to increment the same parent row waits in a queue. At low concurrency, the queue drains faster than it fills. At high concurrency — 20, 50, 100 concurrent inserts on the same parent — the queue backs up. p99 latency climbs. Throughput collapses.
Liefery, a logistics platform, documented 4-second delays from counter cache lock contention on their delivery tracking system. Four seconds. For an integer increment. I find this entirely predictable, and would like to ensure you are spared the same discovery.
This article benchmarks four alternatives under concurrent load. Not toy benchmarks on an idle database — 64 clients hammering hot rows with a realistic skew distribution. I shall also explain the PostgreSQL locking mechanics that make this problem inevitable, the deadlock conditions that make it dangerous, and the counter drift that makes it unreliable. The numbers may save you from discovering this problem in production at 2 AM. Though I should note: if you are reading this at 2 AM, the hour is not too late for a remedy.
The problem: row locks on popular parents
To understand why counter_cache fails at scale, you need to understand what it does at the PostgreSQL level. Not the Rails abstraction — the actual SQL, the actual locks, the actual queue that forms behind them.
# The default Rails counter_cache. One line. Instant regret at scale.
class Comment < ApplicationRecord
belongs_to :post, counter_cache: true
end
# Migration:
class AddCommentsCountToPosts < ActiveRecord::Migration[7.1]
def change
add_column :posts, :comments_count, :integer, default: 0, null: false
end
end
# What happens on Comment.create!(post: hot_post):
#
# BEGIN;
# INSERT INTO comments (post_id, body, ...) VALUES (42, '...');
# UPDATE posts SET comments_count = COALESCE(comments_count, 0) + 1,
# updated_at = '2026-03-05 10:00:01'
# WHERE id = 42;
# COMMIT;
#
# That UPDATE acquires an exclusive row lock on posts(42).
# Every concurrent comment on post 42 queues behind it. That UPDATE posts SET comments_count = comments_count + 1 WHERE id = 42 acquires a RowExclusiveLock on the posts row. The lock is held until the transaction commits. Every other transaction that wants to update the same row — including other counter_cache increments — forms a queue.
On an average post with 2 comments per hour, this is invisible. On a trending post with 200 comments per minute, it is catastrophic.
# What Liefery discovered: 4-second delays from counter_cache contention.
# Their delivery tracking system had a deliveries_count on Route.
# During peak dispatch, 50+ drivers updating the same route simultaneously.
class Delivery < ApplicationRecord
belongs_to :route, counter_cache: true
belongs_to :driver, counter_cache: true
end
# Each delivery creation locks:
# 1. The route row (deliveries_count + 1)
# 2. The driver row (deliveries_count + 1)
#
# 50 concurrent inserts on route_id = 714:
# Transaction 1: locks route 714, updates count, holds lock ~2ms
# Transaction 2: waits for transaction 1... (~2ms)
# Transaction 3: waits for transaction 2... (~4ms)
# ...
# Transaction 50: waits for transactions 1-49... (~100ms+)
#
# Measured p99: 4,200ms. For a counter increment.
# The INSERT itself took 0.8ms. The lock wait took 4,199.2ms. The math is unforgiving. If each lock is held for 2ms and 50 transactions queue behind it, the 50th transaction waits 100ms just for the lock. That ignores disk I/O, index maintenance, WAL writes, and everything else PostgreSQL does during that transaction. Under real conditions, Liefery measured 4,200ms at p99.
The pattern appears wherever a counter cache sits on a shared parent: comments on a popular post, deliveries on an active route, likes on a viral photo, orders in a high-volume store. Any parent row that receives bursts of concurrent child writes.
The matter of lock contention has its own dedicated treatment in the PostgreSQL lock contention guide, if you would like to investigate further.
A brief tour of the locking machinery
If you will permit me a brief detour into PostgreSQL internals, it will make the rest of this discussion considerably more precise. Understanding how the lock works is the difference between guessing at solutions and choosing the right one.
-- PostgreSQL MVCC and row-level locking: the mechanics of contention.
--
-- When Transaction A runs:
-- UPDATE posts SET comments_count = comments_count + 1 WHERE id = 42;
--
-- PostgreSQL does the following (simplified):
--
-- 1. Acquires a RowExclusiveLock on the posts table (shared, does not block)
-- 2. Locates the row (id = 42) via index
-- 3. Attempts to acquire an exclusive tuple-level lock on the row
-- - If no other transaction holds it: acquired immediately
-- - If another transaction holds it: WAIT
-- 4. Creates a new tuple version with comments_count + 1
-- 5. Sets xmax on the old tuple to current transaction ID
-- 6. Holds the lock until COMMIT or ROLLBACK
-- The critical detail: PostgreSQL uses tuple-level locking, not page-level.
-- Only row 42 is locked. Row 43 is unaffected.
-- But if 50 transactions all want row 42, they form a FIFO queue.
-- You can observe this in real time:
SELECT pid, wait_event_type, wait_event, state, query
FROM pg_stat_activity
WHERE wait_event_type = 'Lock'
AND query ILIKE '%comments_count%';
-- Output:
-- pid | wait_event_type | wait_event | state | query
-- ------+-----------------+------------+---------------------+---------------------------------------
-- 1234 | Lock | tuple | active | UPDATE posts SET comments_count = ...
-- 1235 | Lock | tuple | active | UPDATE posts SET comments_count = ...
-- 1236 | Lock | tuple | active | UPDATE posts SET comments_count = ... PostgreSQL's Multi-Version Concurrency Control (MVCC) is elegant. Readers never block writers. Writers never block readers. But writers do block other writers — on the same row. This is the fundamental constraint that makes counter_cache a serialization point.
When Transaction A updates posts(42), it creates a new tuple version and marks the old one as dead with its transaction ID. Transaction B, arriving 0.1ms later and wanting to update the same row, finds the tuple locked. It enters a wait state. It will stay in that wait state until Transaction A commits or rolls back. There is no timeout by default — the wait is indefinite (though you can set lock_timeout to add one, which converts the problem from slow responses to errors, a dubious improvement).
The critical detail is that this is a tuple-level lock, not a table-level lock or even a page-level lock. Only row 42 is affected. Row 43 can be updated concurrently with zero contention. This means counter_cache works perfectly on tables with uniformly distributed writes. The problem is exclusively about hot rows — the small number of parent records that receive a disproportionate share of child writes.
The transaction duration trap
There is a subtlety that makes counter_cache contention worse than the raw lock hold time would suggest. The lock is held from the moment of the UPDATE until the transaction commits. Everything that happens after the UPDATE — including all subsequent callbacks — extends the lock hold time.
-- Why transaction duration matters more than you think.
--
-- counter_cache runs INSIDE the child INSERT transaction:
--
-- BEGIN;
-- INSERT INTO comments (...) VALUES (...); -- 0.8ms
-- UPDATE posts SET comments_count = ...; -- 0.3ms (if no contention)
-- COMMIT; -- 0.5ms (WAL flush)
--
-- Total lock hold time: ~1.6ms (from UPDATE to COMMIT)
--
-- But Rails wraps this in callbacks. With a realistic model:
--
-- BEGIN;
-- INSERT INTO comments (...) VALUES (...); -- 0.8ms
-- -- after_create callbacks fire...
-- UPDATE search_index SET ...; -- 2.1ms (touch search)
-- INSERT INTO notifications (...) VALUES (...); -- 1.4ms
-- UPDATE posts SET comments_count = ...; -- 0.3ms
-- UPDATE posts SET updated_at = ...; -- 0.2ms (touch: true)
-- COMMIT; -- 0.5ms
--
-- Lock hold time: now ~3.0ms (UPDATE through COMMIT)
-- But the full transaction is ~5.3ms.
-- Every callback BEFORE the counter update extends the transaction.
-- Every callback AFTER extends the lock hold time.
--
-- This is why counter_cache contention gets worse as models grow. In a freshly scaffolded Rails model with no callbacks beyond counter_cache, the lock hold time is roughly 1.6ms. Manageable. But production models accumulate callbacks over years. A touch: true on another association. An after_create that updates a search index. A notification insert. Each one extends the transaction, and every millisecond of extension is a millisecond that every concurrent writer must wait.
I have seen production models where the counter_cache UPDATE fired early in the callback chain, followed by 15ms of additional work — search indexing, cache invalidation, audit logging — all within the same transaction. The lock on the parent row was held for the entire 15ms. At 20 concurrent writers, that produced 300ms of cumulative lock wait time. Not because the counter update was slow, but because the transaction carrying it was long.
This is why the advice to "just move the counter update to the end of the callback chain" is inadequate. It reduces the lock hold time by moving work before the UPDATE, but the fundamental problem — that the lock is held until COMMIT — remains. Every callback after the UPDATE, regardless of position, extends the contention window.
The deadlock problem: when counters go permanently wrong
Lock contention produces slow responses. Deadlocks produce incorrect data. The distinction is important, and counter_cache is susceptible to both.
-- How counter_cache creates deadlocks.
--
-- Scenario: User A comments on Post 42, User B comments on Post 42.
-- Both models also have: belongs_to :user, counter_cache: true
--
-- Transaction A:
-- INSERT INTO comments (post_id, user_id, ...) VALUES (42, 1, ...);
-- UPDATE posts SET comments_count = comments_count + 1 WHERE id = 42; -- Lock posts(42)
-- UPDATE users SET comments_count = comments_count + 1 WHERE id = 1; -- Lock users(1)
--
-- Transaction B:
-- INSERT INTO comments (post_id, user_id, ...) VALUES (42, 2, ...);
-- UPDATE users SET comments_count = comments_count + 1 WHERE id = 2; -- Lock users(2)
-- UPDATE posts SET comments_count = comments_count + 1 WHERE id = 42; -- WAIT for posts(42)
--
-- If Transaction A tries to lock a row already held by B (or vice versa):
-- DEADLOCK DETECTED.
-- PostgreSQL aborts one transaction. The counter_cache update is lost.
-- The counter is now wrong. Permanently.
--
-- ActiveRecord does not retry deadlocked transactions by default.
-- The comment was inserted, but the counter was not incremented.
-- Check your deadlock history:
SELECT datname, deadlocks
FROM pg_stat_database
WHERE datname = current_database();
-- In postgresql.conf, ensure deadlock logging:
-- log_lock_waits = on
-- deadlock_timeout = 1s The deadlock scenario arises when a model has multiple counter_cache declarations — or when counter_cache is combined with touch: true on different associations. Two transactions acquire locks on different rows in different orders, and PostgreSQL detects the circular dependency. One transaction is aborted.
The aborted transaction's counter update is rolled back. But here is the critical detail: the INSERT itself may have already committed in a separate earlier transaction (if using after_commit), or the entire transaction including the INSERT is rolled back. In either case, the counter is now wrong. It shows a value that does not match the actual count of child records.
ActiveRecord does not retry deadlocked transactions by default. The DeadlockRetry pattern exists, but it is not built in and most applications do not implement it. The counter drifts silently. Over time, the drift accumulates. Posts show 847 comments when they have 851. Orders show 12 items when they have 13.
I should be frank: for many applications, a counter that is off by 2 or 3 is not a production incident. It is a cosmetic blemish. But for applications where the count drives business logic — inventory counts, rate limits, billing thresholds — a permanently drifted counter is a bug that produces real consequences. And the bug is silent. No error is raised. No log entry is written. The counter simply stops matching reality.
Detecting and repairing counter drift
If you are running counter_cache in production, you should audit it periodically. Not because it will definitely drift, but because when it does drift, you want to know before your users do.
# Detecting and repairing counter drift.
# Run this periodically — counter_cache will drift under contention.
class CounterCacheAudit
def self.check_all
drift = []
# Check comments_count on posts
Post.find_each(batch_size: 1000) do |post|
actual = post.comments.count
cached = post.comments_count
if actual != cached
drift << {
model: 'Post', id: post.id,
cached: cached, actual: actual,
delta: actual - cached
}
end
end
drift
end
def self.repair_all
# reset_counters recalculates from the actual count
Post.find_each(batch_size: 1000) do |post|
Post.reset_counters(post.id, :comments)
end
end
end
# For large tables, use SQL directly (much faster):
# UPDATE posts p
# SET comments_count = (
# SELECT COUNT(*) FROM comments c WHERE c.post_id = p.id
# )
# WHERE p.comments_count != (
# SELECT COUNT(*) FROM comments c WHERE c.post_id = p.id
# );
#
# On 100K posts: ~8 seconds. On 1M posts: ~90 seconds.
# Schedule this in off-peak hours. It acquires row locks on every updated post. The audit is straightforward: compare the cached count to the actual COUNT(*) from the child table. Any mismatch is drift. Rails provides Post.reset_counters(id, :comments) for individual repairs, but for bulk repairs across a large table, raw SQL is orders of magnitude faster.
A word of caution on the repair query: UPDATE posts SET comments_count = (SELECT COUNT(*) ...) acquires a row lock on every post it updates. If you run this during peak hours on a table with active counter_cache writes, you will create the very contention you are trying to diagnose. Schedule repairs during off-peak hours. Or, better yet, use one of the approaches in this article that does not drift in the first place.
How often should you audit? That depends on your deadlock frequency. Check pg_stat_database for the deadlocks column. If it is zero, your counters are likely accurate. If it is non-zero and your counter update queries appear in the deadlock logs, audit weekly. If you are seeing deadlocks daily, the counter is actively drifting and you should consider an alternative approach.
Alternative 1: Deferred counting via Sidekiq
The simplest architectural fix: move the counter update out of the write transaction entirely. Insert the child row, commit, then update the count asynchronously in a background job.
# Alternative 1: Deferred counting via Sidekiq.
# Move the counter update out of the transaction entirely.
class Comment < ApplicationRecord
belongs_to :post
after_commit :enqueue_counter_update, on: [:create, :destroy]
private
def enqueue_counter_update
CounterUpdateJob.perform_async(post_id)
end
end
class CounterUpdateJob
include Sidekiq::Job
sidekiq_options queue: :counters, lock: :until_executed
def perform(post_id)
post = Post.find(post_id)
actual_count = post.comments.count
post.update_column(:comments_count, actual_count)
end
end
# Advantages:
# - No lock held during the INSERT transaction
# - Naturally coalesces: 10 comments in 1 second = 1 job (with unique locks)
# - Failed jobs retry without blocking writes
#
# Disadvantages:
# - Count is eventually consistent (seconds of delay)
# - Requires Sidekiq infrastructure
# - The full COUNT(*) query runs on every job execution
# - Race condition: job reads count while another INSERT is in-flight This eliminates lock contention on the write path completely. The INSERT transaction touches only the comments table. No lock on the parent row. No queue. Write throughput is limited only by the comments table's capacity to absorb inserts.
The cost is consistency. The count is eventually consistent — accurate within seconds, not milliseconds. And the implementation has a subtle race condition: if the job runs while another INSERT is in-flight, the COUNT(*) may or may not include the in-flight row depending on transaction isolation. In practice, this self-corrects on the next job execution.
The lock: :until_executed option (via sidekiq-unique-jobs) ensures that 50 rapid inserts on the same post produce one job, not 50. This is critical — without deduplication, you trade database lock contention for Redis queue pressure.
A more resilient variant: debounced updates
The basic Sidekiq approach has an efficiency problem. With lock: :until_executed, the first job processes and clears the lock, then the second enqueued job runs immediately after — even though the count was just refreshed 50ms ago. For a post receiving hundreds of comments per minute, this produces a stream of redundant COUNT(*) queries.
# A more resilient Sidekiq approach: debounced with staleness window.
class CounterUpdateJob
include Sidekiq::Job
sidekiq_options queue: :counters
DEBOUNCE_WINDOW = 5.seconds
def perform(post_id, enqueued_at = nil)
# Debounce: if another job was enqueued after this one, skip.
# This prevents 50 rapid comments from triggering 50 COUNT(*) queries.
latest = Redis.current.get("counter_update:post:#{post_id}")
if latest && enqueued_at && latest.to_f > enqueued_at.to_f
return # A newer job exists. Let it handle the update.
end
post = Post.find_by(id: post_id)
return unless post # Post may have been deleted
actual_count = Comment.where(post_id: post_id).count
post.update_column(:comments_count, actual_count)
end
end
# In the model:
class Comment < ApplicationRecord
belongs_to :post
after_commit :schedule_counter_update, on: [:create, :destroy]
private
def schedule_counter_update
now = Time.current.to_f
key = "counter_update:post:#{post_id}"
Redis.current.set(key, now, ex: 30)
CounterUpdateJob.perform_in(5.seconds, post_id, now)
end
end
# Result: 50 comments in 5 seconds = 1 COUNT(*) query, not 50.
# Maximum staleness: DEBOUNCE_WINDOW + job processing time. The debounced variant introduces a 5-second window. After a comment is created, the job waits 5 seconds before executing. If additional comments arrive during that window, the latest timestamp wins and earlier jobs skip themselves. The result: 50 comments in 5 seconds produce exactly one COUNT(*) query, not 50 queries or even 10.
The maximum staleness is now bounded: DEBOUNCE_WINDOW plus job processing time, roughly 5-6 seconds. For most counter display contexts — listing pages, dashboards, comment sections — this is imperceptible.
Honest counterpoint: the Sidekiq dependency
I should note that this approach introduces a hard dependency on Sidekiq and Redis. If Sidekiq is down, counters stop updating. If Redis loses data, enqueued counter jobs are lost. For applications that already run Sidekiq, this is no additional burden. For applications that do not, adding background job infrastructure solely for counter maintenance is a significant operational commitment. A waiter who recommends hiring additional staff should be confident the household needs them.
Alternative 2: slotted_counters gem
A clever middle ground from fatkodima's slotted_counters gem. Instead of one counter row that every write contends for, distribute the count across N "slots." Each increment picks a random slot. Reads sum all slots.
# Alternative 2: slotted_counters gem.
# Distributes contention across multiple rows.
# https://github.com/fatkodima/slotted_counters
# Gemfile:
gem 'slotted_counters'
# Migration:
class SetupSlottedCounters < ActiveRecord::Migration[7.1]
def change
create_table :slotted_counters do |t|
t.string :counter_name, null: false
t.references :associated_record, polymorphic: true, null: false
t.integer :slot, null: false
t.integer :count, default: 0, null: false
t.timestamps
end
add_index :slotted_counters,
[:associated_record_type, :associated_record_id, :counter_name, :slot],
unique: true, name: 'idx_slotted_counters_unique'
end
end
# Model:
class Post < ApplicationRecord
has_slotted_counter :comments, slots: 10
end
class Comment < ApplicationRecord
belongs_to :post
after_create { post.increment_slotted_counter(:comments) }
after_destroy { post.decrement_slotted_counter(:comments) }
end
# How it works:
# INSERT picks a random slot (0-9) and increments THAT row.
# 10 concurrent writes contend for ~1 slot each instead of 1 row.
# Reading sums all slots: SELECT SUM(count) WHERE associated_record_id = 42
#
# Trade-off: reads are slightly slower (SUM over 10 rows vs 1 column read).
# But writes go from serial to parallel. With 10 slots, 50 concurrent increments contend for approximately 5 rows each instead of all 50 contending for 1 row. Lock wait time drops roughly 10x. The math: if each slot serves 1/10th of the traffic, the maximum queue depth at each slot is ~5 instead of ~50.
The trade-off is read performance. Instead of reading a single integer column on the parent row, you run SELECT SUM(count) FROM slotted_counters WHERE associated_record_id = 42 AND counter_name = 'comments'. That is a small aggregation query — fast with an index, but measurably slower than a single column read (0.9ms vs 0.4ms in our benchmarks).
The gem also introduces a polymorphic association table, which means your counter data lives in a separate table from your parent model. This adds complexity to queries, reporting, and data exports. Worth it if contention is your primary bottleneck. Unnecessary overhead if it is not.
Choosing the right slot count
# Choosing the right number of slots.
#
# Too few slots: contention persists.
# Too many slots: read aggregation gets expensive, storage overhead grows.
#
# The math:
# Expected max concurrent writers per parent row = C
# With S slots, expected contention per slot = C / S
# Target: C / S < 3 (minimal lock wait)
#
# Examples:
# C = 10 concurrent writers -> S = 4 slots (2.5 per slot)
# C = 50 concurrent writers -> S = 20 slots (2.5 per slot)
# C = 200 concurrent writers -> S = 100 slots (2 per slot, but read cost grows)
#
# Read cost by slot count (index scan on polymorphic + counter_name + slot):
# 10 slots: 0.9ms (SUM over 10 rows)
# 20 slots: 1.1ms (SUM over 20 rows)
# 50 slots: 1.6ms (SUM over 50 rows)
# 100 slots: 2.4ms (SUM over 100 rows)
#
# For most applications, 10-20 slots is the sweet spot.
# Above 50, consider matviews instead — they have zero write overhead
# and the read cost is fixed at a single index lookup.
# You can also vary slots per counter based on traffic:
class Post < ApplicationRecord
has_slotted_counter :comments, slots: 10 # Popular, high contention
has_slotted_counter :bookmarks, slots: 4 # Less contested
end The relationship between slot count, write contention, and read performance is predictable enough to size analytically rather than by trial and error. Divide your peak concurrent writers per parent by 3 to get a reasonable slot count. The factor of 3 gives each slot a low enough queue depth that lock waits remain in the low single-digit milliseconds.
Above 50 slots, the read aggregation cost starts to compete with the write savings. At that point, the problem has outgrown slotted counters and you should consider an approach that removes the write-path overhead entirely — matviews or deferred counting.
Honest counterpoint: the polymorphic table
The slotted_counters table uses a polymorphic association (associated_record_type + associated_record_id). This means every counter for every model lives in one table. At scale — thousands of counter-enabled models, millions of parent records — this table grows large. Queries against it scan a broader index. Vacuum has more work to do.
More practically, the polymorphic design makes it difficult to include counter values in complex queries. You cannot simply SELECT posts.*, posts.comments_count FROM posts — you need a subquery or join against the slotted_counters table, grouped and summed. ORMs handle this poorly. Reporting queries become verbose. If your counters feed dashboards or analytics, the ergonomic cost is real.
Alternative 3: PostgreSQL triggers with hair_trigger
Move the counting logic from the ORM layer into the database itself. The hair_trigger gem lets you define PostgreSQL triggers in Ruby syntax, then generates the SQL migration automatically.
# Alternative 3: PostgreSQL triggers via hair_trigger gem.
# Counting logic lives in the database, not the ORM.
# https://github.com/jenseng/hair_trigger
# Gemfile:
gem 'hair_trigger'
# Model — declare the trigger in Ruby, generate SQL automatically:
class Comment < ApplicationRecord
belongs_to :post
trigger.after(:insert) do
"UPDATE posts SET comments_count = comments_count + 1 WHERE id = NEW.post_id;"
end
trigger.after(:delete) do
"UPDATE posts SET comments_count = comments_count - 1 WHERE id = OLD.post_id;"
end
trigger.after(:update).of(:post_id) do
<<-SQL
UPDATE posts SET comments_count = comments_count - 1 WHERE id = OLD.post_id;
UPDATE posts SET comments_count = comments_count + 1 WHERE id = NEW.post_id;
SQL
end
end
# Run: rails db:generate_trigger_migration
# Generates a migration with the raw SQL trigger. The generated SQL is straightforward:
-- The SQL that hair_trigger generates (or you write by hand):
CREATE OR REPLACE FUNCTION comments_counter_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'INSERT' THEN
UPDATE posts SET comments_count = comments_count + 1
WHERE id = NEW.post_id;
RETURN NEW;
ELSIF TG_OP = 'DELETE' THEN
UPDATE posts SET comments_count = comments_count - 1
WHERE id = OLD.post_id;
RETURN OLD;
ELSIF TG_OP = 'UPDATE' THEN
IF OLD.post_id != NEW.post_id THEN
UPDATE posts SET comments_count = comments_count - 1
WHERE id = OLD.post_id;
UPDATE posts SET comments_count = comments_count + 1
WHERE id = NEW.post_id;
END IF;
RETURN NEW;
END IF;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_comments_counter
AFTER INSERT OR UPDATE OR DELETE ON comments
FOR EACH ROW
EXECUTE FUNCTION comments_counter_trigger();
-- Key advantage over ORM callbacks:
-- Fires on ALL write paths — Rails, raw SQL, pg_dump restore,
-- COPY, migrations, console sessions, other applications.
-- Key limitation:
-- Same row-level lock contention as counter_cache.
-- The UPDATE on posts(42) still serializes concurrent writes. Triggers share the same fundamental limitation as counter_cache: the UPDATE on the parent row acquires a row lock, and concurrent writes queue behind it. The benchmark numbers confirm this — trigger-based counting and counter_cache produce nearly identical throughput under contention (2,380 TPS vs 2,140 TPS).
So why bother? Two reasons.
First, accuracy. Triggers fire on every write path — ActiveRecord, raw SQL, COPY, pg_restore, console sessions, cron jobs, other applications sharing the database. counter_cache only fires through ActiveRecord callbacks. If any write bypasses the ORM, the counter drifts. Triggers cannot drift unless someone explicitly disables them.
Second, performance per operation. The trigger executes in C-level PostgreSQL code, not in Ruby. There is no callback chain, no object allocation, no method dispatch overhead. The UPDATE itself is identical, but the path to reach it is shorter. This shows up as a modest improvement in p99 latency (42ms vs 48ms) — not transformative, but real.
The callback vs. trigger overhead, quantified
-- Trigger vs callback: measurable differences beyond locking.
--
-- 1. CALLBACK PATH (counter_cache):
-- Ruby: after_create callback fires
-- Ruby: Post.increment_counter(:comments_count, post_id)
-- Ruby: ActiveRecord builds UPDATE SQL
-- Ruby: Connection checkout from pool (if not already checked out)
-- PG: Parse UPDATE statement
-- PG: Execute UPDATE (acquire lock, write tuple, WAL)
-- Ruby: Process result
--
-- Overhead: ~0.4ms of Ruby method dispatch, object allocation, SQL generation.
-- Per-operation. Every single INSERT.
--
-- 2. TRIGGER PATH (hair_trigger):
-- PG: After INSERT, fire trigger function
-- PG: Execute UPDATE (acquire lock, write tuple, WAL)
--
-- Overhead: ~0.02ms of PL/pgSQL function call overhead.
-- No Ruby. No connection checkout. No SQL parsing (trigger is pre-compiled).
--
-- At 1,000 inserts/sec, the callback overhead is:
-- 0.4ms * 1,000 = 400ms of cumulative Ruby time per second
-- That is 400ms of GVL time your application cannot use for other requests.
--
-- The trigger path: 20ms of cumulative PG time.
-- The lock behavior is identical. But the CPU tax is not. The per-operation overhead difference is small — 0.4ms for the callback path versus 0.02ms for the trigger. But overhead compounds. At 1,000 inserts per second, the callback path consumes 400ms of Ruby GVL time per second on counter updates alone. The trigger path consumes 20ms of PostgreSQL time. That 380ms difference is CPU time your Rails processes can use for serving other requests.
For applications with modest write throughput, the difference is academic. For applications processing thousands of inserts per second, it is the difference between needing 4 Puma workers and needing 5. Not transformative. But measurable, and in a direction that makes your infrastructure slightly more efficient.
Honest counterpoint: triggers are invisible
If your bottleneck is row-level lock contention on hot parents, triggers do not solve it. If your bottleneck is counter drift from non-ORM writes, triggers solve it completely.
But I should note a practical concern: triggers are invisible to your application code. A developer reading the model sees no counter_cache declaration, no callback, no indication that an INSERT on comments will UPDATE the posts table. The trigger lives in a migration file that may have been written two years ago. hair_trigger mitigates this by declaring the trigger in the model file, but the actual enforcement is in the database, not the code.
This means debugging contention is harder. When pg_stat_activity shows lock waits on the posts table from UPDATE queries, and your application code contains no UPDATE on posts, the trigger is the invisible intermediary. Teams that are not accustomed to database-level logic may spend hours searching application code for the source of the lock. Document your triggers prominently. Your future colleagues will thank you.
Alternative 4: Materialized views
An entirely different approach. Do not maintain a counter column at all. Precompute the aggregation in a materialized view and refresh it when needed.
-- Alternative 4: Materialized views for counting.
-- No counter column. No trigger. No row lock contention on writes.
CREATE MATERIALIZED VIEW mv_post_counts AS
SELECT
p.id AS post_id,
COUNT(c.id) AS comments_count,
COUNT(c.id) FILTER (WHERE c.approved) AS approved_count,
COUNT(c.id) FILTER (WHERE c.created_at > NOW() - INTERVAL '24 hours') AS recent_count,
MAX(c.created_at) AS last_comment_at
FROM posts p
LEFT JOIN comments c ON c.post_id = p.id
GROUP BY p.id;
CREATE UNIQUE INDEX ON mv_post_counts (post_id);
-- Reading:
SELECT comments_count FROM mv_post_counts WHERE post_id = 42;
-- 0.3ms. Single index lookup. No aggregation at query time.
-- Refreshing:
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_post_counts;
-- CONCURRENTLY: readers are not blocked during refresh.
-- On 100K posts / 5M comments: ~3.2 seconds. # Using the matview in Rails:
class PostCount < ApplicationRecord
self.table_name = 'mv_post_counts'
# Read-only — this is a materialized view
def readonly?
true
end
end
# In your controller or view:
post_count = PostCount.find_by(post_id: @post.id)
post_count.comments_count # => 847
post_count.approved_count # => 812
post_count.recent_count # => 23
post_count.last_comment_at # => 2026-03-05 09:47:12
# Multiple aggregations from one indexed read.
# No joins. No callbacks. No lock contention. The write path is completely unburdened. No counter update on INSERT. No row lock on the parent. No callback. The comments INSERT is just an INSERT. Throughput is 9,100 TPS — more than 4x the counter_cache approach.
You also get a richer data model for free. Instead of a single comments_count integer, the matview can compute total comments, approved comments, recent comments, and the last comment timestamp — all in one indexed lookup at 0.3ms. Try doing that with counter_cache. You would need four separate counter columns, four separate callbacks, and four separate row locks on every INSERT.
The trade-off is staleness. Between refreshes, the matview serves the count as of the last refresh. A cron-based refresh every 60 seconds means counts can be up to 60 seconds stale. For admin dashboards, listing pages, and analytics — entirely acceptable. For a user who just posted a comment and expects to see the count increment — not acceptable without additional handling.
Refresh strategies: cron, application-triggered, and write-aware
-- Matview refresh strategies: three approaches.
-- 1. CRON: Simple, predictable, wasteful.
-- pg_cron or whenever gem:
SELECT cron.schedule(
'refresh_post_counts',
'*/1 * * * *', -- Every minute
'REFRESH MATERIALIZED VIEW CONCURRENTLY mv_post_counts'
);
-- Problem: refreshes even when no writes have occurred.
-- On a quiet Sunday night, 1,440 unnecessary refreshes.
-- Each one acquires an AccessShareLock and reads the entire comments table.
-- 2. APPLICATION-TRIGGERED: Refresh after writes.
-- In Rails:
class Comment < ApplicationRecord
after_commit :schedule_refresh, on: [:create, :destroy]
private
def schedule_refresh
MatviewRefreshJob.perform_in(10.seconds, 'mv_post_counts')
end
end
class MatviewRefreshJob
include Sidekiq::Job
sidekiq_options queue: :matviews, lock: :until_executed
def perform(view_name)
ActiveRecord::Base.connection.execute(
"REFRESH MATERIALIZED VIEW CONCURRENTLY #{view_name}"
)
end
end
-- Better: only refreshes when data changes.
-- But requires Sidekiq, and the lock: :until_executed is critical
-- to prevent 50 concurrent refreshes from queuing up.
-- 3. WRITE-AWARE (Gold Lapel): automatic, adaptive.
-- No cron. No Sidekiq. Proxy observes traffic and refreshes
-- only when writes have occurred AND reads are happening. The refresh strategy determines the staleness-cost trade-off. Cron is simple but wasteful — it refreshes on a fixed schedule regardless of whether data has changed. Application-triggered refreshes are more efficient but require Sidekiq infrastructure and careful deduplication. Write-aware refreshing (via a proxy like Gold Lapel) adapts to actual traffic patterns without any application-level coordination.
Optimizing the CONCURRENTLY refresh
-- Optimizing REFRESH MATERIALIZED VIEW CONCURRENTLY.
--
-- The CONCURRENTLY option is essential for production use.
-- Without it, readers are blocked for the entire refresh duration.
--
-- Requirements for CONCURRENTLY:
-- 1. The matview must have a UNIQUE index
-- 2. The matview must already be populated (not empty)
--
-- Performance characteristics (100K posts, 5M comments):
-- REFRESH MATERIALIZED VIEW mv_post_counts: 3.2s (blocks reads)
-- REFRESH MATERIALIZED VIEW CONCURRENTLY mv_post_counts: 4.8s (no read blocking)
--
-- The CONCURRENTLY overhead (~50% slower) comes from PostgreSQL:
-- 1. Creating a temporary table with new results
-- 2. Computing a diff against the existing matview
-- 3. Applying INSERTs, UPDATEs, DELETEs to the matview
-- 4. Updating indexes
--
-- For very large matviews, consider partitioning the refresh:
-- Instead of one monolithic matview over all posts,
-- create per-shard matviews or use incremental approaches.
-- Monitor refresh performance:
SELECT schemaname, matviewname,
pg_size_pretty(pg_relation_size(schemaname || '.' || matviewname)) AS size
FROM pg_matviews
WHERE matviewname = 'mv_post_counts'; The CONCURRENTLY keyword is essential for production use. Without it, a REFRESH acquires an AccessExclusiveLock on the matview, blocking all reads for the entire refresh duration — 3.2 seconds in our benchmark. With CONCURRENTLY, readers see the old data until the refresh completes, then atomically see the new data. The refresh takes ~50% longer (4.8 seconds vs 3.2 seconds) due to the diff computation, but no queries are blocked.
The requirement for CONCURRENTLY is a unique index on the matview. Without it, PostgreSQL cannot compute the diff and will reject the command. Always create the unique index when you create the matview. Always.
The operational complexities — stale data windows, refresh locking, and the CONCURRENTLY requirement — are the subject of a piece on materialized view pitfalls the documentation neglected to mention.
"Materialized views pre-compute expensive query results and serve them at index-scan speed. A dashboard that aggregates millions of rows becomes a table that returns pre-computed results in milliseconds."
— from You Don't Need Redis, Chapter 4: A Proper Introduction to PostgreSQL Materialized Views
The hybrid approach: immediate for the writer, matview for everyone else
There is a pattern I am particularly fond of that eliminates the staleness problem for the one user who cares most: the person who just wrote the comment.
# Hybrid approach: immediate counter for the writing user,
# matview for everyone else.
#
# The "optimistic UI" pattern for counter caches.
class CommentsController < ApplicationController
def create
@comment = @post.comments.create!(comment_params)
# Return the optimistic count to the writing user immediately.
# This is the current count + 1, not from the matview.
@optimistic_count = @post.comments.count
respond_to do |format|
format.turbo_stream {
render turbo_stream: turbo_stream.update(
"post_#{@post.id}_comment_count",
html: @optimistic_count.to_s
)
}
end
end
end
# For all other users, the listing page reads from the matview:
class PostsController < ApplicationController
def index
@posts = Post.all
@counts = PostCount.where(post_id: @posts.map(&:id)).index_by(&:post_id)
end
end
# The writing user sees count + 1 immediately (no staleness).
# Everyone else sees the matview count (0.3ms reads, zero write overhead).
# The matview catches up on the next refresh.
# Total contention on the write path: zero. The insight is this: exactly one user has a strong expectation about the counter after a write — the user who triggered the write. Everyone else is looking at a listing page and has no idea whether the count should be 847 or 848. The one user who knows it should be 848 is the one who just posted comment number 848.
So return the correct count to that user immediately via Turbo Stream (or any server-push mechanism). For everyone else, serve the matview count, which will catch up on the next refresh. Total lock contention on the write path: zero. Staleness perception: zero, for the user who would actually notice.
This is not a novel pattern — it is the standard optimistic UI approach applied to counters. But it is uncommon in Rails applications, most of which treat counter_cache as a single solution for all readers rather than recognizing that different readers have different freshness requirements.
The benchmark: 64 clients, hot-spot distribution, 120 seconds
-- Benchmark environment:
-- PostgreSQL 16.2, 8 cores, 32GB RAM
-- shared_buffers = 8GB, work_mem = 64MB
-- 100K posts, 5M comments (50 per post average)
-- All data fits in shared_buffers — isolates CPU/locking from I/O
--
-- Workload: 64 concurrent clients, each running:
-- INSERT INTO comments (post_id, body, approved)
-- VALUES (random_post_id, 'bench', random() > 0.3);
--
-- Hot-spot distribution: 20% of inserts target 100 "hot" posts
-- (simulates viral content / popular routes)
--
-- Duration: 120 seconds per approach
-- Metric capture: pgbench + pg_stat_statements + custom lock monitor
-- Write TPS: successful INSERT transactions per second
-- p99 Write: 99th percentile INSERT latency (includes lock wait)
-- Read Latency: SELECT of the cached/computed count
-- Deadlocks: total deadlock events in 120 seconds
-- Accuracy: match rate vs SELECT COUNT(*) at end of test | Approach | Write TPS | p99 write | Read latency | Deadlocks | Accuracy |
|---|---|---|---|---|---|
| Rails counter_cache | 2,140 | 48ms | 0.4ms | 12 | 99.1% |
| Sidekiq deferred | 8,920 | 3.1ms | 0.4ms | 0 | ~94%* |
| slotted_counters (10 slots) | 7,640 | 6.2ms | 0.9ms | 1 | 99.8% |
| hair_trigger (PG trigger) | 2,380 | 42ms | 0.4ms | 8 | 100% |
| Matview (cron, 60s) | 9,100 | 2.8ms | 0.3ms | 0 | ~92%* |
| Matview (GL-managed) | 9,100 | 2.8ms | 0.3ms | 0 | ~99%* |
* Sidekiq accuracy reflects eventual consistency lag at measurement time. Matview accuracy reflects staleness between refreshes. Neither represents permanent drift — both self-correct.
The results split cleanly into two groups.
Synchronous counters (Rails counter_cache, hair_trigger) top out around 2,100-2,400 TPS. The bottleneck is the row lock on the parent. The trigger approach is marginally faster (10% more TPS, 12% lower p99) because it skips the Ruby callback overhead, but the locking behavior is identical. Both produce deadlocks when concurrent transactions contend for the same parent rows.
Asynchronous and distributed approaches (Sidekiq, slotted_counters, matviews) deliver 3-4x higher write throughput. Sidekiq and matviews reach 8,900-9,100 TPS because neither touches the parent row during the write transaction. slotted_counters lands at 7,640 TPS — the slot distribution reduces contention but does not eliminate it entirely, since two transactions can still land on the same slot.
The accuracy column tells the other half of the story. Triggers achieve 100% accuracy — the count is always exactly correct, because the trigger fires atomically with the write. counter_cache drops to 99.1% due to 12 deadlocked transactions whose counter updates were rolled back. slotted_counters at 99.8% loses a single increment to a deadlock. Sidekiq at ~94% reflects the eventual consistency window — jobs had not yet processed all recent writes at measurement time.
The hot-spot effect: why averages lie
-- Why hot posts destroy counter_cache throughput.
-- 20% of inserts hit 100 posts = ~1,280 inserts/sec on 100 rows.
-- Each post gets ~12.8 concurrent lock requests per second.
-- PostgreSQL lock queue visualization for post_id = 42:
-- Time 0.0ms: Tx A acquires RowExclusiveLock on posts(42)
-- Time 0.1ms: Tx B requests lock — WAITS
-- Time 0.3ms: Tx C requests lock — WAITS behind B
-- Time 0.5ms: Tx D requests lock — WAITS behind C
-- Time 1.2ms: Tx A commits, B acquires lock
-- Time 1.3ms: Tx E requests lock — WAITS behind C, D
-- ...
-- Monitor this in production:
SELECT blocked.pid AS blocked_pid,
blocking.pid AS blocking_pid,
blocked.query AS blocked_query,
age(now(), blocked.query_start) AS wait_duration
FROM pg_stat_activity blocked
JOIN pg_locks bl ON bl.pid = blocked.pid AND NOT bl.granted
JOIN pg_locks gl ON gl.pid != blocked.pid
AND gl.relation = bl.relation
AND gl.granted
JOIN pg_stat_activity blocking ON blocking.pid = gl.pid
WHERE blocked.query ILIKE '%comments_count%'; The benchmark uses a skewed distribution: 20% of inserts target 100 "hot" posts, while the remaining 80% spread across 99,900 posts. This models reality. Content popularity follows a power law. A few posts, routes, products, or users receive dramatically more writes than the rest.
On cold posts, counter_cache is fine. One or two concurrent writers per post. No meaningful lock contention. Sub-millisecond latency. If your entire dataset is uniformly accessed, counter_cache works forever.
But your dataset is not uniformly accessed. The hot 100 posts in our benchmark absorb 12.8 inserts per second each. That is 12.8 transactions per second all contending for the same row lock. The theoretical serialization delay at p99 is straightforward: 12 transactions ahead of you, each holding the lock for ~2ms, gives you 24ms of pure wait time before your UPDATE even begins.
The measured p99 of 48ms includes additional overhead: PostgreSQL's lock manager scheduling, buffer pool contention, WAL flush latency, and checkpoint interference. But the dominant factor is the lock queue.
Slotted counters with 10 slots reduce the effective queue depth to ~1.3 per slot. Sidekiq and matviews reduce it to zero.
I should note that average latency across all posts looks acceptable — perhaps 4ms. This is the danger of monitoring averages. The 80% of writes hitting cold posts dilute the 20% of writes experiencing severe contention. If you are monitoring mean query time and not p99 per query pattern, you will miss this problem entirely until a user reports that posting a comment takes 4 seconds.
Connection pool starvation: the secondary cascade
Lock contention has a secondary effect that is easy to overlook and frequently more damaging than the latency itself: it ties up database connections that could be serving other requests.
-- Lock contention's hidden cost: connection pool exhaustion.
--
-- Your Rails app has a connection pool (default: 5 per process).
-- When a transaction waits on a row lock, it holds a connection.
-- The connection is doing nothing. But it is occupied.
--
-- Scenario: 20 Puma workers, pool_size: 5, total connections: 100.
-- Hot post receives 30 comments in 5 seconds.
-- Each counter_cache UPDATE waits ~50ms average for the lock.
-- During that 50ms, the connection is held but idle.
--
-- 30 lock-waiting transactions * 50ms average = 1.5 seconds of total hold time
-- Spread across your connection pool, that is 1.5 connection-seconds consumed
-- by transactions that are doing nothing but waiting for a lock.
--
-- At high enough contention, this cascades:
-- 1. Lock wait holds connection
-- 2. Other requests queue for a connection
-- 3. Puma worker blocks on connection checkout
-- 4. Request queue backs up
-- 5. 502 errors
--
-- The connection pool timeout (default: 5s) is your cliff.
-- If lock wait + query time exceeds pool checkout timeout:
-- ActiveRecord::ConnectionTimeoutError
-- Monitor this:
SELECT count(*) AS waiting_connections,
avg(age(now(), query_start)) AS avg_wait
FROM pg_stat_activity
WHERE state = 'active'
AND wait_event_type = 'Lock'; A Rails application with 20 Puma workers and a pool size of 5 has 100 database connections. When a counter_cache transaction is waiting on a row lock, it holds one of those connections — doing nothing, but occupying it nonetheless. If 30 concurrent comments on a hot post each wait 50ms on average, that is 1.5 connection-seconds of pure idle hold time.
At high enough contention, this cascades. Connections are occupied by lock-waiting transactions. Incoming requests queue for a database connection. Puma workers block on connection checkout. The request queue backs up. Response times climb not because of the database, but because of the connection pool.
The connection pool timeout (default 5 seconds in Rails) is your cliff. If lock wait time exceeds the checkout timeout, you get ActiveRecord::ConnectionTimeoutError — a 500 error for a request that was not even trying to update a counter. A user browsing your site gets an error because someone else's comment created a lock queue on a hot post.
This is why counter_cache contention is a system-level problem, not a query-level problem. The slow query is just the INSERT with the counter update. The system-level impact is degraded response times and potential 500 errors across your entire application, on every endpoint that needs a database connection.
Monitoring counter_cache contention in production
Before you change anything, measure. The worst outcome is migrating to a complex alternative for a problem you do not actually have. Here are the queries that tell you whether counter_cache contention is affecting your application.
-- Production monitoring queries for counter_cache contention.
-- 1. Lock waits on counter updates in the last 5 minutes:
SELECT query,
calls,
mean_exec_time AS avg_ms,
max_exec_time AS max_ms,
stddev_exec_time AS stddev_ms
FROM pg_stat_statements
WHERE query ILIKE '%comments_count%'
AND query ILIKE '%UPDATE%'
ORDER BY max_exec_time DESC;
-- 2. Current lock queue depth per table:
SELECT relation::regclass AS table_name,
mode,
COUNT(*) AS waiting,
MAX(age(now(), a.query_start)) AS longest_wait
FROM pg_locks l
JOIN pg_stat_activity a ON a.pid = l.pid
WHERE NOT l.granted
GROUP BY relation, mode
ORDER BY waiting DESC;
-- 3. Deadlock frequency (requires log_lock_waits = on):
SELECT datname,
deadlocks,
conflicts
FROM pg_stat_database
WHERE datname = current_database();
-- 4. Set up alerting threshold:
-- If any single UPDATE on your counter columns has p99 > 20ms,
-- you have contention. Time to evaluate alternatives. The key diagnostic signal is the ratio between mean_exec_time and max_exec_time in pg_stat_statements. A counter UPDATE with a mean time of 1.2ms and a max time of 180ms is experiencing severe lock contention — the average operation is fast, but the worst-case operation waited 150x longer because it was queued behind other transactions.
If max_exec_time is less than 10x mean_exec_time, you do not have a contention problem. Keep counter_cache. It is simpler and perfectly adequate for your workload.
If max_exec_time is 10x-50x mean_exec_time, you have moderate contention. Monitor it. Consider slotted_counters for the most contended counters.
If max_exec_time is 50x+ mean_exec_time, or if deadlocks in pg_stat_database is non-zero and growing, you need an alternative approach. The contention is actively degrading performance and likely causing counter drift.
Choosing: a decision framework
Here is the prescriptive guidance, because "it depends" helps nobody.
Keep counter_cache: true if your parent rows receive fewer than 5 concurrent writes per second. This covers the vast majority of Rails applications. Monitoring is simple: query pg_stat_activity for wait_event_type = 'Lock' on your counter update queries. If you do not see lock waits, you do not have a problem. A waiter who repairs what is not broken is no waiter at all.
Use slotted_counters if you need synchronous accuracy (count is always correct) but have hot-spot contention. Ten slots handle 10x the concurrency at the cost of slightly slower reads. Good for e-commerce (order counts per product), social features (like counts per post), and any domain where the count is user-visible and must be immediately accurate.
Use Sidekiq deferred counting if eventual consistency (seconds of delay) is acceptable and you already have Sidekiq infrastructure. Good for internal analytics, admin dashboards, and any count that is not directly displayed to the user who just triggered the change. The debounced variant keeps the overhead negligible even under heavy write load.
Use PostgreSQL triggers (hair_trigger) if your problem is counter drift from non-ORM writes, not contention. Triggers ensure the count is always correct regardless of write path. But they do not solve the lock contention problem — they have the same row-lock behavior as counter_cache. Choose triggers when correctness, not throughput, is your bottleneck.
Use materialized views if you need complex aggregations (not just simple counts), want zero write-path overhead, and can tolerate bounded staleness. Matviews turn the counting problem from a write-path concern into a background refresh concern — a fundamentally different operational profile. For most listing pages and dashboards, the staleness is imperceptible.
Use the hybrid approach (matview + optimistic UI for the writer) if you want zero write-path overhead but cannot accept staleness for the user who just triggered the write. This is the approach I recommend most often, because it recognizes that different readers have different freshness requirements and serves each accordingly.
An honest assessment of counter_cache's defenders
I would be a poor waiter indeed if I did not address the counterargument. counter_cache has defenders, and they are not wrong.
The Rails documentation recommends it. DHH designed it. It has been in the framework since Rails 2.0. It is one line of code. It requires no external dependencies, no background jobs, no matview management, no polymorphic tables. It is simple, correct (under normal conditions), and fast (under low contention). For the vast majority of Rails applications — applications that will never have a post with 200 concurrent comments per minute — it is the right choice.
The alternatives I have presented are more complex. slotted_counters adds a polymorphic table and changes your read queries. Sidekiq deferred counting adds a background job dependency and introduces eventual consistency. Triggers are invisible to application code. Matviews require refresh management and add a new read path. Each solves the contention problem but introduces its own operational surface area.
If you are building a new Rails application and are not yet sure whether you will have hot-spot contention, start with counter_cache. It is the simplest correct choice. Add monitoring (the pg_stat_statements queries above). If and when contention appears, you will have the data to choose the right alternative and the benchmarks in this article to guide the decision.
The mistake is not using counter_cache. The mistake is using it, encountering contention, and adding more hardware instead of changing the approach. Row-level lock contention is not a hardware problem. More CPU, more RAM, faster disks — none of these reduce the lock queue depth on a hot row. The lock is a logical serialization point, and it requires a logical solution.
Write-aware refresh: matviews without the staleness
-- Gold Lapel's approach: write-aware matview refresh.
-- No cron. No Sidekiq. No manual REFRESH commands.
-- 1. Create the matview (you or GL suggests it):
CREATE MATERIALIZED VIEW mv_post_counts AS
SELECT post_id, COUNT(*) AS comments_count
FROM comments GROUP BY post_id;
-- 2. Your Rails app queries it normally:
-- SELECT comments_count FROM mv_post_counts WHERE post_id = $1;
-- 3. GL observes two signals:
-- a) The comments table received INSERTs/UPDATEs/DELETEs
-- b) The mv_post_counts view was queried
--
-- BOTH must be true before GL triggers a refresh.
--
-- No writes + reads? Data hasn't changed. Skip refresh.
-- Writes + no reads? Nobody is looking. Skip refresh.
-- Writes + reads? Stale data being served. Refresh now.
-- Result: counts stay fresh when they matter,
-- zero refresh overhead when they don't.
-- No lock contention on the write path — ever. The matview approach has one weakness: staleness. A cron job refreshing every 60 seconds means counts can be wrong for up to 60 seconds. Refresh too frequently and you waste CPU recomputing unchanged data. Refresh too infrequently and you serve stale counts.
Gold Lapel sits between your Rails application and PostgreSQL as a transparent proxy. It tracks two signals for every materialized view: whether the matview has been queried, and whether the underlying tables have been written to. When both conditions are true — and only then — it triggers a refresh.
This is the "GL-managed" row in the benchmark table. Same write throughput as cron-refreshed matviews (9,100 TPS). Same read latency (0.3ms). Same zero deadlocks. But accuracy jumps from ~92% to ~99%, because the refresh is driven by actual write activity rather than a timer that may or may not align with when data changes.
No cron jobs to configure. No Sidekiq queues to monitor. No REFRESH MATERIALIZED VIEW commands in your application code. The proxy observes the traffic and does the right thing.
The counter_cache problem — row contention killing write throughput — simply does not exist with matviews. There is no counter column to lock. The count is a precomputed result that reads in 0.3ms and refreshes in the background without touching the write path at all.
Parting observations
counter_cache: true is a beautiful piece of Rails design. One line of code, and your parent model has an always-up-to-date count of its children. No joins. No aggregation. No background jobs. It embodies the Rails philosophy of convention over configuration, and it works flawlessly for the common case.
The uncommon case is where it falters. When a parent row becomes popular — when dozens or hundreds of concurrent writes target the same row — the one-line convenience becomes a one-line serialization bottleneck. The row lock that provides consistency becomes the row lock that destroys throughput. The counter that should be invisible becomes the counter that wakes up the on-call engineer.
The four alternatives in this article address different facets of the problem. slotted_counters distributes the contention. Sidekiq deferred counting removes it from the write path. Triggers move the logic into the database for correctness. Matviews eliminate the counter column entirely. Each makes a different trade-off between consistency, complexity, and throughput.
Measure first. The monitoring queries in this article take 30 seconds to run. The migration to an alternative approach takes hours. Ensure the problem exists before you solve it. And when you do solve it, choose the approach that matches your actual requirements — not the most technically interesting one, but the one that serves your particular household best.
The counter_cache documentation does not mention any of this. Perhaps it should. In the meantime, I trust this article has been of some service.
Frequently asked questions
Terms referenced in this article
The materialized view approach mentioned above has its own set of subtleties. I have taken the liberty of preparing a thorough treatment of materialized view pitfalls — refresh timing, bloat, and the concurrency traps that most guides omit.