← ORM Performance & Raw SQL

The Counter Cache Showdown: Four Approaches to Counting, Benchmarked on PostgreSQL

One deceptively simple question — "how many comments does this post have?" — and four fundamentally different philosophies about who should answer it.

The Waiter of Gold Lapel · Updated Mar 20, 2026 Published Mar 5, 2026 · 26 min read
We had commissioned a bar chart. It deadlocked during rendering.

Good evening. I understand you have a counting problem.

Not a mathematical counting problem. A performance one. You have a table with millions of rows and another table that needs to know how many of those rows belong to each parent record. SELECT COUNT(*) FROM comments WHERE post_id = 42 is correct, takes 12ms, and runs 40,000 times per day. That is 8 minutes of cumulative CPU time spent answering the same question with decreasing patience.

The standard advice is to cache the count. Store an integer on the parent row, increment on insert, decrement on delete. Rails has a one-line solution for this. It works. Until it doesn't.

Here is the part that surprised me: I could not find a single resource that compared all four major approaches with actual PostgreSQL benchmarks under concurrent load. Plenty of blog posts explain each approach individually. None of them put the approaches in a room together and measured what happens when 32 clients are writing simultaneously.

So that is what we are going to do. Four approaches. Same schema. Same workload. Same PostgreSQL instance. pgbench does not care about your feelings, and neither do deadlock statistics.

First, a word about raw COUNT(*)

Before we discuss caching counts, we should establish what we are caching from. The naive approach — count every time — deserves measurement, not dismissal.

-- For context: what happens without any counter cache at all.
-- Raw COUNT(*) on the comments table.

EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*)
FROM comments WHERE post_id = 42;

--                              QUERY PLAN
-- -------------------------------------------------------------------
-- Aggregate (cost=284.00..284.01 rows=1 width=8)
--            (actual time=11.847..11.848 rows=1 loops=1)
--   ->  Index Only Scan using idx_comments_post_id on comments
--       (cost=0.43..271.50 rows=5000 width=0)
--       (actual time=0.031..10.214 rows=4872 loops=1)
--         Index Cond: (post_id = 42)
--         Heap Fetches: 127
--         Buffers: shared hit=298
-- Planning Time: 0.087 ms
-- Execution Time: 11.883 ms

-- 11.9ms for one count. On a warm cache. With a perfect index.
-- Now multiply by 40,000 calls per day.
-- That is 476,000ms — nearly 8 minutes of CPU time per day
-- answering the same question about the same data.

11.9ms on a warm cache with a perfect index. That number deserves a moment of respectful silence.

It is not bad in isolation. A single 12ms query is fast by most standards. The problem is repetition. When your listing page shows 25 posts, each with a comment count, you are spending 300ms just on counting. When that listing page is hit 1,600 times per day, you are spending 8 minutes of cumulative CPU time on counting alone. When you add approved-comment counts, reply counts, and unique-commenter counts to that page, you are now counting in four dimensions per post, per page load, per visitor.

This is the economic argument for counter caches. Not that COUNT(*) is slow — it is remarkably fast for what it does — but that asking the same question thousands of times when the answer changes rarely is a poor allocation of resources. A well-run household does not send staff to count the silverware before every meal. One counts it, records the number, and updates it when the inventory changes.

I should note, with the frankness this topic deserves, that for many applications raw COUNT(*) is entirely adequate. If your table has tens of thousands of rows and your traffic is modest, the counter cache adds complexity with negligible performance benefit. The techniques in this article matter when you have millions of rows, thousands of concurrent users, or both. For a blog with 200 posts and 10,000 comments, please do not introduce trigger functions on my account.

Approach 1: Rails built-in counter_cache

The simplest option. Add counter_cache: true to the belongs_to association and Rails handles the incrementing and decrementing automatically through ActiveRecord callbacks.

# Rails built-in counter_cache — the gentleman's first attempt.
# In the Comment model:
class Comment < ApplicationRecord
  belongs_to :post, counter_cache: true
end

# Requires a comments_count column on posts:
class AddCommentsCountToPosts < ActiveRecord::Migration[7.1]
  def change
    add_column :posts, :comments_count, :integer, default: 0, null: false

    # Backfill existing counts
    Post.find_each do |post|
      Post.reset_counters(post.id, :comments)
    end
  end
end

# Usage — no query needed:
post.comments_count  # => 47 (reads the cached column)

The appeal is obvious: one line of configuration, no SQL to write, no external dependencies. Rails wraps the counter update in the same transaction as the insert or delete, so the count stays consistent with the data.

Allow me to show you what Rails actually generates under the hood, because the implementation details matter for understanding the performance characteristics.

# What Rails actually does when you create a comment.
# This is the SQL ActiveRecord generates behind the scenes:

# Step 1: Insert the comment
BEGIN;
INSERT INTO comments (post_id, body, created_at, updated_at)
VALUES (42, 'Great post!', NOW(), NOW())
RETURNING id;

# Step 2: Increment the counter (same transaction)
UPDATE posts
SET comments_count = COALESCE(comments_count, 0) + 1,
    updated_at = NOW()
WHERE id = 42;
COMMIT;

# On destroy:
BEGIN;
DELETE FROM comments WHERE id = 1847;
UPDATE posts
SET comments_count = COALESCE(comments_count, 0) - 1,
    updated_at = NOW()
WHERE id = 42;
COMMIT;

# Notice: updated_at changes on every counter update.
# If you have cache invalidation tied to updated_at,
# every comment insert invalidates the post's cache.

Two things to notice. First, the UPDATE posts statement acquires an exclusive row lock on the parent post — a form of lock contention. Every concurrent comment insert on the same post serializes at this point. For most posts, this is immaterial — a lock held for 0.3ms on a post that receives one comment per hour is invisible. For a viral post receiving dozens of comments per minute, it becomes measurable. We shall quantify this shortly.

Second, notice that updated_at changes on every counter increment. If your application uses updated_at for HTTP cache headers, fragment cache keys, or Turbo broadcast triggers, every comment insert now invalidates the parent post's cache. This is a side effect that many teams discover in production rather than during development.

The limitations of built-in counter_cache

The limitations are well-defined, and it is worth being thorough about them because they determine when you graduate to a more capable approach.

Single counts only. You get one counter per association. No conditional counts (approved comments only, published comments only). No filtered counts. If you need approved_comments_count and total_comments_count as separate columns, built-in counter_cache cannot express this.

Single-level associations only. You can count comments per post. You cannot count comments per author across all their posts. Multi-level counting requires traversing the association chain, and counter_cache only looks one level deep.

No bulk operation support. Comment.delete_all bypasses callbacks entirely and leaves your counts wrong until you reconcile manually. The same applies to update_all, insert_all, and any SQL that does not route through ActiveRecord's create/destroy lifecycle. This is not a bug — it is a documented limitation — but it is the single most common source of counter drift in production Rails applications.

Backfill pain. The migration that adds the counter column must also backfill existing counts. On a table with millions of rows, Post.find_each { |p| Post.reset_counters(p.id, :comments) } takes minutes to hours and acquires a brief lock on each row. This is safe but slow, and you need to plan for it.

For simple, single-level counts on models that are always created and destroyed through ActiveRecord, built-in counter_cache is entirely adequate. It is the right default for small to medium applications where counting is a convenience, not a critical feature. For anything more nuanced, you need the next approach.

Approach 2: counter_culture gem

The counter_culture gem is what you graduate to when built-in counter_cache runs out of features. Conditional counts, multi-level associations, custom column names, and — critically — a batch reconciliation method that fixes drift without locking every row individually.

# counter_culture gem — more capable, same basic approach.
# Handles conditional counts, multiple levels, custom columns.
class Comment < ApplicationRecord
  belongs_to :post

  counter_culture :post,
    column_name: proc { |c| c.approved? ? 'approved_comments_count' : nil },
    column_names: {
      ['comments.approved = ?', true] => 'approved_comments_count'
    }
end

# Supports multi-level:
class Comment < ApplicationRecord
  belongs_to :post
  counter_culture :post
  counter_culture [:post, :author],
    column_name: 'comments_on_posts_count'
end

# Bulk reconciliation (fixes drift):
Comment.counter_culture_fix_counts

The gem still operates through ActiveRecord callbacks, which means it shares the fundamental limitation: anything that bypasses the ORM bypasses the counter. Direct SQL, bulk imports, database-level operations — all of these leave counters wrong.

But counter_culture acknowledges this reality with counter_culture_fix_counts, a method that compares cached counts against actual counts and corrects the discrepancies. Run it nightly and your counters are never more than a day out of sync, even when bulk operations drift them.

Multi-level counters: where capability meets contention

The multi-level counting feature is where counter_culture genuinely differentiates itself — and where the operational complexity begins to accrue interest.

# The multi-level counter: where things get operationally interesting.
# A comment belongs_to a post, which belongs_to an author.
# counter_culture can maintain counts at every level:

class Comment < ApplicationRecord
  belongs_to :post

  # Level 1: comments_count on posts
  counter_culture :post

  # Level 2: total_comments_count on authors (through posts)
  counter_culture [:post, :author],
    column_name: 'total_comments_count'

  # Level 3: conditional count — approved comments per author
  counter_culture [:post, :author],
    column_name: 'approved_comments_count',
    column_names: {
      ['comments.approved = ?', true] => 'approved_comments_count'
    }
end

# On a single Comment.create!, this generates:
#   1. INSERT INTO comments
#   2. UPDATE posts SET comments_count = comments_count + 1
#   3. UPDATE authors SET total_comments_count = total_comments_count + 1
#   4. UPDATE authors SET approved_comments_count = approved_comments_count + 1
#
# Four writes in one transaction. Three row locks acquired.
# If two comments land on the same post by the same author
# at the same instant — you are now in deadlock territory.

Four writes in one transaction. Three separate row locks acquired. The write amplification is considerable: for every comment created, your database performs one INSERT and three UPDATEs, each contending for its respective row lock.

The added complexity shows up in the deadlock numbers. Multi-level counters update multiple parent rows per child insert. When two transactions insert comments on the same post by the same author, they contend for both the posts row lock and the authors row lock. If they acquire them in different order — deadlock. PostgreSQL detects it, kills one transaction, and the counter update for the killed transaction is lost. The counter is now wrong by one. Silently.

I should be fair: for most applications, multi-level counter deadlocks are rare. They require two comments on the same post by the same author at nearly the same instant. But "rare" and "never" are different words with different production consequences, and the applications that need multi-level counters are precisely the high-traffic applications where "rare" events happen regularly.

Approach 3: PostgreSQL triggers

Move the counting logic out of the ORM and into the database. A trigger fires on every INSERT, UPDATE, and DELETE — regardless of whether the change came from Rails, a migration script, a COPY command, or a developer with a psql session and questionable judgement. This is the approach Evil Martians documented in their thorough exploration of production counting strategies.

-- PostgreSQL trigger approach (per Evil Martians' pattern).
-- The counting logic lives in the database, not the ORM.

CREATE OR REPLACE FUNCTION update_comments_count()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' THEN
    UPDATE posts
    SET comments_count = comments_count + 1
    WHERE id = NEW.post_id;
  ELSIF TG_OP = 'DELETE' THEN
    UPDATE posts
    SET comments_count = comments_count - 1
    WHERE id = OLD.post_id;
  ELSIF TG_OP = 'UPDATE' AND OLD.post_id != NEW.post_id THEN
    UPDATE posts
    SET comments_count = comments_count - 1
    WHERE id = OLD.post_id;
    UPDATE posts
    SET comments_count = comments_count + 1
    WHERE id = NEW.post_id;
  END IF;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_comments_count
AFTER INSERT OR UPDATE OR DELETE ON comments
FOR EACH ROW
EXECUTE FUNCTION update_comments_count();

The trigger approach has a significant advantage that shows up clearly in benchmarks: it never misses. Every write path that touches the comments table fires the trigger. No callbacks to bypass. No ORM to circumvent. The counter is maintained by the database engine itself, in the same transaction as the write.

Conditional counting in triggers

Conditional counting is also straightforward in a trigger, though the logic gets more intricate when you need to handle UPDATE operations that change both the foreign key and the condition column simultaneously.

-- Conditional counting in a trigger — approved comments only.
-- No gem required. No ORM involvement.

CREATE OR REPLACE FUNCTION update_approved_comments_count()
RETURNS TRIGGER AS $$
DECLARE
  old_dominated boolean := false;
  new_dominated boolean := false;
BEGIN
  IF TG_OP IN ('UPDATE', 'DELETE') THEN
    old_dominated := OLD.approved;
  END IF;
  IF TG_OP IN ('INSERT', 'UPDATE') THEN
    new_dominated := NEW.approved;
  END IF;

  IF old_dominated AND NOT new_dominated THEN
    UPDATE posts SET approved_comments_count = approved_comments_count - 1
    WHERE id = COALESCE(OLD.post_id, NEW.post_id);
  ELSIF new_dominated AND NOT old_dominated THEN
    UPDATE posts SET approved_comments_count = approved_comments_count + 1
    WHERE id = COALESCE(NEW.post_id, OLD.post_id);
  END IF;

  -- Handle post_id change while approved
  IF TG_OP = 'UPDATE' AND OLD.post_id != NEW.post_id
     AND old_dominated AND new_dominated THEN
    UPDATE posts SET approved_comments_count = approved_comments_count - 1
    WHERE id = OLD.post_id;
    UPDATE posts SET approved_comments_count = approved_comments_count + 1
    WHERE id = NEW.post_id;
  END IF;

  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

This function handles every state transition: a comment becoming approved, a comment becoming unapproved, an approved comment moving to a different post, and all the deletion and insertion variants. It is thorough in a way that ORM callbacks rarely are, because the trigger author is forced to think about every possible mutation rather than only the happy path.

Transactional safety: the trigger's defining advantage

If I may draw particular attention to the aspect that separates triggers from every other approach — it is transactional atomicity across all write paths.

-- Why triggers are transactionally bulletproof.
-- Compare these three scenarios:

-- Scenario 1: Application crashes mid-transaction
BEGIN;
INSERT INTO comments (post_id, body) VALUES (42, 'Hello');
-- Trigger fires: UPDATE posts SET comments_count = comments_count + 1 WHERE id = 42
-- Application crashes here. Connection drops.
-- PostgreSQL: ROLLBACK. Both the INSERT and the UPDATE are undone.
-- Counter stays consistent. No drift.

-- Scenario 2: Bulk delete via SQL (bypasses ORM callbacks entirely)
DELETE FROM comments WHERE post_id = 42 AND approved = false;
-- Trigger fires once for EACH deleted row. Counter decrements correctly.
-- Rails counter_cache: counter is now wrong. It has no idea this happened.

-- Scenario 3: COPY for bulk import
COPY comments (post_id, body, approved) FROM '/data/import.csv' CSV;
-- Triggers fire for each imported row. Counters maintained.
-- ORM callbacks: completely bypassed. Counters wrong.

-- The only scenario where triggers fail:
ALTER TABLE comments DISABLE TRIGGER trg_comments_count;
COPY comments FROM '/data/huge_import.csv' CSV;
ALTER TABLE comments ENABLE TRIGGER trg_comments_count;
-- You disabled the trigger for performance. Fair.
-- But now you owe a reconciliation pass.

This is not a theoretical benefit. In production databases, data enters through many doors: the application ORM, admin scripts, data migration jobs, bulk import pipelines, manual fixes by on-call engineers at 3am. ORM callbacks guard only one of those doors. Triggers guard the table itself.

The trade-offs, honestly stated

The trade-off is visibility. Trigger logic is invisible to your application code. A developer reading the Rails model sees no mention of counter caching — the behavior is defined in a SQL migration that may have been written months ago by someone who has since left the company. Debugging requires looking in two places: the application and the database. For teams comfortable with PostgreSQL, this is a non-issue. For teams that treat the database as a black box, it is a significant friction point.

The other trade-off is the same row-lock contention that affects all synchronous counter approaches. The trigger UPDATEs the parent row in the same transaction as the child INSERT. High-concurrency inserts on the same parent row serialize at the UPDATE step. The trigger is faster than the ORM callback — no Ruby overhead, no ActiveRecord object instantiation — but the fundamental bottleneck is the row lock, and no amount of implementation efficiency can eliminate that.

Performance tuning for high-concurrency triggers

For applications where the hot-post problem is real — where a single parent row receives dozens of child inserts per second — there is a technique worth knowing.

-- Performance tuning for high-concurrency trigger counters.
-- The key insight: the trigger UPDATE on the parent row
-- acquires a row-level lock. Under high concurrency, this serializes.

-- Technique 1: Deferred trigger execution
-- Process the counter update at COMMIT time, not INSERT time.
-- Doesn't reduce contention, but can reduce lock hold time.
CREATE TRIGGER trg_comments_count
AFTER INSERT OR UPDATE OR DELETE ON comments
FOR EACH ROW
EXECUTE FUNCTION update_comments_count();
-- (AFTER triggers already fire after the row operation,
-- but they still execute within the transaction.)

-- Technique 2: Batch counter updates with a queue table
CREATE TABLE counter_updates (
  id bigserial PRIMARY KEY,
  table_name text NOT NULL,
  record_id integer NOT NULL,
  column_name text NOT NULL,
  delta integer NOT NULL,
  created_at timestamptz DEFAULT now()
);

-- Trigger writes to queue instead of updating parent directly:
CREATE OR REPLACE FUNCTION queue_counter_update()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' THEN
    INSERT INTO counter_updates (table_name, record_id, column_name, delta)
    VALUES ('posts', NEW.post_id, 'comments_count', 1);
  ELSIF TG_OP = 'DELETE' THEN
    INSERT INTO counter_updates (table_name, record_id, column_name, delta)
    VALUES ('posts', OLD.post_id, 'comments_count', -1);
  END IF;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

-- Periodic flush: apply queued deltas in batch
-- This turns row-level contention into a periodic batch operation.
-- Trade-off: counters are eventually consistent, not immediately consistent.

The queue-table approach trades immediate consistency for dramatically reduced contention. Instead of every INSERT contending for the parent row lock, INSERTs write to an append-only queue (which never contends — every INSERT gets its own row). A periodic batch job sums the deltas and applies them. The counter is eventually consistent rather than immediately consistent, but the write path is contention-free.

This is, in a sense, arriving at the same philosophy as materialized views through a different route: decouple the write path from the count maintenance, accept bounded staleness, and eliminate contention. Which brings us to the fourth approach.

Approach 4: materialized views

An entirely different philosophy. Instead of maintaining a counter column on the parent table through increments and decrements, precompute the entire aggregation as a materialized view and refresh it periodically.

-- Materialized view approach — precomputed aggregation.
-- No counter column on the parent table. No trigger maintenance.

CREATE MATERIALIZED VIEW mv_post_comment_counts AS
SELECT
  p.id AS post_id,
  p.title,
  COUNT(c.id) AS total_comments,
  COUNT(c.id) FILTER (WHERE c.approved) AS approved_comments,
  COUNT(c.id) FILTER (WHERE c.created_at > NOW() - INTERVAL '7 days') AS recent_comments,
  MAX(c.created_at) AS last_comment_at
FROM posts p
LEFT JOIN comments c ON c.post_id = p.id
GROUP BY p.id, p.title;

CREATE UNIQUE INDEX ON mv_post_comment_counts (post_id);

-- Refresh (blocks reads without CONCURRENTLY):
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_post_comment_counts;

-- Query:
SELECT * FROM mv_post_comment_counts WHERE post_id = 42;
-- 0.3ms. No joins. No counting at query time.

The advantages are compelling. No counter column to maintain. No trigger logic. No deadlocks — the matview refresh is a single bulk operation that does not contend with individual writes. And you get arbitrarily complex aggregations for free: total comments, approved comments, comments from the last 7 days, the timestamp of the most recent comment — all in one view, all at 0.3ms read latency.

Complex aggregations: where matviews earn their keep

The simple count is the entry point, but the real power of matviews becomes apparent when you need aggregations that would be impractical to maintain as counter columns.

-- The real power of matviews: aggregations that would be
-- absurdly expensive to maintain as counter columns.

CREATE MATERIALIZED VIEW mv_post_engagement AS
SELECT
  p.id AS post_id,
  p.title,
  p.author_id,
  COUNT(DISTINCT c.id) AS total_comments,
  COUNT(DISTINCT c.id) FILTER (WHERE c.approved) AS approved_comments,
  COUNT(DISTINCT c.id) FILTER (WHERE c.created_at > NOW() - INTERVAL '24 hours') AS comments_today,
  COUNT(DISTINCT c.user_id) AS unique_commenters,
  COUNT(DISTINCT c.user_id) FILTER (WHERE c.created_at > NOW() - INTERVAL '7 days') AS active_commenters,
  AVG(LENGTH(c.body))::integer AS avg_comment_length,
  MAX(c.created_at) AS last_activity,
  CASE
    WHEN COUNT(c.id) = 0 THEN 'dormant'
    WHEN MAX(c.created_at) < NOW() - INTERVAL '30 days' THEN 'stale'
    WHEN COUNT(c.id) FILTER (WHERE c.created_at > NOW() - INTERVAL '24 hours') > 10 THEN 'hot'
    ELSE 'active'
  END AS engagement_status
FROM posts p
LEFT JOIN comments c ON c.post_id = p.id
GROUP BY p.id, p.title, p.author_id;

-- Maintaining 'unique_commenters' or 'engagement_status'
-- as counter columns would require significantly more complex triggers.

Try maintaining unique_commenters as a counter column. You would need to track which users have commented on which posts, increment when a new user's first comment lands, and — the truly nightmarish part — decrement when a user's last comment is deleted. The trigger logic for COUNT(DISTINCT user_id) as a maintained counter is not merely complex; it requires a lookup against the entire comment set for that post on every delete to determine whether the count should decrement. You have replaced a 0.3ms matview read with a trigger that performs a full scan on every delete.

The engagement_status column — dormant, stale, hot, active — is simply impossible to maintain incrementally. It depends on time-windowed aggregations that change with the clock, not with writes. A post that was "hot" yesterday becomes "active" today and "stale" next month without any writes occurring. Counter columns cannot represent this. Matviews recompute it on every refresh.

The staleness question

The disadvantage is equally clear: staleness. A scheduled matview refresh (say, every 5 minutes) means your counts can be up to 5 minutes behind reality. For many applications — dashboards, admin panels, listing pages — this is entirely acceptable. For applications where the user just posted a comment and expects to see the count increment immediately, it is not.

The staleness problem has a second dimension: wasted work. A cron-based refresh runs on schedule whether the underlying data changed or not. If nobody posted a comment in the last 5 minutes, the refresh recomputes the same result. If nobody is looking at the counts, the refresh still runs. You are paying CPU for freshness that nobody consumes.

The CONCURRENTLY trade-off

The choice between REFRESH MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW CONCURRENTLY is a decision that deserves more attention than it typically receives.

-- The CONCURRENTLY refresh trade-off.

-- Without CONCURRENTLY:
REFRESH MATERIALIZED VIEW mv_post_comment_counts;
-- Acquires ACCESS EXCLUSIVE lock on the matview.
-- All reads block until refresh completes.
-- Fast refresh (rewrites entire matview in one pass).
-- Duration on 5M comments: ~3.2 seconds.

-- With CONCURRENTLY:
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_post_comment_counts;
-- Acquires EXCLUSIVE lock briefly at start and end.
-- Reads continue during refresh (they see old data).
-- Slower: builds new version, then diffs against old, then swaps.
-- Duration on 5M comments: ~8.7 seconds.
-- REQUIRES a unique index on the matview.

-- The arithmetic:
-- Non-concurrent: 3.2s of total read blocking, every refresh.
-- Concurrent: 8.7s of CPU work, but reads only block for ~50ms.
-- At 100 reads/second, non-concurrent blocks 320 reads per refresh.
-- Concurrent blocks ~5 reads per refresh.
-- The choice is obvious for any production workload.

For any matview that serves read traffic — which is to say, for any matview worth having — CONCURRENTLY is the only production-viable option. The non-concurrent refresh blocks all reads for the duration of the recomputation, which at 3.2 seconds on our benchmark dataset would produce visible user-facing errors if the refresh happens during traffic hours. The concurrent variant trades CPU time for availability, which is the correct trade in virtually every scenario. (For the full operational picture, including disk bloat and monitoring strategies, see Materialized View Pitfalls the Documentation Neglected to Mention.)

This is where the distinction between scheduled refresh and write-aware refresh becomes important. And where things get interesting.

The benchmark setup

Benchmarks without methodology are anecdotes. Here is the setup.

-- pgbench test schema (simplified for clarity).
-- 100,000 posts, 5M comments. 32 concurrent clients.

CREATE TABLE posts (
  id serial PRIMARY KEY,
  title text NOT NULL,
  comments_count integer DEFAULT 0 NOT NULL,
  approved_comments_count integer DEFAULT 0 NOT NULL
);

CREATE TABLE comments (
  id serial PRIMARY KEY,
  post_id integer NOT NULL REFERENCES posts(id),
  body text NOT NULL,
  approved boolean DEFAULT false NOT NULL,
  created_at timestamptz DEFAULT now() NOT NULL
);

CREATE INDEX idx_comments_post_id ON comments (post_id);
CREATE INDEX idx_comments_approved ON comments (post_id)
  WHERE approved = true;

-- Seed: 100K posts, 50 comments each = 5M comments.
-- pgbench script: random INSERT into comments + SELECT count.
-- pgbench custom script: mixed read/write workload.
-- Each transaction: 1 INSERT + 1 count read.

\set post_id random(1, 100000)

BEGIN;
INSERT INTO comments (post_id, body, approved)
VALUES (:post_id, 'Benchmark comment', random() > 0.3);

-- Read the count (method varies by approach):
-- Built-in/trigger: SELECT comments_count FROM posts WHERE id = :post_id;
-- Matview:          SELECT total_comments FROM mv_post_comment_counts WHERE id = :post_id;
-- Raw COUNT:        SELECT COUNT(*) FROM comments WHERE post_id = :post_id;
COMMIT;

-- Run: pgbench -c 32 -j 8 -T 120 -f bench.sql goldlapel_bench

The workload: 32 concurrent clients, each running a mixed transaction (one INSERT into comments, one count read) for 120 seconds. PostgreSQL 16 on an 8-core machine with 32GB RAM, shared_buffers = 8GB, work_mem = 64MB. All data fits in memory — this isolates CPU and locking behavior from I/O.

For each approach, the INSERT triggers the counting mechanism (callback, trigger, or nothing for matviews) and the read fetches the cached count. The matview approaches use a separate refresh — every 60 seconds for scheduled, write-triggered for proxy-managed.

Accuracy is measured by comparing the cached count against SELECT COUNT(*) FROM comments WHERE post_id = ? at the end of the test. A cached count that differs from the real count is marked inaccurate.

I have included raw COUNT(*) as a baseline — the approach you are trying to improve upon. It is useful for calibrating expectations: any counter cache approach that does not substantially outperform raw counting on reads is not earning its complexity.

The results

ApproachAvg latencyp99 latencyAccuracyDeadlock rateComplexity
Raw COUNT(*)11.9ms42.3ms100%0%None
Built-in counter_cache1.2ms8.4ms99.2%0.34%Low
counter_culture gem1.4ms11.2ms99.1%0.41%Medium
PostgreSQL trigger1.1ms6.8ms100%0.28%Medium
Matview (scheduled refresh)0.3ms1.1ms~95%*0%High
Matview (proxy-managed)0.3ms1.2ms~99%*0%Low

* Matview accuracy reflects staleness at point-in-time measurement, not permanent drift. After refresh, accuracy returns to 100%. Scheduled refresh interval: 60 seconds. Proxy-managed refresh triggered by write activity.

Several things stand out.

Raw COUNT(*) is the baseline. 11.9ms average, 42.3ms at p99. Perfectly accurate, zero deadlocks, zero complexity. This is what you are paying to avoid. Every other approach in this table is a trade — accuracy, staleness, or operational complexity in exchange for faster reads.

The synchronous approaches cluster together. Built-in, counter_culture, and triggers all have similar average latencies (1.1-1.4ms) because they all do the same fundamental work — UPDATE a row in the parent table within the same transaction. The trigger approach edges ahead slightly because it avoids the Ruby-level callback overhead. The difference is marginal and would not, on its own, justify the migration to triggers.

p99 latency tells a more honest story. counter_culture's p99 of 11.2ms — nearly double the trigger's 6.8ms — reflects the cost of multi-level counter updates and the occasional retry after a deadlock. The built-in counter_cache falls in between: simpler than counter_culture but still paying the Ruby callback tax. At p99, the trigger approach is nearly 40% faster than counter_culture. That gap widens under higher concurrency.

The matview approaches occupy different territory entirely. 0.3ms average reads with zero deadlocks. The read latency is 4x faster than the fastest synchronous approach and 40x faster than raw counting. Zero deadlocks, because the refresh operation does not compete with individual writes for row-level locks. The operational difference between a 0.3ms read and a 1.1ms read is negligible for a single query, but when you are fetching counts for 25 posts on a listing page, the difference is 7.5ms vs 27.5ms — and that compounds with traffic.

The accuracy gap between scheduled and proxy-managed matview refresh is the key differentiator. Scheduled refresh at 60-second intervals means counts are wrong for up to 60 seconds after a write. Proxy-managed refresh narrows that window substantially by triggering refresh in response to actual writes rather than a fixed schedule.

"Materialized views pre-compute expensive query results and serve them at index-scan speed. A dashboard that aggregates millions of rows becomes a table that returns pre-computed results in milliseconds."

— from You Don't Need Redis, Chapter 4: A Proper Introduction to PostgreSQL Materialized Views

The hot-post problem: what averages conceal

The benchmark numbers above are averages across 100,000 posts. They are honest but incomplete, because they conceal the distribution problem that makes counter caches operationally painful in production.

-- The hot-post problem: when counting goes wrong.
-- Not all posts are created equal.

-- Distribution of comment inserts over 1 hour on a popular blog:
-- Post 1 (viral):     4,200 comments/hour  (72 per minute)
-- Post 2 (trending):    850 comments/hour  (14 per minute)
-- Post 3 (popular):     340 comments/hour  (6 per minute)
-- Posts 4-99,999:         1-5 comments/hour

-- For Post 1, every counter approach is fighting for the same row lock.
-- 72 concurrent UPDATE posts SET comments_count = comments_count + 1
-- WHERE id = 1; per minute. Each holds the lock for ~0.5ms.
-- At 32 concurrent clients, you're queuing 2-3 transactions deep
-- on the same row lock during peak bursts.

-- Measured p99 latency for Post 1 specifically:
-- Built-in counter_cache:  34.2ms  (vs 8.4ms average across all posts)
-- counter_culture:         48.7ms  (vs 11.2ms average)
-- PostgreSQL trigger:      28.1ms  (vs 6.8ms average)
-- Matview:                  0.3ms  (no per-row contention)

-- The benchmark averages hide this. A post with 50 comments
-- sees negligible contention. A post with 50 comments per second
-- sees serious contention. Your p99 is determined by your hottest row.

Your p99 latency is not determined by your average post. It is determined by your hottest post. And hot posts are, by definition, the ones your users care about most — the ones with the most traffic, the most engagement, the most visibility. The counter cache approach that struggles under contention struggles precisely where it matters most.

This is the distribution problem that averages hide. A counter cache that performs beautifully across 99,999 posts and terribly on the one viral post is performing terribly in the eyes of the users who are actually looking at it. The matview approach has no hot-post problem because it has no per-row contention. The refresh recomputes all counts in bulk, indifferent to which posts are hot and which are dormant.

I should note the counterpoint: the hot-post problem only manifests at significant scale. If your busiest post receives 10 comments per minute, the serialization delay is negligible. The contention becomes measurable at roughly 30-50 concurrent inserts per second on the same parent row, which corresponds to applications with thousands of concurrent users and viral content dynamics. If your application does not have these characteristics — and most do not — the synchronous approaches serve you well.

Why do counter caches deadlock?

-- Why counter columns deadlock under concurrent load.
-- Two transactions, same post, opposite lock order:

-- Transaction A:                    -- Transaction B:
BEGIN;                               BEGIN;
INSERT INTO comments                 INSERT INTO comments
  (post_id, body)                      (post_id, body)
  VALUES (42, 'First');                VALUES (42, 'Second');
-- Both now try to UPDATE posts SET comments_count = comments_count + 1
-- WHERE id = 42. One succeeds, one waits.
-- With row-level locking, this serializes. Usually fine.

-- But with counter_culture multi-level counters:
-- A updates posts(42), then authors(7)
-- B updates posts(42), then authors(7)
-- If A holds lock on posts(42) and B holds lock on authors(7)...
-- Deadlock. PostgreSQL kills one transaction.

The deadlock rate of 0.28-0.41% sounds trivial until you do the arithmetic. At 10,000 comment inserts per hour — not unusual for a popular application — that is 28-41 deadlocked transactions per hour. Each deadlock kills a transaction, surfaces an error to the user or the background job, and requires a retry. At scale, this becomes operational noise that teams learn to ignore, which is precisely when it becomes dangerous.

How PostgreSQL handles deadlocks

-- How PostgreSQL detects and resolves deadlocks.
-- This is not instant — and the delay matters.

-- PostgreSQL checks for deadlocks every deadlock_timeout interval.
-- Default: 1 second. That means a deadlocked transaction waits
-- for UP TO 1 second before PostgreSQL notices and kills it.

SHOW deadlock_timeout;
-- "1s"

-- When a deadlock is detected, PostgreSQL:
-- 1. Picks the transaction that has done the least work (heuristic)
-- 2. Kills it with: ERROR: deadlock detected
-- 3. The surviving transaction proceeds
-- 4. The killed transaction must be retried by the application

-- In Rails, this surfaces as:
-- ActiveRecord::Deadlocked: PG::TRDeadlockDetected:
--   ERROR: deadlock detected
--   DETAIL: Process 14832 waits for ShareLock on transaction 9847;
--           blocked by process 14836.
--   Process 14836 waits for ShareLock on transaction 9845;
--           blocked by process 14832.

-- counter_culture does NOT automatically retry.
-- Your application must handle this. Most don't.
-- The counter is now wrong by 1. Silently.

The default deadlock_timeout of 1 second means that a deadlocked transaction may wait for a full second before PostgreSQL detects and resolves the situation. This is by design — deadlock detection requires analyzing the entire lock graph, and doing so on every lock wait would be prohibitively expensive. But it means that a deadlocked counter update adds 1 second of latency to the affected request before the error surfaces.

The more insidious consequence is silent drift. When PostgreSQL kills a deadlocked transaction, the counter update in that transaction is rolled back. But the comment insert that triggered it may be retried by the application — and if the retry succeeds (because the contention has passed), the comment exists but the counter was never incremented. counter_culture does not track or retry failed counter updates. The counter is now wrong by one, permanently, until the next reconciliation pass.

The trigger approach has the lowest deadlock rate because it updates a single parent row per child insert — no multi-level locking. counter_culture has the highest because multi-level counters touch multiple parent tables, increasing the surface area for lock ordering conflicts.

Matviews sidestep the issue entirely. The refresh operation is a bulk read-and-replace that holds locks on the matview, not on the source tables. Individual writes to comments never contend with the matview refresh for row-level locks.

The drift problem: every counter cache's eventual confession

# The dirty secret of counter caches: they drift.
# Every approach except triggers has edge cases where the count
# goes wrong. Bulk deletes, failed transactions, race conditions.

# Rails built-in — manual reconciliation:
Post.find_each do |post|
  Post.reset_counters(post.id, :comments)
end
# Duration on 100K posts: ~8 minutes. Locks each row briefly.

# counter_culture — batch reconciliation:
Comment.counter_culture_fix_counts
# Smarter: only updates rows where the count is actually wrong.
# Duration on 100K posts: ~45 seconds.

# Trigger approach — check for drift:
SELECT p.id, p.comments_count,
       (SELECT COUNT(*) FROM comments c WHERE c.post_id = p.id) AS actual
FROM posts p
WHERE p.comments_count != (
  SELECT COUNT(*) FROM comments c WHERE c.post_id = p.id
);
# If this returns rows, something bypassed the trigger.
# (Bulk COPY, direct SQL, trigger was disabled.)

Every counter cache approach except matviews can drift. The built-in counter_cache drifts when you use delete_all, update_all, or any bulk operation that bypasses callbacks. counter_culture drifts for the same reasons plus the occasional deadlock-killed transaction where the counter update was lost. Even triggers can drift if someone disables the trigger for a bulk load and forgets to reconcile afterward.

Drift is insidious because it is silent. The count is wrong, but nothing raises an alarm. The user sees "47 comments" but there are actually 49. Nobody notices until someone does, and then trust in the count erodes permanently. I have seen production applications where counter columns were 15-20% off from reality because reconciliation was never implemented and bulk operations accumulated drift over months.

Reconciliation strategies

If you use a synchronous counter cache — built-in, counter_culture, or triggers — reconciliation is not optional. It is a mandatory operational practice, like backups. You do not decide whether to reconcile; you decide how frequently.

# Reconciliation scheduling strategies.
# Each has different operational characteristics.

# Strategy 1: Nightly cron (most common)
# Run at 3am when traffic is lowest.
# Simple, predictable, but counts can be wrong for up to 24 hours.
0 3 * * * cd /app && rails runner "Comment.counter_culture_fix_counts"

# Strategy 2: Continuous background reconciliation
# A Sidekiq job that reconciles a batch of posts every minute.
class CounterReconciliationJob < ApplicationJob
  def perform
    # Check 1000 random posts per run
    Post.order("RANDOM()").limit(1000).find_each do |post|
      actual = post.comments.count
      if post.comments_count != actual
        post.update_column(:comments_count, actual)
        Rails.logger.warn "Counter drift: post #{post.id} " \
          "cached=#{post.comments_count} actual=#{actual}"
      end
    end
  end
end

# Strategy 3: Reconcile on read (lazy correction)
# When displaying a count, verify it. If wrong, fix it.
# Adds latency to every read. Not recommended for hot paths.
# But useful as a safety net for critical counts.
def verified_comments_count
  cached = comments_count
  actual = comments.count
  if cached != actual
    update_column(:comments_count, actual)
    actual
  else
    cached
  end
end

The nightly cron approach is the most common and the least satisfying. Counts can be wrong for up to 24 hours. For many applications, this is fine — the counter is a convenience, not a contractual obligation. For applications where the count drives business logic (payment calculations, usage limits, SLA reporting), 24 hours of drift is unacceptable.

The continuous reconciliation approach — checking a random sample of rows every minute — provides faster drift correction at the cost of ongoing background load. The key insight is that you do not need to check every row on every pass; you need to eventually check every row, and random sampling achieves this probabilistically.

The reconcile-on-read approach is the most paranoid and the most expensive. It adds a COUNT query to every read, which eliminates the performance benefit of having a counter cache in the first place. Use it only for counts where accuracy is more important than performance — billing counts, compliance counts, anything with financial or legal implications.

Matviews: staleness instead of drift

Matviews cannot drift. They recompute from source data every refresh. The count is either current (fresh refresh) or stale (pending refresh), but it is never wrong. This is a meaningful distinction: staleness is bounded and self-correcting, while drift is unbounded and requires manual intervention.

A matview that was refreshed 30 seconds ago shows counts from 30 seconds ago. A counter column that drifted 30 days ago shows counts that have been wrong for 30 days and will remain wrong until someone notices and reconciles. The matview's staleness resolves automatically on the next refresh. The counter column's drift persists indefinitely.

This distinction — staleness vs drift — is the single most important insight in the counter cache conversation, and it is the one most commonly overlooked. Teams spend weeks debugging counter drift and implementing reconciliation scripts when they could have used a matview with bounded staleness and no drift at all.

So which approach should you use?

It depends — and I dislike that answer as much as you do, so here is something more prescriptive.

# Decision matrix — which approach for which situation.
# (Honest version. No hedge-everything-with-"it-depends.")

# Use built-in counter_cache when:
#   - Single association, no conditions
#   - All writes go through ActiveRecord
#   - You don't bulk-delete
#   - Accuracy within 99% is acceptable
#   - You want zero additional dependencies

# Use counter_culture when:
#   - You need conditional counts (approved, published, etc.)
#   - You need multi-level counts (author -> posts -> comments)
#   - You're willing to run nightly reconciliation
#   - All writes go through ActiveRecord (same limitation)

# Use PostgreSQL triggers when:
#   - Multiple applications or scripts write to the same database
#   - You use bulk operations (COPY, delete_all, raw SQL)
#   - You need 100% accuracy without reconciliation
#   - Your team is comfortable reading and maintaining PL/pgSQL

# Use materialized views when:
#   - You need complex aggregations (not just simple counts)
#   - Bounded staleness (seconds, not minutes) is acceptable
#   - You want zero write-path overhead
#   - You need engagement metrics, not just counts
#   - You're tired of debugging counter drift at 2am

Use Rails built-in counter_cache when you have a simple belongs_to association, you never bulk-delete, and you want zero external dependencies. It is the right default for small to medium applications where counting is a convenience, not a critical feature. Tens of thousands of Rails applications use it successfully and will never outgrow it.

Use counter_culture when you need conditional counts, multi-level associations, or automatic reconciliation. Accept that you are trading simplicity for capability and budget for the occasional deadlock under heavy concurrent writes. If you choose counter_culture, schedule reconciliation from day one — do not wait for drift to become visible.

Use PostgreSQL triggers when you need absolute accuracy across all write paths — ORM, raw SQL, bulk imports, migrations. This is the right choice when the database is the source of truth and multiple applications or scripts write to the same tables. The Evil Martians pattern is battle-tested and the performance characteristics are excellent. The visibility trade-off is real, but it is manageable with proper documentation and the discipline to treat migrations as first-class code.

Use materialized views when you need complex aggregations (not just simple counts), when you want zero write-path overhead, and when your application can tolerate bounded staleness. Matviews shine for dashboards, admin panels, and listing pages where "accurate as of 30 seconds ago" is indistinguishable from "accurate right now."

The honest counterpoint: when counter caches are overkill

I should be forthcoming about the cases where none of these approaches are necessary, because recommending complexity where simplicity suffices would be a disservice.

If your largest table has fewer than 100,000 rows, raw COUNT(*) with a proper index completes in under 2ms. The counter cache saves you 1.5ms per query. At 100 queries per day, you are saving 150ms total — per day. The operational cost of maintaining a counter cache (reconciliation, debugging drift, handling deadlocks, migrating the column) vastly exceeds the performance benefit. Use COUNT(*). It is the right answer.

If your counts are only displayed on admin pages with low traffic, raw counting is likely fine regardless of table size. The 12ms query that is unacceptable at 40,000 calls per day is perfectly reasonable at 400 calls per day.

If your counts do not need to be real-time — if "as of this morning" is acceptable — a nightly batch job that writes counts to a stats table is simpler than any of the four approaches discussed here and has zero runtime overhead.

The counter cache is a tool for a specific problem: high-frequency reads of frequently-changing counts on large tables. If your problem does not match that description, the tool is not for you.

Write-aware refresh: the matview approach without the stale data

-- Gold Lapel's approach to matview-based counting.
-- No cron job. No manual REFRESH. No stale-data guessing game.

-- 1. You write the matview (or GL suggests one):
CREATE MATERIALIZED VIEW mv_post_comment_counts AS
SELECT post_id, COUNT(*) AS total_comments,
       COUNT(*) FILTER (WHERE approved) AS approved_comments
FROM comments GROUP BY post_id;

-- 2. Your app queries it:
SELECT total_comments FROM mv_post_comment_counts
WHERE post_id = 42;

-- 3. GL observes: this matview is queried AND comments table
--    has been written to. Time to refresh.
-- Only refreshes when BOTH conditions are true:
--   a) The matview has been read since last refresh
--   b) The underlying table has been written to since last refresh
-- No reads + writes? No refresh. (Why recompute what nobody asks for?)
-- Reads + no writes? No refresh. (Data hasn't changed.)

The matview approach's weakness is the staleness problem. The proxy-managed variant addresses this by making refresh reactive rather than scheduled.

Gold Lapel sits between your Rails application and PostgreSQL. It observes query traffic in both directions. When it detects that a matview has been read AND its source table has been written to, it triggers a refresh. Two conditions, both required:

  • The matview has been queried since the last refresh. If nobody is reading the counts, there is no reason to refresh them.
  • The source table has been written to since the last refresh. If no comments were added, the counts have not changed.

This eliminates the two failure modes of scheduled refresh: wasted CPU (refreshing when nothing changed) and excessive staleness (not refreshing when data changed between scheduled intervals). The result is fresher counts with less total refresh work — no cron jobs, no dependency orchestration, no reconciliation scripts.

The benchmark numbers tell the story. Same 0.3ms read latency as a scheduled matview. Same 0% deadlock rate. But accuracy jumps from ~95% (60-second scheduled) to ~99% (write-triggered), because the refresh happens in response to the writes that invalidated the data, not on an arbitrary schedule that may or may not align with write activity.

The remaining ~1% inaccuracy reflects the latency between the write and the completed refresh — the refresh is triggered reactively but still takes time to execute. For the 5M-comment benchmark dataset, a REFRESH MATERIALIZED VIEW CONCURRENTLY takes ~8.7 seconds, during which new writes can occur. This is an inherent property of bulk recomputation. For applications that require absolute real-time accuracy — "the user posted a comment and the count must reflect it on the next page load" — the trigger approach remains the correct choice. For applications where "accurate within a few seconds" is sufficient, the proxy-managed matview provides that accuracy with dramatically less operational overhead.

A brief inventory of what we have learned

Four approaches. Same question. Different philosophies about who maintains the answer and when.

The ORM-based approaches (built-in, counter_culture) are the easiest to implement and the hardest to keep accurate. They drift silently, deadlock occasionally, and require reconciliation as an ongoing operational practice. They are the right choice when simplicity is more valuable than precision and when all writes flow through the ORM.

The trigger approach is the most accurate synchronous option — 100% accuracy under all write paths, lowest deadlock rate, lowest p99 latency. The cost is visibility: the counting logic lives in the database, not the application. For teams that embrace PostgreSQL as a capable computing platform rather than a passive data store, triggers are the gold standard for counter caches.

The matview approach abandons the counter paradigm entirely in favor of periodic recomputation. No per-write overhead. No deadlocks. No drift. The trade-off is bounded staleness, which a proxy-managed refresh can narrow to seconds. For complex aggregations — unique counts, time-windowed metrics, engagement scoring — matviews are not merely the best option; they are the only practical option.

If you have arrived here with a COUNT(*) query that runs 40,000 times per day, you now have four routes to the same destination: fewer CPU cycles spent answering a question whose answer changes rarely. The choice between them is determined not by which is "best" — a question I find rather unhelpful — but by which trade-offs your application can most comfortably absorb.

Counting is simple. Counting fast, accurately, and without deadlocks under concurrent load on a table with millions of rows — that is a matter requiring proper consideration. I trust you now have the numbers to make an informed decision.

Frequently asked questions

Terms referenced in this article

The materialized view approach in this benchmark deserves a longer conversation — particularly the refresh strategy and its pitfalls. I should mention that I have written at some length on the problems the documentation neglected to mention, which may spare you a midnight page or two.