Polymorphic Associations in Rails + PostgreSQL: Fixing Slow Queries Without the Full Refactor
Your commentable_type column is a varchar storing four distinct values across four million rows. There are faster arrangements.
Good afternoon. Someone has told you to refactor your polymorphic associations.
They are not wrong, exactly. Polymorphic associations trade schema integrity for developer convenience, and the full accounting of that trade — which I have addressed in a separate guide on polymorphic association performance — is genuinely unfavorable at scale. No foreign key constraints. No JOINs. A string column doing the work of a proper relational reference. The relational model has opinions about this arrangement, and none of them are favorable.
But you are not here for the full accounting. You are here because your polymorphic queries are slow, your EXPLAIN output is distressing, and the suggestion to "just refactor to join tables" does not acknowledge the reality of your Tuesday afternoon.
Allow me to be specific about what I mean by "the reality of your Tuesday afternoon." You have ActionText storing rich text in a polymorphic table you do not control. You have pg_search writing multisearch documents with searchable_type and searchable_id. You have Active Storage attachments using record_type and record_id on every uploaded file. You have Devise's Confirmable and Lockable modules touching polymorphic-like patterns in authentication tables. You have 47 models across 3 Rails engines, and the suggestion to decompose them into explicit per-type associations would take a quarter, not a sprint.
GitLab documented a formal ban on new polymorphic associations in their codebase. They were right to do so. They also have 200 engineers and a multi-year database infrastructure roadmap. You may not. The gap between "this pattern is architecturally wrong" and "I can fix this by Friday" is precisely the territory this article covers.
This is about what you can do right now, without changing your schema, without modifying your models, and without a migration that touches every row. Index strategies, query rewrites, planner considerations, and proxy-level optimizations that make polymorphic associations perform as if they had been properly designed — even though they were not.
# The pattern you know well:
class Comment < ApplicationRecord
belongs_to :commentable, polymorphic: true
end
class Post < ApplicationRecord
has_many :comments, as: :commentable
end
class Image < ApplicationRecord
has_many :comments, as: :commentable
end -- What Rails generates:
CREATE TABLE comments (
id bigserial PRIMARY KEY,
body text NOT NULL,
commentable_type varchar NOT NULL, -- 'Post', 'Image', 'Video'
commentable_id bigint NOT NULL,
created_at timestamp NOT NULL,
updated_at timestamp NOT NULL
);
-- And the default index:
CREATE INDEX index_comments_on_commentable
ON comments (commentable_type, commentable_id); Two columns. One stores a Ruby class name as a string. The other stores a foreign key that PostgreSQL cannot verify, because the target table depends on the string. This is the arrangement we are working with. I do not approve of it. But I shall make it perform admirably nonetheless.
The string-typed column problem
The commentable_type column stores Ruby class names as variable-length strings. 'Post'. 'Image'. 'ActionText::RichText'. On a table with four million rows, this column consumes storage wildly disproportionate to its information content.
-- Check the actual storage cost of commentable_type:
SELECT avg(pg_column_size(commentable_type)) AS avg_bytes,
count(DISTINCT commentable_type) AS distinct_values,
pg_size_pretty(
sum(pg_column_size(commentable_type))
) AS total_type_column_size
FROM comments;
-- avg_bytes | distinct_values | total_type_column_size
-- -----------+-----------------+------------------------
-- 18.4 | 4 | 73 MB
--
-- 73MB of storage for a column with 4 distinct values.
-- Every index that includes commentable_type carries this weight.
-- GitLab documented this exact problem in their database guidelines:
-- storing Ruby class names as varchar in a column queried millions
-- of times per day. 73MB of storage for a column with four distinct values. That is 18.3 bytes per row, on average, to encode a choice between four options. An integer would do this in 4 bytes. A PostgreSQL enum in 4 bytes. A boolean pair in 2 bytes. The varchar column is using roughly 4.5 times the storage necessary for the information it carries.
But the storage cost of the column itself is secondary to its effect on indexes. Every composite index that includes commentable_type carries the full varchar for every row. The default index on (commentable_type, commentable_id) stores 18.3 bytes of type string plus 8 bytes of bigint ID per entry. That is 26.3 bytes per index entry where 12 bytes (integer type + bigint ID) would suffice. On four million rows, the difference is roughly 57MB of index bloat — pages the buffer cache must hold, pages the operating system's page cache must manage, pages that compete for memory with your actual query data.
-- Break it down per distinct value:
SELECT commentable_type,
count(*) AS row_count,
pg_column_size(commentable_type) AS bytes_per_value,
pg_size_pretty(count(*) * pg_column_size(commentable_type)) AS total
FROM comments
GROUP BY commentable_type
ORDER BY count(*) DESC;
-- commentable_type | row_count | bytes_per_value | total
-- ------------------------+-----------+-----------------+-------
-- Post | 2,400,000 | 5 | 11 MB
-- Image | 1,000,000 | 6 | 5.7 MB
-- Video | 600,000 | 6 | 3.4 MB
-- ActionText::RichText | 40,000 | 24 | 916 kB
--
-- The varchar overhead is 1 byte for strings <= 126 bytes,
-- plus the string length itself. 'Post' = 5 bytes per row.
-- 'ActionText::RichText' = 24 bytes per row.
-- Multiply by every index that includes this column. The distribution matters. 'Post' is 5 bytes. 'ActionText::RichText' is 24 bytes. If your polymorphic table stores rows from gems with verbose class names — and many Rails gems use namespaced class names — the per-entry overhead in the index grows accordingly. I have reviewed production tables where the average commentable_type exceeded 30 bytes because the application included both ActionText and Active Storage rows in a shared polymorphic pattern.
GitLab documented this exact problem in their database development guidelines. Their polymorphic tables stored Ruby class names like MergeRequest, Issue, and Vulnerability — strings queried millions of times per day, replicated across every index that includes the type column.
Rails does not offer a built-in mechanism to change this. The polymorphic: true declaration assumes a string type column. Changing the column to an integer requires intercepting ActiveRecord's type resolution — possible with a custom polymorphic_name implementation in Rails 6.1+, but rarely done in practice because it breaks compatibility with gems that assume string types. The store_base_sti_class configuration (added in Rails 6.1) controls whether the base class or the subclass name is stored, but does not change the column type itself.
I should be candid: for most applications with fewer than a million polymorphic rows, the varchar overhead is noticeable but not urgent. The 73MB figure becomes meaningful when the table reaches the millions, when the indexes no longer fit in the buffer cache, and when every cache miss translates to disk I/O. If your polymorphic table has 50,000 rows, the varchar cost is measured in kilobytes, not megabytes. Direct your attention to the query patterns first; the storage overhead will wait.
Why the default composite index serves fewer queries than you think
Rails generates a composite index on (commentable_type, commentable_id). This index serves exactly one query shape efficiently: lookups that filter by both type and ID. Any query that omits the type column — or leads with the ID — cannot use this index.
-- The default composite index: (commentable_type, commentable_id)
-- Works for this:
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM comments
WHERE commentable_type = 'Post'
AND commentable_id = 42;
Index Scan using index_comments_on_commentable on comments
(cost=0.56..8.58 rows=1 width=112)
(actual time=0.028..0.033 rows=8 loops=1)
Index Cond: ((commentable_type = 'Post') AND (commentable_id = 42))
Buffers: shared hit=4
-- But the column order matters. Type is first.
-- This query CANNOT use the index efficiently:
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM comments
WHERE commentable_id = 42;
Seq Scan on comments (cost=0.00..48712.00 rows=24 width=112)
(actual time=0.041..187.293 rows=8 loops=1)
Filter: (commentable_id = 42)
Rows Removed by Filter: 2399992
Buffers: shared hit=23712
-- 187ms sequential scan. The index exists. PostgreSQL cannot use it. The column order in a composite index determines which queries it serves. A B-tree on (type, id) is sorted first by type, then by id within each type. Think of it as a phone book organized first by city, then by surname. If you know both the city and the surname, you find the entry immediately. If you know only the surname, you would need to check every city's section — which is precisely what PostgreSQL declines to do, choosing a sequential scan instead.
187 milliseconds. 23,712 buffer hits. The index exists. PostgreSQL knows it exists. It simply cannot use it for a query that does not provide the leading column.
This is the first optimization opportunity: if your application ever queries by commentable_id without specifying the type — finding all associations for a given record across types, which is common in activity feeds and admin dashboards — the default index is entirely useless for that query.
-- Default Rails index:
CREATE INDEX index_comments_on_commentable
ON comments (commentable_type, commentable_id);
-- Reordered: id first, type second
CREATE INDEX index_comments_on_commentable_reversed
ON comments (commentable_id, commentable_type);
-- Why? If you ever query by commentable_id alone
-- (finding all associations for ID 42 across types),
-- the reordered index covers it. The original does not.
-- Keep both if you query both ways. Drop the original
-- only if you ALWAYS filter by type. The reordered index on (commentable_id, commentable_type) serves the opposite query shape: lookups by ID, optionally filtered by type. It does not replace the original — it complements it. If your application queries both ways, you may need both indexes. If it queries exclusively by type-and-ID (which is the common case for .includes(:commentable)), the original suffices. Examine your query patterns in pg_stat_statements before deciding.
I should note: adding a second composite index on the same pair of columns (in reversed order) doubles the write overhead for that pair. Every INSERT and UPDATE must maintain both indexes. For write-heavy polymorphic tables — activity logs, event streams, audit trails — this cost is non-trivial. Partial indexes, which I shall address shortly, offer a more surgical alternative.
The planner's statistical blind spot
PostgreSQL's query planner makes cost estimates based on table statistics stored in pg_statistic. For a polymorphic table, these statistics are calculated across all rows, regardless of type. This creates a structural problem: the distribution of commentable_id values is meaningfully different for each type, but the planner sees only the blended average.
-- The planner's challenge with polymorphic tables:
-- commentable_id has wildly different distributions per type.
SELECT commentable_type,
count(DISTINCT commentable_id) AS distinct_ids,
count(*) AS total_rows,
round(count(*)::numeric / count(DISTINCT commentable_id), 1)
AS avg_comments_per_parent
FROM comments
GROUP BY commentable_type;
-- commentable_type | distinct_ids | total_rows | avg_comments_per_parent
-- ------------------+--------------+------------+-------------------------
-- Post | 200,000 | 2,400,000 | 12.0
-- Image | 800,000 | 1,000,000 | 1.3
-- Video | 100,000 | 600,000 | 6.0
-- Posts average 12 comments. Images average 1.3.
-- But pg_statistic stores ONE n_distinct for commentable_id
-- across ALL types. The planner sees a blended average
-- and estimates poorly for every individual type.
-- Check what the planner thinks:
SELECT tablename, attname, n_distinct
FROM pg_stats
WHERE tablename = 'comments'
AND attname = 'commentable_id';
-- tablename | attname | n_distinct
-- -----------+-----------------+------------
-- comments | commentable_id | -0.275
--
-- The planner estimates 27.5% distinct values across all types.
-- For Image rows (80% distinct), this is a severe underestimate.
-- For Post rows (8.3% distinct), it is a mild overestimate. Posts average 12 comments per parent record. Images average 1.3. The planner, working from blended statistics, estimates something in between for every query — overestimating row counts for Image lookups and underestimating for Video lookups. These misestimates propagate through join planning, sort decisions, and memory allocation.
The practical consequence: PostgreSQL may choose a hash join where a nested loop would be faster, or allocate too much (or too little) work memory for a sort. On small tables, the misestimates are harmless — the planner's choices are close enough. On large tables with heavily skewed type distributions, the wrong join strategy can mean the difference between 2ms and 200ms.
Partial indexes address this indirectly. When PostgreSQL uses a partial index filtered to WHERE commentable_type = 'Post', the index's own statistics reflect only Post rows. The planner sees the correct n_distinct for commentable_id within that type, leading to more accurate row estimates for downstream operations. This is one of the less-discussed benefits of partial indexes on polymorphic tables — they do not merely reduce index size, they improve the planner's judgment.
You can also improve statistics directly by increasing the statistics target for the commentable_id column:
ALTER TABLE comments ALTER COLUMN commentable_id SET STATISTICS 1000;
The default is 100. Increasing it to 1000 tells ANALYZE to sample more values, producing a more detailed histogram that better represents the per-type distribution. This is not a substitute for partial indexes, but it helps the planner make better decisions even without them.
EagerLoadPolymorphicError and the raw SQL escape hatch
ActiveRecord raises EagerLoadPolymorphicError when you call .joins(:commentable) on a polymorphic association. This is correct behavior — a SQL JOIN requires a single target table, and a polymorphic association has several. Rails refuses to guess which table you mean, which is the responsible thing to do. The irresponsible thing was allowing the polymorphic association in the first place, but that decision was made long ago.
The workarounds are uniformly unpleasant.
# The query Rails WANTS to write:
Comment.joins(:commentable).where(posts: { published: true })
# => ActiveRecord::EagerLoadPolymorphicError:
# Cannot eagerly load the polymorphic association :commentable
# What you are forced to write instead:
post_comments = Comment
.where(commentable_type: 'Post')
.where(commentable_id: Post.where(published: true).select(:id))
image_comments = Comment
.where(commentable_type: 'Image')
.where(commentable_id: Image.where(published: true).select(:id))
# Or the raw SQL escape hatch:
Comment.find_by_sql(<<~SQL)
SELECT c.*
FROM comments c
LEFT JOIN posts p
ON c.commentable_type = 'Post' AND c.commentable_id = p.id
LEFT JOIN images i
ON c.commentable_type = 'Image' AND c.commentable_id = i.id
WHERE COALESCE(p.published, i.published) = true
ORDER BY c.created_at DESC
LIMIT 50
SQL The first workaround — separate queries per type — is what .includes(:commentable) already does internally. It fires one query per type: SELECT * FROM posts WHERE id IN (...), then SELECT * FROM images WHERE id IN (...), and so on. This works but multiplies round trips. For a page displaying comments from five different type tables, you pay five additional queries beyond the initial comment fetch. Each query is individually fast, but the accumulated latency — network round trips, connection acquisition, query parsing — adds up.
The second workaround — raw SQL with LEFT JOINs to every type table — produces a single query. One round trip. But examine the plan:
-- The raw SQL approach works, but examine the plan:
EXPLAIN (ANALYZE, BUFFERS)
SELECT c.*
FROM comments c
LEFT JOIN posts p
ON c.commentable_type = 'Post' AND c.commentable_id = p.id
LEFT JOIN images i
ON c.commentable_type = 'Image' AND c.commentable_id = i.id
WHERE COALESCE(p.published, i.published) = true
ORDER BY c.created_at DESC
LIMIT 50;
Sort (cost=94218.47..94343.47 rows=50000 width=112)
Sort Key: c.created_at DESC
Sort Method: top-N heapsort Memory: 89kB
-> Hash Left Join (cost=12841.00..78142.00 rows=50000 width=112)
Hash Cond: ((c.commentable_id = i.id)
AND (c.commentable_type = 'Image'))
Filter: (COALESCE(p.published, i.published) = true)
Rows Removed by Filter: 384216
-> Hash Left Join (cost=8412.00..52841.00 rows=2400000 width=120)
Hash Cond: ((c.commentable_id = p.id)
AND (c.commentable_type = 'Post'))
-> Seq Scan on comments c (rows=2400000)
-> Hash (rows=200000)
-> Seq Scan on posts p (rows=200000)
-> Hash (rows=150000)
-> Seq Scan on images i (rows=150000)
Buffers: shared hit=41284
Planning Time: 1.842 ms
Execution Time: 412.891 ms
-- 412ms. Three sequential scans. Two hash joins.
-- And this is the WARM cache result. 412 milliseconds. Three sequential scans. Two hash joins. The planner builds a hash table for the entire posts table and another for the entire images table, then probes them against every row in comments. The COALESCE filter runs after the joins, discarding 384,216 rows that were joined only to be thrown away.
The LEFT JOIN approach scales linearly with the number of types: add a fifth type and you add another sequential scan and another hash join. At ten types, this query becomes impractical. I have observed this exact pattern in production Rails applications with activity feeds spanning 8-12 commentable types, where the LEFT JOIN query exceeded 2 seconds on warm cache.
I should acknowledge that for small tables — under 100,000 rows, with 2-3 types — the LEFT JOIN approach is perfectly adequate. 412 milliseconds becomes 4 milliseconds when the table fits comfortably in shared buffers and the hash tables are small. The techniques in this article are for the point where "perfectly adequate" stops being adequate. You will know when you arrive.
Six optimization strategies that do not require a schema refactor
The following strategies can be applied independently or in combination. None requires changing your models, your associations, or your schema. Several require only a migration that adds an index. One requires a query rewrite. All of them work with the polymorphic table exactly as it is.
| Strategy | Effort | Query change | Performance gain | Best when |
|---|---|---|---|---|
| Reorder composite index columns | One migration | None | Covers queries filtering by ID alone | You query by commentable_id without type |
| Partial indexes per type | One migration per type | None | Smaller indexes, sort elimination, skip type comparison | Most queries filter by type (the common case) |
| Expression index on type column | One migration | Must use same expression in queries | 30-40% smaller index from integer vs varchar | Raw SQL or proxy-rewritten queries |
| UNION ALL instead of LEFT JOIN | Query rewrite | Significant — split into per-type branches | 100-200x on cross-type queries with JOINs | Activity feeds, dashboards, cross-type reports |
| Covering partial indexes | One migration per type | None | Index-only scans (no heap access) | High-frequency queries returning few columns |
| Full schema refactor | Multi-phase migration | Entire data layer | FK constraints, real JOINs, clean planner stats | Greenfield or major version — not a Friday afternoon fix |
The full schema refactor is included for completeness. It is the right long-term answer for many applications — decomposing the polymorphic table into per-type join tables with proper foreign keys, real JOINs, and clean planner statistics. It is also a multi-phase migration that touches every row in the table, every model that references the association, and every query that filters by type. It is not the answer for this afternoon. The other five are.
I recommend reading the strategies in order. Each builds on the concepts introduced by the previous one, and the combination of partial indexes with UNION ALL (strategies 2 and 4) produces the most dramatic improvement for the most common polymorphic query patterns.
Partial indexes: one per type, each surgically targeted
This is the highest-impact, lowest-effort optimization for polymorphic associations. Instead of one large composite index spanning all types, create a partial index for each type value. Each index contains only the rows matching that type, includes the columns your queries actually need, and skips the type comparison entirely at query time.
-- Partial indexes filtered by type value.
-- One small, targeted index per type.
CREATE INDEX idx_comments_post ON comments (commentable_id, created_at)
WHERE commentable_type = 'Post';
CREATE INDEX idx_comments_image ON comments (commentable_id, created_at)
WHERE commentable_type = 'Image';
CREATE INDEX idx_comments_video ON comments (commentable_id, created_at)
WHERE commentable_type = 'Video';
-- Each index contains ONLY rows of that type.
-- On a 4M row table (2.4M Post, 1M Image, 600K Video):
-- Full composite index: 128MB (all 4M rows)
-- idx_comments_post: 62MB (2.4M rows)
-- idx_comments_image: 26MB (1M rows)
-- idx_comments_video: 16MB (600K rows)
-- Combined partial indexes: 104MB (same 4M rows)
-- 19% smaller total. But the real gain is query speed: -- Before: full composite index
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM comments
WHERE commentable_type = 'Post'
AND commentable_id = 42
ORDER BY created_at DESC;
Index Scan using index_comments_on_commentable on comments
(cost=0.56..8.58 rows=8 width=112)
(actual time=0.031..0.039 rows=8 loops=1)
Index Cond: ((commentable_type = 'Post') AND (commentable_id = 42))
Sort: created_at DESC -- <-- extra sort step
Buffers: shared hit=4
-- After: partial index with created_at included
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM comments
WHERE commentable_type = 'Post'
AND commentable_id = 42
ORDER BY created_at DESC;
Index Scan Backward using idx_comments_post on comments
(cost=0.42..4.12 rows=8 width=112)
(actual time=0.008..0.013 rows=8 loops=1)
Index Cond: (commentable_id = 42)
Buffers: shared hit=3
-- No sort step. Smaller index scan. 3 buffer hits vs 4.
-- The partial index already filtered out non-Post rows,
-- so PostgreSQL skips the commentable_type comparison entirely. Three specific improvements in the partial index plan:
- No type comparison. The
WHERE commentable_type = 'Post'predicate matches the index's filter clause. PostgreSQL does not need to evaluate it row-by-row — the index already contains only Post rows. This is not merely an optimization; it is a categorical elimination of work. - Sort elimination. The partial index includes
created_atas the second column. Queries withORDER BY created_at DESCuse an index scan backward instead of a separate sort step. For activity feeds and comment lists — which almost universally sort by creation time — this eliminates the most expensive operation after the scan itself. - Smaller index. Each partial index contains a fraction of the rows. Fewer pages to read, fewer buffer hits, faster scans. The Post index has 62MB instead of 128MB. That is 66MB of buffer cache freed for other work.
For tables where one type dominates — 60% Post, 25% Image, 10% Video, 5% ActionText — the partial index for Post alone eliminates 40% of the index's row count compared to the full composite, while serving the majority of queries. On highly skewed distributions, the savings are even more dramatic. I reviewed a production table where 92% of rows were of one type; the partial index for that type was actually smaller than the full composite index despite covering nearly all the rows, because it omitted the 18-byte type string from every entry.
The migration is straightforward, and I recommend building the indexes concurrently to avoid locking the table:
# The migration — one partial index per type:
class AddPartialIndexesOnComments < ActiveRecord::Migration[7.1]
disable_ddl_transaction! # Required for CONCURRENTLY
def change
add_index :comments, [:commentable_id, :created_at],
where: "commentable_type = 'Post'",
name: 'idx_comments_post',
algorithm: :concurrently
add_index :comments, [:commentable_id, :created_at],
where: "commentable_type = 'Image'",
name: 'idx_comments_image',
algorithm: :concurrently
add_index :comments, [:commentable_id, :created_at],
where: "commentable_type = 'Video'",
name: 'idx_comments_video',
algorithm: :concurrently
end
end
# CONCURRENTLY means no table lock. Existing queries
# continue uninterrupted. The index builds in the background.
# On a 4M row table, expect 30-90 seconds per index
# depending on your hardware and write load. One detail that catches teams off guard: partial indexes do not automatically replace the original composite index in the planner's consideration set. After adding partial indexes, monitor pg_stat_user_indexes to confirm they are being used. If idx_scan remains at zero, the query's WHERE clause may not match the index's filter condition exactly. The match must be literal — WHERE commentable_type = 'Post' matches, but WHERE commentable_type IN ('Post') may not, depending on the PostgreSQL version.
Expression indexes: compressing the type column in the index
If partial indexes are the surgical approach, expression indexes are the compression approach. Instead of splitting into separate indexes per type, you create one index that maps the varchar type column to an integer expression — storing 4 bytes per entry instead of 5 to 24.
-- Problem: commentable_type stores Ruby class names as varchar.
-- 'Post', 'Photo', 'Image', 'Video', 'ActionText::RichText'
-- These strings repeat millions of times. The index is bloated.
-- Solution: Expression index that normalizes the type
CREATE INDEX idx_comments_type_id ON comments (
(CASE commentable_type
WHEN 'Post' THEN 1
WHEN 'Image' THEN 2
WHEN 'Video' THEN 3
WHEN 'ActionText::RichText' THEN 4
ELSE 0
END),
commentable_id
);
-- The index stores integers instead of variable-length strings.
-- On a 4M row table:
-- Original index (commentable_type, commentable_id): 128MB
-- Expression index with integer mapping: 86MB
-- 33% smaller. Same query performance.
-- But you must use the same expression in your queries:
-- WHERE (CASE commentable_type WHEN 'Post' THEN 1 ... END) = 1
-- Not practical with ActiveRecord. Useful with raw SQL or a proxy. 33% smaller. Same query performance — but only if your queries use the same CASE expression in their WHERE clause. This is the fundamental limitation: PostgreSQL can only use an expression index when the query contains the exact same expression.
# Making the expression index work with ActiveRecord:
# Define a scope that uses the same CASE expression.
class Comment < ApplicationRecord
COMMENTABLE_TYPE_MAP = {
'Post' => 1,
'Image' => 2,
'Video' => 3,
'ActionText::RichText' => 4
}.freeze
scope :by_type_expr, ->(type_name) {
type_int = COMMENTABLE_TYPE_MAP[type_name]
where(
Arel.sql(<<~SQL)
(CASE commentable_type
WHEN 'Post' THEN 1
WHEN 'Image' THEN 2
WHEN 'Video' THEN 3
WHEN 'ActionText::RichText' THEN 4
ELSE 0
END) = #{type_int}
SQL
)
}
end
# Usage:
Comment.by_type_expr('Post').where(commentable_id: 42)
# This hits the expression index. But it is fragile:
# - The CASE expression must match the index exactly
# - Adding a new type requires updating both the index
# and the scope
# - Other developers will not understand why this scope
# exists instead of .where(commentable_type: 'Post')
#
# For most teams, partial indexes are the better choice. I should be direct about the trade-offs. Expression indexes on polymorphic type columns are clever, and cleverness in database optimization is a word I use with caution. They require that every developer on the team understands why the scope uses a CASE expression instead of a simple WHERE clause. They require updating both the index and the scope when new types are added. And they offer a 33% index size reduction that partial indexes can often match or exceed through row filtering alone.
For teams using raw SQL or a query proxy that can rewrite expressions transparently, expression indexes are a clean win. For teams working primarily through ActiveRecord, partial indexes are nearly always the better choice — they require no query changes, no custom scopes, and no coordination between index definition and query construction. I recommend expression indexes only when you have exhausted the benefits of partial indexes and still need to reduce index size further.
UNION ALL: the cross-type query that actually scales
The raw SQL LEFT JOIN approach scans every type table regardless of whether it has matching rows. UNION ALL inverts this: query each type separately, then merge the results. Each branch uses its own partial index. The query planner optimizes each branch independently.
-- The UNION ALL pattern for cross-type queries:
-- Instead of one query with LEFT JOINs to every type table,
-- query each type separately and combine.
EXPLAIN (ANALYZE, BUFFERS)
SELECT c.id, c.body, c.commentable_type, c.commentable_id,
c.created_at, p.title AS parent_title
FROM comments c
JOIN posts p ON c.commentable_id = p.id
WHERE c.commentable_type = 'Post'
AND p.published = true
UNION ALL
SELECT c.id, c.body, c.commentable_type, c.commentable_id,
c.created_at, i.title AS parent_title
FROM comments c
JOIN images i ON c.commentable_id = i.id
WHERE c.commentable_type = 'Image'
AND i.published = true
ORDER BY created_at DESC
LIMIT 50;
-- With partial indexes on each type:
Limit (cost=1.28..142.84 rows=50 width=148)
-> Merge Append (cost=1.28..6842.16 rows=2414 width=148)
Sort Key: c.created_at DESC
-> Nested Loop (cost=0.85..4218.41 rows=1847 width=148)
-> Index Scan Backward using idx_comments_post
on comments c (rows=2400000)
Index Cond: (commentable_type = 'Post')
-> Index Scan using posts_pkey on posts p (rows=1)
Filter: published
-> Nested Loop (cost=0.85..2612.48 rows=567 width=148)
-> Index Scan Backward using idx_comments_image
on comments c_1 (rows=1000000)
-> Index Scan using images_pkey on images i (rows=1)
Filter: published
Buffers: shared hit=214
Planning Time: 0.891 ms
Execution Time: 2.147 ms
-- 2.1ms vs 412ms for the LEFT JOIN approach. 197x faster.
-- Each branch uses its own partial index. Merge Append
-- combines the pre-sorted results efficiently. 2.1 milliseconds versus 412 milliseconds for the LEFT JOIN approach. 197 times faster. The improvement comes from three sources:
- Each branch uses its partial index. No sequential scans. No hash joins. Nested loop joins with index lookups — the most efficient join strategy when each probe returns a small number of rows.
- Merge Append combines pre-sorted results. Because each partial index includes
created_at, each branch returns results already in order. The Merge Append step merges the sorted streams without re-sorting. This is analogous to the merge step in merge sort — each input is pre-sorted, so the merge is O(n) rather than O(n log n). - Early termination with LIMIT. The outer LIMIT 50 propagates into the Merge Append. PostgreSQL stops reading from each branch once it has enough rows to satisfy the limit. For a feed displaying the 50 most recent comments, PostgreSQL may only read 30 Post entries and 20 Image entries before stopping — even though the indexes contain millions of rows.
The UNION ALL pattern is more verbose than a single query. It requires one branch per type. In ActiveRecord, this means either raw SQL or a helper that generates the UNION:
# A reusable UNION ALL helper for ActiveRecord:
module PolymorphicUnion
def self.comments_feed(types_with_tables, limit: 50)
branches = types_with_tables.map do |type, table, join_col|
<<~SQL
SELECT c.id, c.body, c.commentable_type,
c.commentable_id, c.created_at,
parent.title AS parent_title
FROM comments c
JOIN #{table} parent ON c.commentable_id = parent.id
WHERE c.commentable_type = '#{type}'
AND parent.published = true
SQL
end
sql = branches.join(" UNION ALL ") +
" ORDER BY created_at DESC LIMIT #{limit}"
Comment.find_by_sql(sql)
end
end
# Usage:
PolymorphicUnion.comments_feed([
['Post', 'posts', 'id'],
['Image', 'images', 'id'],
['Video', 'videos', 'id'],
])
# Each branch hits its partial index. The planner
# optimizes each branch independently. Merge Append
# combines the sorted results without re-sorting. Several gems provide syntactic sugar for this pattern — notably active_record_union and active_record_extended. But the raw SQL helper above has one advantage: it is transparent. Every developer can read the generated SQL, understand the execution plan, and debug performance issues without understanding a gem's DSL.
A word on when UNION ALL is not the answer. If your cross-type query does not include JOINs to the parent tables — if it is simply SELECT * FROM comments ORDER BY created_at DESC LIMIT 50 — then a single index on (created_at DESC) serves it directly. UNION ALL is specifically valuable when each branch needs to JOIN to a different table, which is the scenario the LEFT JOIN approach handles so poorly.
"The ORM did not fail. It did exactly what was asked. It was simply asked poorly."
— from You Don't Need Redis, Chapter 3: The ORM Tax
Gems you cannot refactor: ActionText, Active Storage, pg_search, and the rest
The advice to "just refactor your polymorphic associations" assumes you own the schema. For a meaningful percentage of Rails polymorphic tables, you do not. Allow me to take inventory.
# ActionText uses polymorphic associations internally:
class ActionText::RichText < ApplicationRecord
belongs_to :record, polymorphic: true
end
# Every model with has_rich_text creates a polymorphic row:
class Article < ApplicationRecord
has_rich_text :content
has_rich_text :summary
end
# The action_text_rich_texts table:
# record_type = 'Article', record_id = 1, name = 'content'
# record_type = 'Article', record_id = 1, name = 'summary'
# record_type = 'User', record_id = 5, name = 'bio'
# You cannot refactor this. ActionText owns the schema.
# But you can index it:
#
# CREATE INDEX idx_rich_texts_article ON action_text_rich_texts
# (record_id, name)
# WHERE record_type = 'Article'; # Active Storage — the polymorphic table you forgot about:
class ActiveStorage::Attachment < ApplicationRecord
belongs_to :record, polymorphic: true
belongs_to :blob, class_name: 'ActiveStorage::Blob'
end
# Schema:
# CREATE TABLE active_storage_attachments (
# id bigserial PRIMARY KEY,
# name varchar NOT NULL,
# record_type varchar NOT NULL,
# record_id bigint NOT NULL,
# blob_id bigint NOT NULL,
# created_at timestamp NOT NULL
# );
#
# Default index: (record_type, record_id, name, blob_id)
#
# Every has_one_attached and has_many_attached adds rows here.
# A model with 3 attachments means 3 rows per record.
# A User with avatar, resume, and cover_letter:
# record_type = 'User', record_id = 42, name = 'avatar'
# record_type = 'User', record_id = 42, name = 'resume'
# record_type = 'User', record_id = 42, name = 'cover_letter'
# Targeted partial index:
# CREATE INDEX idx_attachments_user ON active_storage_attachments
# (record_id, name)
# WHERE record_type = 'User'; # pg_search polymorphic setup:
class PgSearch::Document < ApplicationRecord
belongs_to :searchable, polymorphic: true
# searchable_type varchar, searchable_id bigint
end
# Devise's Confirmable, Lockable, Recoverable all use
# polymorphic-like token lookups. Third-party gems bake
# polymorphic associations into YOUR database.
# You cannot change the gem's schema.
# You CAN add targeted indexes:
# For pg_search multisearch:
# CREATE INDEX idx_pg_search_product ON pg_search_documents
# (searchable_id)
# WHERE searchable_type = 'Product'; ActionText's action_text_rich_texts table uses record_type and record_id as polymorphic columns. Every model with has_rich_text adds rows to this table. Active Storage's active_storage_attachments table uses the same pattern — record_type and record_id — for every file attached to any model. You cannot change either schema. They are owned by the framework.
The inventory extends further than most developers realize. The pg_search gem's multisearch documents use searchable_type and searchable_id. The acts_as_taggable_on gem uses polymorphic taggings. The audited gem stores audit trails with polymorphic references. The noticed gem for notifications uses polymorphic associations for both the recipient and the event. Virtually any gem that provides cross-model functionality reaches for belongs_to :something, polymorphic: true as its coupling mechanism.
Millions of Rails applications carry these polymorphic tables whether the developer chose the pattern or not. The tables accumulate rows as the application grows. The default indexes — typically a composite on (record_type, record_id) — serve the basic lookups, but offer nothing for the sort patterns, the cross-type queries, or the covering scan opportunities that production traffic demands.
For these tables, partial indexes by type are not just an optimization — they are the only optimization available. You cannot refactor what you do not own. You can index it.
I encourage you to audit your schema for polymorphic tables you may have forgotten about. Run this query:
SELECT table_name, column_name FROM information_schema.columns WHERE column_name LIKE '%_type' AND data_type = 'character varying' ORDER BY table_name;
Every row in that result is a candidate for partial indexing. Some you created deliberately. Others arrived with a gem install.
Covering indexes: eliminating heap access entirely
PostgreSQL 11 introduced the INCLUDE clause for covering indexes. A covering index stores additional columns in the index leaf pages without including them in the index's sort order. When a query can be answered entirely from the index — all selected columns are either indexed or included — PostgreSQL performs an index-only scan with zero heap fetches.
-- A covering partial index avoids heap access entirely:
CREATE INDEX idx_comments_post_covering ON comments
(commentable_id, created_at)
INCLUDE (body)
WHERE commentable_type = 'Post';
-- Query that benefits from index-only scan:
EXPLAIN (ANALYZE, BUFFERS)
SELECT commentable_id, created_at, body
FROM comments
WHERE commentable_type = 'Post'
AND commentable_id = 42
ORDER BY created_at DESC;
Index Only Scan Backward using idx_comments_post_covering on comments
(cost=0.42..3.48 rows=8 width=80)
(actual time=0.006..0.009 rows=8 loops=1)
Heap Fetches: 0
Buffers: shared hit=2
-- Heap Fetches: 0. PostgreSQL answered the query
-- entirely from the index. No table access at all. Heap Fetches: 0. PostgreSQL answered the query entirely from the index. No table page access at all. Two buffer hits. This is the fastest possible read path — the data never leaves the index structure.
Covering partial indexes combine two powerful techniques: the partial index filters to only the relevant type's rows, and the INCLUDE clause adds columns that eliminate the need to visit the heap. For high-frequency queries that return a consistent set of columns — comment bodies in a feed, attachment URLs in a gallery, search document titles in an autocomplete — covering partial indexes are the fastest possible path.
The trade-off is index size. Including body (a text column) in the index makes the index substantially larger — potentially much larger than the base index, depending on the average column width. A text body averaging 200 bytes per row would increase the 62MB Post partial index to roughly 500MB. This is worthwhile for columns queried thousands of times per second. It is wasteful for columns queried occasionally or for large text blobs that push the index out of the buffer cache.
My recommendation: start with partial indexes (no INCLUDE). Measure. If pg_stat_user_indexes shows a specific partial index with high idx_scan counts and pg_statio_user_indexes shows significant idx_blks_read (disk reads), consider adding INCLUDE for the columns that would eliminate heap fetches on that hot path. The covering index is the final optimization, not the first.
Orphaned rows: the silent cost of missing foreign keys
Polymorphic associations cannot have foreign key constraints. PostgreSQL's foreign keys require a single target table, and the entire premise of polymorphic associations is that the target depends on a string column's value. The consequence is that deleted parent records leave their polymorphic children behind — orphaned rows that reference nothing, serve no purpose, and participate in every index scan, every VACUUM cycle, and every sequential scan on the table.
-- The hidden cost: orphaned rows with no FK to catch them.
-- Polymorphic associations cannot have foreign keys.
-- Deleted parents leave orphaned children behind.
SELECT commentable_type,
count(*) AS orphaned_rows
FROM comments c
WHERE commentable_type = 'Post'
AND NOT EXISTS (
SELECT 1 FROM posts p WHERE p.id = c.commentable_id
)
UNION ALL
SELECT commentable_type, count(*)
FROM comments c
WHERE commentable_type = 'Image'
AND NOT EXISTS (
SELECT 1 FROM images i WHERE i.id = c.commentable_id
)
UNION ALL
SELECT commentable_type, count(*)
FROM comments c
WHERE commentable_type = 'Video'
AND NOT EXISTS (
SELECT 1 FROM videos v WHERE v.id = c.commentable_id
);
-- commentable_type | orphaned_rows
-- ------------------+---------------
-- Post | 3,847
-- Image | 912
-- Video | 156
--
-- 4,915 orphaned rows. No error was ever raised.
-- These rows are scanned, indexed, and VACUUMed
-- for no reason. They are dead weight in every query. 4,915 orphaned rows in this example. No error was ever raised. No constraint was violated. The application deleted posts, images, and videos through dependent: :destroy callbacks in Rails — but those callbacks only fire when the deletion goes through the model. Direct SQL deletes, bulk deletions, database-level cascades, and interrupted transactions all bypass the callback and leave orphans behind.
The performance impact of orphaned rows depends on their proportion. 4,915 orphans in a 4-million-row table is noise — 0.12%. But orphans accumulate. Without periodic cleanup, a table that loses 100 parent records per day accumulates 36,500 orphaned children per year (assuming an average of one child per parent). Over several years, on a high-write table, orphans can reach the low single-digit percentages of total rows.
Each orphaned row is indexed, VACUUMed, analyzed, and scanned exactly like a valid row. The partial indexes you create will include orphans that match their type filter. VACUUM will maintain visibility maps for pages containing orphans. Autovacuum will include orphan-heavy pages in its dead tuple cleanup. The orphans earn their keep in no useful capacity.
I recommend a periodic cleanup job — weekly or monthly, depending on your deletion volume — that removes orphaned polymorphic rows. The UNION ALL pattern from the orphan detection query above can be adapted into a DELETE statement. Run it during low-traffic periods, in batches, with a LIMIT to avoid long-running transactions.
Maintenance considerations for partial indexes
Partial indexes are not "set and forget" infrastructure. They participate in PostgreSQL's maintenance machinery — VACUUM, ANALYZE, and reindexing — and their behavior differs from full indexes in ways that warrant your attention.
-- Partial indexes require VACUUM attention.
-- Check the visibility map coverage per partial index:
SELECT indexrelname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
pg_stat_get_dead_tuples(indrelid) AS dead_tuples
FROM pg_stat_user_indexes
WHERE relname = 'comments'
AND indexrelname LIKE 'idx_comments_%'
ORDER BY idx_scan DESC;
-- indexrelname | idx_scan | index_size | dead_tuples
-- ----------------------+----------+------------+-------------
-- idx_comments_post | 847,221 | 62 MB | 12,418
-- idx_comments_image | 214,847 | 26 MB | 3,201
-- idx_comments_video | 41,218 | 16 MB | 847
-- 12,418 dead tuples in the Post index. These accumulate
-- from UPDATE and DELETE operations. VACUUM cleans them.
-- If autovacuum is tuned conservatively, partial indexes
-- on high-write tables may accumulate dead tuples faster
-- than they are cleaned.
-- Targeted VACUUM on just the comments table:
VACUUM (VERBOSE) comments;
-- For index-only scans, the visibility map must be current.
-- Stale visibility maps cause Heap Fetches > 0, which
-- defeats the purpose of covering indexes. VACUUM and dead tuples. Every UPDATE or DELETE on a row matching a partial index's filter creates a dead tuple in that index. If your application updates comments frequently — editing body text, toggling visibility flags, updating timestamps via touch: true — the partial indexes accumulate dead tuples proportional to the update rate for their type. Autovacuum handles this, but its default thresholds are based on the entire table's row count, not the partial index's. A table with 4 million rows and an autovacuum threshold of 20% will not trigger until 800,000 dead tuples accumulate — even if all of them are in the Post partial index.
If your polymorphic table is write-heavy, consider lowering the autovacuum thresholds for that table specifically:
ALTER TABLE comments SET (autovacuum_vacuum_scale_factor = 0.05, autovacuum_analyze_scale_factor = 0.02);
Visibility maps and index-only scans. Covering partial indexes (with INCLUDE) depend on the visibility map being current. If VACUUM has not run recently, pages with recently-modified rows will not be marked as all-visible, and PostgreSQL will fall back to heap fetches for those pages. Monitor Heap Fetches in your EXPLAIN output. If it climbs above zero on a covering index that should produce index-only scans, VACUUM is behind.
Index bloat over time. Partial indexes are not immune to index bloat. On tables with heavy UPDATE/DELETE workloads, the B-tree pages can become sparse as dead entries are reclaimed but pages are not consolidated. Periodically check index bloat using the pgstattuple extension or the Bloat Estimation Query from the PostgreSQL wiki. If a partial index's actual size significantly exceeds its expected size, REINDEX INDEX CONCURRENTLY rebuilds it without downtime.
The honest counterpoint: when these techniques are not enough
I have spent this article demonstrating that polymorphic associations can perform well with the right indexes and query patterns. This is true. It is also incomplete. Allow me to be forthcoming about where these techniques reach their limits, because a waiter who overstates his case is no waiter at all.
Foreign key integrity cannot be indexed into existence. Partial indexes make queries fast. They do not prevent orphaned rows, enforce referential integrity, or guarantee that commentable_id = 42 with commentable_type = 'Post' actually corresponds to an existing post. If data integrity is your primary concern — and for financial, medical, or regulatory applications, it should be — no amount of indexing substitutes for proper foreign keys. The schema refactor remains the only path to constraint-enforced integrity.
Cross-type aggregations remain expensive. "How many total comments exist across all types?" is a full table scan regardless of indexing strategy. "What is the average comment count per parent, across all parent types?" requires the UNION ALL pattern with per-type GROUP BY operations merged afterward. These queries work, but they scale with the total table size, not with any single type's partition. If your application is dashboard-heavy with frequent cross-type aggregations, the polymorphic table will always be slower than per-type tables that can be aggregated independently.
The planner still sees one table. Partial indexes improve the planner's statistics for indexed operations, but the base table's statistics remain blended. Complex queries that involve multiple polymorphic tables — joining comments to notifications to audit logs, all polymorphic — can produce planning errors that compound multiplicatively. I have observed queries where the planner's row estimate was off by three orders of magnitude because two polymorphic tables with skewed distributions were joined together.
Partial indexes do not reduce table size. The table itself — the heap — contains all rows regardless of indexing strategy. If your polymorphic table has grown to tens of gigabytes and the heap is the bottleneck (sequential scans for analytics, full-table VACUUM times, backup sizes), partial indexes help queries but do not help storage. Table partitioning by type would address this, but partitioning a polymorphic table in Rails requires significant application changes.
When should you pursue the full refactor? When the polymorphic table exceeds 50 million rows and is still growing. When foreign key violations are causing data corruption that costs more to detect and repair than the refactor itself. When the UNION ALL pattern requires 10+ branches and is becoming unmaintainable. When the team is already planning a major version or a greenfield rewrite. In those cases, the techniques in this article are the bridge — they buy time and performance while the proper solution is designed and deployed.
What Gold Lapel does with polymorphic query patterns
Gold Lapel sits between your Rails application and PostgreSQL as a transparent proxy. It observes every query — including the ones ActiveRecord generates for polymorphic eager loading, ActionText rich text lookups, Active Storage attachment fetches, and pg_search multisearch queries. Add gem "goldlapel-rails" to your Gemfile, bundle, and it auto-patches ActiveRecord — no model changes, no migrations.
-- Gold Lapel observes your polymorphic query patterns:
-- Pattern detected: 94% of queries on comments table include
-- WHERE commentable_type = 'Post'
-- Recommendation: partial index filtered by type
-- Pattern detected: commentable_type has 4 distinct values
-- stored as varchar, averaging 18.4 bytes per row
-- Opportunity: expression index maps to 4-byte integer
-- Pattern detected: ORDER BY created_at DESC appears in 78%
-- of commentable_type = 'Post' queries
-- Recommendation: include created_at in partial index
-- to eliminate sort step
-- Created automatically:
CREATE INDEX gl_auto_comments_post_created
ON comments (commentable_id, created_at DESC)
WHERE commentable_type = 'Post';
CREATE INDEX gl_auto_comments_image_created
ON comments (commentable_id, created_at DESC)
WHERE commentable_type = 'Image';
-- No migration. No deploy. No code change. The proxy detects three specific patterns in polymorphic query traffic:
- Repeated type-filtered lookups. When 94% of queries on a polymorphic table include
WHERE commentable_type = 'Post', Gold Lapel creates a partial index filtered by that type value. One index per observed type, each containing only the rows that type's queries need. - Sort patterns on type-filtered queries. When
ORDER BY created_at DESCappears consistently in Post-type queries, the partial index includescreated_at DESCto eliminate the sort step. The proxy detects the sort column from query analysis, not from schema inspection — it responds to how your application actually queries the table. - Expression index opportunities. When the type column stores long class names (
ActionText::RichTextis 23 bytes) and the proxy can rewrite queries to use integer mappings transparently, it creates expression indexes that reduce index size by 30-40%. The query rewriting happens at the proxy level — your application sends the original ActiveRecord query, and the proxy translates the WHERE clause to match the expression index.
This matters most for the gems you cannot refactor. ActionText's action_text_rich_texts table, Active Storage's active_storage_attachments table, pg_search's pg_search_documents table — these carry polymorphic columns by design. You cannot change their schema. You cannot contribute partial index migrations to the gem's codebase and expect them to be accepted, because the gem authors cannot know which types your application stores. Gold Lapel indexes them based on your actual query traffic, automatically, without touching a line of application code.
The full Rails integration works with any Rails version that uses PostgreSQL. One gem, no code changes. The polymorphic queries that were costing you hundreds of milliseconds start hitting partial indexes that did not exist five minutes ago. The expression indexes that would require coordinated scope changes across your codebase are created and utilized transparently.
I would be a poor waiter indeed if I suggested that a tool could replace understanding. The techniques in this article work with or without Gold Lapel. The proxy automates what you now know how to do manually. Whether you prefer to manage your own indexes or allow the proxy to manage them on your behalf, the underlying strategies are the same — partial indexes, expression indexes, and query patterns that let the planner do its best work.
A matter of proper maintenance
Polymorphic associations are not going away. The pattern is deeply embedded in Rails conventions, in gem ecosystems, and in millions of production databases. The advice to avoid them in new code is sound. The advice to refactor them in existing code is aspirational. The advice to make them perform well, right now, with the schema you have — that is practical, and that is what I have endeavored to provide.
If you take one technique from this article, let it be partial indexes. One migration per type. No query changes. No model changes. Smaller indexes, sort elimination, better planner statistics, and the ability to create covering indexes later when the hot paths demand them. It is, if I may say so, the most elegant optimization available for the least elegant schema pattern in Rails.
The household metaphor is apt here. Polymorphic associations are the inherited furniture — not what you would have chosen, but serviceable with proper care. Partial indexes are the polish. UNION ALL is the rearrangement that makes the room work despite the furniture's limitations. And the full refactor is the renovation that happens when the budget and the calendar align.
Until then, the existing furniture shall be impeccably maintained.
Frequently asked questions
Terms referenced in this article
If you'll permit me one further thought — the index strategies discussed here are only a subset of what PostgreSQL offers. I have written a comprehensive guide to PostgreSQL index types that covers partial indexes, expression indexes, and the GIN indexes that serve polymorphic lookups particularly well.