Polymorphic Associations Are Undermining Your PostgreSQL Performance. Here Are the Escape Routes.
Your comments table has a column called "commentable_type." Allow me to explain what it is costing you.
Good morning. You have a polymorphic association.
It seemed like such a reasonable idea at the time. Comments can belong to posts, photos, and videos. Rather than creating three separate comment tables — each with identical columns — you created one table with two special columns: commentable_type and commentable_id. ActiveRecord made it easy. One belongs_to :commentable, polymorphic: true and you were done.
That was eighteen months and four million rows ago. Now you are here because something is wrong with your query performance, and the EXPLAIN output is not making you feel better.
I have seen this pattern many times. In Rails applications, in Django projects using GenericForeignKey, in Laravel codebases with morphTo. The syntax differs. The underlying problem does not. Polymorphic associations are one of the most popular ORM features and one of the most quietly destructive PostgreSQL anti-patterns. They trade schema integrity for developer convenience — and the bill comes due at scale.
This is not a pattern that fails immediately. It works beautifully for months, sometimes years. The problems accumulate silently: orphaned rows, degrading query plans, index bloat, cardinality misestimates. By the time you notice, the table has four million rows and the EXPLAIN output reads like a medical diagnosis.
Allow me to walk you through exactly what polymorphic associations cost, why the cost compounds, and — most importantly — the escape routes available to you. We shall examine four distinct migration patterns, each with honest trade-offs, along with indexing strategies for those who must live with the pattern as-is.
# app/models/comment.rb
class Comment < ApplicationRecord
belongs_to :commentable, polymorphic: true
end
class Post < ApplicationRecord
has_many :comments, as: :commentable
end
class Photo < ApplicationRecord
has_many :comments, as: :commentable
end
class Video < ApplicationRecord
has_many :comments, as: :commentable
end -- The resulting table:
CREATE TABLE comments (
id bigserial PRIMARY KEY,
body text NOT NULL,
commentable_type varchar NOT NULL, -- 'Post', 'Photo', 'Video'
commentable_id bigint NOT NULL, -- id in that table
created_at timestamp NOT NULL
);
-- Rails adds this index by default:
CREATE INDEX idx_comments_commentable
ON comments (commentable_type, commentable_id);
-- Notice what is missing: no foreign key constraint.
-- PostgreSQL cannot enforce a FK that points to three different tables. This is not a Rails problem. It is a relational modeling problem.
I should be clear about something before we proceed: this article uses Rails and ActiveRecord in its examples because Rails popularized the pattern and made it most accessible. But the problem is not specific to Rails. Any framework that stores a type discriminator and an ID pointing at multiple tables produces the same structural limitations in PostgreSQL.
# Django's equivalent: GenericForeignKey
from django.contrib.contenttypes.fields import (
GenericForeignKey, GenericRelation
)
from django.contrib.contenttypes.models import ContentType
class Comment(models.Model):
content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
object_id = models.PositiveIntegerField()
commentable = GenericForeignKey('content_type', 'object_id')
body = models.TextField()
class Post(models.Model):
comments = GenericRelation(Comment)
# The resulting table is structurally identical:
# content_type_id (FK to django_content_types) + object_id
# Same problem: object_id cannot have a FK constraint
# to the referenced table.
# Django at least uses an integer FK to content_types
# instead of storing class names as strings.
# But the fundamental issue — one ID column pointing
# at multiple tables — remains. // Laravel's morphTo/morphMany:
class Comment extends Model {
public function commentable() {
return $this->morphTo();
}
}
class Post extends Model {
public function comments() {
return $this->morphMany(Comment::class, 'commentable');
}
}
// Resulting columns: commentable_type, commentable_id
// Same pattern. Same limitations.
// Laravel stores the fully-qualified class name:
// 'App\Models\Post', 'App\Models\Photo'
// These strings are even longer than Rails' class names,
// making the index bloat worse. Django's GenericForeignKey uses an integer FK to the content_types table rather than storing class names as strings, which avoids some of the index bloat that Rails and Laravel suffer. But the core issue — object_id pointing at rows in multiple tables without a foreign key constraint — is identical.
Laravel's implementation is arguably the worst from a storage perspective. It stores fully-qualified PHP class names like App\Models\Post in the type column — 16 bytes or more per row, repeated millions of times, included in every index. But the class name length is a secondary concern. The real problem is structural, and it affects every framework equally.
The relational model was designed around the principle that a foreign key references exactly one table. Polymorphic associations violate this principle by design. Understanding this as a modeling problem rather than a framework bug is essential, because it explains why no framework can fully solve it — and why the solutions are schema-level, not code-level.
What exactly are you giving up?
Three things, each progressively more expensive as your data grows.
1. Foreign key constraints — gone
A foreign key constraint tells PostgreSQL: "this column must reference a valid row in that table." It prevents orphaned records, enables ON DELETE CASCADE, and gives the query planner useful cardinality information. Foreign keys are not merely a data integrity convenience — they are the mechanism by which PostgreSQL guarantees that your references are valid. Without them, you are trusting your application code to never make a mistake. I have observed application code. It makes mistakes.
Polymorphic associations make foreign key constraints impossible. The commentable_id column might reference posts.id, photos.id, or videos.id — PostgreSQL cannot enforce a constraint that points to three different tables simultaneously.
-- You might try this. PostgreSQL will refuse.
ALTER TABLE comments
ADD CONSTRAINT fk_commentable
FOREIGN KEY (commentable_id) REFERENCES posts(id);
-- ERROR: This only covers commentable_type = 'Post'.
-- What about Photo? Video? You need a FK per type,
-- but commentable_id can point to ANY of them.
-- The result: orphaned rows accumulate silently.
-- A deleted Post leaves its comments behind with
-- commentable_type = 'Post', commentable_id = 42,
-- pointing at nothing. Without FK constraints, referential integrity becomes your application's responsibility. Every DELETE must remember to clean up associated comments. Every bug in that cleanup logic creates orphaned rows that silently accumulate, bloating your table and corrupting your aggregations. Every race condition between a delete and a concurrent insert creates a window where orphaned records can slip through.
The dependent: :destroy callback in Rails handles this in the common case — but callbacks can be skipped (delete_all, bulk operations, raw SQL), and when they are skipped, orphans are born. A foreign key constraint with ON DELETE CASCADE runs inside PostgreSQL itself. It cannot be skipped. It cannot be forgotten. It does not depend on your application being well-behaved.
-- How many orphaned comments do you have right now?
-- This query checks for commentable_id values that
-- no longer exist in their referenced table:
SELECT commentable_type, count(*) AS orphaned_count
FROM comments c
WHERE commentable_type = 'Post'
AND NOT EXISTS (
SELECT 1 FROM posts p WHERE p.id = c.commentable_id
)
GROUP BY commentable_type
UNION ALL
SELECT commentable_type, count(*)
FROM comments c
WHERE commentable_type = 'Photo'
AND NOT EXISTS (
SELECT 1 FROM photos p WHERE p.id = c.commentable_id
)
GROUP BY commentable_type
UNION ALL
SELECT commentable_type, count(*)
FROM comments c
WHERE commentable_type = 'Video'
AND NOT EXISTS (
SELECT 1 FROM videos v WHERE v.id = c.commentable_id
)
GROUP BY commentable_type;
-- On a production table I reviewed recently:
-- commentable_type | orphaned_count
-- ------------------+----------------
-- Post | 3,847
-- Photo | 912
-- Video | 156
--
-- 4,915 orphaned rows. No error was ever raised. If you run that query against your production database and the count is zero, you have either been extraordinarily disciplined or extraordinarily lucky. In my experience, the former is rare and the latter does not persist.
2. JOINs — blocked
SQL JOINs require a target table. A polymorphic association does not have one — it has several. Rails acknowledges this directly by raising EagerLoadPolymorphicError when you attempt .joins(:commentable).
# Rails eager loading with polymorphic associations:
Comment.includes(:commentable)
# This works — but sends MULTIPLE queries:
# SELECT * FROM comments WHERE post_id = 7;
# SELECT * FROM posts WHERE id IN (1, 2, 3, ...);
# SELECT * FROM photos WHERE id IN (4, 5, ...);
# SELECT * FROM videos WHERE id IN (6, ...);
# What you CANNOT do:
Comment.joins(:commentable)
# => ActiveRecord::EagerLoadPolymorphicError:
# Cannot eagerly load the polymorphic association :commentable
# JOINs require a single target table.
# Polymorphic associations point to many tables.
# Rails raises an error rather than guess. Without JOINs, every eager load becomes multiple queries — one per type. This is the N+1 problem baked into the schema itself. It is not a bug in your application code. It is not a missing .includes() call. It is a structural limitation of the data model that cannot be resolved by better application code.
The impact is most visible in activity feeds and dashboards — any page that displays comments from mixed types. Rails must issue a separate query for each type to resolve the associated records. Three types, three queries. Six types, six queries. Each query is individually fast, but the round trips accumulate. On a page that displays 50 comments belonging to 4 types, Rails issues at minimum 5 queries (one for comments, one per type) regardless of how well you optimize each individual query.
Some developers work around this with raw SQL LEFT JOINs to every type table, using COALESCE to merge results. This works but produces plans that scale linearly with the number of types — each additional type adds another sequential scan and another hash join. I have seen these queries exceed 400ms in production with as few as five types. The Rails-specific optimization guide covers the UNION ALL alternative in detail.
3. Index efficiency — compromised
The default composite index on (commentable_type, commentable_id) works for type-specific lookups. But any query that does not filter by type — recent comments across all types, global full-text search, aggregation by date — cannot use this index efficiently. The type column is the leading column; without it, the index is only partially useful.
This is a consequence of how B-tree indexes work. A composite index on (A, B) is sorted first by A, then by B within each A value. A query that filters only by B must scan across all A groups — PostgreSQL typically declines to do this, choosing a sequential scan instead. Your carefully created index sits unused while PostgreSQL reads every row in the table.
4. Query planner confusion — the hidden cost
This fourth cost is less discussed but frequently significant. The query planner estimates cardinality — how many rows a query will return — to choose between sequential scans, index scans, nested loops, hash joins, and merge joins. Polymorphic tables produce systematically inaccurate cardinality estimates because the planner treats commentable_type and commentable_id as independent columns.
-- The planner's cardinality estimates on polymorphic tables
-- are frequently inaccurate. Here is why:
-- Check the statistics:
SELECT most_common_vals, most_common_freqs
FROM pg_stats
WHERE tablename = 'comments'
AND attname = 'commentable_type';
-- most_common_vals | most_common_freqs
-- -------------------+-------------------
-- {Post,Photo,Video} | {0.6,0.3,0.1}
-- PostgreSQL knows the distribution. But the composite
-- statistics across (commentable_type, commentable_id)
-- are estimated independently by default.
-- When you query WHERE commentable_type = 'Post'
-- AND commentable_id = 42, the planner multiplies:
-- P(type='Post') * P(id=42) = 0.6 * (1/n_distinct)
-- This assumes independence. In practice, commentable_id
-- values are NOT uniformly distributed across types.
-- Post IDs come from the posts table (200K rows).
-- Photo IDs come from the photos table (150K rows).
-- The planner does not know this.
-- Result: cardinality misestimates that cascade into
-- wrong join strategies, wrong index choices,
-- wrong memory allocations for sorts and hash tables. The planner assumes that the probability of commentable_type = 'Post' AND commentable_id = 42 equals P(type='Post') * P(id=42). But these columns are not independent — the set of valid IDs depends entirely on the type. Post IDs come from the posts table. Photo IDs come from the photos table. The ID spaces may overlap, may have gaps, may have vastly different cardinalities. The planner does not know any of this.
The result: wrong row count estimates that cascade into wrong join strategies, wrong index choices, and wrong memory allocations. A query the planner estimates will return 1 row actually returns 12. A query estimated at 50,000 rows returns 847. These misestimates are the invisible tax on every query that touches the polymorphic table.
-- PostgreSQL 10+ extended statistics can help:
CREATE STATISTICS comments_type_id_stats
(dependencies, ndistinct, mcv)
ON commentable_type, commentable_id FROM comments;
-- Then refresh:
ANALYZE comments;
-- Now the planner understands the correlation between
-- type and id. It knows that commentable_id values
-- for type='Post' are drawn from a different pool
-- than those for type='Photo'.
-- Before extended statistics:
-- Estimated rows: 1 Actual rows: 12
-- After extended statistics:
-- Estimated rows: 11 Actual rows: 12
-- Close enough. The planner chooses the right index. PostgreSQL's extended statistics (available since version 10) can help by teaching the planner about correlations between columns. Creating extended statistics on (commentable_type, commentable_id) tells the planner to consider their joint distribution rather than treating them independently. This does not fix the structural problems, but it gives the planner better estimates for the queries you do run.
What does this cost in practice?
Numbers clarify things. Here is a comments table with 1 million rows — 600K for posts, 300K for photos, 100K for videos — on PostgreSQL 16.
-- Query: find all comments for Post id 42
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM comments
WHERE commentable_type = 'Post'
AND commentable_id = 42;
-- With the default composite index on (commentable_type, commentable_id):
Index Scan using idx_comments_commentable on comments
(cost=0.56..8.58 rows=1 width=97)
(actual time=0.031..0.035 rows=12 loops=1)
Index Cond: ((commentable_type = 'Post') AND (commentable_id = 42))
Buffers: shared hit=4
-- Acceptable for a single lookup. Now scale it:
-- 200 posts, each needing their comments:
-- 200 x Index Scan = 200 queries, ~800 buffer hits. A single type-specific lookup is fine. The composite index handles it. Four buffer hits, sub-millisecond response. If this is your only query pattern — look up comments for a specific post, a specific photo — polymorphic associations perform adequately. The index was designed for exactly this access pattern.
But now consider the eager-loading pattern Rails uses when you call Comment.includes(:commentable) on a collection of 200 comments belonging to mixed types. Rails issues three separate queries — one per type — each with an IN clause containing the relevant IDs. Three round trips, three index scans, three result sets stitched together in Ruby. Each individual query is fast. The combined cost — network round trips, Ruby object allocation, result merging — adds up to tens of milliseconds on a good day and hundreds on a busy one.
Now consider a query that crosses types entirely:
-- Query: find recent comments across ALL types, sorted
EXPLAIN (ANALYZE, BUFFERS)
SELECT c.*, commentable_type, commentable_id
FROM comments c
WHERE created_at > '2025-01-01'
ORDER BY created_at DESC
LIMIT 50;
-- The composite index on (commentable_type, commentable_id) is useless here.
-- PostgreSQL has no index on created_at:
Sort (cost=28471.32..28596.32 rows=50000 width=97)
Sort Key: created_at DESC
Sort Method: top-N heapsort Memory: 71kB
-> Seq Scan on comments (cost=0.00..21370.00 rows=50000 width=97)
(actual time=0.021..98.432 rows=47823 loops=1)
Filter: (created_at > '2025-01-01')
Rows Removed by Filter: 952177
Buffers: shared hit=8370
-- Sequential scan over 1M rows. 98ms. And this is a warm cache. Sequential scan. 98 milliseconds on a warm cache. 952,177 rows read and discarded to return 47,823. On a cold cache or a table with 10 million rows, this becomes measured in seconds. The composite index on (commentable_type, commentable_id) cannot help because the query does not filter by type.
This query is not exotic. It is the "recent activity" feed. It is the admin dashboard showing latest comments. It is the moderation queue. Any feature that displays comments regardless of what they are attached to hits this wall.
In fairness: when polymorphic associations are the right choice
I should be forthcoming about the boundaries of this argument, because presenting polymorphic associations as universally wrong would be a disservice to you and an embarrassment to me.
Polymorphic associations are a reasonable choice when:
- You have two or three types and no plans for more. The complexity of the alternative patterns — join tables, separate tables, partitioning — is not free. For a small number of types with modest data volumes, the simplicity of
polymorphic: trueearns its keep. - Your queries almost always filter by type. If you never need cross-type queries — no activity feeds, no global search, no aggregate reports — the default composite index serves every query you actually run. The limitations exist but do not affect you.
- The table will remain small. Below 100,000 rows, PostgreSQL handles sequential scans quickly enough that index optimization is academic. The break-even point where polymorphic overhead becomes measurable depends on your hardware and query frequency, but most tables under 100K rows perform well regardless of schema design.
- You are prototyping. The cost of polymorphic associations is paid at scale. If you are validating a product idea and will redesign the data model before launch, the speed of
polymorphic: trueis a legitimate advantage. Ship it, learn what your actual query patterns are, then design the schema around those patterns.
The advice in this article is for the other case: you have shipped, your table has grown, your queries are slowing down, and the bill is coming due. For that case, the escape routes below are ordered from least disruptive to most thorough.
Do I have to migrate away from polymorphic associations?
Not necessarily. The severity depends on your scale, query patterns, and tolerance for the constraints above. For a small application with two or three types and modest data volumes, polymorphic associations are perfectly adequate. ActiveRecord makes them convenient, and convenience has value — real, measurable value in development velocity.
But if any of the following apply, it is time to consider alternatives:
- Your polymorphic table exceeds 1 million rows
- You need cross-type queries (recent activity feeds, global search)
- You have orphaned rows from inconsistent deletions
- Eager loading performance is degrading as types multiply
- You need to JOIN through the polymorphic association for reporting
- Your EXPLAIN output shows cardinality estimates that are wildly wrong
- You are adding new types and each one adds another eager-load query
GitLab reached this conclusion years ago and documented a formal policy against polymorphic associations in their codebase. Their migration to join tables is one of the patterns below. They did not arrive at this position casually — they arrived at it after years of operating polymorphic tables at scale and measuring the cost.
Four escape routes
Each pattern resolves the core problems — FK constraints, JOINs, index efficiency — with different trade-offs. The right choice depends on whether your types share structure, whether you need global queries across types, and how much migration risk you can absorb.
Pattern 1: Explicit per-type tables
The most straightforward migration. Replace the polymorphic table with separate tables, one per type. Each gets its own foreign key, its own indexes, its own constraints. The data model becomes boring. Boring, in infrastructure, is the highest compliment available.
# Pattern 1: Explicit per-type associations
# Replace polymorphic with dedicated foreign keys.
class PostComment < ApplicationRecord
belongs_to :post
end
class PhotoComment < ApplicationRecord
belongs_to :photo
end
class VideoComment < ApplicationRecord
belongs_to :video
end
# Or, if comments share structure, use STI on the comment side:
class Comment < ApplicationRecord
# No polymorphic columns
end
class PostComment < Comment
belongs_to :post
end -- Explicit associations: real foreign keys, real JOINs
CREATE TABLE post_comments (
id bigserial PRIMARY KEY,
post_id bigint NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
body text NOT NULL,
created_at timestamp NOT NULL
);
CREATE TABLE photo_comments (
id bigserial PRIMARY KEY,
photo_id bigint NOT NULL REFERENCES photos(id) ON DELETE CASCADE,
body text NOT NULL,
created_at timestamp NOT NULL
);
-- Foreign key constraints: enforced.
-- JOINs: straightforward.
-- Query planner: one table, one index, no type discrimination. This works well when the types are genuinely different — when post comments and photo comments will eventually diverge in structure, permissions, or lifecycle. It is the pattern the Rails Guides describe as the alternative.
The query planner loves this pattern. One table, one index, one set of statistics. No type discrimination, no cardinality confusion, no composite index with a string leading column. Every query plan is straightforward. Every estimate is accurate.
The cost: if you need "all recent comments regardless of type," you must UNION ALL across tables. With two or three types, this is manageable. With twelve, it is not. This is the fundamental trade-off — perfect per-type performance at the expense of cross-type query convenience.
Pattern 2: Join tables (GitLab's approach)
Keep the comments in one table. Create a small join table per type to link comments to their parent. Each join table carries its own FK constraints.
-- Pattern 2: Separate join tables (GitLab's approach)
-- Keep a single comments table, but use join tables per type.
CREATE TABLE comments (
id bigserial PRIMARY KEY,
body text NOT NULL,
created_at timestamp NOT NULL
);
CREATE TABLE post_comments (
post_id bigint NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
comment_id bigint NOT NULL REFERENCES comments(id) ON DELETE CASCADE,
PRIMARY KEY (post_id, comment_id)
);
CREATE TABLE photo_comments (
photo_id bigint NOT NULL REFERENCES photos(id) ON DELETE CASCADE,
comment_id bigint NOT NULL REFERENCES comments(id) ON DELETE CASCADE,
PRIMARY KEY (photo_id, comment_id)
);
-- One comments table (shared structure, easy to query globally).
-- Join tables provide FK constraints per type.
-- JOINs work naturally through the join table. -- Fetch all comments for a post, with a proper JOIN:
EXPLAIN (ANALYZE, BUFFERS)
SELECT c.*
FROM comments c
JOIN post_comments pc ON pc.comment_id = c.id
WHERE pc.post_id = 42
ORDER BY c.created_at DESC;
Nested Loop (cost=0.56..12.84 rows=12 width=64)
-> Index Scan using post_comments_pkey on post_comments pc
(cost=0.29..4.41 rows=12 width=8)
Index Cond: (post_id = 42)
-> Index Scan using comments_pkey on comments c
(cost=0.42..0.70 rows=1 width=64)
Index Cond: (id = pc.comment_id)
Buffers: shared hit=5
-- Clean nested loop. FK-enforced. No type column needed. This is the pattern GitLab adopted across their codebase. It preserves the ability to query all comments globally (they are in one table) while adding FK constraints through the join tables. JOINs work naturally through the bridge table.
-- The join table pattern's advantage: global queries still work.
-- "Show the 50 most recent comments regardless of type"
EXPLAIN (ANALYZE, BUFFERS)
SELECT c.*, 'Post' AS source_type, pc.post_id AS source_id
FROM comments c
JOIN post_comments pc ON pc.comment_id = c.id
UNION ALL
SELECT c.*, 'Photo' AS source_type, phc.photo_id AS source_id
FROM comments c
JOIN photo_comments phc ON phc.comment_id = c.id
ORDER BY created_at DESC
LIMIT 50;
-- Each branch uses the join table's primary key index.
-- Merge Append combines the pre-sorted results.
-- No sequential scan on the comments table.
-- Alternatively, if you do not need the source:
SELECT * FROM comments ORDER BY created_at DESC LIMIT 50;
-- This is a simple index scan on (created_at DESC).
-- No type column in the way. No polymorphic filter.
-- Just a clean, fast query on a single table. The join table pattern is my recommendation for most applications migrating away from polymorphic associations. It preserves the single-table advantage for global queries — SELECT * FROM comments ORDER BY created_at DESC LIMIT 50 is a simple index scan with no type filtering, no UNION ALL, no complexity. But it adds per-type FK constraints through the bridge tables, so deletions cascade properly and orphaned rows become structurally impossible.
The cost: an extra JOIN per query (through the bridge table), and more tables to manage. In practice, the bridge tables are small — just two integer columns with a composite primary key — and the JOIN cost is negligible. The nested loop plan above shows 5 buffer hits. The polymorphic equivalent shows 4. One extra buffer hit is not a concern worth losing sleep over.
Pattern 3: PostgreSQL table inheritance
-- Pattern 3: PostgreSQL table inheritance
CREATE TABLE comments (
id bigserial PRIMARY KEY,
body text NOT NULL,
created_at timestamp NOT NULL
);
CREATE TABLE post_comments (
post_id bigint NOT NULL REFERENCES posts(id) ON DELETE CASCADE
) INHERITS (comments);
CREATE TABLE photo_comments (
photo_id bigint NOT NULL REFERENCES photos(id) ON DELETE CASCADE
) INHERITS (comments);
-- Querying the parent table scans all children:
SELECT * FROM comments; -- returns post_comments + photo_comments
SELECT * FROM ONLY comments; -- returns only direct comments rows
SELECT * FROM post_comments; -- returns only post comments
-- Caveats (significant):
-- - Unique constraints do NOT span children
-- - Primary keys are NOT inherited
-- - Indexes must be created on EACH child table separately
-- - Most ORMs have poor or no support for this PostgreSQL's table inheritance lets child tables inherit columns from a parent. Querying the parent table automatically includes rows from all children. Each child can add its own columns and FK constraints.
This is the most PostgreSQL-native solution and the least ORM-friendly. ActiveRecord does not support table inheritance. Sequelize does not. Django does not. Most ORMs treat it as an edge case or ignore it entirely. If you are writing raw SQL or using a thin query builder, inheritance is elegant. If you are committed to an ORM, this pattern adds friction that may not be worth the benefit.
The caveats are real and significant: unique constraints and indexes do not span child tables automatically, primary keys are not inherited, and you must create indexes on each child separately. A SELECT * FROM comments WHERE id = 42 may match rows in multiple children with the same ID — because each child's bigserial sequence is independent unless you share a sequence explicitly.
I include this pattern for completeness and because it is genuinely the right choice for PostgreSQL-native applications that do not use an ORM. But for the majority of web applications using ActiveRecord, Django ORM, or Eloquent, Patterns 1, 2, or 4 are more practical.
Pattern 4: Declarative partitioning
PostgreSQL 10 introduced declarative partitioning, which provides the query-routing benefits of table inheritance with better planner integration and stronger constraint enforcement. You keep the original table structure — including the type column — but PostgreSQL physically separates the rows into per-type partitions.
-- Pattern 4: Declarative partitioning (PostgreSQL 10+)
-- Similar to inheritance but with planner-level optimization.
CREATE TABLE comments (
id bigserial,
body text NOT NULL,
commentable_type varchar NOT NULL,
commentable_id bigint NOT NULL,
created_at timestamp NOT NULL,
PRIMARY KEY (id, commentable_type) -- partition key must be in PK
) PARTITION BY LIST (commentable_type);
CREATE TABLE comments_post PARTITION OF comments
FOR VALUES IN ('Post');
CREATE TABLE comments_photo PARTITION OF comments
FOR VALUES IN ('Photo');
CREATE TABLE comments_video PARTITION OF comments
FOR VALUES IN ('Video');
-- Now add per-partition FK constraints:
ALTER TABLE comments_post
ADD CONSTRAINT fk_post
FOREIGN KEY (commentable_id) REFERENCES posts(id) ON DELETE CASCADE;
-- Per-partition indexes:
CREATE INDEX idx_comments_post_id ON comments_post (commentable_id);
CREATE INDEX idx_comments_photo_id ON comments_photo (commentable_id);
-- Partition pruning: when you query WHERE commentable_type = 'Post',
-- PostgreSQL only scans the comments_post partition.
-- No sequential scan of the full table. Automatic. -- Partition pruning in action:
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM comments
WHERE commentable_type = 'Post'
AND commentable_id = 42;
Append (cost=0.29..8.31 rows=1 width=97)
-> Index Scan using idx_comments_post_id on comments_post
(cost=0.29..8.31 rows=1 width=97)
(actual time=0.011..0.016 rows=12 loops=1)
Index Cond: (commentable_id = 42)
Buffers: shared hit=3
Planning Time: 0.284 ms
-- PostgreSQL pruned the photo and video partitions entirely.
-- Only the post partition was scanned. The planner did this
-- at planning time, not execution time.
-- Cross-type queries scan all partitions but in parallel:
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM comments
ORDER BY created_at DESC LIMIT 50;
Limit (cost=0.86..3.47 rows=50 width=97)
-> Merge Append (cost=0.86..52184.47 rows=1000000 width=97)
Sort Key: created_at DESC
-> Index Scan Backward using idx_comments_post_created
on comments_post (rows=600000)
-> Index Scan Backward using idx_comments_photo_created
on comments_photo (rows=300000)
-> Index Scan Backward using idx_comments_video_created
on comments_video (rows=100000)
Buffers: shared hit=6 Declarative partitioning is the most interesting escape route because it preserves the original table's interface. Application code continues to INSERT INTO comments and SELECT FROM comments without knowing partitions exist. PostgreSQL routes rows to the correct partition on insert and prunes irrelevant partitions on select. Your ORM does not need to change.
The trade-offs are specific and measurable. The partition key (commentable_type) must be part of the primary key, which changes your PK from (id) to (id, commentable_type). Any foreign keys pointing to the comments table must include the partition key. Unique indexes must include the partition key. These constraints can require changes to referencing tables — a cost that must be evaluated against the benefits.
Per-partition FK constraints are the headline feature for polymorphic tables. You can add FOREIGN KEY (commentable_id) REFERENCES posts(id) on the comments_post partition specifically. PostgreSQL enforces referential integrity for Post comments without affecting Photo or Video partitions. This is the only pattern that provides FK constraints while keeping a single logical table.
"Most application developers interact with PostgreSQL through an ORM that exposes perhaps 10-15% of what PostgreSQL actually offers. The remaining 85% includes features that eliminate the need for much of the infrastructure complexity that modern applications carry."
— from You Don't Need Redis, Chapter 1: Good Evening. We Have a Problem.
How do the patterns compare?
| Strategy | FK constraints | JOINs | Query complexity | Schema cost | Best for |
|---|---|---|---|---|---|
| Polymorphic (status quo) | No | No (EagerLoadPolymorphicError) | Multiple queries per type | Minimal | Prototyping, few types, low scale |
| Explicit per-type tables | Yes | Yes | Single JOIN | One table per type | Divergent schemas, independent lifecycles |
| Join tables (GitLab pattern) | Yes | Yes | Single JOIN through bridge | One join table per type | Shared structure, many types, global queries |
| Table inheritance | Yes (per child) | Yes | Automatic with parent query | Child tables inherit parent | PostgreSQL-native apps, no ORM |
| Declarative partitioning | Yes (per partition) | Yes | Automatic partition pruning | Partition per type value | High-volume tables, per-type FK needed |
For most applications at scale, the join table pattern (Pattern 2) offers the best balance: global queries still work, FK constraints are enforced, and JOINs are available. Explicit per-type tables (Pattern 1) are preferable when the types have genuinely different lifecycles. Declarative partitioning (Pattern 4) is the strongest choice when you need FK constraints but want to preserve the single-table interface for application code.
Table inheritance (Pattern 3) is the right answer for a narrow set of applications — PostgreSQL-native codebases that do not use an ORM. For the rest, its ORM incompatibility creates more friction than it resolves.
How do you actually migrate?
Migrating a live polymorphic association is a five-phase operation. The critical principle: never switch reads and writes simultaneously. Dual-write first, backfill, switch reads, then retire the old structure. Impatience at this stage produces data loss. Patience produces a clean migration with zero downtime.
# Migrating polymorphic to explicit associations in Rails:
# Phase 1: Add the new table alongside the old one
class CreatePostComments < ActiveRecord::Migration[7.1]
def change
create_table :post_comments do |t|
t.references :post, null: false, foreign_key: true
t.text :body, null: false
t.timestamps
end
end
end
# Phase 2: Dual-write (write to both old and new)
class Comment < ApplicationRecord
after_create :sync_to_typed_table
private
def sync_to_typed_table
case commentable_type
when 'Post'
PostComment.create!(post_id: commentable_id, body: body)
end
end
end
# Phase 3: Backfill existing data
Comment.where(commentable_type: 'Post').find_each do |c|
PostComment.create!(post_id: c.commentable_id, body: c.body)
end
# Phase 4: Switch reads to the new table
# Phase 5: Stop dual-writing, drop the old columns Allow me to expand on each phase, because the devil resides in the transitions between them.
Phase 1 — Create the new structure. Add the new table alongside the old one. Do not drop anything. Do not modify the existing table. The new table simply exists, empty, waiting. This migration is zero-risk and can be deployed at any time.
Phase 2 — Dual-write. Every new comment is written to both the old polymorphic table and the new typed table. Use an after_create callback, a database trigger, or application-level logic. The old table remains the source of truth for reads. The new table is accumulating data for the cutover.
Phase 3 — Backfill. Copy existing data from the old table to the new table in batches. Use find_each (which processes records in batches of 1,000 by default) rather than loading everything into memory. For tables with tens of millions of rows, consider a background job that processes batches with a delay between them to avoid saturating the database.
Do not attempt to backfill in a single transaction — it will hold locks for too long and generate excessive WAL. Batched inserts with periodic commits allow autovacuum to work and keep your replication lag manageable.
Phase 4 — Verify and switch reads. This is where correctness is confirmed. Run both old and new queries in parallel, compare results, and only cut over when they match.
-- Phase 4 verification: compare old and new before cutover.
-- Run both queries and confirm identical results.
-- Old path (polymorphic):
SELECT body, created_at FROM comments
WHERE commentable_type = 'Post' AND commentable_id = 42
ORDER BY created_at DESC;
-- New path (join table):
SELECT c.body, c.created_at FROM comments c
JOIN post_comments pc ON pc.comment_id = c.id
WHERE pc.post_id = 42
ORDER BY c.created_at DESC;
-- Automated verification for the full table:
SELECT count(*) AS old_count FROM comments
WHERE commentable_type = 'Post';
SELECT count(*) AS new_count FROM post_comments;
-- These counts MUST match after backfill completes.
-- If they differ, the backfill missed rows or the
-- dual-write has a race condition. Do not proceed. Phase 5 — Retire the old structure. Stop dual-writing. Drop the old columns or table. This phase can wait weeks — there is no cost to leaving the old structure in place temporarily, and the safety margin is worth it. I have seen teams drop the old table the same day as the cutover and regret it within hours. There is no prize for speed here.
If you are keeping polymorphic associations, index them properly
Not every polymorphic association needs to be migrated. If the scale is modest and the pattern works for your access patterns, the right move is better indexes rather than a schema migration. A well-indexed polymorphic table can perform within 10-20% of a properly normalized alternative for type-specific queries. The gap widens for cross-type queries, but if your application does not run those, the gap does not matter.
The default composite index on (commentable_type, commentable_id) is a start, but it serves only one query shape: lookup by type and ID. For other patterns — recent comments of a specific type, all comments for a user filtered by type — partial indexes are dramatically more effective.
-- Partial index filtered by type:
CREATE INDEX idx_comments_post_created
ON comments (commentable_id, created_at)
WHERE commentable_type = 'Post';
CREATE INDEX idx_comments_photo_created
ON comments (commentable_id, created_at)
WHERE commentable_type = 'Photo';
-- Now the same query, filtered by type:
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM comments
WHERE commentable_type = 'Post'
AND commentable_id = 42
ORDER BY created_at DESC;
Index Scan Backward using idx_comments_post_created on comments
(cost=0.42..4.12 rows=12 width=97)
(actual time=0.009..0.014 rows=12 loops=1)
Index Cond: (commentable_id = 42)
Buffers: shared hit=3
-- Smaller index (only Post rows), and created_at is included
-- for sort elimination. 3 buffer hits instead of 4. Partial indexes filtered by commentable_type are smaller, faster to scan, and cheaper to maintain than the full composite index. Each partial index contains only the rows for its type — a natural fit for the polymorphic pattern where queries almost always filter by type.
Three specific improvements in the partial index plan:
- No type comparison at query time. The
WHERE commentable_type = 'Post'predicate matches the index's filter clause. PostgreSQL does not evaluate it row-by-row — the index already contains only Post rows. - Sort elimination. The partial index includes
created_atas the second column. Queries withORDER BY created_at DESCuse an index scan backward instead of a separate sort step. - Smaller index footprint. Each partial index is a fraction of the size of the full composite. On a table with 1M rows (600K Post, 300K Photo, 100K Video), the Post partial index is 60% the size of the full composite. Fewer pages, fewer buffer hits, faster maintenance.
This is precisely the kind of optimization Gold Lapel creates automatically. The proxy observes your query traffic — sees repeated queries like WHERE commentable_type = 'Post' AND commentable_id = $1 — and creates partial indexes filtered by each type value it encounters. It also detects the multi-query eager-loading pattern that polymorphic associations force: the rapid succession of SELECT * FROM posts WHERE id IN (...) followed by SELECT * FROM photos WHERE id IN (...), each of which benefits from its own targeted index.
For polymorphic tables with expression-based queries — lower(commentable_type), JSONB extractions on metadata columns, date truncations on created_at — the proxy creates expression indexes as well. The type column becomes a partitioning dimension for index strategy, not just a filter value.
Whether you migrate to join tables or keep the polymorphic pattern, the query traffic reveals which indexes actually matter. Schema decisions and index decisions are separate concerns — and the indexes should reflect what your application actually does, not what the schema suggests it might do.
A brief word about the alternatives that are not alternatives
In the course of researching polymorphic association alternatives, you will encounter several suggestions that I feel obligated to address before you invest time in them.
JSONB with a type field. Storing the associated record as a JSONB document with a type key does not solve the foreign key problem — JSONB values cannot participate in FK constraints. It trades the string column for a JSONB key lookup (slower) and loses the composite index entirely. This is a lateral move at best and a regression at worst.
Multiple nullable FK columns. Adding post_id, photo_id, and video_id columns to the comments table, with a CHECK constraint ensuring exactly one is non-null. This works for small numbers of types but creates increasingly sparse rows as types multiply. A table with 12 nullable FK columns — 11 of which are always NULL — is paying a storage and maintenance cost for columns that contribute nothing. It also forces every query to check which column is populated, replacing commentable_type with COALESCE(post_id, photo_id, video_id, ...).
Materialized views. Creating a materialized view that pre-joins polymorphic associations is tempting but deferred — the view must be refreshed, the refresh locks the table (unless you use CONCURRENTLY, which requires a unique index), and the data is always as stale as the last refresh. For real-time comment displays, this is insufficient. For nightly reports, it may be acceptable.
None of these are terrible ideas. They are simply not better than the patterns described above. The join table pattern gives you FK constraints, JOINs, and global queries without creating sparse rows, JSONB lookups, or stale materialized views. Prefer it.
The uncomfortable question: should you have used polymorphic associations at all?
I am asked this frequently, and my answer is more nuanced than you might expect from the preceding 3,000 words of criticism.
Yes. Probably. At the time, it was likely the right decision.
The polymorphic pattern exists because it solves a real problem — shared behavior across multiple parent types — with minimal schema complexity. One table, two columns, done. The alternatives all require more tables, more joins, more code, more migration steps. When you are building a product and validating an idea, that simplicity has genuine value. The team that ships a working comment system in an afternoon using polymorphic: true is further ahead than the team that spends a week designing a join table architecture for a product that may not survive its first month.
The mistake is not using polymorphic associations. The mistake is not recognizing when they have outgrown their welcome. The pattern that served you well at 10,000 rows becomes a burden at 1 million and a crisis at 10 million. The escape routes exist because growth is the normal case. A successful product outgrows its initial schema. That is not a failure of planning — it is a sign that the product worked.
The best database design is the one that matches your current scale and query patterns, with awareness of where the ceiling is. If polymorphic associations are serving you well today and your table is growing slowly, keep them. Add partial indexes. Use extended statistics. Enjoy the simplicity. But know where the escape routes are, and when you see the signs — orphaned rows accumulating, cross-type queries degrading, eager loading adding round trips — do not wait for a crisis to plan the migration.
A well-run household anticipates its needs. It does not wait for the pipes to burst before calling the plumber.
Frequently asked questions
Terms referenced in this article
While the matter is fresh in your mind, I have written at some length on the broader question of ORM-generated SQL versus hand-written queries — the performance characteristics of each, and why polymorphic eager loading is one of the patterns where the gap is widest. The investigation may inform your next schema decision.