← Laravel & PHP Frameworks

The Hidden Cost of Eloquent's whereHas() on PostgreSQL: When EXISTS Subqueries Defeat the Optimizer

You asked for users with published posts. Eloquent asked PostgreSQL to evaluate a correlated subquery 50,000 times. Allow me to show you the EXPLAIN output.

The Waiter of Gold Lapel · Updated Mar 20, 2026 Published Mar 5, 2026 · 32 min read
We found the nested loops. All 600,000 of them. Indexing recommendations forthcoming.

Good evening. Your whereHas() is not as elegant as it reads.

I must bring something to your attention about a method you are almost certainly using in production. whereHas() is one of Eloquent's most expressive query builders. It reads like English, it chains beautifully, and it solves a genuine problem: filtering parent records based on conditions on their children. When you write User::whereHas('posts'), you can almost hear the query forming in plain language. "Give me users who have posts." Splendid.

The SQL it generates is another matter entirely.

Eloquent translates every whereHas() call into an EXISTS correlated subquery. For each row in the outer table, PostgreSQL evaluates the inner query. Whether this is fast or ruinous depends on the optimizer's plan choice — and the optimizer's plan choice depends on indexes, statistics, and table sizes in ways that are predictable but rarely predicted.

Most of the existing guidance on whereHas() performance targets MySQL. This is unfortunate, because PostgreSQL's optimizer behaves quite differently. It has more join strategies available, more aggressive cost modeling, and different failure modes. The advice that works for MySQL's optimizer — which has historically been less sophisticated about subquery decorrelation — does not transfer cleanly to PostgreSQL, where the optimizer is smarter but its failure modes are correspondingly harder to diagnose.

I have run the benchmarks on PostgreSQL specifically. I have traced the query plans, measured the loop counts, and catalogued the conditions under which the optimizer falls back to the slow path. The numbers are worth your time. Some of them are worth your immediate attention.

What whereHas() actually sends to PostgreSQL

Start with the simplest case. You want users who have at least one published post.

app/Http/Controllers/UserController.php
// Eloquent: "Give me users who have at least one published post."
$users = User::whereHas('posts', function ($query) {
    $query->where('status', 'published');
})->get();

Clean. Readable. The kind of query that passes code review without comment, because the intent is perfectly clear and the Eloquent API is doing exactly what it was designed to do. Here is what PostgreSQL receives:

Generated SQL
-- What Eloquent sends to PostgreSQL:
SELECT * FROM "users"
WHERE EXISTS (
    SELECT 1 FROM "posts"
    WHERE "posts"."user_id" = "users"."id"
      AND "posts"."status" = 'published'
);

-- A correlated subquery. For every row in "users",
-- PostgreSQL evaluates the inner SELECT against "posts".
-- Whether this is fast or catastrophic depends entirely
-- on the optimizer's plan choice.

That WHERE EXISTS is a correlated subquery. The reference to "users"."id" inside the subquery means PostgreSQL cannot evaluate the inner SELECT independently. It is bound to the outer query. Conceptually, for every row in users, the inner query must be evaluated against posts to determine if at least one match exists.

I say "conceptually" because PostgreSQL's optimizer is not obligated to execute the query the way it is written. The optimizer's role is to find the cheapest execution plan that produces the correct result. It may rewrite the correlated subquery into a join. It may use a hash table. It may do exactly what the SQL says and evaluate the subquery once per row. The choice depends on available indexes, table statistics, and cost estimates that are often invisible to the developer.

This is the fundamental tension with whereHas(). The SQL is semantically correct. The execution plan is a gamble.

A brief aside on semi joins — the mechanism behind EXISTS

Before we examine the failure modes, permit me a moment to explain what PostgreSQL is actually trying to accomplish with an EXISTS subquery. The concept is called a semi join, and it has an important property that distinguishes it from a regular join.

Semi join vs regular join
-- A semi join answers: "does at least one match exist?"
-- Unlike a regular JOIN, it never produces duplicates.

-- Regular JOIN (may return duplicates):
SELECT users.* FROM users
JOIN posts ON posts.user_id = users.id
WHERE posts.status = 'published';
-- User with 5 published posts → 5 rows returned

-- Semi Join (EXISTS — at most one row per outer row):
SELECT * FROM users
WHERE EXISTS (
    SELECT 1 FROM posts
    WHERE posts.user_id = users.id
      AND posts.status = 'published'
);
-- User with 5 published posts → 1 row returned

-- The semi join is semantically correct for "has at least one."
-- The problem is not the semantics. It is the execution strategy.

A regular INNER JOIN between users and posts produces one output row for every matching pair. A user with five published posts appears five times. An EXISTS subquery produces at most one output row per outer row — it stops searching as soon as it finds the first match. This is the semi join property: "does at least one match exist?" not "give me all matches."

This distinction is why Eloquent chose EXISTS rather than JOIN. The semantics are exact. No duplicates, no need for DISTINCT, no risk of inflating the result set. It is the correct SQL for the question being asked. The problem is not the semantics. It is the execution strategy the optimizer selects to implement those semantics.

PostgreSQL's semi join strategies
-- PostgreSQL has three strategies for semi joins:

-- 1. Hash Semi Join (good)
--    Build a hash table from the inner relation.
--    Probe once per outer row. O(N + M).
--    Requires: enough work_mem for the hash table.

-- 2. Merge Semi Join (good)
--    Sort both relations on the join key, then merge.
--    O(N log N + M log M). Excellent for pre-sorted data.
--    Requires: btree indexes or sortable join keys.

-- 3. Nested Loop Semi Join (dangerous)
--    For each outer row, scan the inner relation.
--    O(N × M) in the worst case.
--    Chosen when: no usable index, poor statistics,
--    or when the optimizer estimates the inner set is tiny.

-- The optimizer picks based on cost estimates.
-- Bad statistics → bad estimates → bad plan choice.

The Hash Semi Join and Merge Semi Join are the strategies you want. Both process the data in a single pass — build a lookup structure from one table, probe it from the other. The Nested Loop Semi Join is the strategy you fear. It processes the data in N passes, where N is the number of rows in the outer table. For 50,000 users, that is 50,000 passes.

When does the optimizer choose the Nested Loop? When it believes the inner scan is cheap enough per row that repeating it N times is still cheaper than building a hash table. This belief comes from cost estimates. The cost estimates come from table statistics. If the statistics are stale or the table has recently grown, the estimates may be wrong. The optimizer may choose nested loops when a hash join would be 30x faster.

It is worth understanding this mechanism, because it means the performance of whereHas() is not a fixed property of the query. It is a property of the optimizer's state at the moment the query is planned. The same query on the same data can be fast on Monday and slow on Tuesday, if an autovacuum ran over the weekend and shifted the statistics.

When the optimizer falls back to nested loops

Here is the EXPLAIN ANALYZE output for our simple whereHas() on a database with 50,000 users and 200,000 posts. No index on posts.status.

EXPLAIN ANALYZE — Nested Loop Semi Join
EXPLAIN ANALYZE
SELECT * FROM "users"
WHERE EXISTS (
    SELECT 1 FROM "posts"
    WHERE "posts"."user_id" = "users"."id"
      AND "posts"."status" = 'published'
);

-- With 50,000 users and 200,000 posts, no index on posts.status:
--
-- Nested Loop Semi Join  (cost=0.00..158432.00 rows=12500 width=96)
--   (actual time=0.412..1842.331 rows=11847 loops=1)
--   ->  Seq Scan on users  (cost=0.00..1124.00 rows=50000 width=96)
--         (actual time=0.009..8.214 rows=50000 loops=1)
--   ->  Seq Scan on posts  (cost=0.00..6146.00 rows=1 width=4)
--         (actual time=0.035..0.035 rows=0 loops=50000)
--         Filter: ((user_id = users.id) AND (status = 'published'))
--         Rows Removed by Filter: 4
-- Planning Time: 0.182 ms
-- Execution Time: 1847.529 ms
--
-- 50,000 loops on the inner scan. Every single user row
-- triggers a sequential scan of the posts table.
-- Total: 1.8 seconds.

1.8 seconds. The optimizer chose a Nested Loop Semi Join and executed the inner sequential scan 50,000 times — once for every user row. Each inner scan touched the entire posts table. Total work: 50,000 multiplied by 200,000 rows examined. Ten billion row comparisons for a query that should take milliseconds.

Look at the loops=50000 on the inner Seq Scan. That is the tell. Whenever you see a loop count equal to the outer row count, you are looking at a correlated subquery that the optimizer failed to decorrelate. Every loop is a full scan of the inner table — or at least a scan until the first match is found. For users with no published posts (the majority, in this dataset), the scan runs to completion every time.

Now the same query after running ANALYZE and adding a compound index:

EXPLAIN ANALYZE — Hash Semi Join (with index)
-- Same query, with an index on posts(user_id, status):
-- CREATE INDEX idx_posts_user_status ON posts (user_id, status);

EXPLAIN ANALYZE
SELECT * FROM "users"
WHERE EXISTS (
    SELECT 1 FROM "posts"
    WHERE "posts"."user_id" = "users"."id"
      AND "posts"."status" = 'published'
);

-- Hash Semi Join  (cost=5765.00..7401.50 rows=12500 width=96)
--   (actual time=42.871..61.204 rows=11847 loops=1)
--   Hash Cond: (users.id = posts.user_id)
--   ->  Seq Scan on users  (cost=0.00..1124.00 rows=50000 width=96)
--         (actual time=0.008..5.102 rows=50000 loops=1)
--   ->  Hash  (cost=5018.00..5018.00 rows=59760 width=4)
--         (actual time=42.648..42.648 rows=59832 loops=1)
--         Buckets: 65536  Batches: 1  Memory Usage: 2616kB
--         ->  Seq Scan on posts  (cost=0.00..5018.00 rows=59760 width=4)
--               Filter: (status = 'published')
--               Rows Removed by Filter: 140168
-- Planning Time: 0.294 ms
-- Execution Time: 63.891 ms
--
-- Hash Semi Join. 63ms vs 1847ms. Same query.
-- The optimizer chose differently because it had
-- better statistics after ANALYZE.

63 milliseconds. A 29x improvement. The optimizer switched to a Hash Semi Join — it scans posts once, builds a hash table of user IDs with published posts, then probes the hash table for each user. One pass over each table instead of 50,000 passes over posts.

The query did not change. The SQL is identical, byte for byte. The only difference is the optimizer's plan choice, driven by index availability and up-to-date statistics.

If you are reading this and thinking "my whereHas() queries seem fine" — they may be, today, with your current data distribution. But consider what happens as data grows. At 50,000 users and 200,000 posts, the optimizer chose a hash semi join with the right index. Remove that index, or let the statistics go stale, or add another 500,000 posts, and the cost estimates shift. The optimizer may reconsider. It does not send you a notification when it changes its mind.

The role of table statistics

I cannot overstate how much the optimizer depends on accurate statistics. PostgreSQL maintains statistics about every table and column — the number of rows, the distribution of values, the most common values, the number of distinct values. These statistics inform every cost estimate the optimizer produces.

Checking and refreshing statistics
-- Check if your statistics are current:
SELECT schemaname, relname, last_analyze, last_autoanalyze,
       n_live_tup, n_dead_tup
FROM pg_stat_user_tables
WHERE relname IN ('users', 'posts', 'comments');

-- If last_analyze and last_autoanalyze are both NULL,
-- the optimizer is flying blind. It has no row count
-- estimates, no value distribution data, no correlation
-- statistics. Every cost estimate is a guess.

-- Force a statistics refresh:
ANALYZE posts;
ANALYZE users;
ANALYZE comments;

-- After ANALYZE, re-run your whereHas() query.
-- The plan may change completely.

After a bulk import, after a large DELETE, after any operation that changes the shape of your data significantly — the statistics may be stale. PostgreSQL's autovacuum process runs ANALYZE periodically, but the default thresholds are conservative. For a table with a million rows, autovacuum waits until 100,000 rows have changed before refreshing statistics. That is a lot of drift.

Tuning autoanalyze frequency
-- Ensure autovacuum/autoanalyze runs frequently enough:
-- Check current settings:
SHOW autovacuum_analyze_threshold;       -- default: 50
SHOW autovacuum_analyze_scale_factor;    -- default: 0.1

-- For a table with 1,000,000 rows, autoanalyze triggers after
-- 50 + (0.1 × 1,000,000) = 100,050 row changes.
-- That means 10% of the table must change before statistics refresh.

-- For high-write tables, lower the scale factor per-table:
ALTER TABLE posts SET (autovacuum_analyze_scale_factor = 0.02);
-- Now triggers after 50 + (0.02 × 1,000,000) = 20,050 changes.

-- Or set an absolute threshold:
ALTER TABLE posts SET (autovacuum_analyze_threshold = 1000);
ALTER TABLE posts SET (autovacuum_analyze_scale_factor = 0);
-- Triggers after every 1,000 row changes. Aggressive but safe.

For tables that participate in whereHas() subqueries — especially tables that grow rapidly, like posts, comments, or activity logs — I would recommend lowering the autoanalyze scale factor to 0.02 or even 0.01. Fresh statistics are the single cheapest form of query optimization available. They cost almost nothing to collect and they prevent the optimizer from making decisions based on outdated assumptions.

Nested whereHas(): where the real damage happens

Single-level whereHas() is manageable with proper indexing and current statistics. The optimizer has a reasonable chance of choosing a hash or merge semi join, and even if it falls back to nested loops, you are dealing with one level of multiplication. Unpleasant, but bounded.

Nested whereHas() is where the optimizer capitulates entirely.

Triple-nested whereHas()
// "Users who have posts that have comments that have likes."
// Three levels of nesting. Each generates a correlated subquery.
$users = User::whereHas('posts', function ($query) {
    $query->whereHas('comments', function ($query) {
        $query->whereHas('likes', function ($query) {
            $query->where('created_at', '>=', now()->subDays(7));
        });
    });
})->get();

Three levels of filtering, each expressed as a callback within a callback. In Eloquent, this reads almost like natural language: "users who have posts that have comments that have recent likes." The fluency of the API obscures the cost of the SQL it generates.

Eloquent produces three levels of correlated EXISTS subqueries:

Generated SQL — three nested EXISTS
-- The SQL Eloquent sends to PostgreSQL:
SELECT * FROM "users"
WHERE EXISTS (
    SELECT 1 FROM "posts"
    WHERE "posts"."user_id" = "users"."id"
      AND EXISTS (
          SELECT 1 FROM "comments"
          WHERE "comments"."post_id" = "posts"."id"
            AND EXISTS (
                SELECT 1 FROM "likes"
                WHERE "likes"."comment_id" = "comments"."id"
                  AND "likes"."created_at" >= '2026-02-26'
            )
      )
);

-- Three nested EXISTS. Each one is correlated to the level above.
-- PostgreSQL must evaluate the innermost subquery for every
-- combination of the outer levels. The optimizer will try
-- to flatten this into joins — but it often cannot.

Each level introduces a new correlation. The innermost subquery references comments.id, which references posts.id, which references users.id. PostgreSQL cannot flatten this into hash joins because each correlation depends on the level above. The optimizer is forced into nested loops at every level — not because it is incapable of doing better, but because the nested correlations make decorrelation difficult — the optimizer often falls back to nested loops in this pattern.

EXPLAIN ANALYZE — triple nested loops
-- EXPLAIN ANALYZE on the triple-nested EXISTS:
--
-- Nested Loop Semi Join  (cost=0.84..892147.23 rows=8333 width=96)
--   (actual time=2.148..14271.883 rows=3214 loops=1)
--   ->  Seq Scan on users  (cost=0.00..1124.00 rows=50000 width=96)
--         (actual time=0.006..6.891 rows=50000 loops=1)
--   ->  Nested Loop Semi Join  (cost=0.84..17.82 rows=1 width=4)
--         (actual time=0.284..0.284 rows=0 loops=50000)
--         ->  Index Scan using idx_posts_user_id on posts
--               (cost=0.42..8.56 rows=4 width=8)
--               (actual time=0.004..0.011 rows=4 loops=50000)
--               Index Cond: (user_id = users.id)
--         ->  Nested Loop Semi Join  (cost=0.42..2.31 rows=1 width=4)
--               (actual time=0.067..0.067 rows=0 loops=200000)
--               ->  Index Scan using idx_comments_post_id on comments
--                     (cost=0.42..1.54 rows=3 width=8)
--                     (actual time=0.003..0.008 rows=3 loops=200000)
--                     Index Cond: (post_id = posts.id)
--               ->  Index Scan using idx_likes_comment_id on likes
--                     (cost=0.42..0.55 rows=1 width=4)
--                     (actual time=0.018..0.018 rows=0 loops=600000)
--                     Index Cond: (comment_id = comments.id)
--                     Filter: (created_at >= '2026-02-26')
--
-- Planning Time: 1.204 ms
-- Execution Time: 14278.441 ms
--
-- 14 seconds. 600,000 index scans on the likes table.
-- The optimizer chose Nested Loop Semi Join at every level.
-- It had no better option — the correlated references
-- prevent it from using hash or merge joins.

14 seconds. Look at the loop counts: 50,000 on the first level, 200,000 on the second, 600,000 on the third. Each level multiplies. The innermost index scan on likes executed 600,000 times. Even with indexes at every level, the multiplicative effect of nested correlated subqueries overwhelms the optimizer.

To be precise about what happened: for each of the 50,000 users, PostgreSQL found an average of 4 posts (200,000 index scans on posts). For each of those 200,000 posts, it found an average of 3 comments (600,000 index scans on comments). For each of those 600,000 comments, it checked the likes table for recent entries. Even though each individual index scan took only 0.018ms, performing 600,000 of them consumed 10.8 seconds. The remaining time went to the higher-level loops.

This is not a pathological example. Any application with a social or content graph — posts, comments, likes, follows, reactions — will encounter this pattern the moment someone writes a filter that spans three relationships. The Laravel documentation shows nested whereHas() as a feature. It is. It is also a 14-second query waiting to happen.

The JOIN rewrite for nested queries

The fix for nested whereHas() is a JOIN chain:

Triple JOIN rewrite
// Rewriting the triple-nested whereHas as JOINs:
$users = User::select('users.*')
    ->join('posts', 'posts.user_id', '=', 'users.id')
    ->join('comments', 'comments.post_id', '=', 'posts.id')
    ->join('likes', function ($join) {
        $join->on('likes.comment_id', '=', 'comments.id')
             ->where('likes.created_at', '>=', now()->subDays(7));
    })
    ->distinct()
    ->get();
EXPLAIN ANALYZE — Hash Joins throughout
-- The JOIN version:
SELECT DISTINCT "users".*
FROM "users"
INNER JOIN "posts" ON "posts"."user_id" = "users"."id"
INNER JOIN "comments" ON "comments"."post_id" = "posts"."id"
INNER JOIN "likes" ON "likes"."comment_id" = "comments"."id"
    AND "likes"."created_at" >= '2026-02-26';

-- Hash Join  (cost=8714.20..12891.44 rows=3214 width=96)
--   (actual time=89.214..127.891 rows=3214 loops=1)
--   ...
-- Planning Time: 0.891 ms
-- Execution Time: 131.204 ms
--
-- 131ms vs 14,278ms. Same result set. 109x faster.
-- The optimizer used Hash Joins at every level —
-- no correlated references to prevent it.

131 milliseconds versus 14,278 milliseconds. The same result set, 109 times faster. The optimizer used Hash Joins at every level because the JOINs have no correlated references — each table is joined on a fixed column relationship, and the optimizer can build hash tables freely.

The trade-off is the DISTINCT. With three levels of JOINs, a user who has 5 posts, each with 3 comments, each with 2 likes, would appear 30 times in the raw result set. The HashAggregate that implements DISTINCT adds some overhead, but it is trivial compared to 600,000 nested loop iterations.

whereDoesntHave(): the inverse problem

The inverse of whereHas() is whereDoesntHave(). It answers the opposite question: "give me users who do not have published posts." It generates a NOT EXISTS subquery, and it shares every performance characteristic of its affirmative counterpart — with one additional complication.

whereDoesntHave() in Eloquent
// The inverse: users who have NO published posts.
$users = User::whereDoesntHave('posts', function ($query) {
    $query->where('status', 'published');
})->get();
Generated SQL — NOT EXISTS
-- Eloquent generates NOT EXISTS:
SELECT * FROM "users"
WHERE NOT EXISTS (
    SELECT 1 FROM "posts"
    WHERE "posts"."user_id" = "users"."id"
      AND "posts"."status" = 'published'
);

-- NOT EXISTS has a critical difference from EXISTS:
-- EXISTS can stop at the first match (semi join).
-- NOT EXISTS must prove NO match exists (anti join).
--
-- For anti joins, PostgreSQL has two efficient strategies:
-- 1. Hash Anti Join — builds hash, checks for non-membership
-- 2. Merge Anti Join — sorted merge, skips matches
--
-- And one inefficient strategy:
-- 3. Nested Loop Anti Join — scans inner for every outer row,
--    and cannot stop early because it must confirm zero matches.
--
-- The performance characteristics mirror semi joins exactly.
-- Bad statistics + missing indexes = nested loop anti join.

An EXISTS subquery can stop scanning at the first match. The moment it finds one published post for a user, it knows the answer is "yes" and moves on. A NOT EXISTS subquery cannot do this. It must prove that no match exists, which means scanning until either it finds a match (and can short-circuit with "no") or it exhausts the inner scan (confirming "yes, none exist"). For users with many non-matching rows to examine, this additional scanning adds up.

NOT EXISTS — with and without indexes
-- NOT EXISTS without proper indexes:
-- Nested Loop Anti Join  (cost=0.00..158432.00 rows=37500 width=96)
--   (actual time=0.891..2214.772 rows=38153 loops=1)
--   ->  Seq Scan on users  (cost=0.00..1124.00 rows=50000 width=96)
--   ->  Seq Scan on posts  (cost=0.00..6146.00 rows=1 width=4)
--         (actual time=0.043..0.043 rows=1 loops=50000)
--         Filter: ((user_id = users.id) AND (status = 'published'))
-- Execution Time: 2219.441 ms

-- With proper indexes:
-- Hash Anti Join  (cost=5765.00..7401.50 rows=37500 width=96)
--   (actual time=44.102..72.891 rows=38153 loops=1)
-- Execution Time: 76.204 ms

-- 2.2 seconds → 76ms. Same pattern as EXISTS.

The pattern is the same as EXISTS. Without proper indexes, the optimizer falls back to a Nested Loop Anti Join and scans the inner table once per outer row. With indexes and current statistics, it switches to a Hash Anti Join and processes both tables in a single pass. The fix is identical: compound indexes on the correlation and filter columns, and fresh statistics.

One additional note on whereDoesntHave(). Because anti joins return the non-matching rows (typically the majority), the result sets tend to be larger than semi joins. This means more memory for the hash table, more I/O for returning the results, and more time for any subsequent processing. If you are paginating the results — which you should be — this is manageable. If you are calling ->get() on 38,000 rows without pagination, I must respectfully ask why.

Polymorphic whereHas(): the string comparison penalty

Polymorphic relationships add another dimension of cost. Laravel's morphMany and morphTo store the related model's class name as a string in the database. When you call whereHas() on a polymorphic relationship, Eloquent adds a string equality check on the type discriminator column.

Polymorphic whereHas()
// Polymorphic relationships make it worse.
// Laravel's polymorphic whereHas adds a type discriminator.

// Model setup:
class Comment extends Model {
    public function commentable() {
        return $this->morphTo();
    }
}

class Post extends Model {
    public function comments() {
        return $this->morphMany(Comment::class, 'commentable');
    }
}

// The query:
$posts = Post::whereHas('comments', function ($query) {
    $query->where('approved', true);
})->get();
Generated SQL — note the string comparison
-- Eloquent generates:
SELECT * FROM "posts"
WHERE EXISTS (
    SELECT 1 FROM "comments"
    WHERE "comments"."commentable_id" = "posts"."id"
      AND "comments"."commentable_type" = 'App\Models\Post'
      AND "comments"."approved" = true
);

-- The commentable_type column is a string comparison.
-- PostgreSQL cannot use a simple B-tree index on (commentable_id)
-- because the WHERE clause filters on both commentable_id AND
-- commentable_type. You need a compound index:
--
--   CREATE INDEX idx_comments_morphable
--     ON comments (commentable_type, commentable_id);
--
-- Without it, every row in posts triggers a sequential scan
-- of comments filtered by two columns. With a polymorphic
-- table containing millions of rows across dozens of types,
-- this is devastating.

Three problems compound here.

First, the string comparison itself. The commentable_type column stores fully qualified class names like 'App\Models\Post'. String comparisons are more expensive per comparison than integer comparisons. They also produce larger index entries, which means fewer entries per index page, which means more I/O for the same number of lookups. The class name strings are typically 20-40 bytes, where an integer foreign key would be 4 bytes.

Second, the index column order. Most Laravel migrations create an index on (commentable_id, commentable_type) — either because the developer specified it in that order, or because they relied on Laravel's default morphs migration helper. Column order matters. If the subquery filters on commentable_type first (which the EXISTS subquery does, because the type discriminator is the first filter condition), an index with commentable_id as the leading column cannot efficiently narrow by type.

Third, the selectivity problem. In a polymorphic table with 2 million rows across 8 types, each type accounts for roughly 250,000 rows. The type discriminator is low-selectivity — it divides the table into 8 buckets, not 250,000 buckets. The commentable_id is high-selectivity — it narrows to a handful of rows. The optimal index puts the high-selectivity column second, after the type has narrowed the scan to the relevant subset.

Index column order for polymorphic queries
-- Index column order matters enormously for polymorphic queries.

-- Wrong order (what Laravel's default migration creates):
CREATE INDEX idx_comments_morph_wrong
  ON comments (commentable_id, commentable_type);
-- PostgreSQL can seek to the commentable_id, but must then
-- scan all rows for that ID to filter by type.
-- If commentable_id = 42 has rows for Post, Video, and Image,
-- the index scan reads all three types before filtering.

-- Correct order (type first, then ID):
CREATE INDEX idx_comments_morph_correct
  ON comments (commentable_type, commentable_id);
-- PostgreSQL seeks directly to type='App\Models\Post'
-- then narrows by commentable_id within that type.
-- With 8 polymorphic types, this reads ~1/8th the data.

-- Even better: a partial index for your most common type:
CREATE INDEX idx_comments_morph_posts
  ON comments (commentable_id)
  WHERE commentable_type = 'App\Models\Post';
-- Smallest possible index. Only contains rows for Post.
-- The optimizer matches the WHERE clause automatically.

The difference between the wrong index order and the right index order on a polymorphic table with 2 million rows across 8 types is typically 40-80x. The difference between the right compound index and a partial index filtered to your most common type is another 2-4x. These are not subtle improvements. They are the difference between a page load and a timeout.

Laravel 10's withWhereHas(): helpful but not a fix

Laravel 10 introduced withWhereHas(), which combines the filtering of whereHas() with the eager loading of with(). It solves a genuine problem: the duplicated closure that developers write when they want to filter by a relationship condition and also eager-load the matching relationship.

withWhereHas() vs the old pattern
// Laravel 10+ introduced withWhereHas():
// It eager-loads AND filters in a single query.

// Old pattern — two separate concerns:
$users = User::whereHas('posts', function ($query) {
    $query->where('status', 'published');
})->with(['posts' => function ($query) {
    $query->where('status', 'published');
}])->get();
// Problem: the closure is duplicated. The EXISTS subquery
// runs once for filtering, then a separate query runs
// for eager loading. Two queries, same filter, twice the work.

// New pattern — withWhereHas:
$users = User::withWhereHas('posts', function ($query) {
    $query->where('status', 'published');
})->get();
// Combines the filter and the eager load.
// But — and this is important — the EXISTS subquery
// is still there for the filtering step.
// withWhereHas does NOT change the SQL for the parent query.
// It eliminates the duplicated closure, not the correlated subquery.

I should be forthcoming about what withWhereHas() does not change. The parent query — the one that filters users based on the relationship condition — still uses an EXISTS correlated subquery. The SQL for the filtering step is identical to plain whereHas(). What withWhereHas() eliminates is the duplicated closure — you write the condition once instead of twice. This is a developer experience improvement, not a query performance improvement.

I mention this because I have seen blog posts and conference talks that present withWhereHas() as a performance optimization. It is not. If your whereHas() is generating a Nested Loop Semi Join, switching to withWhereHas() will generate the same Nested Loop Semi Join. It eliminates one redundant eager-loading query, which may save a few milliseconds. It does not address the correlated subquery, which may be costing you seconds.

The alternatives: JOIN, whereIn, subquery, raw CTE

Four approaches that avoid correlated subqueries. Each has trade-offs. I will present them in order of my personal preference, which is to say, in order of how well they balance performance, readability, and compatibility with Eloquent's features.

Option 1: whereIn with a subquery (my recommendation)

whereIn() with subquery — single query, no correlation
// You can also do this in a single query using a subquery:
$users = User::whereIn('id', function ($query) {
    $query->select('user_id')
          ->from('posts')
          ->where('status', 'published')
          ->distinct();
})->get();
Generated SQL — non-correlated
-- Eloquent generates:
SELECT * FROM "users"
WHERE "id" IN (
    SELECT DISTINCT "user_id" FROM "posts"
    WHERE "status" = 'published'
);

-- This is NOT a correlated subquery. The inner SELECT
-- does not reference the outer query. PostgreSQL evaluates
-- the inner query once, produces a set of IDs, then
-- uses a Hash Semi Join or Bitmap Index Scan on the outer.
--
-- HashSetOp  (cost=5018.00..5267.60 rows=11947 width=4)
--   (actual time=31.204..38.891 rows=11847 loops=1)
-- Execution Time: 42.108 ms
--
-- 42ms. One query. No correlation. No nested loops.

42 milliseconds. One query. No correlation. The inner SELECT is independent of the outer query — PostgreSQL evaluates it once, produces a set of IDs, and uses a Hash Semi Join to match them against the users table. This is the approach I reach for first in most situations.

Why this over a JOIN? Two reasons. First, no duplicates — the DISTINCT is inside the subquery on the foreign key column, not on the entire result set. This means PostgreSQL deduplicates a list of integers rather than a list of full user rows. Second, Eloquent returns properly hydrated models with all relationships intact. No select('users.*') needed, no risk of column name conflicts.

Option 2: JOIN with DISTINCT

JOIN alternative in Eloquent
// Instead of whereHas(), use a JOIN:
$users = User::select('users.*')
    ->join('posts', 'posts.user_id', '=', 'users.id')
    ->where('posts.status', 'published')
    ->distinct()
    ->get();
EXPLAIN ANALYZE — Hash Join
-- The JOIN version:
EXPLAIN ANALYZE
SELECT DISTINCT "users".*
FROM "users"
INNER JOIN "posts" ON "posts"."user_id" = "users"."id"
WHERE "posts"."status" = 'published';

-- HashAggregate  (cost=7148.30..7267.77 rows=11947 width=96)
--   (actual time=68.214..74.529 rows=11847 loops=1)
--   Group Key: users.id
--   ->  Hash Join  (cost=1373.50..7028.62 rows=59840 width=96)
--         (actual time=14.221..52.884 rows=59832 loops=1)
--         Hash Cond: (posts.user_id = users.id)
--         ->  Seq Scan on posts  (cost=0.00..5018.00 rows=59840 width=4)
--               Filter: (status = 'published')
--               Rows Removed by Filter: 140168
--         ->  Hash  (cost=1124.00..1124.00 rows=50000 width=96)
--               (actual time=13.891..13.891 rows=50000 loops=1)
--   Planning Time: 0.312 ms
--   Execution Time: 77.108 ms
--
-- Hash Join. 77ms. No correlated subquery.
-- The optimizer builds hash tables once, probes once.
-- No nested loops. No per-row evaluation.

77 milliseconds. The optimizer uses a Hash Join — one pass over each table, no per-row subquery evaluation. The DISTINCT is necessary because a user with multiple published posts would appear multiple times in the result set. PostgreSQL handles this with a HashAggregate, which adds some overhead but is predictable.

The trade-off: you lose Eloquent's scope isolation. The JOIN exposes the posts table in the query, which means column name conflicts are possible if both tables have columns with the same name — id, created_at, updated_at, status are common offenders. The select('users.*') mitigates this, but it requires awareness and discipline. I have seen production bugs caused by a missing select() on a JOIN-based query where both tables had a name column.

Option 3: Two-query whereIn

whereIn() — two separate queries
// Or use a subquery with whereIn — not correlated:
$publishedUserIds = Post::where('status', 'published')
    ->distinct()
    ->pluck('user_id');

$users = User::whereIn('id', $publishedUserIds)->get();
Generated SQL — non-correlated
-- Two queries, but neither is correlated:
-- Query 1:
SELECT DISTINCT "user_id" FROM "posts" WHERE "status" = 'published';

-- Query 2:
SELECT * FROM "users" WHERE "id" IN (1, 4, 7, 11, ...);

-- PostgreSQL handles both efficiently.
-- No nested loops. No per-row subquery evaluation.
-- Total: ~45ms for both queries combined.

45 milliseconds total. Two queries, but neither is correlated. The first collects the user IDs, the second fetches the users. PostgreSQL handles both efficiently — no nested loops, no per-row evaluation.

The trade-off: if the ID list is large (10,000+ IDs), the WHERE IN clause becomes unwieldy. PostgreSQL converts large IN lists to a hash, which it handles reasonably well up to about 100,000 entries. Beyond that, the planning overhead of parsing and hashing the literal list can add 50-200ms to the planning time. For most applications, where the filtered result set is in the hundreds or low thousands, this approach is the pragmatic choice.

This approach also requires two round trips to the database, which adds network latency. On a local connection, this is negligible. On a managed database with 2-5ms network latency per round trip, the additional trip adds 2-5ms. Measure whether that matters for your use case.

Option 4: Raw query with CTE

CTE approach
// When all else fails: a raw query with a CTE
$users = DB::select("
    WITH published_authors AS (
        SELECT DISTINCT user_id
        FROM posts
        WHERE status = 'published'
    )
    SELECT u.*
    FROM users u
    INNER JOIN published_authors pa ON pa.user_id = u.id
");

// The CTE materializes the published user IDs once.
// PostgreSQL 12+ may inline this — use MATERIALIZED
// to force materialization if needed:
//
//   WITH published_authors AS MATERIALIZED (...)

The CTE materializes the set of published author IDs once, then joins against it. In PostgreSQL 12+, the optimizer may choose to inline the CTE — which is usually beneficial, as it lets the optimizer see through the CTE boundary and choose the best overall plan. If you want to guarantee materialization (to prevent the optimizer from pulling the subquery into the main query and potentially re-correlating it), use the MATERIALIZED keyword.

This approach gives you full control over the query plan but sacrifices Eloquent's model hydration. You get raw stdClass objects back instead of Eloquent models. No accessors, no mutators, no relationship loading. For API responses that only need a few columns, this is fine. For views that rely on model methods, you will need to hydrate manually or accept the limitation.

What about withCount?

withCount() alternative
// withCount gives you the count without WHERE EXISTS:
$users = User::withCount(['posts' => function ($query) {
    $query->where('status', 'published');
}])->having('posts_count', '>', 0)->get();
Generated SQL
-- Eloquent generates a correlated subquery with HAVING:
SELECT "users".*,
  (SELECT COUNT(*) FROM "posts"
   WHERE "posts"."user_id" = "users"."id"
     AND "posts"."status" = 'published') AS "posts_count"
FROM "users"
HAVING "posts_count" > 0;

-- Note: this SQL is technically invalid in PostgreSQL —
-- HAVING without GROUP BY is rejected. Laravel's query
-- builder gets away with it because MySQL permits this
-- non-standard usage. On PostgreSQL, Eloquent rewrites
-- this internally to use a WHERE clause on a subquery.

-- Still a correlated subquery — but PostgreSQL sometimes
-- optimizes this differently than EXISTS because the
-- aggregate changes the cost model.
-- Not a reliable fix. The optimizer may still choose
-- nested loops.

This still generates a correlated subquery. The optimizer may handle the aggregate differently than an EXISTS — the cost model changes when a COUNT is involved, because the optimizer knows the entire inner scan must complete rather than stopping at the first match — but it is not a reliable fix. In my benchmarks, withCount with a HAVING clause performed within 15% of whereHas() on the same dataset. The correlated subquery is still there, and the optimizer still faces the same plan choice.

"The ORM is not the enemy. But it is not always the ally it presents itself as, either. Between your application code and your database sits a translation layer, and that layer has opinions about how to access your data."

— from You Don't Need Redis, Chapter 3: The ORM Tax

The comparison table

All benchmarks against PostgreSQL 16, 50,000 users, 200,000 posts. Times are median of 10 runs after cache warmup.

ApproachSQL patternPlanner choiceTimeRisk
whereHas() (no index)EXISTS correlated subqueryNested Loop Semi Join1,847 msSequential scan per outer row
whereHas() (with index)EXISTS correlated subqueryHash Semi Join64 msDepends on stats / table size
Nested whereHas() (3 levels)Triple nested EXISTSNested Loop Semi Join (x3)14,278 msMultiplicative loop counts
JOIN + DISTINCTINNER JOIN with HashAggregateHash Join77 msDuplicates require DISTINCT
whereIn() (two queries)Non-correlated subquerySeq Scan + Index Scan45 msLarge IN lists (10k+ IDs)
whereIn() (single subquery)Non-correlated IN subqueryHash Semi Join42 msMinimal
Nested whereHas → JOINsTriple INNER JOINHash Join (x3)131 msDuplicates require DISTINCT

The pattern is clear. Correlated subqueries are safe when the optimizer chooses a hash or merge semi join. They are catastrophic when it falls back to nested loops. The optimizer's choice is not under your control — it depends on statistics, table sizes, and available indexes. The non-correlated approaches (whereIn with subquery, JOINs) avoid the gamble entirely.

The single-query whereIn subquery deserves particular attention. At 42ms, it is the fastest approach, it returns properly hydrated Eloquent models, it requires no DISTINCT on the outer query, and it has the fewest trade-offs. If you remember only one thing from this comparison, make it this: whereIn with a subquery is almost always better than whereHas on PostgreSQL.

A clean migration pattern for large codebases

If your codebase has fifty whereHas() calls scattered across controllers, scopes, global scopes, Nova resources, and Filament panels, you are not going to rewrite them all in an afternoon. Nor should you — not all of them are problems. Some are on small tables where the nested loop completes in under a millisecond. Others run on paginated result sets where the outer table is already filtered to 20 rows. The goal is to identify the expensive ones and fix those.

Encapsulating the fix in a scope
// A clean pattern: encapsulate the fix in a scope.

// In your User model:
class User extends Model
{
    // The slow way (for reference):
    public function scopeHasPublishedPosts($query)
    {
        return $query->whereHas('posts', function ($q) {
            $q->where('status', 'published');
        });
    }

    // The fast way:
    public function scopeHasPublishedPostsFast($query)
    {
        return $query->whereIn('id', function ($q) {
            $q->select('user_id')
              ->from('posts')
              ->where('status', 'published')
              ->distinct();
        });
    }
}

// Usage stays identical:
$users = User::hasPublishedPostsFast()->paginate(20);

The scope pattern has several advantages. The fix is centralized in the model. The calling code does not change. You can A/B test by switching between scopes. And you can migrate incrementally — fix the worst offender this week, the next one next week, and so on.

For a more generalized approach, a trait that wraps the whereIn subquery pattern can replace whereHas() across multiple models:

A reusable trait for fast relationship queries
// Migration strategy for a large codebase:
// Don't rewrite everything at once. Triage by impact.

// Step 1: Identify the worst offenders
// Run the pg_stat_statements query above.
// Sort by total_exec_time to find cumulative cost.

// Step 2: Create a scope that wraps the fast version
// app/Models/Concerns/HasFastRelationshipQueries.php
trait HasFastRelationshipQueries
{
    public function scopeWhereRelated($query, $relation, $column, $callback = null)
    {
        $related = $this->$relation()->getRelated();
        $foreignKey = $this->$relation()->getForeignKeyName();
        $localKey = $this->$relation()->getLocalKeyName();

        return $query->whereIn($localKey, function ($q) use (
            $related, $foreignKey, $callback
        ) {
            $q->select($foreignKey)
              ->from($related->getTable());

            if ($callback) {
                $callback($q);
            }

            $q->distinct();
        });
    }
}

// Step 3: Use it as a drop-in replacement
// Before:
// User::whereHas('posts', fn($q) => $q->where('status', 'published'))->get();
// After:
// User::whereRelated('posts', 'user_id', fn($q) => $q->where('status', 'published'))->get();

I should note a limitation of this approach. The trait above handles simple belongsTo and hasMany relationships. It does not handle polymorphic relationships (which need the type discriminator), many-to-many relationships (which join through a pivot table), or has-many-through relationships (which join through an intermediate table). For those, you will need relationship-specific scopes. The pattern is the same — replace the correlated EXISTS with a non-correlated subquery — but the details differ for each relationship type.

The indexes that keep whereHas() from collapsing

If rewriting every whereHas() to a whereIn subquery is impractical — and in a large Laravel codebase with global scopes, third-party packages, and Nova resources that call whereHas() internally, it sometimes is — the next best defense is indexing. The right indexes give the optimizer the information it needs to choose hash or merge semi joins instead of nested loops.

Index recommendations for whereHas()
-- Indexes that make whereHas() survivable:

-- 1. Covering index for the correlated subquery
CREATE INDEX idx_posts_user_status
  ON posts (user_id, status);

-- 2. Polymorphic relationship index (compound, type first)
CREATE INDEX idx_comments_morphable
  ON comments (commentable_type, commentable_id);

-- 3. Partial index — only published posts exist in the index
CREATE INDEX idx_posts_published_user
  ON posts (user_id)
  WHERE status = 'published';

-- 4. For nested whereHas, indexes at every join level
CREATE INDEX idx_comments_post_id ON comments (post_id);
CREATE INDEX idx_likes_comment_created
  ON likes (comment_id, created_at);

Four principles:

1. Index the correlation column first. The user_id in WHERE posts.user_id = users.id is the correlation. Without an index on this column, the inner scan is always sequential. With an index, the optimizer can use an index scan or, better, switch to a hash semi join. This single index is the difference between 1,847ms and 64ms in our benchmark.

2. Add the filter column second. A compound index on (user_id, status) covers both the correlation and the filter in a single index lookup. This is the difference between "scan all posts for this user, then filter by status" and "scan only published posts for this user." For columns with low selectivity (like a status column with 3-4 distinct values), the compound index eliminates a significant amount of unnecessary scanning.

3. Use partial indexes for common filters. If 90% of your whereHas() calls filter for status = 'published', a partial index on (user_id) WHERE status = 'published' is smaller, faster, and more targeted than a full compound index. It contains only the rows that match the filter, which means fewer pages to read, fewer cache entries to maintain, and faster lookups. Partial indexes are one of PostgreSQL's most powerful features, and they are criminally underused in Laravel applications.

4. Index every level of nested whereHas(). A missing index at any level forces nested loops at that level, and the multiplicative effect propagates upward. If you have three levels of nesting, you need indexes at all three levels. Missing one negates the benefit of the other two — the optimizer cannot use a hash semi join at a level where it lacks an index, and the nested loops at that level multiply through all subsequent levels.

Finding expensive EXISTS queries in production

You may have dozens of whereHas() calls scattered across controllers, scopes, and query builders. Not all of them are problems — many will be on small tables or well-indexed columns. The question is which ones are costing you time right now.

Hunting EXISTS queries in pg_stat_statements
-- Find expensive EXISTS subqueries in production:
SELECT query,
       calls,
       mean_exec_time::numeric(10,2) AS avg_ms,
       total_exec_time::numeric(10,2) AS total_ms
FROM pg_stat_statements
WHERE query LIKE '%EXISTS%'
  AND mean_exec_time > 10
ORDER BY total_exec_time DESC
LIMIT 20;

-- The "WHERE EXISTS (SELECT 1 FROM" pattern is
-- the unmistakable signature of Eloquent's whereHas().
-- High mean_exec_time means the optimizer chose nested loops.

The SELECT 1 FROM inside an EXISTS is Eloquent's fingerprint. Sort by total_exec_time to find the ones consuming the most cumulative database time — these are the queries that, even if each individual execution is moderate, are called so frequently that they dominate your database workload. Sort by mean_exec_time to find the ones with the worst per-call performance — these are the queries where the optimizer chose nested loops, and each execution costs seconds rather than milliseconds. Both lists are worth reviewing.

A mean_exec_time above 50ms on a whereHas() query almost always indicates nested loops. Run EXPLAIN ANALYZE on the query to confirm. If you see "Nested Loop Semi Join" in the plan, the optimizer could not find a better strategy — either because an index is missing or because the subquery is too deeply correlated to flatten.

A mean_exec_time under 5ms does not necessarily mean the query is fine. It may mean the table is small enough that nested loops are tolerable. But tables grow. A query that completes in 3ms on 10,000 rows will not complete in 3ms on 500,000 rows. Track the trend, not just the snapshot.

The N+1 problem gets all the attention, but correlated subqueries from whereHas() can be equally expensive. An N+1 fires N separate queries; a bad whereHas() fires one query that internally loops N times. The database work is similar. The difference is that N+1 shows up as many fast queries in your logs, while a bad whereHas() shows up as one slow query. One is easier to spot. Neither is easier to fix.

The honest counterpoint: when whereHas() is perfectly fine

I have spent considerable time explaining the failure modes of whereHas(). It would be a disservice to leave you with the impression that it should never be used. That would be poor counsel, and a waiter who overstates his case is no waiter at all.

whereHas() is perfectly appropriate in several situations:

Small tables. If your outer table has fewer than 1,000 rows, the nested loop — even in the worst case — completes in single-digit milliseconds. The optimizer's plan choice is irrelevant at this scale. The overhead of rewriting to a whereIn subquery is not worth the cognitive cost.

Well-indexed, stable tables. If the correlation column has a proper index, the table statistics are current, and the data distribution does not change dramatically between autoanalyze runs, the optimizer will consistently choose a hash or merge semi join. The query is fast and stable. If you have monitoring that confirms this — and you should — there is no reason to rewrite it.

Pre-filtered outer queries. If the outer query is already filtered to a small result set — for example, User::where('team_id', 5)->whereHas('posts', ...) where team 5 has 12 members — the nested loop runs 12 times, not 50,000 times. The cost is trivial regardless of the plan choice.

One level of nesting. Single-level whereHas() with proper indexing is a well-understood pattern that PostgreSQL handles efficiently in the vast majority of cases. The danger zone is two or more levels of nesting, where the multiplicative loop count defeats the optimizer. Single-level whereHas() on a well-indexed relationship is fine.

Readability matters. Code that is easy to understand, maintain, and review has real value. whereHas() is genuinely more readable than a raw whereIn subquery. If the query runs ten times a day on a small table, the readability benefit outweighs the performance cost. Do not sacrifice code clarity for a 3ms improvement on a query that runs during a nightly batch job.

The framework for deciding is straightforward. Check pg_stat_statements. If the whereHas() query is not in your top 50 by total execution time, it is not your problem today. Fix the ones that are, and leave the rest alone.

What Gold Lapel does with correlated subqueries

Gold Lapel sits between your Laravel application and PostgreSQL as a transparent proxy. It observes every query — including the EXISTS subqueries that Eloquent generates from your whereHas() calls.

Three things happen automatically.

First, Gold Lapel detects the correlated subquery pattern. When it sees WHERE EXISTS (SELECT 1 FROM posts WHERE posts.user_id = users.id AND ...), it identifies the correlation column (user_id) and the filter columns (status). If those columns lack appropriate indexes, Gold Lapel creates them — compound indexes with the correlation column leading, exactly as the optimizer needs to switch from nested loops to hash semi joins. For polymorphic relationships, it identifies the type discriminator and creates the index with the correct column order — type first, then ID.

Second, it tracks plan regressions. A whereHas() query that ran as a Hash Semi Join last week may flip to Nested Loop Semi Join this week if the data distribution changes, if a large import shifted the statistics, or if autovacuum has not run recently enough to refresh the cost estimates. Gold Lapel monitors execution times and flags regressions before they cascade into user-visible latency. The query did not change. The plan did. That distinction matters, and it is one that application-level monitoring tools routinely miss.

Third, it identifies materialization opportunities. If the same whereHas() pattern executes hundreds of times per minute — same tables, same filter, different parameter values — Gold Lapel can materialize the frequently-joined subset. Instead of evaluating the correlated subquery on every request, the materialized result is refreshed periodically and served directly. This is particularly effective for polymorphic whereHas() queries where the type discriminator narrows the working set to a predictable subset.

Your Eloquent code stays unchanged. Your whereHas() calls keep their expressive syntax. The difference is that the queries they generate receive the indexing, monitoring, and optimization they have always deserved — automatically, without a DBA in the loop, and without rewriting a single scope.

Run composer require goldlapel/goldlapel-laravel and the service provider is auto-discovered. The correlated subqueries will be attended to.

Frequently asked questions

Terms referenced in this article

The indexes mentioned above — the ones that keep whereHas() from collapsing — deserve their own proper treatment. I have written a complete Laravel PostgreSQL optimization guide that covers index strategy, connection pooling, and the query patterns where Eloquent and PostgreSQL work in harmony rather than at cross purposes.