Why Django's ORM Duplicates Your Subqueries: Fixing Slow Annotate Chains with CTEs
Your three .annotate() calls generate three identical table scans. PostgreSQL is too polite to complain, but the execution plan is not.
Good evening. I notice you have been chaining .annotate() calls.
Django's ORM is, by most accounts, a well-mannered piece of software. It composes queries from Python expressions. It handles joins, aggregations, and subqueries with a fluent API that reads almost like English. It has earned its reputation as one of the most productive ORMs in any language, and I do not say that lightly — I have attended to applications written in every major framework, and Django's queryset API remains among the most thoughtfully designed.
It also has a nine-year-old open ticket about generating catastrophically redundant SQL when you chain .annotate() calls with Subquery expressions.
The Django project's own ticket tracker documents the problem plainly. Ticket #28919, opened in September 2017, describes a straightforward problem: when you annotate a queryset with multiple subqueries that reference the same table, Django emits each subquery independently. No sharing. No common table expressions. No factoring out repeated work. The same table gets scanned once per annotation, even when the subqueries could trivially share a single pass.
On small datasets, this is imperceptible. Your local development database with 200 tickets will never reveal the problem. On production datasets with millions of rows, it is the difference between a 400ms page load and a 5-second one. I have seen dashboard pages that load in under a second on staging and take eleven seconds in production, and the root cause is always the same: the ORM faithfully translated three Python expressions into three independent subqueries, and PostgreSQL faithfully executed all three.
I would like to show you exactly what happens, why it happens, and four ways to fix it — each appropriate for a different situation. I would also like to be candid about where each approach falls short, because a recommendation without honest boundaries is not a recommendation. It is a sales pitch.
The innocent queryset that starts the trouble
Allow me to begin with the queryset that looks perfectly reasonable — because it is. The problem is not that you wrote bad code. The problem is that the ORM's code generation has a blind spot that no amount of queryset elegance can work around.
from django.db.models import Count, Subquery, OuterRef, Avg
# A Django view that answers: "For each department, show the number
# of employees, the average salary, and the count of open tickets."
departments = (
Department.objects
.annotate(
employee_count=Count('employees'),
)
.annotate(
avg_salary=Avg('employees__salary'),
)
.annotate(
open_tickets=Count(
Subquery(
Ticket.objects.filter(
department=OuterRef('pk'),
status='open',
).values('pk')
)
),
)
.order_by('-employee_count')
)
# Three .annotate() calls. Readable. Composable. Exactly how
# the Django docs suggest you build complex querysets.
# And exactly where the SQL goes sideways. Three .annotate() calls, each adding a computed field to the queryset. This is exactly how the Django documentation suggests you build complex querysets — composable, readable, each annotation self-contained. The pattern is so idiomatic that it appears in virtually every Django tutorial that covers aggregation.
The SQL Django generates for this particular queryset is actually reasonable. The Count and Avg on the employees relation collapse into a single JOIN with GROUP BY. The Subquery for tickets becomes a correlated subquery in the SELECT list, which runs once per group. For a single subquery annotation, this is an acceptable plan.
-- What Django's ORM actually generates:
SELECT
"departments"."id",
"departments"."name",
COUNT("employees"."id") AS "employee_count",
AVG("employees"."salary") AS "avg_salary",
COUNT(
(SELECT U0."id" FROM "tickets" U0
WHERE U0."department_id" = "departments"."id"
AND U0."status" = 'open')
) AS "open_tickets"
FROM "departments"
LEFT OUTER JOIN "employees"
ON ("departments"."id" = "employees"."department_id")
GROUP BY "departments"."id", "departments"."name"
ORDER BY "employee_count" DESC;
-- Notice: the subquery in COUNT() runs once per group.
-- That is expected. The real problem appears when you chain
-- multiple Subquery annotations that reference the same table. The trouble begins when you add a second subquery annotation. And a third. Each one referencing the same base table.
What the ORM generates when you chain multiple Subquery annotations
Consider a more demanding dashboard query. You have departments, employees, and tickets. You want to show each department with its open ticket count, closed ticket count, and average resolution time. Three metrics. Three subqueries. One table.
# The pattern that triggers duplication.
# Two annotations referencing the same subquery base:
open_tickets_sq = Subquery(
Ticket.objects.filter(
department=OuterRef('pk'),
status='open',
).values('department')
.annotate(cnt=Count('*'))
.values('cnt')
)
closed_tickets_sq = Subquery(
Ticket.objects.filter(
department=OuterRef('pk'),
status='closed',
).values('department')
.annotate(cnt=Count('*'))
.values('cnt')
)
avg_resolution_sq = Subquery(
Ticket.objects.filter(
department=OuterRef('pk'),
status='closed',
).values('department')
.annotate(avg_hours=Avg('resolution_hours'))
.values('avg_hours')
)
departments = (
Department.objects
.annotate(open_tickets=open_tickets_sq)
.annotate(closed_tickets=closed_tickets_sq)
.annotate(avg_resolution=avg_resolution_sq)
.order_by('-open_tickets')
) Each Subquery is a self-contained query that references the outer queryset via OuterRef. Django compiles each one independently into a correlated subquery. The compiler does not look at the other annotations. It does not notice that two of the three subqueries filter on status = 'closed' against the same table. It simply translates each Python expression into SQL and embeds it in the SELECT list.
The resulting SQL:
-- The SQL Django generates. Read it carefully.
SELECT
"departments"."id",
"departments"."name",
(SELECT COUNT(*) FROM "tickets" U0
WHERE U0."department_id" = "departments"."id"
AND U0."status" = 'open'
GROUP BY U0."department_id") AS "open_tickets",
(SELECT COUNT(*) FROM "tickets" U0
WHERE U0."department_id" = "departments"."id"
AND U0."status" = 'closed'
GROUP BY U0."department_id") AS "closed_tickets",
(SELECT AVG(U0."resolution_hours") FROM "tickets" U0
WHERE U0."department_id" = "departments"."id"
AND U0."status" = 'closed'
GROUP BY U0."department_id") AS "avg_resolution"
FROM "departments"
ORDER BY "open_tickets" DESC;
-- Three correlated subqueries. Each one scans the tickets table
-- independently. The closed_tickets and avg_resolution subqueries
-- share the same WHERE clause but are executed separately.
--
-- For 500 departments and 2 million tickets, that is three full
-- scans of the tickets table — per department row. Three correlated subqueries. Each one performs its own scan of the tickets table, filtered by department_id. The second and third subqueries share identical WHERE clauses (status = 'closed') but cannot share any work between them. They scan the same rows, apply the same filter, and produce results that could trivially have been computed in a single pass.
This is not a bug in the usual sense. Django's query compiler treats each .annotate() call as an independent expression to be embedded in the SELECT list. It does not analyze whether two annotations reference the same base table with the same filter conditions. It does not attempt to consolidate them. It simply emits what you asked for, literally.
The SQL is correct. It returns the right results. It is also doing three times the work necessary. And "correct but tripled in cost" is, if you will forgive the observation, a rather expensive form of correctness.
Reading the execution plan: where the time actually goes
Numbers are more convincing than adjectives. Here is the EXPLAIN ANALYZE output for the three-subquery version, run against 500 departments and 2.1 million tickets:
-- EXPLAIN ANALYZE on 500 departments, 2.1M tickets
-- PostgreSQL 16, 8GB shared_buffers
Sort (cost=892741.23..892742.48 rows=500 width=52)
(actual time=4871.223..4871.301 rows=500 loops=1)
Sort Key: (SubPlan 1) DESC
Sort Method: quicksort Memory: 65kB
-> Seq Scan on departments
(cost=0.00..892718.72 rows=500 width=52)
(actual time=9.812..4869.445 rows=500 loops=1)
SubPlan 1
-> GroupAggregate (cost=0.43..594.82 rows=1 width=12)
(actual time=3.214..3.215 rows=1 loops=500)
-> Index Scan using idx_tickets_dept_status on tickets
(actual time=0.041..2.887 rows=4200 loops=500)
Index Cond: (department_id = departments.id)
Filter: (status = 'open')
SubPlan 2
-> GroupAggregate (cost=0.43..594.82 rows=1 width=12)
(actual time=2.944..2.945 rows=1 loops=500)
-> Index Scan using idx_tickets_dept_status on tickets
(actual time=0.038..2.612 rows=3800 loops=500)
Index Cond: (department_id = departments.id)
Filter: (status = 'closed')
SubPlan 3
-> GroupAggregate (cost=0.43..612.19 rows=1 width=12)
(actual time=3.102..3.103 rows=1 loops=500)
-> Index Scan using idx_tickets_dept_status on tickets
(actual time=0.039..2.741 rows=3800 loops=500)
Index Cond: (department_id = departments.id)
Filter: (status = 'closed')
Planning Time: 1.842 ms
Execution Time: 4871.892 ms
-- Three SubPlans. Each executes 500 times (once per department).
-- SubPlan 2 and SubPlan 3 scan the same rows (status = 'closed')
-- but cannot share the work. Total: ~4.9 seconds. Three SubPlans, each executing 500 times. SubPlan 2 and SubPlan 3 scan exactly the same rows — tickets where status = 'closed' for a given department — but PostgreSQL cannot merge them. The planner sees three independent scalar subqueries and executes them independently. It has no mechanism to recognize that two subqueries differ only in their aggregate function and could share a single scan.
I want to draw your attention to the loops=500 annotation on each SubPlan. That number means each subquery executes 500 times — once for each department row. With three subqueries, that is 1,500 individual executions of grouped aggregation against the tickets table. Each execution scans an average of 4,000 rows. The total work: approximately 6 million row reads, of which 4 million are redundant.
Total execution time: 4,872ms.
Now the CTE version, which scans the tickets table once and computes all three aggregates in a single pass:
-- The same query, written with a CTE. One scan of tickets.
WITH ticket_stats AS (
SELECT
department_id,
COUNT(*) FILTER (WHERE status = 'open') AS open_tickets,
COUNT(*) FILTER (WHERE status = 'closed') AS closed_tickets,
AVG(resolution_hours) FILTER (WHERE status = 'closed')
AS avg_resolution
FROM tickets
GROUP BY department_id
)
SELECT
d.id,
d.name,
COALESCE(ts.open_tickets, 0) AS open_tickets,
COALESCE(ts.closed_tickets, 0) AS closed_tickets,
ts.avg_resolution
FROM departments d
LEFT JOIN ticket_stats ts ON ts.department_id = d.id
ORDER BY open_tickets DESC; The FILTER (WHERE ...) clause is a PostgreSQL feature that applies a condition during aggregation without requiring separate subqueries. Combined with a CTE, this produces a plan that scans the tickets table exactly once:
-- EXPLAIN ANALYZE on the CTE version. Same data.
Sort (cost=48923.41..48924.66 rows=500 width=52)
(actual time=412.118..412.194 rows=500 loops=1)
Sort Key: (COALESCE(ts.open_tickets, 0)) DESC
Sort Method: quicksort Memory: 65kB
-> Hash Left Join (cost=48801.23..48900.90 rows=500 width=52)
(actual time=411.204..411.892 rows=500 loops=1)
Hash Cond: (d.id = ts.department_id)
-> Seq Scan on departments d
(cost=0.00..11.00 rows=500 width=20)
(actual time=0.012..0.098 rows=500 loops=1)
-> Hash (cost=48794.98..48794.98 rows=500 width=36)
(actual time=411.142..411.143 rows=498 loops=1)
-> CTE Scan on ticket_stats ts
(cost=48782.48..48792.48 rows=500 width=36)
(actual time=410.892..411.024 rows=498 loops=1)
CTE ticket_stats
-> HashAggregate
(cost=48782.48..48787.48 rows=500 width=36)
(actual time=410.887..410.952 rows=498 loops=1)
-> Seq Scan on tickets
(cost=0.00..38482.00 rows=2100000 width=20)
(actual time=0.018..142.301 rows=2100000 loops=1)
Planning Time: 0.892 ms
Execution Time: 412.441 ms
-- One scan of the tickets table. One aggregation pass.
-- 412ms vs 4,872ms. That is 11.8x faster. One sequential scan of the tickets table. One HashAggregate. One hash join back to departments. Total execution time: 412ms.
That is 11.8x faster. The ratio gets worse as the data grows. With 5 million tickets, the subquery version exceeds 12 seconds. The CTE version stays under 900ms. The gap is not linear — it is multiplicative, because each additional subquery annotation adds another full pass over the table.
| Approach | Ticket scans | 500 depts / 2.1M tickets | 500 depts / 5M tickets |
|---|---|---|---|
| 3 correlated subqueries | 1,500 | 4,872 ms | 12,340 ms |
| Conditional aggregation | 1 | 487 ms | 1,020 ms |
| CTE + JOIN | 1 | 412 ms | 871 ms |
| LATERAL JOIN | 500 | 1,842 ms | 4,310 ms |
| CTE speedup vs subqueries | 1,500x fewer scans | 11.8x | 14.2x |
I have included the LATERAL JOIN approach in this table for completeness. It deserves its own discussion, which follows later.
Why Django cannot fix this easily
You might wonder why Django has not simply added CTE support in the nine years since ticket #28919 was opened. The answer is architectural, not political. And in fairness to the Django team, it is a genuinely difficult problem.
# Django ticket #28919 — opened September 2017
# Title: "Add support for Common Table Expressions (CTEs)"
# Status: Open (as of 2026)
#
# Key quote from the discussion:
# "The ORM generates repeated correlated subqueries where a
# single CTE would be dramatically more efficient."
#
# Nine years. Still open. The ORM team acknowledges the limitation
# but considers CTE support a significant architectural change.
#
# The ticket has 60+ comments, dozens of related PRs,
# and no merge date in sight. Django's query compiler builds SQL by walking a tree of expressions. Each .annotate() call adds a node to that tree. The compiler then serializes the tree into a SQL string, emitting each annotation as an inline expression in the SELECT list. There is no intermediate step where the compiler analyzes the full set of annotations for redundancy or factoring opportunities.
Adding CTE support would require the compiler to:
- Detect when multiple annotations reference the same base table with compatible filters
- Factor the shared work into a CTE definition
- Rewrite the remaining annotations to reference the CTE instead of the base table
- Emit the CTE in the correct position (before the main SELECT)
- Handle edge cases: different GROUP BY clauses, different filter conditions, recursive CTEs, CTEs that reference other CTEs
This is not a patch. It is a rearchitecture of the query compiler's code generation phase. The Django team is right to be cautious — a wrong CTE rewrite that changes query semantics would be far worse than a slow but correct query. A waiter who serves the wrong dish quickly has not improved upon a waiter who serves the right dish slowly.
There is also the question of database compatibility. Django supports PostgreSQL, MySQL, SQLite, and Oracle. CTEs are supported in MySQL 8+, PostgreSQL 9.1+, SQLite 3.8.3+, and Oracle 9i+. But the syntax, optimization behavior, and materialization rules differ across engines. PostgreSQL 12 introduced MATERIALIZED and NOT MATERIALIZED hints. MySQL has no equivalent. SQLite materializes by default with no option to inline. An ORM that generates CTEs must either generate the lowest-common-denominator syntax or maintain per-backend logic for CTE optimization — precisely the kind of complexity the Django team is reluctant to take on.
In the meantime, the problem persists. Every Django application with a moderately complex dashboard is likely generating redundant subqueries. Most teams never look at the generated SQL, so they never notice. The application works. The page loads. It just loads five seconds slower than it needs to.
"The ORM is not the enemy. But it is not always the ally it presents itself as, either. Between your application code and your database sits a translation layer, and that layer has opinions about how to access your data."
— from You Don't Need Redis, Chapter 3: The ORM Tax
Fix 1: Conditional aggregation (no library needed)
Before reaching for CTEs, consider whether your query can be restructured using Django's built-in conditional aggregation. Since Django 2.0, Count and Avg accept a filter argument that maps directly to PostgreSQL's FILTER (WHERE ...) clause:
# Django 2.0+ supports conditional aggregation via filter=
# This can reduce subquery duplication in SOME cases.
from django.db.models import Q, Count, Avg
departments = (
Department.objects
.annotate(
open_tickets=Count(
'tickets',
filter=Q(tickets__status='open'),
),
closed_tickets=Count(
'tickets',
filter=Q(tickets__status='closed'),
),
avg_resolution=Avg(
'tickets__resolution_hours',
filter=Q(tickets__status='closed'),
),
)
.order_by('-open_tickets')
)
# This generates a single JOIN with FILTER clauses:
#
# SELECT d.*,
# COUNT("tickets"."id") FILTER (WHERE "tickets"."status" = 'open'),
# COUNT("tickets"."id") FILTER (WHERE "tickets"."status" = 'closed'),
# AVG("tickets"."resolution_hours") FILTER (WHERE ...)
# FROM departments d
# LEFT JOIN tickets ON ...
# GROUP BY d.id
#
# Better than three subqueries. But the JOIN can still cause
# row multiplication if you have multiple relationships. This eliminates the subqueries entirely. Instead of three correlated subqueries, Django generates a single JOIN with FILTER clauses:
-- SQL generated by conditional aggregation. Compare to the three-subquery version.
SELECT
"departments"."id",
"departments"."name",
COUNT("tickets"."id") FILTER (
WHERE "tickets"."status" = 'open'
) AS "open_tickets",
COUNT("tickets"."id") FILTER (
WHERE "tickets"."status" = 'closed'
) AS "closed_tickets",
AVG("tickets"."resolution_hours") FILTER (
WHERE "tickets"."status" = 'closed'
) AS "avg_resolution"
FROM "departments"
LEFT OUTER JOIN "tickets"
ON ("departments"."id" = "tickets"."department_id")
GROUP BY "departments"."id", "departments"."name"
ORDER BY "open_tickets" DESC;
-- One JOIN. One GROUP BY. Three FILTER clauses evaluated in a single
-- aggregation pass. No correlated subqueries. No repeated scans.
--
-- Execution time on 500 depts / 2.1M tickets: 487ms.
-- Not as fast as the CTE (412ms), because the JOIN produces a wider
-- intermediate result set. But 10x faster than three subqueries. PostgreSQL handles FILTER expressions efficiently — they are evaluated during the aggregation pass, not as separate subqueries. One scan of the tickets table, one GROUP BY, three conditional counts computed simultaneously. On our benchmark dataset, this executes in 487ms — not quite as fast as the CTE approach (412ms) because the JOIN produces a wider intermediate result set, but ten times faster than the three-subquery version.
There are limitations, and I want to be forthright about them.
The first limitation is row multiplication. If your annotations involve different base tables (not just different filters on the same table), conditional aggregation breaks down:
# The row multiplication problem with conditional aggregation.
# When you annotate across MULTIPLE related tables:
departments = (
Department.objects
.annotate(
open_tickets=Count(
'tickets',
filter=Q(tickets__status='open'),
),
total_invoices=Count(
'invoices', # a second related table
),
)
)
# Django generates:
#
# SELECT d.*,
# COUNT("tickets"."id") FILTER (WHERE ...),
# COUNT("invoices"."id")
# FROM departments d
# LEFT JOIN tickets ON ...
# LEFT JOIN invoices ON ...
# GROUP BY d.id
#
# If department 1 has 100 tickets and 50 invoices, the JOIN
# produces 100 x 50 = 5,000 intermediate rows. COUNT(tickets)
# returns 5,000, not 100. COUNT(invoices) returns 5,000, not 50.
#
# The fix: use Count('tickets', distinct=True) — but DISTINCT
# adds its own cost, especially on large intermediate result sets.
# Or use Subquery for the second relationship. Or use a CTE.
#
# This is the fundamental limitation of single-query aggregation:
# multiple JOINs create a Cartesian product between the joined tables. When you JOIN tickets and invoices in the same query, a department with 100 tickets and 50 invoices produces 5,000 intermediate rows. The COUNT for each table is inflated by the size of the other. The fix — Count('tickets', distinct=True) — adds overhead and does not work for Avg or Sum without additional gymnastics.
The second limitation is expressiveness. Conditional aggregation can only compute aggregates that Django's Count, Sum, Avg, Min, Max, and StdDev functions support. If you need percentiles, array aggregation, string aggregation, or any PostgreSQL-specific aggregate function, you are back to Subquery or raw SQL.
For the common case of "multiple counts/averages on the same related table with different filters," conditional aggregation is the simplest fix. No libraries. No raw SQL. Pure ORM. And for that common case, it is what I would recommend first.
Fix 2: django-cte for CTE queries within the ORM
When conditional aggregation is not sufficient — when you need to join CTE results, use window functions over CTE output, or factor out genuinely complex shared subqueries — the Dimagi team has built a library worth your attention. django-cte provides CTE support that integrates with Django's queryset API.
# django-cte: CTE support inside the Django ORM
# pip install django-cte
from django_cte import With
# Build the CTE as a queryset
ticket_stats_cte = With(
Ticket.objects
.values('department_id')
.annotate(
open_tickets=Count('id', filter=Q(status='open')),
closed_tickets=Count('id', filter=Q(status='closed')),
avg_resolution=Avg(
'resolution_hours',
filter=Q(status='closed'),
),
)
)
# Use the CTE in the main query
departments = (
ticket_stats_cte
.join(Department, id=ticket_stats_cte.col.department_id)
.with_cte(ticket_stats_cte)
.annotate(
open_tickets=ticket_stats_cte.col.open_tickets,
closed_tickets=ticket_stats_cte.col.closed_tickets,
avg_resolution=ticket_stats_cte.col.avg_resolution,
)
.order_by('-open_tickets')
)
# This generates the CTE SQL shown earlier.
# One scan. One aggregation. 12x faster. The library works by hooking into Django's SQL compiler to inject the WITH clause before the main SELECT. You build the CTE as a regular queryset, then join it to the main query using the library's .join() and .with_cte() methods.
The generated SQL is the CTE version shown earlier — one scan, one aggregation, 12x faster.
A few notes on django-cte:
- It supports recursive CTEs, multiple CTEs in a single query, and CTEs that reference other CTEs
- It works with PostgreSQL, SQLite, and MySQL 8+ (though the performance benefits are most dramatic on PostgreSQL)
- It is maintained by Dimagi, who use it in production on CommCare — a large-scale Django application serving millions of users
- It does require learning a different queryset composition pattern — the
.join()/.with_cte()API is not identical to standard Django queryset chaining
The recursive CTE support deserves particular mention, because it opens up an entire category of queries that Django's ORM simply cannot express natively:
# django-cte also supports recursive CTEs.
# Example: organizational hierarchy (who reports to whom)
from django_cte import With
def get_org_tree(root_dept_id):
cte = With.recursive(
lambda cte: (
Department.objects
.filter(id=root_dept_id)
.values('id', 'name', 'parent_id', depth=Value(0))
.union(
cte.join(
Department,
parent_id=cte.col.id,
).values(
'id', 'name', 'parent_id',
depth=cte.col.depth + 1,
),
all=True,
)
)
)
return (
cte.join(Department, id=cte.col.id)
.with_cte(cte)
.annotate(
depth=cte.col.depth,
parent_name=cte.col.name,
)
.order_by('depth', 'name')
)
# Recursive CTEs are one of SQL's most powerful features,
# and Django's ORM has no native equivalent. This alone
# justifies adding django-cte to your dependencies. Recursive CTEs are one of SQL's most powerful features — hierarchical data, graph traversal, bill-of-materials queries — and Django's ORM has no native equivalent. If your application has tree-structured data and you are currently solving it with multiple queries or a third-party tree library like django-mptt or django-treebeard, a recursive CTE may be the more direct solution.
The honest counterpoint on django-cte
I should be candid about the trade-offs. django-cte is a relatively small library with a focused maintainer team. It has roughly 600 stars on GitHub as of this writing — well-regarded in its niche, but not widely adopted. If Dimagi's priorities change, the library could fall behind Django's release cycle. This has happened to other Django libraries, and it is a real risk for any third-party dependency that hooks into ORM internals.
The library also cannot generate MATERIALIZED or NOT MATERIALIZED hints, which means you are relying on PostgreSQL's default materialization behavior. For most use cases this is fine. For performance-critical CTEs where you need explicit control over materialization, you may still need raw SQL.
Finally, the .join() syntax is unfamiliar to most Django developers. Code review will be slower. New team members will need to learn the pattern. These are not reasons to avoid the library — they are reasons to document your usage well and include good inline comments.
Fix 3: Raw SQL when the ORM is the bottleneck
Sometimes the most direct solution is the correct one.
from django.db import connection
def get_department_stats():
sql = """
WITH ticket_stats AS (
SELECT
department_id,
COUNT(*) FILTER (WHERE status = 'open') AS open_tickets,
COUNT(*) FILTER (WHERE status = 'closed') AS closed_tickets,
AVG(resolution_hours) FILTER (WHERE status = 'closed')
AS avg_resolution
FROM tickets
GROUP BY department_id
)
SELECT
d.id,
d.name,
COALESCE(ts.open_tickets, 0) AS open_tickets,
COALESCE(ts.closed_tickets, 0) AS closed_tickets,
ts.avg_resolution
FROM departments d
LEFT JOIN ticket_stats ts ON ts.department_id = d.id
ORDER BY open_tickets DESC
"""
with connection.cursor() as cursor:
cursor.execute(sql)
columns = [col[0] for col in cursor.description]
return [dict(zip(columns, row)) for row in cursor.fetchall()]
# Pros: full SQL control, CTEs, window functions, FILTER clauses.
# Cons: no queryset chaining, no model instances, no lazy evaluation.
# Use when: the performance gap justifies the trade-off. Raw SQL gives you complete control over query structure. CTEs, window functions, FILTER clauses, LATERAL joins, MATERIALIZED hints — everything PostgreSQL supports, without waiting for the ORM to add support.
The trade-off is real: you lose queryset composability, lazy evaluation, and model instance hydration. You get back dictionaries instead of Django model objects. For read-only dashboard queries — which is precisely where duplicated subqueries tend to appear — this trade-off is usually acceptable.
A pragmatic approach: use the ORM for CRUD operations where its abstractions save time, and drop to raw SQL for complex reporting queries where its abstractions cost performance. There is no rule that says an application must use one approach exclusively. I find the insistence on ORM purity — "we must use the queryset API for everything" — to be a form of misplaced consistency. Consistency serves clarity. When the ORM generates SQL that is 12x slower than necessary, the clearest thing you can do is write the SQL yourself.
For teams that use raw SQL regularly, I recommend a thin wrapper pattern:
from django.db import connection
from functools import wraps
def raw_query(sql, params=None):
"""Execute raw SQL and return list of dicts."""
with connection.cursor() as cursor:
cursor.execute(sql, params or [])
columns = [col[0] for col in cursor.description]
return [dict(zip(columns, row)) for row in cursor.fetchall()]
def department_stats_query(min_tickets=0, status_filter=None):
"""Parameterized CTE query — safe, composable, fast."""
conditions = ["1=1"]
params = []
if status_filter:
conditions.append("t.status = %s")
params.append(status_filter)
where = " AND ".join(conditions)
sql = f"""
WITH ticket_stats AS (
SELECT
department_id,
COUNT(*) FILTER (WHERE status = 'open') AS open_tickets,
COUNT(*) FILTER (WHERE status = 'closed') AS closed_tickets,
AVG(resolution_hours) FILTER (WHERE status = 'closed')
AS avg_resolution
FROM tickets t
WHERE {where}
GROUP BY department_id
)
SELECT
d.id, d.name,
COALESCE(ts.open_tickets, 0) AS open_tickets,
COALESCE(ts.closed_tickets, 0) AS closed_tickets,
ts.avg_resolution
FROM departments d
LEFT JOIN ticket_stats ts ON ts.department_id = d.id
WHERE COALESCE(ts.open_tickets, 0) + COALESCE(ts.closed_tickets, 0) >= %s
ORDER BY open_tickets DESC
"""
params.append(min_tickets)
return raw_query(sql, params)
# Still parameterized. Still injection-safe. Still composable
# within Python — just not composable as a Django queryset.
# For dashboard views, this is often the pragmatic choice. This preserves parameterized queries (safe from SQL injection), provides composability through Python function arguments, and keeps the CTE structure that makes the query fast. You lose queryset chaining, but for dashboard and reporting views, you rarely need it.
When raw SQL is the wrong choice
I would be remiss not to mention the cases where raw SQL creates more problems than it solves. If you need to paginate the results using Django's built-in pagination, raw SQL requires manual LIMIT/OFFSET handling. If you need to compose the query with Django's permission system or filter backends, raw SQL bypasses those entirely. If you use Django REST Framework and need to return model instances for serialization, raw SQL returns dictionaries that your serializer cannot process without a custom adapter.
The general principle: use raw SQL for queries that produce final output (dashboard numbers, CSV exports, report tables). Use the ORM for queries that feed into Django's middleware stack (pagination, permissions, serialization, admin integration).
Fix 4: LATERAL JOIN — the middle ground
There is a fourth approach that merits discussion, particularly because it solves a specific problem that neither conditional aggregation nor CTEs handle well: per-row computation that needs access to multiple columns.
-- LATERAL JOIN: an alternative to CTEs for row-by-row computation.
SELECT
d.id,
d.name,
stats.open_tickets,
stats.closed_tickets,
stats.avg_resolution
FROM departments d
LEFT JOIN LATERAL (
SELECT
COUNT(*) FILTER (WHERE status = 'open') AS open_tickets,
COUNT(*) FILTER (WHERE status = 'closed') AS closed_tickets,
AVG(resolution_hours) FILTER (WHERE status = 'closed')
AS avg_resolution
FROM tickets t
WHERE t.department_id = d.id
) stats ON true
ORDER BY stats.open_tickets DESC;
-- LATERAL runs the subquery once per department row, but unlike
-- correlated subqueries in the SELECT list, it computes ALL
-- aggregations in a single pass per department.
--
-- Performance on 500 depts / 2.1M tickets: 1,842ms.
-- Faster than three correlated subqueries (4,872ms).
-- Slower than the CTE (412ms), because it still iterates per row.
--
-- LATERAL shines when you need per-row computation with access
-- to multiple columns from the lateral subquery. For pure
-- aggregation, the CTE is superior. A LATERAL join runs the subquery once per row in the outer query, but unlike correlated subqueries in the SELECT list, it computes all aggregations in a single pass per row. With three correlated subqueries, each department triggers three separate scans of the tickets table. With a LATERAL join, each department triggers one scan.
On our benchmark: 1,842ms. Faster than three correlated subqueries (4,872ms), slower than the CTE (412ms). The LATERAL approach still iterates per department, while the CTE aggregates the entire tickets table in one pass regardless of the number of departments.
Where LATERAL shines is when the subquery needs to reference columns from the outer query beyond a simple equality filter. If your per-department computation needs to access the department's creation date, region, or other attributes to customize the aggregation, LATERAL provides that access naturally. A CTE, being defined before the main query, cannot reference the outer query's columns.
Django's ORM does not support LATERAL joins. This is raw SQL territory. But if your use case demands it, it is a powerful tool to have in your vocabulary.
CTE materialization: a detail that matters more than it should
If you adopt the CTE approach — whether via django-cte or raw SQL — there is a PostgreSQL planning detail that deserves your attention.
-- PostgreSQL 12+ gives you control over CTE materialization.
-- By default, non-recursive CTEs that are referenced once
-- may be inlined (un-materialized) by the planner.
-- Force materialization (pre-compute and store):
WITH ticket_stats AS MATERIALIZED (
SELECT department_id,
COUNT(*) FILTER (WHERE status = 'open') AS open_tickets
FROM tickets
GROUP BY department_id
)
SELECT d.*, ts.open_tickets
FROM departments d
LEFT JOIN ticket_stats ts ON ts.department_id = d.id;
-- Force inlining (no materialization):
WITH ticket_stats AS NOT MATERIALIZED (
SELECT department_id,
COUNT(*) FILTER (WHERE status = 'open') AS open_tickets
FROM tickets
GROUP BY department_id
)
SELECT d.*, ts.open_tickets
FROM departments d
LEFT JOIN ticket_stats ts ON ts.department_id = d.id;
-- For our use case — multiple aggregations over the same table —
-- MATERIALIZED is almost always correct. The CTE result is small
-- (one row per department) and referenced once in the main query.
-- Materialization prevents the planner from inlining the CTE
-- back into a correlated subquery, which would undo the optimization.
--
-- PostgreSQL's default behavior (since 12) is usually correct here,
-- but explicit MATERIALIZED removes any ambiguity. Prior to PostgreSQL 12, all CTEs were materialized — the CTE result was computed once, stored in a temporary worktable, and referenced from there. This was predictable and generally desirable for our use case. Starting with PostgreSQL 12, the planner may choose to inline a non-recursive CTE that is referenced only once, effectively converting it back into a subquery.
For our aggregation CTEs, inlining is almost never what we want. The whole point of the CTE is to compute the aggregation once and join the result. If the planner inlines the CTE, we are back to a correlated subquery — exactly the problem we were solving.
In practice, PostgreSQL's heuristics are usually correct for this pattern. The planner recognizes that a CTE with a GROUP BY and aggregation functions is expensive to recompute and keeps it materialized. But "usually correct" is not "always correct," and I have observed cases where a statistics update or an ANALYZE run changed the planner's cost estimates enough to trigger inlining.
If you want certainty, use AS MATERIALIZED. If you are using django-cte and cannot add the hint, monitor your query plans after major data changes. A plan regression on a dashboard query is particularly insidious because it manifests as gradual slowdown rather than sudden failure.
Why Django cannot fix this easily (the deeper reason)
I touched on the architectural constraints earlier, but there is a subtlety worth exploring — because it explains not just why this specific fix is difficult, but why ORM-generated SQL consistently lags behind hand-written SQL for analytical queries.
Django's query compiler operates on a per-expression basis. When you write .annotate(x=SomeExpression()), the compiler resolves SomeExpression into a SQL fragment and embeds it at the correct position in the SELECT list. It does not maintain a global view of all expressions in the query. It cannot, because queryset evaluation is lazy — the full set of annotations is not known until the queryset is actually evaluated.
Consider what would be needed for automatic CTE extraction:
- The compiler would need to defer SQL generation until all annotations are known
- It would need to analyze the full set of annotations for common subexpressions — a problem known in compiler theory as common subexpression elimination, which is NP-hard in the general case
- It would need to determine whether factoring a subexpression into a CTE actually improves performance — which depends on the data distribution, index availability, and PostgreSQL version
- It would need to generate the CTE syntax correctly across all supported database backends
This is, in essence, asking the ORM to include a query optimizer. PostgreSQL already has one. The mismatch is that Django generates SQL before PostgreSQL's optimizer sees it, and Django's code generator does not have access to the statistics, indexes, or cost models that PostgreSQL's optimizer uses to make planning decisions.
This is not a criticism of Django. It is a structural observation about the limits of ORMs in general. An ORM that generates perfect SQL for every query would need to be as complex as the database engine itself. No ORM has achieved this. None is likely to. The practical answer is the one this article explores: know where the ORM's blind spots are, and apply the appropriate fix at the appropriate layer.
Finding duplicated subqueries in your existing queries
Before you can fix the problem, you need to find it. Most Django applications have dozens of views with chained .annotate() calls, and not all of them generate redundant subqueries. The question is which ones are costing you time and which ones are harmless.
In development: Django's query logging
# settings.py — log all SQL queries in development
LOGGING = {
'version': 1,
'handlers': {
'console': {
'class': 'logging.StreamHandler',
},
},
'loggers': {
'django.db.backends': {
'level': 'DEBUG',
'handlers': ['console'],
},
},
}
# Every SQL query Django executes will now appear in the console.
# In development, this is indispensable. In production, it would
# generate enough log volume to constitute a performance problem
# of its own. Use pg_stat_statements in production instead. With SQL logging enabled, every query Django executes appears in your console. Look for queries where the same table name appears in multiple nested SELECT statements within a single outer query. If you see FROM "tickets" appearing three times in one query, you have found a candidate.
The Django Debug Toolbar provides a more visual approach — it shows the query count, individual execution times, and the EXPLAIN output for each query. For development-time diagnosis, it is indispensable.
In production: pg_stat_statements
-- Find queries with repeated correlated subqueries in your workload.
-- Run this against pg_stat_statements:
SELECT
queryid,
query,
calls,
mean_exec_time,
(length(query) - length(replace(query, 'SubPlan', '')))
/ length('SubPlan') AS subplan_count
FROM pg_stat_statements
WHERE query ILIKE '%subplan%'
OR (query ILIKE '%SELECT%SELECT%SELECT%'
AND mean_exec_time > 100)
ORDER BY mean_exec_time * calls DESC
LIMIT 20;
-- Queries with 3+ nested SELECTs and high mean_exec_time
-- are prime CTE rewrite candidates. The pg_stat_statements extension aggregates query statistics across all calls. The query above identifies queries with multiple nested SELECTs and high mean execution time — exactly the pattern that duplicated subqueries produce. Sort by mean_exec_time * calls to find the queries with the highest total impact on your database.
The pattern to watch for: any queryset with two or more .annotate(Subquery(...)) calls where the subqueries reference the same model. That is almost certainly generating redundant table scans. A single Subquery annotation is usually fine — the redundancy only appears when multiple subqueries target the same table and could share a scan.
Testing for subquery regression
Finding the problem once is good. Ensuring it never returns is better.
from django.test.utils import CaptureQueriesContext
from django.db import connection
class TestDepartmentDashboard:
def test_annotate_query_count(self):
"""Verify the dashboard query does not generate redundant subqueries."""
# Create test data: 20 departments, ~100 tickets each
create_departments(count=20)
create_tickets(per_department=100)
with CaptureQueriesContext(connection) as ctx:
list(get_department_stats()) # Force evaluation
# With CTE or conditional aggregation: 1 query.
# With three subqueries: still 1 query (but much slower).
# The assertion catches regressions where someone
# adds a .annotate() that triggers a separate query.
assert len(ctx.captured_queries) == 1, (
f"Expected 1 query, got {len(ctx.captured_queries)}. "
f"Queries: {[q['sql'][:100] for q in ctx.captured_queries]}"
)
def test_annotate_no_repeated_table_scans(self):
"""Verify the generated SQL does not contain repeated subqueries."""
qs = get_department_queryset()
sql = str(qs.query)
# Count how many times the tickets table appears in subqueries
subquery_count = sql.count('FROM "tickets"')
assert subquery_count <= 1, (
f"Query contains {subquery_count} references to tickets table. "
f"Expected 1 (CTE or JOIN). Got redundant subqueries."
) The first test asserts the query count. The second test goes further — it inspects the generated SQL string for repeated table references. If someone adds a new .annotate(Subquery(...)) that references the tickets table, the test catches it before the code reaches production.
A note on testing philosophy: I prefer to test the generated SQL rather than just the query count. A query count of 1 tells you there is one query. It does not tell you whether that one query contains three redundant subqueries. The SQL inspection test catches the specific failure mode we are discussing here — redundant table scans within a single query.
An honest assessment: when this problem does not matter
I have spent considerable time explaining a performance problem and its fixes. I should be equally clear about when the problem does not justify the fix.
If your tickets table has 10,000 rows and your departments table has 50 rows, the three-subquery version runs in 15ms. The CTE version runs in 8ms. The conditional aggregation version runs in 10ms. You have saved 7ms. That saving will never be noticed by any user under any circumstances.
The crossover point — where the redundancy starts to cause perceptible delay — depends on the size of the scanned table, the number of outer rows, and the complexity of the filter conditions. As a rough guide: if the scanned table has fewer than 100,000 rows and the outer query returns fewer than 200 rows, the subquery approach is unlikely to cause problems. The overhead is there, but it is measured in single-digit milliseconds.
Where it matters is exactly where it sounds like it would matter: large tables, many outer rows, or both. A tickets table with 2 million rows and 500 departments. An orders table with 10 million rows and 5,000 customers. A logs table with 50 million rows and any number of aggregation groups. These are the datasets where 12x matters. These are also, not coincidentally, the datasets that exist in production but not in development.
The insidious nature of this problem is that it is invisible on development data. Every developer who has ever said "it's fast on my machine" and been wrong about production was bitten by exactly this class of issue: a query pattern whose cost scales with data volume in a way that small datasets cannot reveal.
A note on PostgreSQL's planner: why it cannot save you here
A reasonable question: if PostgreSQL's query optimizer is so sophisticated, why can it not detect that two correlated subqueries scan the same table with the same filter and merge them?
The answer is that PostgreSQL's planner optimizes within each subquery independently. A correlated scalar subquery in the SELECT list is treated as a black box by the outer query's planner. The planner does not compare subqueries against each other for common subexpressions. It does not attempt to factor shared work into a common table expression. Each subquery is planned and executed as a self-contained unit.
This is a deliberate design choice. Cross-subquery optimization would require the planner to reason about the interactions between multiple subqueries — a combinatorial problem that could make planning time itself a bottleneck. The PostgreSQL team has chosen plan stability and predictable planning time over speculative cross-subquery optimization.
There have been proposals in the PostgreSQL community to add subquery consolidation to the planner. None have been accepted. The consensus appears to be that this optimization is better performed at the SQL generation layer — by the application, the ORM, or a proxy — rather than by the database engine.
This is, I should note, a reasonable position. The application knows the intent behind the query. The database only sees the SQL. An application that generates three redundant subqueries could have intended all three to be independent (perhaps for isolation semantics). The database should not second-guess the application's SQL without explicit instruction.
Choosing the right fix for your situation
Four fixes, each with trade-offs. Allow me to suggest a decision framework.
Use conditional aggregation when all your annotations reference the same related table and you are computing standard aggregates (Count, Sum, Avg, Min, Max). This is the simplest fix, requires no libraries, and produces clean SQL. It is the right choice for probably 70% of the cases where this problem appears.
Use django-cte when you need CTE syntax that stays within the ORM's queryset API — for recursive CTEs, for CTEs that are referenced multiple times in the same query, or when your team prefers the composability of queryset chaining over raw SQL strings. Accept the dependency trade-off consciously.
Use raw SQL when the query is complex enough that expressing it through any ORM layer adds confusion rather than clarity. Dashboard queries with multiple CTEs, window functions, FILTER clauses, and LATERAL joins are often clearer as SQL than as Python. The key discipline is parameterization — never interpolate user input into raw SQL strings.
Use a LATERAL JOIN when you need per-row computation that references columns from the outer query and cannot be expressed as a CTE. This is the least common case but the one where LATERAL is genuinely the best tool.
And if you are unsure: start with conditional aggregation. If it produces correct results and acceptable performance, stop there. The simplest solution that works is the correct solution. Additional complexity earns its place only when simplicity falls short.
Where a query-aware proxy handles this automatically
The four fixes above require you to find the problem, diagnose it, choose the right approach, and rewrite the query. For a handful of views, that is manageable — even satisfying. For a large Django application with hundreds of querysets composed across views, serializers, managers, and admin classes — many of them generated dynamically by Django's admin, Django REST Framework, or third-party libraries you do not control — manual rewriting does not scale.
Gold Lapel operates at the PostgreSQL wire protocol level, between Django and the database. It sees the SQL that Django generates — after all the ORM compilation, after all the annotation chaining, after all the middleware processing — and analyzes it for structural inefficiencies before it reaches PostgreSQL.
When Gold Lapel detects a query with multiple correlated subqueries that reference the same table with overlapping filter conditions, it can rewrite the query to use a CTE. The rewritten query is semantically identical — same results, same column names, same row order. The execution time drops by the same 10-12x factor shown in the benchmarks above.
This happens transparently. No django-cte installation. No raw SQL. No changes to your querysets. The ORM continues generating the SQL it always has. The proxy fixes it on the way to PostgreSQL.
What a proxy cannot do
I would be a poor waiter indeed if I suggested the proxy approach without acknowledging its limits. A proxy rewrites SQL at the wire level. It cannot change the number of queries your application sends — if you have an N+1 problem, the proxy sees N+1 individual queries, not one query that should be rewritten. CTE consolidation addresses a different failure mode: one query that does too much redundant work internally.
The proxy also adds a small amount of latency — typically 1-3ms per query — for the analysis and potential rewriting. For a query that drops from 4,872ms to 412ms, that overhead is negligible. For a query that already runs in 5ms, the proxy is not the right tool. Not every query needs optimization. Not every optimization needs automation.
For the ticket that has been open since 2017, there is now a fix that does not require waiting for the ticket to close. For the queries where manual rewriting is practical, the manual approaches are preferable — simpler, no proxy dependency, full developer control. For the queries you do not know about yet, in the Django admin views and third-party serializers you did not write, the proxy catches what you cannot.
The view from the manor: what this teaches us about ORMs
This article is, on the surface, about a specific performance issue in Django's ORM. But the underlying lesson is broader, and I think worth stating plainly.
An ORM is a translation layer. It translates Python (or Ruby, or Java) into SQL. Like all translations, it preserves meaning while losing nuance. The meaning of your queryset — "give me departments with ticket counts" — is preserved perfectly. The nuance — "and please compute all the counts in a single pass" — is lost, because that nuance exists at the SQL level, not the Python level.
This is not a failure of Django specifically. Every ORM faces the same fundamental tension: the application language's abstractions do not map cleanly to the database language's optimization opportunities. A queryset chains annotations sequentially. SQL can compute them in parallel. The queryset has no way to express "these three annotations should share a scan" because the concept of a shared scan does not exist in the queryset API. It exists only in SQL.
The practical consequence: treat the ORM as a tool with known blind spots, not as a complete abstraction over SQL. There is a thorough guide to reading EXPLAIN ANALYZE output on the site. It may prove useful. Learn to identify the SQL patterns that the ORM generates. Learn when to let the ORM handle query generation and when to write the SQL yourself.
The database was not slow. It was being asked poorly. And the entity doing the asking, in this case, was not you — it was an ORM that did its best with the instructions it was given.
Frequently asked questions
Terms referenced in this article
The broader question of ORM-generated SQL versus hand-written queries — and where exactly the abstraction costs more than it saves — is one I have taken up at some length. The full account, I suspect, would interest you, particularly the benchmarks showing where Django's ORM produces identical plans and where it does not.