← Django & Python Frameworks

Django Custom Managers for Optimal PostgreSQL: Encapsulating the Queries Your Database Deserves

Your QuerySets are generating SQL. The question is whether that SQL would survive a performance review.

The Waiter of Gold Lapel · Updated Mar 20, 2026 Published Mar 5, 2026 · 32 min read
The illustration was annotated with correlated subqueries and has not returned. Our apologies.

Good evening. I see you have written a Django view.

A fine view, I am sure. Filters the right records, renders the correct template, passes context with appropriate variable names. Clean. Idiomatic. The sort of code that passes review without comment.

And behind it, Django's ORM is sending 1,501 queries to PostgreSQL for a single page load.

I do not say this to alarm you. I say it because the view looks correct, and that is precisely what makes it dangerous. A view that crashes is fixed within the hour. A view that quietly generates fifteen hundred queries on every request survives for months, scaling linearly with data volume, until someone runs django-debug-toolbar during a load test and discovers what I am about to show you.

This is not Django's fault. The ORM does exactly what it is told. The problem is that most Django code tells it nothing about how to fetch related data, so it falls back to lazy loading — one query per relationship access, per row, per request. The classic N+1 pattern, reproduced faithfully across thousands of Django applications.

The fix is not to scatter select_related() and prefetch_related() calls across every view. That works, but it does not scale — not because the database cannot handle it, but because human beings cannot remember to include the same four method calls in every view, serializer, and management command that touches the same model. One forgotten call, one new developer, one rushed deadline, and you are back to 1,501 queries.

The fix is to encapsulate your data access patterns inside custom managers and QuerySet methods — reusable, composable, and impossible to forget.

Allow me to show you what I mean.

# A view that looks reasonable.
# A view that will ruin your afternoon.

def order_dashboard(request):
    orders = Order.objects.filter(
        status='active',
        created_at__gte=timezone.now() - timedelta(days=30)
    )

    for order in orders:
        # N+1: one query per order to fetch customer
        customer_name = order.customer.name

        # N+1 again: one query per order to fetch items
        item_count = order.items.count()

        # And again: correlated subquery per order
        total = order.items.aggregate(
            total=Sum('price')
        )['total']

That view generates this traffic:

-- What PostgreSQL receives from the view above,
-- assuming 500 active orders:

SELECT * FROM orders WHERE status = 'active'
  AND created_at >= '2026-02-03';
-- 1 query. Then, for EACH of the 500 rows:

SELECT * FROM customers WHERE id = 42;
SELECT * FROM customers WHERE id = 17;
-- ... 498 more customer lookups

SELECT COUNT(*) FROM order_items WHERE order_id = 1;
SELECT COUNT(*) FROM order_items WHERE order_id = 2;
-- ... 498 more counts

SELECT SUM(price) FROM order_items WHERE order_id = 1;
SELECT SUM(price) FROM order_items WHERE order_id = 2;
-- ... 498 more aggregations

-- Total: 1 + 500 + 500 + 500 = 1,501 queries
-- For one page load.

1,501 queries. Three round trips per order. And this is a modest example — 500 orders with two relationships. A real dashboard with five or six foreign keys and a couple of aggregations can easily produce 3,000+ queries per page load. I have seen 11,000. The developer responsible was a capable engineer. The code passed review. Nobody noticed until response times crossed two seconds and a monitoring alert finally rang.

Each individual query was fast — a fraction of a millisecond. The catastrophe was not in any single query but in the sheer volume. One thousand five hundred round trips between Django and PostgreSQL, each carrying the overhead of network latency, connection protocol, query parsing, and result serialization. Death by a thousand paper cuts, except there were fifteen hundred of them.

Custom managers: encoding your access patterns once

A custom manager is a class that defines how your model's QuerySets behave. Instead of remembering to call select_related('customer') in every view that touches orders, you define it once on the model and chain it like any other QuerySet method.

from django.db import models

class OrderQuerySet(models.QuerySet):
    def active(self):
        return self.filter(status='active')

    def recent(self, days=30):
        cutoff = timezone.now() - timedelta(days=days)
        return self.filter(created_at__gte=cutoff)

    def with_customer(self):
        """JOIN customers in a single query."""
        return self.select_related('customer')

    def with_items(self):
        """Prefetch order items in one additional query."""
        return self.prefetch_related('items')


class OrderManager(models.Manager):
    def get_queryset(self):
        return OrderQuerySet(self.model, using=self._db)

    def active(self):
        return self.get_queryset().active()

    def recent(self, days=30):
        return self.get_queryset().recent(days=days)


class Order(models.Model):
    customer = models.ForeignKey(Customer, on_delete=models.CASCADE)
    status = models.CharField(max_length=20)
    created_at = models.DateTimeField(auto_now_add=True)

    objects = OrderManager()

The critical insight: custom QuerySet methods return QuerySets. They are lazy. They compose. You chain them together, and Django builds a single SQL statement from the entire chain. Nothing executes until you iterate, slice, or explicitly evaluate.

# Before: 1,501 queries
orders = Order.objects.filter(
    status='active',
    created_at__gte=cutoff
)

# After: 3 queries
orders = (
    Order.objects
    .active()
    .recent(days=30)
    .with_customer()
    .with_items()
)

# Reads like a sentence. Executes like a plan.
# Query 1: SELECT * FROM orders WHERE status = 'active' AND ...
# Query 2: SELECT * FROM customers WHERE id IN (42, 17, 89, ...)
# Query 3: SELECT * FROM order_items WHERE order_id IN (1, 2, 3, ...)

Three queries instead of 1,501. The select_related('customer') call produces a JOIN. The prefetch_related('items') produces a second query with an IN clause. The filters merge into the WHERE clause of the first query. Django assembles all of this from the chain — you never write SQL.

More importantly, every developer on your team now calls Order.objects.active().with_customer() instead of reinventing the eager loading in each view. The pattern is encoded. The knowledge is shared. The 1,501-query disaster becomes structurally impossible.

I should note a subtlety that trips up even experienced Django developers: the relationship between the Manager class and the QuerySet class. The Manager is the entry point — it lives on the model as objects. The QuerySet is the chain — it accumulates filters, annotations, and orderings. You need both because the Manager controls what get_queryset() returns (the base QuerySet for every chain), while the QuerySet methods define the vocabulary of your chains.

If you find the two-class pattern verbose, Django offers a shortcut that I rather approve of:

from django.db import models

class OrderQuerySet(models.QuerySet):
    def active(self):
        return self.filter(status='active')

    def recent(self, days=30):
        cutoff = timezone.now() - timedelta(days=days)
        return self.filter(created_at__gte=cutoff)

    def with_customer(self):
        return self.select_related('customer')

    def with_items(self):
        return self.prefetch_related('items')

    def with_stats(self):
        return self.annotate(
            item_count=Count('items'),
            total_value=Sum('items__price'),
        )

# One line. Manager and QuerySet in one declaration.
OrderManager = OrderQuerySet.as_manager

class Order(models.Model):
    customer = models.ForeignKey(Customer, on_delete=models.CASCADE)
    status = models.CharField(max_length=20)
    created_at = models.DateTimeField(auto_now_add=True)

    objects = OrderQuerySet.as_manager()

# Order.objects.active().with_customer().with_stats()
# All QuerySet methods available directly on the manager.
# No boilerplate Manager class. No method forwarding.

One line. The as_manager() class method creates a Manager that forwards every QuerySet method automatically. No boilerplate. No method forwarding. The QuerySet is the API, and the Manager is just the door through which you access it.

I prefer this pattern for most models. The explicit two-class approach earns its keep only when the Manager's get_queryset() needs custom behavior — applying default filters, restricting visibility, or adding auto-eager loading. For everything else, as_manager() is the right amount of ceremony.

Lazy annotation chains: .with_stats() that compute in PostgreSQL

Eager loading handles relationships. But what about computed values — counts, sums, averages? The naive approach computes these in Python, per row, with separate queries. The manager approach pushes them into PostgreSQL as annotations.

This distinction matters more than most developers realize. When you compute a sum in Python, you are fetching every child row across the network, deserializing it into a Python object, and adding it to an accumulator in a language that processes numbers roughly a thousand times slower than the database engine sitting five milliseconds away. PostgreSQL has a query optimizer, SIMD-accelerated aggregation functions, and columnar storage strategies that exist for precisely this purpose. Let it do its job.

class OrderQuerySet(models.QuerySet):
    # ... active(), recent(), with_customer() as above ...

    def with_stats(self):
        """Annotate each order with item count and total,
        computed in the database — not in Python."""
        return self.annotate(
            item_count=Count('items'),
            total_value=Sum('items__price'),
        )

    def with_avg_item_price(self):
        """Average price per item, per order."""
        return self.annotate(
            avg_item_price=Avg('items__price'),
        )

    def expensive(self, threshold=500):
        """Orders whose total exceeds a threshold.
        Requires with_stats() in the chain."""
        return self.filter(total_value__gte=threshold)

The generated SQL:

-- Order.objects.active().with_stats()
-- generates a single query:

SELECT
  orders.*,
  COUNT(order_items.id) AS item_count,
  SUM(order_items.price) AS total_value
FROM orders
LEFT OUTER JOIN order_items
  ON order_items.order_id = orders.id
WHERE orders.status = 'active'
GROUP BY orders.id;

-- One query. All stats computed by PostgreSQL.
-- No Python loops. No per-row aggregation.

One query. PostgreSQL handles the COUNT and SUM with a LEFT OUTER JOIN and GROUP BY. No Python loops. No per-row aggregation queries. The database does what databases are built to do.

And because these are QuerySet methods, they chain with everything else:

# Lazy chaining — annotations only execute when evaluated
dashboard_orders = (
    Order.objects
    .active()
    .recent(days=30)
    .with_customer()
    .with_stats()
    .expensive(threshold=1000)
    .order_by('-total_value')
)

# Still just a QuerySet. No SQL has been sent.
# The query fires when you iterate, slice, or call list().

# The generated SQL is a single statement that:
# 1. Filters active orders from last 30 days
# 2. JOINs customers
# 3. Annotates item_count and total_value via LEFT JOIN
# 4. Filters to orders above $1,000
# 5. Sorts by total descending

That entire chain — filters, eager loading, annotations, filtering on annotations, ordering — compiles to a single SQL statement. Django's ORM is doing real work here. The manager pattern simply makes it accessible without requiring every developer to know the annotation API by heart.

The double-counting trap: a matter requiring some candour

I would be doing you a disservice if I did not address this directly, because it is the most common bug in Django annotation code, and it is deeply unintuitive.

Annotating Count and Sum on the same QuerySet that involves multiple LEFT JOINs can produce incorrect results. If an order has 3 items and 2 payments, a naive double-join produces 6 rows per order, and Count('items') returns 6 instead of 3.

# The bug that ruins every demo.

orders = (
    Order.objects
    .annotate(
        item_count=Count('items'),        # LEFT JOIN order_items
        payment_count=Count('payments'),   # LEFT JOIN payments
    )
)

# An order with 3 items and 2 payments:
# LEFT JOIN produces 3 x 2 = 6 rows
# item_count = 6 (wrong — should be 3)
# payment_count = 6 (wrong — should be 2)

# The fix: Count with distinct=True
orders = (
    Order.objects
    .annotate(
        item_count=Count('items', distinct=True),
        payment_count=Count('payments', distinct=True),
    )
)
# item_count = 3 (correct)
# payment_count = 2 (correct)

# Or, for Sum aggregations where distinct doesn't apply,
# move one aggregation into a Subquery.

The distinct=True parameter solves the counting problem by deduplicating before aggregation. But it does not help with Sum — you cannot sum distinct values when the values themselves might legitimately repeat (two items priced at $29.99 are not duplicates; they are two items).

For Sum across multiple joins, the correct solution is a Subquery annotation — which isolates each aggregation from the others. This is not a Django limitation. It is a fundamental property of SQL joins: when you join table A to table B and table C simultaneously, you get |B| x |C| rows per row in A. Any aggregate computed on that result set reflects the cartesian product, not the individual relationship.

A well-designed manager prevents this trap. You solve it once, inside the method, document why distinct=True is there, and nobody encounters the bug again. This is exactly the kind of knowledge that belongs in a QuerySet method rather than in a code review comment that nobody reads twice.

Subquery annotations: when JOINs cannot be trusted

The double-counting problem drives the most important architectural decision in your manager design: when to use a JOIN annotation versus a correlated Subquery.

Allow me to be direct. Correlated subqueries have a reputation for being slow, and the reputation is earned. PostgreSQL evaluates the inner query for every row in the outer query. With 10,000 orders, that is 10,000 executions of each subquery. Without an index on the correlation column, each execution performs a sequential scan of the inner table. The arithmetic is unforgiving.

But reputation is not the same as universal truth. There are scenarios where subqueries are not merely acceptable — they are the only correct choice.

from django.db.models import Subquery, OuterRef, DecimalField

class OrderQuerySet(models.QuerySet):
    def with_latest_payment(self):
        """Annotate each order with its most recent payment date.
        Uses a correlated subquery — expensive, but sometimes
        the only option for non-aggregatable data."""
        latest = (
            Payment.objects
            .filter(order=OuterRef('pk'))
            .order_by('-paid_at')
            .values('paid_at')[:1]
        )
        return self.annotate(
            latest_payment=Subquery(latest)
        )

    def with_refund_total(self):
        """Total refunded amount per order.
        Subquery avoids the double-counting trap when
        JOINing multiple child tables simultaneously."""
        refund_sum = (
            Refund.objects
            .filter(order=OuterRef('pk'))
            .values('order')
            .annotate(total=Sum('amount'))
            .values('total')
        )
        return self.annotate(
            refund_total=Coalesce(
                Subquery(refund_sum, output_field=DecimalField()),
                0
            )
        )

The SQL:

-- Order.objects.with_latest_payment().with_refund_total()

SELECT
  orders.*,
  (
    SELECT payments.paid_at
    FROM payments
    WHERE payments.order_id = orders.id
    ORDER BY payments.paid_at DESC
    LIMIT 1
  ) AS latest_payment,
  (
    SELECT COALESCE(SUM(refunds.amount), 0)
    FROM refunds
    WHERE refunds.order_id = orders.id
  ) AS refund_total
FROM orders;

-- Two correlated subqueries. PostgreSQL executes each
-- for every row in the outer query. With 10,000 orders
-- and no index on payments.order_id or refunds.order_id,
-- this is a sequential scan disaster.

Two scenarios where subqueries are the correct choice:

  1. Non-aggregatable data. The latest payment date is not a SUM or COUNT — it is a specific row selected by ordering. You cannot express ORDER BY ... LIMIT 1 as a GROUP BY aggregate (not without a window function, and Django's annotate() does not support DISTINCT ON natively).
  2. Multiple child tables. When you need aggregates from both order_items and refunds, joining both produces a cartesian product. An order with 5 items and 3 refunds generates 15 rows, and every aggregate inflates. Subqueries compute each aggregate independently, avoiding the multiplication.

The performance cost of correlated subqueries is manageable if the inner query uses an index. This point is worth a code block of its own:

-- These indexes make correlated subqueries viable.
-- Without them, each subquery execution is a sequential scan.

-- For with_latest_payment():
-- Covers the filter (order_id) and the ORDER BY (paid_at DESC)
CREATE INDEX idx_payments_order_paid
  ON payments (order_id, paid_at DESC);

-- For with_refund_total():
-- Covers the filter and the aggregation source
CREATE INDEX idx_refunds_order_amount
  ON refunds (order_id, amount);

-- With these indexes, each subquery execution is an index-only
-- scan: ~0.01ms per row instead of ~2ms per row.
-- At 10,000 orders: 100ms vs 20,000ms.

With a covering index, the subquery for each row is an index-only scan — microseconds, not milliseconds. Without it, you are running a table scan per row, which is where EXPLAIN ANALYZE starts showing numbers that make you reconsider your career choices.

I should be honest about the boundary here. If your outer query returns 100,000 rows and each has two correlated subqueries, you are asking PostgreSQL to execute 200,000 index lookups. Even at 0.01ms each, that is 2 seconds of pure subquery execution. At that scale, consider materializing the aggregation — a database view, a periodic refresh, or a denormalized column updated by a trigger. The subquery pattern excels for datasets in the low tens of thousands. Beyond that, it is worth measuring rather than assuming.

Case/When conditional aggregation: the subquery eliminator

Many developers reach for subqueries when they need per-category breakdowns. "Count shipped items" becomes a subquery. "Count pending items" becomes another subquery. "Count returned items" becomes a third. Three correlated subqueries where one conditional aggregation would do.

This is the pattern I find myself recommending most often, because it replaces the most expensive approach with the least expensive one.

from django.db.models import Case, When, Value, IntegerField, Sum, Q

class OrderQuerySet(models.QuerySet):
    def with_status_breakdown(self):
        """Count items by fulfillment status using
        conditional aggregation. One query, no subqueries."""
        return self.annotate(
            shipped_count=Count(
                'items',
                filter=Q(items__status='shipped')
            ),
            pending_count=Count(
                'items',
                filter=Q(items__status='pending')
            ),
            returned_count=Count(
                'items',
                filter=Q(items__status='returned')
            ),
        )

    def with_revenue_by_category(self):
        """Sum revenue per product category, per order.
        Case/When avoids N separate subqueries."""
        return self.annotate(
            electronics_revenue=Sum(
                Case(
                    When(
                        items__product__category='electronics',
                        then='items__price'
                    ),
                    default=Value(0),
                    output_field=DecimalField(),
                )
            ),
            clothing_revenue=Sum(
                Case(
                    When(
                        items__product__category='clothing',
                        then='items__price'
                    ),
                    default=Value(0),
                    output_field=DecimalField(),
                )
            ),
        )

The SQL Django generates:

-- Order.objects.with_status_breakdown()

SELECT
  orders.*,
  COUNT(order_items.id) FILTER (
    WHERE order_items.status = 'shipped'
  ) AS shipped_count,
  COUNT(order_items.id) FILTER (
    WHERE order_items.status = 'pending'
  ) AS pending_count,
  COUNT(order_items.id) FILTER (
    WHERE order_items.status = 'returned'
  ) AS returned_count
FROM orders
LEFT OUTER JOIN order_items
  ON order_items.order_id = orders.id
GROUP BY orders.id;

-- Django translates filtered Count() into PostgreSQL
-- FILTER (WHERE ...) clauses. One pass over the data.
-- No subqueries. No correlated anything.
-- This is the pattern.

Django translates Count('items', filter=Q(...)) into PostgreSQL's FILTER (WHERE ...) clause — a single pass over the joined data that splits the count by condition. This is not a Django trick. It is standard SQL that PostgreSQL executes efficiently because there is one LEFT JOIN, one GROUP BY, and one scan of the order_items table.

Compare this to three correlated subqueries, each scanning order_items independently. With 100,000 orders, the conditional aggregation version runs one table scan. The subquery version runs 300,000 index lookups (or worse, 300,000 sequential scans). The performance difference is not 3x — it is orders of magnitude, because a single sequential scan of order_items is faster than 300,000 index-seek round trips.

The with_revenue_by_category method shows the same pattern with Sum and Case/When. Any time you need a pivot-style breakdown — counts by status, sums by category, averages by region — conditional aggregation is the answer. Wrap it in a QuerySet method, give it a clear name, and your views never need to know the difference between a FILTER clause and a subquery.

A candid note on FILTER (WHERE ...): this is a PostgreSQL extension to standard SQL. MySQL and SQLite do not support it. Django is clever enough to rewrite the expression as a CASE WHEN ... THEN 1 ELSE NULL END for databases that lack FILTER, so your code remains portable — but the PostgreSQL version is faster because FILTER short-circuits the evaluation. If you are reading this article, you are running PostgreSQL. You get the fast path.

Func expressions: reaching into PostgreSQL-specific features

Django's ORM is database-agnostic by default. That is a feature — until you are running PostgreSQL and want to use array_length(), similarity(), or date_part(). The Func expression bridges the gap, letting you call any PostgreSQL function from inside a QuerySet method.

from django.db.models import Func, FloatField, CharField
from django.db.models.functions import Cast, Extract

class PgArrayLength(Func):
    """Wraps PostgreSQL's array_length() function."""
    function = 'array_length'
    template = '%(function)s(%(expressions)s, 1)'

class PgSimilarity(Func):
    """Wraps pg_trgm's similarity() for fuzzy matching."""
    function = 'similarity'

class ProductQuerySet(models.QuerySet):
    def with_tag_count(self):
        """Count tags stored in a PostgreSQL array column.
        Faster than COUNT on a join table."""
        return self.annotate(
            tag_count=PgArrayLength('tags')
        )

    def similar_to(self, term, threshold=0.3):
        """Fuzzy search using pg_trgm trigram similarity.
        Requires: CREATE EXTENSION pg_trgm;
        Requires: GIN or GiST index on the name column."""
        return (
            self
            .annotate(
                name_similarity=PgSimilarity(
                    'name', Value(term),
                    output_field=FloatField()
                )
            )
            .filter(name_similarity__gte=threshold)
            .order_by('-name_similarity')
        )

    def with_age_days(self):
        """Days since creation using PostgreSQL date arithmetic."""
        return self.annotate(
            age_days=Extract(
                timezone.now() - models.F('created_at'),
                'day'
            )
        )

The SQL for the fuzzy search:

-- Product.objects.similar_to('postgrse', threshold=0.3)

SELECT
  products.*,
  similarity(products.name, 'postgrse') AS name_similarity
FROM products
WHERE similarity(products.name, 'postgrse') >= 0.3
ORDER BY name_similarity DESC;

-- PostgreSQL pg_trgm handles the typo gracefully.
-- "postgrse" matches "PostgreSQL" at ~0.45 similarity.
-- This requires a GIN index with gin_trgm_ops:
-- CREATE INDEX ON products USING GIN (name gin_trgm_ops);

The similarity() function comes from PostgreSQL's pg_trgm extension. It handles typos, partial matches, and phonetic similarity with zero application-side logic. But it requires a GIN index with gin_trgm_ops to avoid sequential scans on large tables.

Custom Func expressions let your managers use PostgreSQL as PostgreSQL — not as a generic SQL database that happens to be running. Array operations, trigram searches, full-text search with ts_rank(), geometric functions, JSON path queries — all of these become composable QuerySet methods that chain with your existing filters and annotations.

I want to acknowledge the tradeoff directly, because pretending it does not exist would be an embarrassment. Portability. A manager that uses array_length() will not work on MySQL or SQLite. For most production applications, this is not a concern — you chose PostgreSQL for a reason, and your managers should take advantage of that choice. But if your test suite runs against SQLite (a practice I regard with some scepticism but that remains common), your PostgreSQL-specific QuerySet methods will fail in tests.

The solution is not to avoid PostgreSQL features. It is to run your tests against PostgreSQL. A test suite that validates behaviour against a different database engine than your production database is a test suite that validates the wrong thing. I am aware this is a strong opinion. I hold it with full confidence.

"PostgreSQL is not a database. It is an ecosystem that most teams use as a database. The features you are not using — materialized views, partial indexes, expression indexes, trigram search — are not advanced. They are simply undiscovered."

— from You Don't Need Redis, Chapter 19: The Case for Simplicity

Auto-eager managers and Prefetch with custom querysets

Some models have relationships that are accessed in every context — every view, every serializer, every management command. For these models, it makes sense to eager-load by default, directly in the manager's get_queryset().

class AutoEagerOrderManager(models.Manager):
    """Manager that always applies select_related and
    prefetch_related. Every query through this manager
    gets eager loading by default."""

    def get_queryset(self):
        return (
            super()
            .get_queryset()
            .select_related('customer', 'shipping_address')
            .prefetch_related(
                Prefetch(
                    'items',
                    queryset=OrderItem.objects.select_related('product')
                )
            )
        )

# Usage: Order.objects.filter(status='active')
# automatically includes customer, shipping_address,
# and items with their products.
#
# The Prefetch object controls the inner query:
# it eager-loads product on each OrderItem,
# preventing a second layer of N+1.

Every query through Order.objects now includes eager loading. No view needs to remember. No code review needs to catch the missing select_related(). The manager handles it.

For more targeted scenarios, Prefetch with a custom queryset lets you filter, sort, or limit the prefetched data:

from django.db.models import Prefetch

class OrderQuerySet(models.QuerySet):
    def with_recent_items(self):
        """Prefetch only items from the last 7 days,
        with their products pre-loaded."""
        recent_items = (
            OrderItem.objects
            .filter(created_at__gte=timezone.now() - timedelta(days=7))
            .select_related('product')
            .order_by('-created_at')
        )
        return self.prefetch_related(
            Prefetch('items', queryset=recent_items, to_attr='recent_items')
        )

# order.recent_items  — a plain list, not a QuerySet
# order.items.all()   — still available, still lazy
#
# to_attr stores the prefetched result as a list attribute
# instead of replacing the relationship manager.
# This means you can have multiple Prefetch objects
# on the same relationship with different filters.

The to_attr parameter is worth examining. Without it, prefetch_related('items') replaces the items manager cache — subsequent calls to order.items.all() return the prefetched data instead of hitting the database. With to_attr='recent_items', the prefetched results are stored as a plain Python list on a new attribute, leaving the original relationship manager untouched.

This is particularly useful when you need multiple filtered views of the same relationship. One Prefetch for recent items, another for high-value items, each stored on a different attribute. All resolved in the initial query batch.

One architectural note that I feel strongly about: if you define an auto-eager manager as the default, keep a second plain manager available for contexts where you do not want the overhead. Migrations, data imports, and management commands rarely need eager loading, and the extra JOINs can slow down bulk operations substantially.

class Order(models.Model):
    customer = models.ForeignKey(Customer, on_delete=models.CASCADE)
    status = models.CharField(max_length=20)
    created_at = models.DateTimeField(auto_now_add=True)

    # Default manager — always applies eager loading
    objects = OrderManager()

    # Raw manager — for admin, migrations, data imports
    raw_objects = models.Manager()

    class Meta:
        # Which manager Django uses for related lookups,
        # admin, dumpdata, etc.
        default_manager_name = 'objects'

The default_manager_name setting in Meta tells Django which manager to use for internal operations — related lookups from other models, the admin interface, dumpdata, and loaddata. Set it to your custom manager so that related lookups benefit from eager loading, but keep raw_objects available for the rare cases where you need a clean, undecorated QuerySet.

Controlling column width: only() and defer()

Most discussions of Django query optimization focus on the number of queries. And rightly so — reducing 1,501 queries to 3 is the largest single improvement you will ever make. But once the query count is correct, the width of each query begins to matter.

If your Order model has a notes TextField, an internal_comments TextField, and a raw_payload JSONField, every SELECT * transfers those columns across the wire — even if your list view only displays the order status and creation date. On a table where raw_payload averages 4KB per row and you are loading 500 rows, that is 2MB of data you did not need and did not use.

class OrderQuerySet(models.QuerySet):
    def lightweight(self):
        """Exclude heavy text columns from the SELECT.
        Useful for list views that don't need full content."""
        return self.defer('notes', 'internal_comments', 'raw_payload')

    def summary_only(self):
        """Select only the columns needed for a list view.
        Inverse of defer — explicit inclusion."""
        return self.only('id', 'status', 'created_at', 'customer_id')

# Usage in a list view:
orders = (
    Order.objects
    .active()
    .recent(days=30)
    .summary_only()
    .with_customer()
)

# Generated SQL:
# SELECT orders.id, orders.status, orders.created_at,
#        orders.customer_id, customers.*
# FROM orders
# INNER JOIN customers ON ...
# WHERE orders.status = 'active' AND ...
#
# No notes, no internal_comments, no raw_payload.
# On a table where raw_payload averages 4KB per row
# and you're loading 500 rows, that's 2MB of data
# you didn't need and didn't transfer.

A word of caution. Accessing a deferred field triggers an additional database query to fetch that specific column. If you defer notes and then access order.notes in a template, Django silently fires a SELECT notes FROM orders WHERE id = ... query for each row. This is its own N+1 problem, and it is harder to detect because it does not appear in the original QuerySet chain.

The rule is simple: defer columns you will not access. If there is any doubt, do not defer. The cost of transferring an unused column is measured in bandwidth. The cost of accidentally triggering N deferred-field queries is measured in round trips. The latter is always worse.

When the ORM reaches its limit: .raw() and its consequences

I have spent this article demonstrating the considerable range of Django's ORM — annotations, subqueries, conditional aggregation, custom functions. It covers a great deal of ground. But I would be dishonest if I did not acknowledge the boundaries.

There are PostgreSQL features that Django's ORM cannot express. Window functions beyond simple Rank and RowNumber. Recursive CTEs. LATERAL joins. GROUPING SETS. DISTINCT ON with complex ordering. When you genuinely need these, .raw() is the escape hatch.

class OrderQuerySet(models.QuerySet):
    def with_percentile_rank(self):
        """Percentile rank by order value.
        Uses window functions that Django's ORM
        doesn't support natively."""
        return self.raw('''
            SELECT
              orders.*,
              PERCENT_RANK() OVER (
                ORDER BY (
                  SELECT COALESCE(SUM(price), 0)
                  FROM order_items
                  WHERE order_id = orders.id
                )
              ) AS value_percentile
            FROM orders
            WHERE status = 'active'
        ''')

    # WARNING: .raw() returns a RawQuerySet.
    # It does NOT support chaining (.filter, .annotate).
    # It does NOT support prefetch_related.
    # It IS a dead end in your QuerySet chain.
    #
    # Use .raw() only when the ORM genuinely cannot
    # express the query. Then accept the limitations.
    # Do not try to chain after it.

The warnings in that code block are not decoration. .raw() returns a RawQuerySet, which is a fundamentally different object than a QuerySet. It does not support .filter(). It does not support .annotate(). It does not support prefetch_related(). It is a dead end in your chain.

This means a QuerySet method that uses .raw() breaks the composability contract that makes custom managers useful. You can call it, but you cannot chain after it. For this reason, I suggest reserving .raw() for terminal methods — methods that return final results, not intermediate QuerySets.

There is an alternative worth considering: django.db.models.expressions.RawSQL. Unlike .raw(), a RawSQL expression can be used inside .annotate() and .filter(), preserving chainability. The tradeoff is that you are embedding raw SQL fragments inside a QuerySet chain, which is precisely as fragile as it sounds. But for a single column expression that the ORM cannot generate — a window function, a lateral subquery — it is a pragmatic compromise.

The honest assessment: if more than 10% of your QuerySet methods use .raw() or RawSQL, you are either doing very advanced PostgreSQL work (in which case, well done) or fighting the ORM instead of working with it (in which case, consider whether Django is the right tool for this particular model). Most applications need none. A few need one or two. That is the right ratio.

Testing your managers: assertNumQueries is non-negotiable

A custom manager without tests is a promise without evidence. You have encoded an access pattern that reduces 1,501 queries to 3. How do you know it stays at 3 after the next developer adds a new field, changes a relationship, or upgrades Django?

Django provides assertNumQueries — a context manager that counts the number of database queries executed within its block and fails the test if the count does not match. This is your most important tool.

from django.test import TestCase

class OrderManagerTest(TestCase):
    def setUp(self):
        # Create test data: 10 orders, each with 3 items
        self.customer = Customer.objects.create(name='Test Customer')
        for i in range(10):
            order = Order.objects.create(
                customer=self.customer,
                status='active',
                created_at=timezone.now() - timedelta(days=i)
            )
            for j in range(3):
                OrderItem.objects.create(
                    order=order,
                    product=self.product,
                    price=Decimal('29.99')
                )

    def test_with_customer_uses_one_query(self):
        """select_related produces a JOIN, not N+1."""
        with self.assertNumQueries(1):
            orders = list(
                Order.objects.active().with_customer()
            )
            # Access the related customer — no additional query
            for order in orders:
                _ = order.customer.name

    def test_with_items_uses_two_queries(self):
        """prefetch_related uses one query for orders,
        one for items. Total: 2, regardless of row count."""
        with self.assertNumQueries(2):
            orders = list(
                Order.objects.active().with_items()
            )
            for order in orders:
                _ = list(order.items.all())

    def test_with_stats_uses_one_query(self):
        """Annotation computes in the database, not Python."""
        with self.assertNumQueries(1):
            orders = list(
                Order.objects.active().with_stats()
            )
            for order in orders:
                self.assertEqual(order.item_count, 3)
                self.assertEqual(
                    order.total_value,
                    Decimal('89.97')
                )

These tests are not testing business logic. They are testing database access patterns. The assertion is not "does this return the right data" — it is "does this return the right data with the expected number of queries." Both matter. The first prevents bugs. The second prevents performance regressions.

I cannot overstate how valuable this is. A developer adds a new field to the Order model with a foreign key. The auto-eager manager does not include select_related for the new field. Without a query-count test, nobody notices until production dashboards slow down. With the test, the CI pipeline fails immediately: expected 1 query, got 501.

For quick debugging during development — not for CI, not for production, but for your terminal while you are building the manager — Django's connection query log is informative:

# In development, quick and dirty query counting.
# Not for production. Not for CI. For your terminal.

from django.db import connection, reset_queries
from django.conf import settings

settings.DEBUG = True  # Required for query logging
reset_queries()

# Run your manager chain
orders = list(
    Order.objects
    .active()
    .recent(days=30)
    .with_customer()
    .with_stats()
)

print(f"Queries executed: {len(connection.queries)}")
for q in connection.queries:
    print(f"  [{q['time']}s] {q['sql'][:120]}")

# Output:
# Queries executed: 1
#   [0.004s] SELECT orders.*, COUNT(order_items.id) AS ...
#
# Compare to the naive view:
# Queries executed: 1,501
#   [0.001s] SELECT * FROM orders WHERE ...
#   [0.000s] SELECT * FROM customers WHERE id = 42
#   ... (1,499 more lines you don't want to read)

The difference between reading "Queries executed: 1" and "Queries executed: 1,501" in your terminal is the difference between a manager that works and a manager that merely exists.

The honest counterpoints

I have made a sustained argument for custom managers. It is a good argument, and I stand behind it. But a waiter who overstates his case is no waiter at all. Here are the boundaries.

Managers add a layer of indirection

When a developer reads Order.objects.active().with_stats(), they see intent. When they need to debug the generated SQL, they must navigate to the QuerySet class, find the method, understand what annotation it applies, and mentally compose the SQL from the chain. This is more indirection than Order.objects.filter(status='active').annotate(item_count=Count('items')), where the SQL is visible inline.

The tradeoff is worth it for patterns that repeat across multiple views. It is not worth it for one-off queries used in a single view. Not every filter combination deserves a named method. If a QuerySet method is called from exactly one place, it is premature abstraction — move the logic inline and save the manager for patterns that genuinely repeat.

Complex chains produce complex SQL

A chain of six QuerySet methods compiles to a single SQL statement that may involve multiple JOINs, subqueries, annotations, and filter conditions. That SQL can be difficult to read, difficult to optimize, and difficult to correlate back to the Python code that generated it. When EXPLAIN ANALYZE shows a slow plan, tracing it to the responsible QuerySet method requires familiarity with how Django translates each ORM call to SQL.

This is inherent to the abstraction. The solution is not to avoid complex chains — it is to run print(queryset.query) during development and verify the generated SQL against your expectations. A QuerySet method that generates SQL you did not expect is a method that needs revision, regardless of whether it produces correct results.

The ORM is not always the right tool

Custom managers optimize queries that the ORM can express. For reporting queries with complex window functions, recursive hierarchies, or multi-table pivot aggregations, the ORM's expressiveness has limits. A well-written raw SQL query, reviewed and tested, is better than an ORM chain that contorts itself into generating the right SQL through a sequence of hacks.

I do not regard this as a failure of the ORM. I regard it as a boundary. The ORM excels at CRUD operations, relationship traversal, and standard aggregations. Beyond that boundary, raw SQL is not a compromise — it is the appropriate tool.

Auto-eager loading can over-fetch

A manager that always applies select_related('customer', 'shipping_address') is helpful for views that display order details. It is wasteful for management commands that only need order IDs for a bulk status update. The JOINs execute regardless, transferring customer and address data that the command discards.

This is why the dual-manager pattern — objects for the custom manager, raw_objects for a plain manager — exists. Use it. The overhead of maintaining two managers is trivial compared to the overhead of unnecessary JOINs in batch operations.

The pattern reference

Every pattern in this article maps to a specific SQL strategy with specific tradeoffs. Here they are, side by side.

PatternUse caseSQL strategyQueriesWatch out
select_related in managerFK / one-to-one lookupsJOIN1Wide result set with many relations
prefetch_related in managerReverse FK / M2M collectionsSELECT ... WHERE fk IN (...)1 + N relationsLarge IN lists with thousands of PKs
Prefetch with custom querysetFiltered / sorted child dataSELECT ... WHERE fk IN (...) AND ...1 + N relationsto_attr returns list, not QuerySet
Count/Sum/Avg annotationPer-row aggregatesLEFT JOIN + GROUP BY1Double-counting with multiple JOINs
Subquery annotationNon-aggregatable data, avoiding double-countCorrelated subquery1 (but expensive)Sequential scan without FK index
Case/When conditional aggregationPivot-style breakdownsFILTER (WHERE ...)1None significant
Func expressionPostgreSQL-specific functionsFunction call in SELECT1Requires extension or function to exist
only() / defer()Reducing SELECT column widthSELECT col1, col2 ...1Deferred field access triggers extra query

The decision tree is straightforward. For relationships you always need: select_related (FK/one-to-one) or prefetch_related (reverse FK/M2M). For computed values from a single child table: annotation with Count/Sum/Avg. For computed values from multiple child tables: subquery annotations to avoid the cartesian product. For pivot-style breakdowns: Case/When conditional aggregation. For PostgreSQL-specific operations: Func expressions. For reducing transfer size: only() and defer().

Wrap each one in a QuerySet method. Name it clearly. Chain them. Your views become a single line of intent, and the SQL takes care of itself.

A note on naming

The names of your QuerySet methods are part of your codebase's vocabulary. They are how developers think about data access. Poor names create confusion; good names create understanding.

A few conventions I have found reliable:

  • with_* for methods that add data via eager loading or annotation: with_customer(), with_stats(), with_latest_payment(). The prefix signals that the method enriches each row without changing which rows are returned.
  • Adjective or noun for filters: active(), recent(), expensive(). These read naturally in chains: Order.objects.active().recent().expensive() — active, recent, expensive orders.
  • for_* for context-specific QuerySets: for_dashboard(), for_export(), for_api(). These are composite methods that combine several with_* calls and filters into a single call for a specific use case.

The goal is that a chain reads like a description of what you want: Order.objects.active().recent(days=7).with_customer().with_stats(). Active orders from the last seven days, with customer data and computed statistics. If the chain reads like a sentence, the naming is correct.

What Gold Lapel does with the SQL your managers generate

A well-structured manager generates clean, intentional SQL. But even clean SQL needs the right database infrastructure to run efficiently. A Subquery annotation that correlates on payments.order_id is elegant — until order_id is not indexed and PostgreSQL resorts to a sequential scan for every row.

This is where a common misunderstanding arises. Django developers often assume that defining a ForeignKey in their model creates a database index on the foreign key column. It does not. Django creates the constraint — the referential integrity check — but PostgreSQL does not automatically create an index on the referencing column. The constraint ensures that every order_id in payments points to a valid orders row. The index that would make lookups by order_id fast is a separate concern, and it is absent by default.

Gold Lapel sits between your Django application and PostgreSQL, observing every query your managers produce. When it sees a correlated subquery filtering on payments.order_id thousands of times per hour, it checks for an index. If there is none, it creates one — CREATE INDEX CONCURRENTLY, no downtime, no migration file, no deploy.

The same applies to the IN clauses generated by prefetch_related. Every SELECT * FROM order_items WHERE order_id IN (...) depends on a B-tree index on order_id. Foreign keys in PostgreSQL do not create indexes automatically — the constraint exists, but the index does not. Gold Lapel fills that gap for every foreign key your prefetch patterns touch.

For managers that use conditional aggregation or heavy annotation chains — the .with_stats().with_status_breakdown() combinations that produce complex GROUP BY queries — Gold Lapel identifies repeated aggregation patterns and can materialize them as views. The first request computes the aggregation. Subsequent requests read from the materialized result until the underlying data changes.

Your custom managers define what to ask PostgreSQL. Gold Lapel makes sure PostgreSQL is prepared to answer efficiently. The managers encode the access pattern. The proxy ensures the infrastructure matches.

I should note that Gold Lapel is not required for any of the techniques in this article. Every pattern described here works with a standard Django-to-PostgreSQL connection. Gold Lapel adds the infrastructure layer — automatic indexing, query plan analysis, materialization — that makes the difference between "this manager generates good SQL" and "this manager generates good SQL that runs on a well-prepared database." The distinction matters at scale. It is not essential at five hundred orders. It is noticeable at fifty thousand.

The shape of a well-managed model

If you have followed along — and I appreciate your patience, as this has been a thorough tour — you now have the vocabulary for a complete manager design. Allow me to describe what the finished architecture looks like in practice.

The model has a single custom manager set as objects, created via QuerySet.as_manager(). The QuerySet class defines methods in three categories: filters (active(), recent(), expensive()), enrichments (with_customer(), with_stats(), with_latest_payment()), and composites (for_dashboard(), for_api()). A plain models.Manager() is available as raw_objects for bulk operations. The test suite includes assertNumQueries tests for every enrichment method. And the naming is consistent: with_* enriches, adjectives filter, for_* composes.

The views that use this model are short. A dashboard view is a single QuerySet chain and a template. An API endpoint is a single QuerySet chain and a serializer. The SQL is generated by the chain, optimized by PostgreSQL, and validated by the test suite. No developer needs to remember which relationships to eager-load. No code review needs to catch a missing annotation. The manager knows. The manager remembers.

In infrastructure, boring is the highest compliment available. A manager that produces the same three queries on every request, regardless of who wrote the view, is boring in exactly the right way. It is the kind of boring that lets you sleep through the night while your dashboard serves requests at 40ms instead of 4,000ms.

I trust your QuerySets are in order. If they are not, the patterns above will see to it. And if PostgreSQL needs some attention on its end — well. That is what staff are for.

Frequently asked questions

Terms referenced in this article

If you'll permit me — the Subquery annotations and Case/When patterns above generate SQL that benefits enormously from proper indexing. I have written a guide to every PostgreSQL index type that explains when B-tree, GIN, GiST, and partial indexes apply, so the SQL your managers generate meets the indexes it deserves.