PostgreSQL Table Partitioning in Django: A Production Guide with django-postgres-extra
Django's ORM has no opinion on partitioning. PostgreSQL has very strong opinions. Allow me to mediate.
Good evening. Your table has outgrown itself.
There is a moment in every growing Django application's life when a certain table crosses a threshold. Not a dramatic failure. Not a crash. Just a gradual thickening of response times, a lengthening of vacuum cycles, a growing unease when someone mentions running migrations on "the events table."
I have seen this table before. It has 800 million rows. It occupies 340 GB on disk. Sequential scans take 45 seconds. Autovacuum runs for 6 hours and still cannot keep pace with the bloat. Index maintenance dominates your I/O budget. Your REINDEX operations require a maintenance window that your product team no longer wishes to grant you. And yet — and I say this with genuine sympathy — 92% of your queries only touch the last 30 days of data.
Ninety-two percent. The remaining 8% are analytics queries that run overnight and could comfortably read from a separate store. Your application reads 28 GB of relevant data and ignores 312 GB of perfectly archived history every single day. It does this because PostgreSQL has no way to know that the history is irrelevant. To PostgreSQL, the table is the table. All 340 GB of it.
This is the problem that table partitioning solves. Not by making queries faster in some abstract sense, but by making PostgreSQL stop reading 340 GB when it only needs 28 GB. By giving autovacuum twelve manageable tables instead of one unwieldy monolith. By letting you drop an entire month of data in 12 milliseconds instead of running a DELETE that generates 41 GB of WAL and creates 58 million dead tuples.
Django's ORM has no built-in support for partitioning. It does not know that partitioned tables exist. It has never heard of them. The django-postgres-extra library fills that gap — not perfectly, but well enough to run partitioned tables in production with full ORM compatibility. This guide covers how to set it up, what to watch out for, where it genuinely hurts, and when the complexity is actually worth the trouble.
If you will permit me, I should like to be thorough. There are a number of sharp edges in this territory, and I would prefer you encounter them here rather than in production at three in the morning.
When does partitioning actually pay off?
Partitioning is not free. It adds schema complexity, requires the partition key in every unique constraint, complicates migrations, can make some queries slower if the planner cannot prune partitions, and introduces a new operational concern: ensuring future partitions exist before data arrives. Before reaching for it, be honest about whether your table needs it.
I have prepared a reference.
| Table size | Row count | Verdict | Reasoning |
|---|---|---|---|
| < 10 GB | < 10M | Do not partition | Indexes handle this range efficiently. Partitioning adds complexity for no measurable benefit. |
| 10-50 GB | 10M-100M | Unlikely | Proper indexing and vacuum tuning solve most problems at this scale. Consider partitioning only if you need instant data retention (DROP vs DELETE). |
| 50-100 GB | 100M-500M | Consider it | If queries naturally filter by a time or category column and VACUUM is struggling, partitioning helps. If not, improve indexes first. |
| 100 GB - 1 TB | 500M-5B | Strongly recommended | VACUUM struggles. Index maintenance dominates I/O. Partition pruning dramatically reduces scan times. Maintenance becomes manageable. |
| > 1 TB | > 5B | Essential | Without partitioning, autovacuum falls behind, bloat accumulates unchecked, and routine operations become multi-hour affairs. |
The 100 GB threshold is not arbitrary. Below it, PostgreSQL's planner, indexes, and autovacuum handle things capably. A well-indexed 50 GB table with properly tuned autovacuum is fast, maintainable, and operationally simple. Above 100 GB, three problems converge: autovacuum takes too long and bloat accumulates, index scans traverse deeper B-trees with more levels, and maintenance operations (REINDEX, VACUUM FULL, pg_repack) become multi-hour affairs that compete with production traffic for I/O.
There is also a query pattern requirement, and this is the part that most articles understate. Partitioning helps when your queries naturally filter on the partition key. If your events table is partitioned by created_at but most queries filter by user_id without a date range, the planner cannot prune partitions and must open, plan, and scan all of them. That is worse than a single table with a good composite index on (user_id, created_at).
The ideal candidate for partitioning has three characteristics:
- Size: the table exceeds 100 GB or is growing toward it at a predictable rate.
- Access pattern: queries consistently filter on a time column, a region column, or a tenant ID — a column with natural partitioning semantics.
- Lifecycle: old data has different access patterns than new data. Recent data receives writes and frequent reads. Historical data is read-only, queried infrequently, and may be subject to retention policies.
If your table meets all three criteria, partitioning is almost certainly worth the operational investment. If it meets only one or two, proceed with caution and consider whether improved indexing, autovacuum tuning, or archival to a separate table would solve the problem with less complexity.
Before we proceed: honest counterpoints
I should be forthcoming about the costs of partitioning, because pretending they do not exist would be a disservice to you and an embarrassment to me. Every strong claim deserves an honest boundary.
| Common claim | The reality |
|---|---|
| Partitioning makes everything faster | Only queries that include the partition key in their WHERE clause benefit. Queries without the partition key scan ALL partitions, which can be slower than a single table with a good index due to planning overhead. |
| More partitions are better | Each partition adds planning overhead. 60 monthly partitions spanning 5 years can add 5-15ms of planning time to every query. Daily partitions on a 3-year table means 1,095 partitions — the planner may spend more time planning than executing. |
| Partitioning replaces good indexing | Partitioning is a complement to indexing, not a substitute. A partitioned table with poor indexes is still slow. A well-indexed monolithic table is often fast enough. Partitioning shines when the table is too large for indexes and maintenance to scale. |
| You should partition proactively | Partitioning an existing table requires a full data migration with downtime. Adding partitioning to a table that does not need it creates operational complexity for no benefit. Partition when the table reaches the threshold, not before. |
I mention these not to discourage you but to calibrate your expectations. Partitioning is a powerful tool when applied to the right problem. Applied to the wrong problem, it is a source of ongoing operational friction that delivers no measurable benefit. The Waiter's role is to ensure you encounter the right problem before reaching for the tool.
"Partitioning, read replicas, connection pooling, materialized views, proper indexing — there is a long and rewarding list of techniques to exhaust before sharding enters the conversation. Most teams will never reach the end of that list."
— from You Don't Need Redis, Chapter 16: Everything to Try Before You Shard
Setting up django-postgres-extra for partitioning
The library provides three things: a model base class (PostgresPartitionedModel), migration operations for creating partitions, and a partitioning manager for automated partition creation. Here is how the setup comes together.
# Install django-postgres-extra
pip install django-postgres-extra
# settings.py — swap the database backend
DATABASES = {
"default": {
"ENGINE": "psqlextra.backend", # not django.db.backends.postgresql
"NAME": "myapp",
"HOST": "localhost",
"PORT": "5432",
}
}
INSTALLED_APPS = [
"psqlextra",
# ... your apps
] The custom database backend is essential. It extends Django's default PostgreSQL backend with partition-aware DDL generation. Without it, makemigrations produces standard CREATE TABLE statements that omit the PARTITION BY clause entirely. The backend is otherwise identical to Django's — queries, transactions, and connection handling are unchanged.
With the backend configured, define your first partitioned model.
# models.py — a range-partitioned model with django-postgres-extra.
# The table will be partitioned by created_at (monthly ranges).
from django.db import models
from psqlextra.models import PostgresPartitionedModel
from psqlextra.types import PostgresPartitioningMethod
class Event(PostgresPartitionedModel):
class PartitioningMeta:
method = PostgresPartitioningMethod.RANGE
key = ["created_at"]
id = models.BigAutoField(primary_key=True)
created_at = models.DateTimeField()
user_id = models.IntegerField()
event_type = models.CharField(max_length=64)
payload = models.JSONField(default=dict)
class Meta:
# Indexes are created on each partition automatically.
indexes = [
models.Index(fields=["user_id", "created_at"]),
models.Index(fields=["event_type"]),
] The PartitioningMeta inner class tells django-postgres-extra to append PARTITION BY RANGE (created_at) to the CREATE TABLE statement. The model otherwise behaves like any Django model — queries, filters, aggregations, and the admin all work normally. You will not notice the difference until you run the migration.
Three partitioning methods are available, matching PostgreSQL's native support:
- Range — partition by value ranges. Best for time-series data. Monthly or weekly partitions are typical. This is what you want 80% of the time.
- List — partition by exact values. Best for categorical data like region codes or tenant IDs with a known, finite set of values.
- Hash — partition by hash of a column value. Best for even distribution when there is no natural range or list grouping. Multi-tenant SaaS applications are the classic case.
Range partitioning accounts for roughly 80% of real-world usage. If your table grows over time and queries filter by time, range partitioning is almost certainly the right choice. I shall focus on it throughout this guide, with dedicated sections on hash and list partitioning further on.
Now. About that migration. When you run python manage.py migrate, something unfortunate happens.
-- What PostgreSQL receives from the migration above:
CREATE TABLE event (
id BIGSERIAL NOT NULL,
created_at TIMESTAMP WITH TIME ZONE NOT NULL,
user_id INTEGER NOT NULL,
event_type VARCHAR(64) NOT NULL,
payload JSONB NOT NULL DEFAULT '{}',
PRIMARY KEY (id)
) PARTITION BY RANGE (created_at);
-- And here is where it fails:
-- ERROR: unique constraint on partitioned table must include
-- all partitioning columns
-- DETAIL: PRIMARY KEY constraint on table "event" lacks column "created_at" And here we arrive at the partitioning gotcha that has derailed more first attempts than any other.
The primary key constraint that stops everyone cold
This is where most Django developers' first attempt at partitioning ends. You define your model, run makemigrations, run migrate, and PostgreSQL returns an error that deserves a clear explanation.
# The partition key gotcha: it MUST be part of the primary key.
#
# PostgreSQL enforces this: the partition key must appear in
# every unique constraint, including the primary key.
# The reason is mechanical — uniqueness is enforced per-partition,
# not across the parent table. Without the partition key in the PK,
# two partitions could each contain id = 42.
# This WILL NOT work:
class Event(PostgresPartitionedModel):
class PartitioningMeta:
method = PostgresPartitioningMethod.RANGE
key = ["created_at"]
id = models.BigAutoField(primary_key=True) # PK lacks created_at
created_at = models.DateTimeField()
# ...
# PostgreSQL error:
# ERROR: unique constraint on partitioned table must include
# all partitioning columns
# DETAIL: PRIMARY KEY constraint on table "event" lacks column "created_at"
# The fix requires a composite primary key.
# But Django's ORM was built around single-column PKs...
# This is the fundamental tension. PostgreSQL requires that every unique constraint — including the primary key — contains all partition key columns. The reason is mechanical, and worth understanding fully because it explains a great deal about how partitioning works internally.
In a partitioned table, each partition is a physically separate table. When you insert a row, PostgreSQL routes it to the correct partition based on the partition key value. When you enforce uniqueness, PostgreSQL checks the constraint within the destination partition only. It does not check other partitions — doing so would require a cross-partition scan on every insert, which would negate the performance benefits of partitioning entirely.
If the primary key is just id, two different partitions could each contain a row with id = 42. The January partition gets (id=42, created_at='2026-01-15') and the February partition gets (id=42, created_at='2026-02-15'). PostgreSQL checks uniqueness within each partition and finds no conflict in either one. You now have two rows with the same primary key. This is, to put it mildly, not what anyone wants.
The solution is to include the partition key in the primary key: PRIMARY KEY (id, created_at). Now the uniqueness check is meaningful within each partition, because each partition only contains rows for its date range. Two rows with the same id but different created_at values are, by definition, in different partitions and have different composite primary keys.
Django's ORM, however, was designed around single-column integer primary keys. The AutoField and BigAutoField types expect to be the sole primary key. Foreign keys reference a single column. The admin constructs URLs from a single PK value. The .get(pk=42) pattern assumes a single value. Changing the primary key to a composite breaks assumptions throughout the stack.
Before Django 5.2, the options were grim.
# Before Django 5.2 — the raw SQL workaround.
# This creates the table without a Django-managed PK,
# then adds a composite PK with RunSQL.
from django.db import migrations
class Migration(migrations.Migration):
operations = [
# After CreateModel for Event, add:
migrations.RunSQL(
sql="""
ALTER TABLE event DROP CONSTRAINT IF EXISTS event_pkey;
ALTER TABLE event ADD PRIMARY KEY (id, created_at);
""",
reverse_sql="""
ALTER TABLE event DROP CONSTRAINT IF EXISTS event_pkey;
ALTER TABLE event ADD PRIMARY KEY (id);
""",
),
]
# This works at the PostgreSQL level, but Django's ORM does not
# know about the composite PK. Consequences:
# - Event.objects.get(pk=42) may return wrong results
# - .save() on an existing instance may INSERT instead of UPDATE
# - The admin builds URLs from a single PK value
# - select_related across FKs may produce incorrect joins
#
# It is manageable. It is not pleasant. The raw SQL approach works at the database level but leaves the ORM confused. Event.objects.get(pk=42) generates WHERE id = 42, which can return rows from multiple partitions if the same auto-increment value was somehow reused (unlikely with a sequence, but the ORM does not know that). The .save() method uses WHERE id = 42 to find the existing row for an UPDATE, which may match the wrong row or no row at all if the ORM does not include created_at.
These are not theoretical concerns. I have seen them in production. They manifest as silent data corruption — updates applied to the wrong row, deletes that miss their target, admin pages that display one record but edit another. The ORM is not broken; it is simply working with incomplete information about the primary key.
Django 5.2's CompositePrimaryKey changes the equation
Django 5.2 introduced CompositePrimaryKey, and it is the single most important feature for partitioned tables in Django's history. It gives you a way to tell the ORM: "this table's primary key spans multiple columns, and you must use all of them."
# Django 5.2+ — CompositePrimaryKey. The official solution.
from django.db import models
from django.db.models import CompositePrimaryKey
from psqlextra.models import PostgresPartitionedModel
from psqlextra.types import PostgresPartitioningMethod
class Event(PostgresPartitionedModel):
class PartitioningMeta:
method = PostgresPartitioningMethod.RANGE
key = ["created_at"]
pk = CompositePrimaryKey("id", "created_at")
id = models.BigAutoField()
created_at = models.DateTimeField()
user_id = models.IntegerField()
event_type = models.CharField(max_length=64)
payload = models.JSONField(default=dict)
# PostgreSQL receives:
# CREATE TABLE event (
# id BIGSERIAL NOT NULL,
# created_at TIMESTAMP WITH TIME ZONE NOT NULL,
# user_id INTEGER NOT NULL,
# event_type VARCHAR(64) NOT NULL,
# payload JSONB NOT NULL DEFAULT '{}',
# PRIMARY KEY (id, created_at)
# ) PARTITION BY RANGE (created_at);
#
# The PK includes the partition key. PostgreSQL is satisfied.
# Django understands composite lookups. The ORM is satisfied.
# Everyone is satisfied. A rare convergence. With CompositePrimaryKey("id", "created_at"), Django generates a primary key that includes the partition key. PostgreSQL accepts the partitioned table definition. The ORM understands that lookups require both columns. Foreign keys from other models reference the composite key correctly. The admin constructs URLs using both values.
The ergonomic difference between the raw SQL workaround and CompositePrimaryKey is not incremental — it is categorical. With the raw SQL approach, you are fighting the ORM at every turn. With CompositePrimaryKey, the ORM is a willing participant.
# Using composite PKs in Django ORM queries:
# Lookup by composite PK — pass a tuple:
event = Event.objects.get(pk=(42, "2026-02-15 08:30:00+00"))
# Or use individual fields:
event = Event.objects.get(id=42, created_at="2026-02-15 08:30:00+00")
# The .pk attribute returns a tuple:
print(event.pk) # (42, datetime(2026, 2, 15, 8, 30, tzinfo=UTC))
# Filtering still works as expected:
recent = Event.objects.filter(
created_at__gte="2026-02-01",
event_type="purchase",
)
# ForeignKey from another model:
class EventDetail(models.Model):
event = models.ForeignKey(Event, on_delete=models.CASCADE)
detail_type = models.CharField(max_length=32)
value = models.TextField()
# Django generates the correct multi-column FK constraint.
# select_related and prefetch_related work normally. I should note one subtlety. The pk attribute on a model instance with a composite key returns a tuple, not a scalar. Code that assumes event.pk is an integer — and stores it in a cache key, passes it as a URL parameter, or uses it in a template — will need updating. This is a one-time migration cost, and it is well worth paying.
The recommendation is straightforward: if you are serious about partitioning in Django, upgrade to 5.2 or later. The difference between "possible but painful" and "properly supported" is the difference between a feature that works and a feature that your team is willing to maintain.
Creating and managing partitions
A partitioned table with no partitions is an empty promise — inserts fail because PostgreSQL has nowhere to put the rows. You need to create partitions before data arrives, and keep creating them as time advances. This is the ongoing operational cost of partitioning, and it is the part most tutorials mention in passing and then forget about.
I shall not forget about it. It is, if anything, the most important section of this guide from a production reliability standpoint.
Two approaches are available: explicit partitions in migrations, and automated partition creation with the partitioning manager.
# Creating partitions in a migration — explicit and version-controlled.
from psqlextra.backend.migrations.operations import (
PostgresAddRangePartition,
)
class Migration(migrations.Migration):
dependencies = [("events", "0001_initial")]
operations = [
PostgresAddRangePartition(
model_name="Event",
name="event_2026_01",
from_values="2026-01-01",
to_values="2026-02-01",
),
PostgresAddRangePartition(
model_name="Event",
name="event_2026_02",
from_values="2026-02-01",
to_values="2026-03-01",
),
PostgresAddRangePartition(
model_name="Event",
name="event_2026_03",
from_values="2026-03-01",
to_values="2026-04-01",
),
] The migration approach is explicit and version-controlled. You know exactly which partitions exist by reading the migration files. Rolling back a migration removes the partitions. CI/CD environments get the same partitions as production. This is clean, predictable, and entirely manual.
The downside is that someone must remember to create next month's partition before the month starts. If they forget, inserts fail.
-- What happens when you insert into a date range with no partition:
INSERT INTO event (created_at, user_id, event_type, payload)
VALUES ('2026-04-01 00:00:01', 42, 'purchase', '{}');
-- ERROR: no partition of relation "event" found for row
-- DETAIL: Partition key of the failing row contains
-- (created_at) = (2026-04-01 00:00:01+00).
--
-- This is a production incident. The insert fails. The row is lost.
-- If your partition creation cron misses a beat and April arrives
-- without an april partition, every insert for the new month fails.
-- The 3-month buffer in the partitioning manager exists for this reason. I cannot overstate how unpleasant this is in production. The error is not a warning. The insert fails. The row is lost. If your application does not have retry logic, the data is gone. If your partition creation cron job misses a beat and April arrives without an April partition, every insert with a created_at in April fails until someone creates the partition.
The programmatic approach mitigates this risk.
# Automated partition creation with the partitioning manager.
# Run this from a management command, a Celery beat task, or
# your deployment pipeline.
from psqlextra.partitioning import (
PostgresPartitioningManager,
PostgresCurrentTimePartitioningStrategy,
PostgresTimePartitionSize,
)
from psqlextra.partitioning.config import PostgresPartitioningConfig
manager = PostgresPartitioningManager()
config = PostgresPartitioningConfig(
model=Event,
strategy=PostgresCurrentTimePartitioningStrategy(
size=PostgresTimePartitionSize(months=1),
count=3, # Create 3 months ahead of current date
max_age=None, # Don't auto-drop old partitions
),
)
# Plan shows what would be created; apply executes it:
plan = manager.plan(config)
for partition in plan.creations:
print(f"Will create: {partition.name}")
plan.apply() # Creates any missing partitions The count=3 parameter means "always have 3 months of partitions ahead of the current date." Run this manager daily or weekly — from a cron job, a Celery beat task, or your deployment pipeline — and it creates partitions as needed. Miss a cron run? You have a 3-month buffer before anything breaks. Miss three cron runs? Now you have a problem, but at least you had 90 days of warning.
For production deployments, I recommend wrapping this in a management command.
# management/commands/ensure_partitions.py
# A management command you can run from cron or your CI/CD pipeline.
from django.core.management.base import BaseCommand
from psqlextra.partitioning import (
PostgresPartitioningManager,
PostgresCurrentTimePartitioningStrategy,
PostgresTimePartitionSize,
)
from psqlextra.partitioning.config import PostgresPartitioningConfig
from events.models import Event
class Command(BaseCommand):
help = "Create missing partitions for partitioned models"
def handle(self, *args, **options):
manager = PostgresPartitioningManager()
configs = [
PostgresPartitioningConfig(
model=Event,
strategy=PostgresCurrentTimePartitioningStrategy(
size=PostgresTimePartitionSize(months=1),
count=3,
max_age=None,
),
),
# Add more partitioned models here as needed.
]
for config in configs:
plan = manager.plan(config)
if plan.creations:
for p in plan.creations:
self.stdout.write(f"Creating partition: {p.name}")
plan.apply()
else:
self.stdout.write(
f"All partitions current for {config.model.__name__}"
) Run it from your CI/CD pipeline on every deploy, and from a daily cron job as a safety net. Belt and suspenders. The household does not rely on a single mechanism for something this important.
Verify your partitions exist and are healthy.
-- Verify your partitions exist and are correctly bounded:
SELECT
parent.relname AS parent_table,
child.relname AS partition_name,
pg_get_expr(child.relpartbound, child.oid) AS partition_bounds,
pg_size_pretty(pg_relation_size(child.oid)) AS size,
pg_stat_get_live_tuples(child.oid) AS live_rows
FROM pg_inherits
JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN pg_class child ON pg_inherits.inhrelid = child.oid
WHERE parent.relname = 'event'
ORDER BY child.relname;
-- Output:
-- parent_table | partition_name | partition_bounds | size | live_rows
-- -------------+----------------+---------------------------------------------------+--------+----------
-- event | event_2026_01 | FOR VALUES FROM ('2026-01-01') TO ('2026-02-01') | 847 MB | 12400000
-- event | event_2026_02 | FOR VALUES FROM ('2026-02-01') TO ('2026-03-01') | 791 MB | 11800000
-- event | event_2026_03 | FOR VALUES FROM ('2026-03-01') TO ('2026-04-01') | 214 MB | 3100000 Check this regularly. A missing partition is a production incident waiting to happen. A partition with zero rows and a date range in the past is wasted schema that can be safely dropped if you do not need the data. A partition whose size is dramatically different from its siblings may indicate uneven data distribution — which is worth understanding even if it is not a problem.
Partition pruning: where the performance actually comes from
Partitioning does not make individual queries faster. It makes the planner smarter about which data to skip entirely. This is called partition pruning, and it is the entire point of the exercise. If you take one thing from this guide, let it be this: partitioning without pruning is overhead without benefit.
Allow me to demonstrate with EXPLAIN ANALYZE.
-- Partition pruning in action: EXPLAIN ANALYZE on a partitioned table.
-- PostgreSQL only scans partitions matching the WHERE clause.
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM event
WHERE created_at >= '2026-02-01' AND created_at < '2026-03-01'
AND user_id = 42;
-- With partitioning + index on (user_id, created_at):
--
-- Append (cost=0.43..52.16 rows=12 width=196)
-- (actual time=0.031..0.044 rows=12 loops=1)
-- Subplans Removed: 2
-- -> Index Scan using event_2026_02_user_id_created_at_idx
-- on event_2026_02 (cost=0.43..52.16 rows=12 width=196)
-- (actual time=0.028..0.038 rows=12 loops=1)
-- Index Cond: ((user_id = 42) AND (created_at >= '2026-02-01')
-- AND (created_at < '2026-03-01'))
-- Buffers: shared hit=4
-- Planning Time: 0.892 ms
-- Execution Time: 0.068 ms
--
-- "Subplans Removed: 2" — two partitions eliminated before execution.
-- 4 buffer hits. 0.068 ms. The planner did not touch January or March. The key line is Subplans Removed: 2. The planner examined the WHERE clause, determined that only the February partition could contain matching rows, and eliminated January and March from consideration. It did not open them. It did not read their indexes. It did not touch them at all. Zero I/O against 1.6 GB of data.
The result: 4 buffer hits, 0.068 ms. For a table that, without partitioning, would require either a 34-second sequential scan or careful index management on a 340 GB monolith.
Now observe what happens without the partition key in the WHERE clause.
-- Same query WITHOUT the partition key in WHERE:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM event
WHERE user_id = 42;
-- Append (cost=0.43..156.48 rows=36 width=196)
-- (actual time=0.034..0.127 rows=36 loops=1)
-- -> Index Scan using event_2026_01_user_id_created_at_idx
-- on event_2026_01 (cost=0.43..52.16 rows=12 width=196)
-- (actual time=0.031..0.040 rows=12 loops=1)
-- Index Cond: (user_id = 42)
-- Buffers: shared hit=4
-- -> Index Scan using event_2026_02_user_id_created_at_idx
-- on event_2026_02 (cost=0.43..52.16 rows=12 width=196)
-- (actual time=0.029..0.038 rows=12 loops=1)
-- Index Cond: (user_id = 42)
-- Buffers: shared hit=4
-- -> Index Scan using event_2026_03_user_id_created_at_idx
-- on event_2026_03 (cost=0.43..52.16 rows=12 width=196)
-- (actual time=0.027..0.035 rows=12 loops=1)
-- Index Cond: (user_id = 42)
-- Buffers: shared hit=4
-- Planning Time: 1.204 ms
-- Execution Time: 0.162 ms
--
-- No partitions pruned. All three scanned. With 3 partitions, the
-- overhead is minimal. With 60 monthly partitions spanning 5 years,
-- the planning time alone can exceed the query execution time. No partitions pruned. All three scanned. With 3 partitions, the overhead is minimal — 0.162 ms is still fast, and the planning time difference is negligible. But with 60 monthly partitions spanning 5 years, the planner must open and plan across every partition. Planning time alone can reach 5-15 ms, and the executor must merge results from 60 index scans.
For comparison, here is the same query on the unpartitioned monolithic table.
-- Baseline: the same table WITHOUT partitioning (single 1.8 TB table).
-- 27.3 million rows. No partition pruning available.
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM event_monolithic
WHERE created_at >= '2026-02-01' AND created_at < '2026-03-01'
AND user_id = 42;
-- CASE 1: No index on (user_id, created_at) — sequential scan:
--
-- Seq Scan on event_monolithic (cost=0.00..4521890.00 rows=12 width=196)
-- (actual time=34521.109..34521.122 rows=12 loops=1)
-- Filter: ((created_at >= '2026-02-01') AND (created_at < '2026-03-01')
-- AND (user_id = 42))
-- Rows Removed by Filter: 27299988
-- Buffers: shared hit=1024 read=2894102
-- Planning Time: 0.412 ms
-- Execution Time: 34521.198 ms
--
-- 34.5 seconds. Read 2.9 million pages. Examined 27.3 million rows
-- to find 12 matches. This is what 1.8 TB of unindexed data feels like.
-- CASE 2: With a proper index on (user_id, created_at):
--
-- Index Scan using event_monolithic_user_created_idx on event_monolithic
-- (cost=0.56..58.32 rows=12 width=196)
-- (actual time=0.041..0.054 rows=12 loops=1)
-- Index Cond: ((user_id = 42) AND (created_at >= '2026-02-01')
-- AND (created_at < '2026-03-01'))
-- Buffers: shared hit=5
-- Planning Time: 0.318 ms
-- Execution Time: 0.082 ms
--
-- The index makes the query fast on any size table.
-- But the 52 GB index itself becomes the problem — REINDEX takes
-- hours, bloated indexes slow every INSERT, and vacuum cannot
-- reclaim dead index entries efficiently. Two observations from this comparison. First, the indexed unpartitioned query (0.082 ms) is almost as fast as the partitioned query (0.068 ms) for individual lookups. Partitioning does not dramatically improve single-query performance when good indexes exist. Its advantages are elsewhere: vacuum performance, maintenance operations, data retention, and the ability to manage storage per-partition.
Second, the unindexed sequential scan (34.5 seconds) is the scenario partitioning prevents most dramatically. On a partitioned table, even without the optimal index, a query with the partition key only scans one partition — 28 GB instead of 340 GB. The sequential scan drops from 34 seconds to roughly 3 seconds. Not fast, but survivable while you add the missing index.
Static vs. dynamic partition pruning
Partition pruning happens in two phases, and understanding the distinction matters for Django applications that use parameterized queries.
-- Verify partition pruning is enabled (it is by default since PG 11):
SHOW enable_partition_pruning;
-- enable_partition_pruning
-- ------------------------
-- on
-- Dynamic pruning with parameterized queries (prepared statements):
PREPARE user_events(int, timestamptz, timestamptz) AS
SELECT * FROM event
WHERE user_id = $1 AND created_at >= $2 AND created_at < $3;
EXPLAIN (ANALYZE) EXECUTE user_events(42, '2026-02-01', '2026-03-01');
-- Dynamic pruning eliminates partitions at execution time.
-- The planner sees the parameter placeholders and plans for all
-- partitions, but the executor prunes at runtime once it knows
-- the actual parameter values. Static pruning happens during query planning when the partition key values are literal constants. WHERE created_at >= '2026-02-01' is a literal — the planner knows at plan time which partitions to exclude.
Dynamic pruning happens during query execution when the values come from parameters, subqueries, or function calls. Django's ORM uses parameterized queries by default (through psycopg2 or psycopg3), which means most Django queries benefit from dynamic pruning. The planner creates a plan that includes all partitions, but the executor eliminates irrelevant partitions at runtime once it knows the actual parameter values.
Dynamic pruning is slightly less efficient than static pruning — the planner does more work — but the difference is typically under 1 ms and is overwhelmingly preferable to no pruning at all.
Teaching your Django code to include the partition key
This is a discipline issue, not a technical one. Code reviews should flag queries against partitioned tables that omit the partition key.
# Django ORM queries — partition-aware patterns.
from django.utils import timezone
from datetime import timedelta
# GOOD: always include the partition key in filters.
# Pruning works. Only the relevant partition is scanned.
events = Event.objects.filter(
user_id=42,
created_at__gte=timezone.now() - timedelta(days=30),
)
# GOOD: range filter for analytics queries.
monthly_summary = (
Event.objects
.filter(
created_at__gte="2026-02-01",
created_at__lt="2026-03-01",
)
.values("event_type")
.annotate(count=models.Count("id"))
.order_by("-count")
)
# BAD: no partition key. Scans all partitions.
# This works, but defeats the purpose of partitioning.
all_user_events = Event.objects.filter(user_id=42)
# BAD: ordering by id across all partitions.
# Each partition has its own sequence — id ordering is NOT
# global insertion order across partitions.
wrong_order = Event.objects.all().order_by("id")[:100]
# GOOD: order by the partition key instead.
correct_order = Event.objects.all().order_by("-created_at")[:100]
# Aggregations with partition pruning:
from django.db.models import Avg, Count, Max
stats = (
Event.objects
.filter(created_at__gte="2026-02-01", created_at__lt="2026-03-01")
.aggregate(
total=Count("id"),
latest=Max("created_at"),
)
)
# Only event_2026_02 is scanned. The aggregate runs on ~12M rows
# instead of 27M. With proper indexes, this is sub-second. The pattern is straightforward: every queryset against a partitioned table should include a filter on the partition key. For time-partitioned tables, this typically means including created_at__gte and/or created_at__lt. Without these filters, PostgreSQL scans every partition.
For teams that want to enforce this programmatically, a custom manager can issue warnings during development.
# A custom manager that enforces partition key filtering.
# This is optional but prevents accidental full-table scans.
from django.db import models
class PartitionAwareManager(models.Manager):
"""Raises a warning if queries omit the partition key."""
def get_queryset(self):
return PartitionAwareQuerySet(self.model, using=self._db)
class PartitionAwareQuerySet(models.QuerySet):
def _check_partition_key(self):
import warnings
# Walk the where tree to check for created_at conditions.
# This is a development aid, not a production enforcement.
if hasattr(self.query, "where") and not self._has_partition_filter():
warnings.warn(
f"Query on {self.model.__name__} lacks partition key "
f"filter (created_at). This will scan all partitions.",
stacklevel=3,
)
return self
def _has_partition_filter(self):
# Simplified check — inspect the SQL for created_at references
sql, params = self.query.sql_with_params()
return "created_at" in sql
def _fetch_all(self):
self._check_partition_key()
super()._fetch_all()
class Event(PostgresPartitionedModel):
# ... fields as before ...
objects = PartitionAwareManager() I should note that this manager is a development aid, not a production enforcement mechanism. It checks for the presence of the partition key in the query at the Python level, which is imperfect — subqueries, annotations, and complex Q objects may include the partition key in ways the simple check does not detect. Use it as a linter-like warning, not as a gate.
Partitioning an existing table (the hard part)
If your table already has 800 million rows, you cannot simply add PARTITION BY RANGE (created_at) to it. PostgreSQL does not support converting an existing table to a partitioned table in place. This is perhaps the single most requested PostgreSQL feature that does not exist. It is tracked in django-postgres-extra issue #221, and there is no shortcut.
The process requires a full data migration. I shall walk through both the standard approach and the logical replication approach, because the right choice depends on your downtime tolerance.
# The hard part: partitioning an existing table.
# PostgreSQL does NOT support ALTER TABLE ... PARTITION BY.
# You cannot retroactively partition a table with data in it.
# The process requires a full data migration:
# Step 1: Create the new partitioned table with a temporary name.
# Step 2: Create all needed partitions.
# Step 3: Copy data in batches (avoid long transactions). Approach 1: batch copy with a maintenance window
This is the straightforward approach. Create the new partitioned table, copy data in batches, then swap table names during a maintenance window.
-- Batch data migration from monolithic to partitioned table.
-- Run this in a loop, advancing :last_copied_id each iteration.
INSERT INTO event_partitioned (id, created_at, user_id, event_type, payload)
SELECT id, created_at, user_id, event_type, payload
FROM event_old
WHERE id > :last_copied_id
ORDER BY id
LIMIT 100000;
-- Monitor progress:
SELECT
(SELECT count(*) FROM event_partitioned) AS copied,
(SELECT count(*) FROM event_old) AS total,
round(
100.0 * (SELECT count(*) FROM event_partitioned)
/ (SELECT count(*) FROM event_old), 1
) AS pct_complete; The batch size matters for the same reason it matters during any bulk operation: large transactions generate large amounts of WAL, hold locks longer, and compete with autovacuum. 100,000 rows per batch is a reasonable starting point. On a table with wide rows (JSONB payloads, text columns), reduce to 50,000. On a narrow table with small rows, you can safely increase to 500,000.
Monitor three things during the copy: replication lag (if you have replicas), WAL generation rate, and I/O utilization. If replication lag climbs, pause the copy and let replicas catch up. If I/O utilization hits 90%, add a brief sleep between batches.
On a table with 800 million rows and 340 GB of data, expect the copy to take 4-8 hours depending on hardware, batch size, and concurrent load. Plan accordingly.
-- The table swap: brief exclusive access required.
-- Schedule this during a maintenance window.
BEGIN;
-- Prevent new writes during the swap
LOCK TABLE event_old IN ACCESS EXCLUSIVE MODE;
LOCK TABLE event_partitioned IN ACCESS EXCLUSIVE MODE;
-- Copy any rows written since the last batch
INSERT INTO event_partitioned (id, created_at, user_id, event_type, payload)
SELECT id, created_at, user_id, event_type, payload
FROM event_old
WHERE id > (SELECT max(id) FROM event_partitioned);
-- Rename tables
ALTER TABLE event_old RENAME TO event_archive;
ALTER TABLE event_partitioned RENAME TO event;
-- Point the sequence at the new table
SELECT setval(
'event_id_seq',
(SELECT max(id) FROM event) + 1
);
COMMIT;
-- Verify:
SELECT count(*) FROM event;
SELECT count(*) FROM event_archive;
-- Row counts should match.
-- After confidence period, drop the archive:
-- DROP TABLE event_archive; The table swap is the moment of truth. The exclusive lock prevents any reads or writes to either table during the swap. On a well-prepared system, this takes under 5 seconds. The lock acquisition itself may take longer if there are long-running queries against the old table — ensure those are killed or completed before taking the lock.
Approach 2: logical replication for near-zero downtime
If your application cannot tolerate a maintenance window longer than a few seconds, logical replication offers an alternative. The new partitioned table is set up as a subscriber to the old table's publication. Changes stream continuously as they happen.
-- Alternative: logical replication for near-zero-downtime migration.
-- This approach streams changes continuously, minimizing the
-- maintenance window to seconds instead of minutes.
-- 1. Create a publication on the old table:
CREATE PUBLICATION event_migration FOR TABLE event_old;
-- 2. Create a subscription on the partitioned table:
CREATE SUBSCRIPTION event_migration_sub
CONNECTION 'host=localhost dbname=myapp'
PUBLICATION event_migration
WITH (copy_data = true); -- initial data copy included
-- 3. Monitor replication lag:
SELECT
slot_name,
pg_size_pretty(
pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)
) AS replication_lag
FROM pg_replication_slots
WHERE slot_name = 'event_migration_sub';
-- 4. When lag reaches zero, perform a fast cutover:
-- - Pause writes (brief)
-- - Wait for lag = 0
-- - Rename tables
-- - Resume writes
-- Total downtime: typically under 5 seconds.
-- Caveat: logical replication requires wal_level = logical
-- and sufficient max_replication_slots. Plan ahead. The logical replication approach reduces the final cutover window to seconds rather than minutes. The tradeoff is complexity: you need wal_level = logical (which may require a server restart if it is not already set), sufficient replication slots, and careful monitoring of replication lag throughout the process.
Some teams use pg_partman for the data migration phase, as its partition_data_proc() function handles batched copying with progress tracking. Others combine pg_partman with logical replication for the smoothest cutover.
Foreign keys complicate the swap
If other tables have foreign keys referencing your table, the swap becomes more complex. Foreign key constraints reference a specific table OID, not a table name. When you rename tables, the constraints still point to the old table (now named event_archive). You must drop and recreate them.
In the swap transaction:
- Drop all foreign keys referencing the old table.
- Rename old table to archive, new table to the original name.
- Recreate the foreign keys pointing to the new table.
This is operationally messy but mechanically straightforward. The key is to script it completely before the maintenance window. No improvisation during a production migration.
Hash and list partitioning: the other two methods
Range partitioning gets the most attention because time-series data is the most common use case. But PostgreSQL supports two other partitioning methods, and django-postgres-extra exposes both. They solve different problems, and understanding when to use each is worth your time.
Hash partitioning for tenant isolation
# Hash partitioning — for even data distribution.
# Useful for tenant isolation when queries filter by tenant_id
# but not by time range.
from django.db import models
from django.db.models import CompositePrimaryKey
from psqlextra.models import PostgresPartitionedModel
from psqlextra.types import PostgresPartitioningMethod
class TenantData(PostgresPartitionedModel):
class PartitioningMeta:
method = PostgresPartitioningMethod.HASH
key = ["tenant_id"]
pk = CompositePrimaryKey("id", "tenant_id")
id = models.BigAutoField()
tenant_id = models.IntegerField()
data = models.JSONField()
created_at = models.DateTimeField(auto_now_add=True)
# Hash partitions are created with modulus and remainder:
# CREATE TABLE tenant_data_p0 PARTITION OF tenant_data
# FOR VALUES WITH (MODULUS 8, REMAINDER 0);
# CREATE TABLE tenant_data_p1 PARTITION OF tenant_data
# FOR VALUES WITH (MODULUS 8, REMAINDER 1);
# ... through REMAINDER 7.
#
# WHERE tenant_id = 42 scans exactly one partition.
# WHERE tenant_id IS NULL scans all 8. Hash partitioning distributes rows evenly across a fixed number of partitions based on the hash of the partition key. It is most useful when you want to limit the size of any single partition — for vacuum performance or tablespace management — but have no natural range to partition on.
Multi-tenant SaaS applications are the classic example. Partition by tenant ID so that each tenant's data is physically co-located within a partition. Queries filtered by tenant_id hit exactly one partition. The data distribution is even (assuming a good hash function, which PostgreSQL provides), so each partition is roughly the same size.
-- Hash partitioning: EXPLAIN ANALYZE with tenant_id filter.
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM tenant_data WHERE tenant_id = 42;
-- Append (cost=0.43..48.12 rows=15 width=312)
-- (actual time=0.024..0.039 rows=15 loops=1)
-- Subplans Removed: 7
-- -> Index Scan using tenant_data_p3_tenant_id_idx
-- on tenant_data_p3 (cost=0.43..48.12 rows=15 width=312)
-- (actual time=0.021..0.033 rows=15 loops=1)
-- Index Cond: (tenant_id = 42)
-- Buffers: shared hit=3
-- Planning Time: 0.712 ms
-- Execution Time: 0.058 ms
--
-- "Subplans Removed: 7" — 7 of 8 partitions pruned.
-- hash(42) % 8 = 3, so only partition p3 is scanned. The tradeoffs with hash partitioning are significant and worth stating clearly:
- You cannot drop a hash partition. Unlike range partitions where you can detach and archive an old month, hash partitions are permanent. Each one contains a pseudorandom subset of your data. Dropping one loses that subset.
- You cannot add partitions without rehashing. Changing from 8 to 16 partitions requires creating a new table with 16 partitions, copying all data, and swapping — essentially a full data migration. Choose your partition count carefully at the start.
- No range queries.
WHERE tenant_id BETWEEN 100 AND 200cannot be pruned because hash values do not preserve ordering. All partitions are scanned.
The rule of thumb for hash partition count: choose a power of 2 that results in partitions between 10-50 GB each when the table is fully grown. If you expect 400 GB of data, 8 partitions (50 GB each) is reasonable. If you expect 2 TB, 32 partitions (62 GB each) is a better starting point.
List partitioning for categorical data
# List partitioning — for a known, finite set of values.
# Region codes, event categories, order statuses.
from django.db import models
from django.db.models import CompositePrimaryKey
from psqlextra.models import PostgresPartitionedModel
from psqlextra.types import PostgresPartitioningMethod
class Order(PostgresPartitionedModel):
class PartitioningMeta:
method = PostgresPartitioningMethod.LIST
key = ["region"]
pk = CompositePrimaryKey("id", "region")
id = models.BigAutoField()
region = models.CharField(max_length=4)
customer_id = models.IntegerField()
total = models.DecimalField(max_digits=10, decimal_places=2)
placed_at = models.DateTimeField(auto_now_add=True)
# Partitions:
# CREATE TABLE order_us PARTITION OF "order" FOR VALUES IN ('US');
# CREATE TABLE order_eu PARTITION OF "order" FOR VALUES IN ('EU');
# CREATE TABLE order_ap PARTITION OF "order" FOR VALUES IN ('AP');
# CREATE TABLE order_default PARTITION OF "order" DEFAULT;
#
# Each region lives in its own physical table.
# Per-region backups, retention policies, and tablespace placement
# become trivial operations. List partitioning works when the partition key has a small, known set of values. Region codes, order statuses, event categories — anything where you can enumerate all possible values upfront. Queries filtered by the partition key hit exactly one partition. Adding a new region means creating one new partition.
-- The default partition: a necessary evil with list partitioning.
-- Without a default partition, any insert with an unlisted
-- region value fails:
INSERT INTO "order" (region, customer_id, total)
VALUES ('SA', 101, 49.99);
-- ERROR: no partition of relation "order" found for row
-- DETAIL: Partition key of the failing row contains (region) = (SA).
-- With a default partition, it silently catches unrecognized values:
CREATE TABLE order_default PARTITION OF "order" DEFAULT;
-- The danger: the default partition becomes a junk drawer.
-- Monitor its size and split it when patterns emerge:
SELECT pg_size_pretty(pg_relation_size('order_default')) AS size,
pg_stat_get_live_tuples('order_default'::regclass) AS rows;
-- If it is growing, you have a new region to partition:
-- 1. Create the new partition
-- 2. Move rows from default to the new partition
-- 3. The default partition shrinks automatically
--
-- PostgreSQL 14+ detaches and reattaches the default partition
-- automatically when you add a new list partition whose values
-- were previously caught by the default. The default partition is a necessary evil. Without it, any insert with an unlisted value fails immediately. With it, you have a catch-all that can grow unbounded if new values appear frequently. The discipline is to monitor the default partition's size and split it when patterns emerge — if "SA" (South America) starts appearing regularly, create a dedicated partition for it and let PostgreSQL move the rows automatically (on PostgreSQL 14+).
One more consideration with list partitioning: the number of distinct values determines the number of partitions. If your partition key has 50 possible values, you have 50 partitions. If it has 3, you have 3. The granularity is fixed by the data, not by your sizing preferences. This makes list partitioning inappropriate for columns with high cardinality — use hash partitioning instead.
Sub-partitioning: when one level is not enough
PostgreSQL supports multi-level partitioning — partitions that are themselves partitioned. This is a powerful feature that I recommend approaching with considerable caution.
-- Sub-partitioning: partitions within partitions.
-- Range by year at the top level, list by region within each year.
CREATE TABLE orders (
id BIGSERIAL,
region VARCHAR(4) NOT NULL,
placed_at TIMESTAMPTZ NOT NULL,
customer_id INT NOT NULL,
total NUMERIC(10,2),
PRIMARY KEY (id, placed_at, region)
) PARTITION BY RANGE (placed_at);
-- Year-level partitions, each sub-partitioned by region:
CREATE TABLE orders_2026 PARTITION OF orders
FOR VALUES FROM ('2026-01-01') TO ('2027-01-01')
PARTITION BY LIST (region);
CREATE TABLE orders_2026_us PARTITION OF orders_2026
FOR VALUES IN ('US');
CREATE TABLE orders_2026_eu PARTITION OF orders_2026
FOR VALUES IN ('EU');
CREATE TABLE orders_2026_ap PARTITION OF orders_2026
FOR VALUES IN ('AP');
CREATE TABLE orders_2026_default PARTITION OF orders_2026
DEFAULT;
-- Query with both keys prunes to a single leaf partition:
-- WHERE placed_at >= '2026-03-01' AND region = 'EU'
-- scans only orders_2026_eu.
-- Warning: sub-partitioning multiplies operational complexity.
-- 5 years x 4 regions = 20 leaf partitions.
-- Each needs indexes, vacuum monitoring, and partition creation.
-- Only use this when both dimensions have high selectivity
-- AND queries consistently filter on both. Sub-partitioning makes sense when queries consistently filter on two dimensions and both have high selectivity. A retail platform that always queries by year and region is the textbook case. The query WHERE placed_at >= '2026-03-01' AND region = 'EU' prunes to a single leaf partition out of potentially dozens.
The operational cost, however, scales multiplicatively. Five years of data partitioned by year and 4 regions produces 20 leaf partitions. Each needs indexes. Each is a separate table that autovacuum must consider. Each must be created in advance. The partition creation management command becomes more complex. Monitoring becomes more complex. Every DBA operation that touches "the orders table" now touches 20 physical tables.
My guidance: use sub-partitioning only when single-level partitioning leaves individual partitions larger than 50-100 GB and queries consistently filter on both dimensions. If your monthly partitions are 30 GB each, sub-partitioning by region produces 10 GB partitions — too small to justify the operational overhead.
Autovacuum, bloat, and partition maintenance
One of the underappreciated benefits of partitioning is how it transforms the autovacuum story. This, I would argue, is where partitioning earns its keep most reliably — not in query performance, which depends on pruning, but in maintenance performance, which is guaranteed.
Consider the autovacuum arithmetic on a single 340 GB table. The default autovacuum_vacuum_scale_factor of 0.2 means autovacuum triggers after 20% of the rows are dead — that is 160 million dead rows on an 800-million-row table. By the time autovacuum runs, bloat has already consumed tens of gigabytes. The vacuum itself takes 4-6 hours. During those hours, the table continues accumulating dead tuples from ongoing writes, and the vacuum worker competes with your production queries for I/O.
With monthly partitions, the active partition is perhaps 30 GB with 12 million rows. Autovacuum triggers after 2.4 million dead rows (at the default 20%). The absolute number is dramatically lower. The vacuum completes in minutes, not hours. Bloat stays controlled because the vacuum cycle is fast enough to keep pace with ongoing writes.
-- Autovacuum treats each partition as an independent table.
-- A 200 GB parent is hard to vacuum. Twelve 16 GB partitions are easy.
-- Check autovacuum activity per partition:
SELECT relname,
n_live_tup,
n_dead_tup,
round(100.0 * n_dead_tup
/ GREATEST(n_live_tup + n_dead_tup, 1), 1) AS dead_pct,
last_autovacuum,
last_autoanalyze
FROM pg_stat_user_tables
WHERE relname LIKE 'event_%'
ORDER BY relname;
-- Output:
-- relname | n_live_tup | n_dead_tup | dead_pct | last_autovacuum | last_autoanalyze
-- ---------------+------------+------------+----------+---------------------+--------------------
-- event_2026_01 | 12400000 | 1240 | 0.0 | 2026-03-14 02:15:00 | 2026-03-14 02:15:12
-- event_2026_02 | 11800000 | 8420 | 0.1 | 2026-03-14 03:01:00 | 2026-03-14 03:01:08
-- event_2026_03 | 3100000 | 142000 | 4.4 | 2026-03-14 04:30:00 | 2026-03-14 04:30:05 Old partitions that receive no writes naturally have zero dead tuples. Autovacuum checks them periodically, finds nothing to do, and moves on. Your vacuum budget — the finite number of autovacuum workers and I/O capacity — is spent on the partitions that need it.
-- Tune autovacuum per-partition based on write patterns.
-- Active partition (current month): aggressive thresholds
ALTER TABLE event_2026_03 SET (
autovacuum_vacuum_scale_factor = 0.01, -- 1% dead triggers vacuum
autovacuum_vacuum_threshold = 5000, -- or 5000 dead rows
autovacuum_analyze_scale_factor = 0.005 -- re-analyze at 0.5%
);
-- Historical partitions (read-only): disable autovacuum entirely
ALTER TABLE event_2026_01 SET (
autovacuum_enabled = false
);
-- Why disable on read-only partitions? They have zero dead tuples.
-- Autovacuum checks them periodically and finds nothing to do.
-- Disabling eliminates the check entirely — one fewer table for
-- the autovacuum launcher to consider, freeing workers for the
-- active partitions that need them. Per-partition tuning is one of the genuine luxuries of a partitioned schema. You can set aggressive thresholds on the active partition (1% dead rows triggers vacuum, because you want that partition clean at all times) and disable autovacuum entirely on read-only historical partitions (because they will never have dead tuples, and checking them is wasted work).
This is not possible with a monolithic table. You get one set of autovacuum parameters for 340 GB of mixed-workload data. Set the thresholds aggressively and autovacuum runs too frequently. Set them conservatively and bloat accumulates on the write-heavy portion. There is no configuration that serves both the active and historical data well.
Data retention becomes a partition operation
-- Data retention: DROP vs DELETE.
-- This is where partitioning earns its keep most dramatically.
-- WITHOUT partitioning — deleting old data:
DELETE FROM event WHERE created_at < '2025-01-01';
-- Time: 847,291 ms (14 minutes)
-- WAL generated: 41 GB
-- Dead tuples created: 58,000,000
-- Subsequent VACUUM time: 23 minutes
-- Total wall time: 37 minutes
-- During which: table bloat, increased I/O, autovacuum contention
-- WITH partitioning — dropping an old partition:
ALTER TABLE event DETACH PARTITION event_2024_12;
DROP TABLE event_2024_12;
-- Time: 12 ms
-- WAL generated: 0
-- Dead tuples created: 0
-- Subsequent VACUUM time: 0
-- Total wall time: 12 ms
-- During which: nothing. The catalog entry was removed. That is all.
-- For compliance or audit requirements, detach without dropping:
ALTER TABLE event DETACH PARTITION event_2024_12;
-- The partition still exists as a standalone table.
-- You can query it, archive it, compress it, or move it to
-- cheaper storage. It simply no longer receives new inserts
-- or appears in queries against the parent table. The numbers speak for themselves, but the operational difference deserves emphasis. The DELETE approach generates 41 GB of WAL (which must be replicated to standby servers, archived to your backup system, and eventually recycled). It creates 58 million dead tuples (which autovacuum must clean up in a subsequent pass). The total wall time — DELETE plus VACUUM — is 37 minutes, during which the table's I/O characteristics are degraded.
The DROP approach removes a catalog entry. Twelve milliseconds. No WAL. No dead tuples. No vacuum. The disk space is reclaimed immediately. If you have replicas, they process the DROP in milliseconds as well.
For tables with compliance or audit requirements, DETACH without DROP preserves the data as a standalone table that can be queried independently, archived to cheaper storage, or compressed. The detached partition no longer appears in queries against the parent table, but the data is not lost.
Index strategy for partitioned tables
Indexes on partitioned tables deserve their own discussion because they behave differently than indexes on regular tables — in ways that are mostly beneficial, but worth understanding.
-- Index strategy for partitioned tables.
-- Each partition gets its own copy of every index.
-- Global indexes DO NOT EXIST in PostgreSQL partitioning.
-- There is no single B-tree spanning all partitions.
-- Each partition has its own index, with its own statistics.
-- This is actually a feature, not a limitation:
-- 1. Index builds are per-partition.
-- CREATE INDEX CONCURRENTLY on a 28 GB partition takes minutes.
-- On a 340 GB monolithic table, it takes hours.
-- 2. Index bloat is per-partition.
-- Reindexing one partition is fast and low-impact.
-- 3. Dead index entries are per-partition.
-- VACUUM processes each partition's indexes independently.
-- Check index sizes across partitions:
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE tablename LIKE 'event_2026%'
ORDER BY tablename, indexname;
-- Output:
-- schemaname | tablename | indexname | index_size
-- -----------+---------------+-------------------------------------+-----------
-- public | event_2026_01 | event_2026_01_pkey | 264 MB
-- public | event_2026_01 | event_2026_01_user_id_created_at_idx| 312 MB
-- public | event_2026_01 | event_2026_01_event_type_idx | 186 MB
-- public | event_2026_02 | event_2026_02_pkey | 251 MB
-- public | event_2026_02 | event_2026_02_user_id_created_at_idx| 296 MB
-- public | event_2026_02 | event_2026_02_event_type_idx | 178 MB
-- public | event_2026_03 | event_2026_03_pkey | 68 MB
-- public | event_2026_03 | event_2026_03_user_id_created_at_idx| 79 MB
-- public | event_2026_03 | event_2026_03_event_type_idx | 47 MB PostgreSQL does not support global indexes that span all partitions. When you create an index on a partitioned table, PostgreSQL creates an identical index on each partition independently. This is actually a feature, not a limitation, for three reasons.
First, index builds are per-partition. CREATE INDEX CONCURRENTLY on a 28 GB partition takes minutes. On a 340 GB monolithic table, the same operation takes hours — and during those hours, the CONCURRENTLY build must track all changes to the table, which can cause it to take even longer under write-heavy workloads.
Second, index bloat is per-partition. If the index on your active partition becomes bloated (a common occurrence on tables with high update rates), you can reindex just that partition. REINDEX INDEX CONCURRENTLY event_2026_03_user_id_created_at_idx takes minutes and does not touch the other partitions' indexes.
Third, VACUUM processes each partition's indexes independently. On a monolithic table, VACUUM must scan the entire index to remove dead entries — even if the dead entries are concentrated in a small portion of the table. With partitions, VACUUM scans only the partition's index, which is proportionally smaller.
You can also create indexes that exist on only some partitions — a capability that has no equivalent on monolithic tables.
-- Combine partitioning with partial indexes for surgical precision.
-- Example: on the active partition, create a partial index for
-- unprocessed events only.
CREATE INDEX CONCURRENTLY event_2026_03_unprocessed_idx
ON event_2026_03 (created_at, user_id)
WHERE payload->>'processed' IS NULL;
-- This index is tiny compared to a full index, because it only
-- covers unprocessed rows. As rows are processed and updated,
-- they leave the index automatically.
-- The older partitions do not need this index at all —
-- all their events are already processed. This pattern — a partial index on only the active partition — is remarkably efficient. The index covers a narrow subset of a single partition. It is tiny. It updates in microseconds. And it is exactly the index that your "find unprocessed events" query needs. Older partitions, where all events are processed, do not carry this index at all.
The general principle: start with the same indexes on every partition (defined in the model's Meta.indexes), then add specialized indexes to individual partitions as query patterns emerge. The model-level indexes are your baseline. The per-partition indexes are your optimization layer.
A materialized view that pre-computes your most expensive aggregation, refreshed on a schedule, can eliminate more load than any amount of partition tuning. Consider whether the data your queries need might be better served pre-computed than partitioned.
Foreign keys and the Django ORM
Foreign key support with partitioned tables improved significantly in PostgreSQL 12. If you are on 12 or later — and given that PostgreSQL 11 reached end of life in November 2023, I would be mildly alarmed if you were not — foreign keys referencing a partitioned parent table work correctly.
# Foreign keys and partitioned tables (PostgreSQL 12+).
# PostgreSQL 12+ supports FKs referencing partitioned parent tables.
# PostgreSQL 11 does NOT.
from django.db import models
class EventDetail(models.Model):
# This FK points to the partitioned Event table.
# Works on PostgreSQL 12+ automatically.
event = models.ForeignKey(
"Event",
on_delete=models.CASCADE,
)
detail_type = models.CharField(max_length=32)
value = models.TextField()
class Meta:
indexes = [
models.Index(fields=["event", "detail_type"]),
]
# If you are on PostgreSQL 11:
# 1. Upgrade to PostgreSQL 12+ (recommended).
# 2. Or remove the FK and enforce integrity in app code
# (not recommended — data integrity is not optional). Django's admin, .select_related(), and .prefetch_related() all work normally with partitioned tables. The ORM queries the parent table name, and PostgreSQL routes to the correct partition transparently. There is no special queryset syntax required — partitioning is invisible at the ORM layer, which is exactly how it should be.
# ForeignKey lookups with composite PKs (Django 5.2+).
# select_related and prefetch_related work as expected.
# Fetch event details with their parent events:
details = (
EventDetail.objects
.select_related("event")
.filter(event__created_at__gte="2026-02-01")
)
# Django generates:
# SELECT event_detail.*, event.*
# FROM event_detail
# INNER JOIN event ON (
# event_detail.event_id = event.id
# AND event_detail.event_created_at = event.created_at
# )
# WHERE event.created_at >= '2026-02-01'
#
# The JOIN includes both composite PK columns.
# Partition pruning still works on the event table. With Django 5.2's CompositePrimaryKey, the generated JOIN includes both columns of the composite key. This is correct and efficient — the JOIN benefits from partition pruning on the event table if the query includes a created_at filter.
Two edge cases to be aware of:
- Ordering by
id:.order_by('id')on a partitioned table may not return rows in global insertion order. Each partition has its own portion of the sequence, and rows are physically stored per-partition. If you need globally ordered results, order by the partition key (created_at) or a column with a globally monotonic value (a UUID v7, for instance, embeds a timestamp). - Cascading deletes:
on_delete=models.CASCADEworks correctly with partitioned tables. When a parent row is deleted, PostgreSQL deletes the child rows from the correct partition. However, if the child table is large, the cascade may be slow because PostgreSQL must scan the child table's foreign key index to find matching rows. This is the same behavior as with non-partitioned tables, but worth noting because partitioned tables tend to be large.
Monitoring and operational queries
A partitioned table requires ongoing monitoring that a monolithic table does not. The household has more rooms to maintain, and a competent waiter keeps an inventory.
-- Essential monitoring queries for partitioned tables.
-- 1. Partition size distribution (are partitions roughly balanced?):
SELECT
child.relname AS partition,
pg_size_pretty(pg_total_relation_size(child.oid)) AS total_size,
pg_size_pretty(pg_relation_size(child.oid)) AS table_size,
pg_size_pretty(
pg_total_relation_size(child.oid) - pg_relation_size(child.oid)
) AS index_size,
pg_stat_get_live_tuples(child.oid) AS rows
FROM pg_inherits
JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN pg_class child ON pg_inherits.inhrelid = child.oid
WHERE parent.relname = 'event'
ORDER BY child.relname;
-- 2. Detect queries that are NOT pruning (from pg_stat_statements):
-- Look for queries against the parent table with high total rows
-- but no created_at filter.
SELECT query,
calls,
round(mean_exec_time::numeric, 2) AS avg_ms,
rows
FROM pg_stat_statements
WHERE query LIKE '%event%'
AND query NOT LIKE '%created_at%'
AND calls > 10
ORDER BY mean_exec_time DESC
LIMIT 10;
-- 3. Upcoming partition gap detection:
-- Find the maximum boundary across all partitions:
SELECT max(
(regexp_match(
pg_get_expr(child.relpartbound, child.oid),
$$TO \('([^']+)'\)$$
))[1]::date
) AS last_partition_end,
current_date AS today,
max(
(regexp_match(
pg_get_expr(child.relpartbound, child.oid),
$$TO \('([^']+)'\)$$
))[1]::date
) - current_date AS days_remaining
FROM pg_inherits
JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN pg_class child ON pg_inherits.inhrelid = child.oid
WHERE parent.relname = 'event';
-- If days_remaining < 30, create partitions immediately. The third query — detecting upcoming partition gaps — is the most operationally critical. If your most distant partition boundary is less than 30 days away, you are one missed cron run from a production incident. Set up an alert. This is not optional.
For teams using monitoring tools like Datadog, Prometheus, or Grafana, expose these metrics:
- Partition count per table: should increase by 1 each month (for monthly partitions) and never decrease unexpectedly.
- Days until partition exhaustion: should always be above 30. Alert at 30, page at 7.
- Partition size variance: if one partition is 3x larger than its siblings, investigate. Uneven partitions suggest data skew or an unexpected change in write volume.
- Queries without partition pruning: track the count of queries from
pg_stat_statementsthat touch the parent table without including the partition key. These are your candidates for optimization.
How Gold Lapel works with partitioned tables
Gold Lapel sits between your Django application and PostgreSQL as a transparent proxy. Partitioning does not change that relationship — GL works with partitioned tables without configuration changes and without awareness that partitioning exists.
GL's auto-indexing engine monitors query patterns and creates indexes CONCURRENTLY on the specific partitions that need them. When you add a new monthly partition, GL detects new queries hitting it and creates appropriate indexes — often before you would have gotten around to writing the migration yourself. Indexes are created per-partition, which is how PostgreSQL manages them internally regardless of whether GL is involved.
GL's query rewriting can add missing partition key filters to queries that would otherwise scan all partitions. If your application has legacy queries that filter by user_id without a created_at range, GL can append a time boundary based on observed access patterns — converting a full-scan query into a pruned query without changing your Django code.
The practical effect: partitioning handles the storage and maintenance problem. GL handles the query optimization problem. They work on different axes of the same goal — making your database fast without requiring your Django code to know or care about the underlying physical layout. The guest need not concern themselves with how the household is organized. That is, if you will permit me, the waiter's responsibility.
A brief checklist before you begin
If you have read this far and are ready to partition a Django table, here is what I suggest you confirm before writing the first line of code.
- The table exceeds 100 GB or is growing toward it at a predictable rate. If it is under 50 GB, investigate indexing and autovacuum tuning first.
- Your queries consistently filter on the partition key. Run a representative sample of queries from
pg_stat_statementsand confirm that 80%+ include the candidate partition key in their WHERE clause. - You are on Django 5.2+ for CompositePrimaryKey support. If you are on an earlier version and cannot upgrade, proceed with the raw SQL workaround, but accept the ORM limitations.
- You are on PostgreSQL 12+ for foreign key support on partitioned tables. Ideally 14+ for improvements to default partition management and partition pruning.
- You have a plan for partition creation. Automated, tested, monitored. A cron job or deployment hook with a 3-month buffer. An alert when the buffer drops below 30 days.
- You have a plan for the data migration. If the table already has data, plan the batch copy, estimate the duration, schedule the maintenance window, and script the table swap. Test it on a staging environment first. Twice.
- Your team understands the partition key discipline. Every query against the partitioned table must include the partition key. Add it to your code review checklist.
Partitioning, done well, is one of the most effective tools in PostgreSQL's arsenal for managing large tables. It does not make your database magically faster — it makes it structurally manageable at a scale where monolithic tables become operationally untenable. The queries that include the partition key become faster. The maintenance operations become faster. The data retention operations become instantaneous. And the autovacuum workers, who have been struggling valiantly against your 340 GB table for months, finally get to do their work in a room they can actually clean.
Should you wish to discuss your specific situation, the household is always accepting visitors.
Frequently asked questions
Terms referenced in this article
A brief detour, if you would indulge me. The autovacuum tuning for partitioned tables discussed above is only half the story — I have written a complete autovacuum tuning guide that covers the cost model, per-table settings, and transaction ID wraparound prevention in the depth that partitioned workloads require.