pg_partman Alternatives: Your Table Does Not Need to Be Dismembered
Good evening. You have a large table. Allow me to explain why the correct response is almost never to saw it into pieces.
I'm afraid we need to discuss this reflex
The conversation arrives at my door with remarkable consistency. A query is slow. Someone runs \dt+ orders, observes that the table is 80 GB, and the conclusion is immediate: "We need to partition this table." Two weeks are then invested in implementing pg_partman, migrating data into time-based partitions, updating every ORM model and migration, and resolving the cascade of foreign key issues that partitioning introduces. The slow query, I regret to report, is still slow — because it was never filtering on the partition key in the first place.
This is not a hypothetical. It is the single most common partitioning failure pattern, and it persists because the reasoning skips a step. "The table is large, therefore partitioning will help" has the same logical structure as "the house is large, therefore we should knock down some walls." The full reasoning must be: "The table is large, the queries filter on a column that would serve as the partition key, and the considerable operational cost of partitioning is justified by the access pattern." That second version is true far less often than the first. Considerably less often.
I wish to be precise about my position, because imprecision on this matter has caused a great deal of unnecessary engineering. PostgreSQL partitioning is a powerful feature. pg_partman makes it operationally manageable. But partitioning is a data organisation strategy, not a query optimisation technique. Conflating the two is the infrastructural equivalent of reorganising every room in the household because a single drawer was untidy.
What partitioning actually does — and what it does not
If you will permit me a brief inventory. Partitioning splits a single logical table into multiple physical tables, each holding a subset of the rows defined by a range, list, or hash of a partition key. When a query includes a WHERE clause on that key, the planner can skip partitions that cannot contain matching rows. This is partition pruning, and it is the only performance mechanism partitioning provides. The only one.
Partitioning solves three problems well, and I shall give each its due:
- Data lifecycle management. Dropping a partition is instant. Deleting millions of rows from a regular table generates dead tuples, bloats the table, and triggers expensive VACUUM operations. If you need to retain 90 days of data and discard the rest, partitioning by month makes the discard operation a
DROP TABLEthat completes in milliseconds. This is partitioning at its finest — disposing of what is no longer needed with the clean efficiency of closing a guest ledger at the end of a season. I respect this use case deeply. - Partition pruning for time-range queries. If every query on the table includes a filter on the partition key (typically a timestamp), the planner reads only the relevant partitions. A query for "today's events" scans one daily partition instead of the entire table. Genuinely useful for time-series workloads where the access pattern is tightly coupled to the partition key.
- Maintenance at scale. VACUUM, ANALYZE, and index rebuilds operate per partition. On a 500 GB table, a single VACUUM can hold locks for minutes. On fifty 10 GB partitions, each VACUUM is faster and holds locks for less time. For tables in the hundreds of millions of rows, this maintenance improvement matters.
Now. Notice what is absent from this list: "making slow queries fast." Partitioning does not fix a missing index. It does not repair a misguided join strategy. It does not reduce the cost of an aggregation that must read every partition regardless. If your dashboard query computes SUM(total) across all orders for the last year, and you have monthly partitions, PostgreSQL still reads all twelve partitions. The query is not faster. It may, in fact, be slower — because the planner now coordinates twelve index scans instead of one. I find this outcome particularly dispiriting when it follows a two-week migration effort.
The invoice nobody reads before signing
Partitioning is not free. It introduces constraints and operational overhead that compound — quietly, steadily, and with interest — over the life of the schema. I consider it a professional obligation to present this invoice before you commit to the arrangement.
Unique constraints must include the partition key
This is the constraint that surprises teams most, and I confess it gives me no pleasure to deliver the news. PostgreSQL cannot enforce uniqueness across partitions. Every primary key, unique index, and unique constraint on a partitioned table must include all columns of the partition key.
-- What you want:
ALTER TABLE orders ADD CONSTRAINT orders_pkey PRIMARY KEY (id);
-- What PostgreSQL requires on a partitioned table:
ALTER TABLE orders ADD CONSTRAINT orders_pkey
PRIMARY KEY (id, created_at);
-- The partition key (created_at) must be part of every unique
-- constraint. This means:
-- 1. Foreign keys referencing this table need BOTH columns
-- 2. JOINs become: ON orders.id = line_items.order_id
-- AND orders.created_at = line_items.order_created_at
-- 3. Every referencing table needs the partition key column The cascade is immediate and thorough. Every table with a foreign key referencing the partitioned table now needs the partition key column. Every JOIN needs both columns. ORMs that generate joins automatically — which is all of them — require schema-level changes to accommodate the compound key. The pganalyze guide to PostgreSQL partitioning in Django documents the depth of this problem: Django has no native support for composite foreign keys, so every referencing relationship requires manual migration SQL. One decides to partition a single table; one remodels the entire ground floor.
The planner's growing burden
The query planner must consider every partition during planning, even when pruning eliminates most of them at execution time. pganalyze has documented that planning time increases linearly with partition count, reaching noticeable overhead at a few hundred partitions.
-- Query planning overhead scales with partition count
-- 50 partitions: planning time ~0.5 ms
-- 500 partitions: planning time ~5 ms
-- 5000 partitions: planning time ~50 ms
-- Check your current partition count:
SELECT
parent.relname AS parent_table,
count(*) AS partition_count
FROM pg_inherits
JOIN pg_class parent ON parent.oid = pg_inherits.inhparent
JOIN pg_class child ON child.oid = pg_inherits.inhrelid
GROUP BY parent.relname
ORDER BY partition_count DESC; Daily partitions for three years means over 1,000 partitions. If your workload is high-concurrency OLTP with short queries, the planning overhead per query can exceed the execution time. The query takes 2 ms to run but 5 ms to plan. You have not made the query faster. You have made it measurably, documentably slower. I have seen this in production. The expression on the engineer's face was not one I would care to see again.
ORMs were not consulted
Django, SQLAlchemy, and ActiveRecord were not designed with partitioned tables in mind. The compound primary key requirement breaks assumptions baked into ORM identity maps, foreign key resolution, and migration generators. Transifex documented their experience moving to partitioned tables with Django, noting that the deletions generated by the ORM did not include the partition key, resulting in queries that scanned every partition. SQLAlchemy has an open issue tracking native partitioning support that has been open since 2020. Six years. One begins to suspect the issue may outlast the framework.
Cross-partition queries lose their advantage
Any query that does not filter on the partition key touches every partition. SELECT * FROM orders WHERE customer_id = 12345 on a table partitioned by created_at scans all partitions. If your workload includes both time-range queries and customer-specific lookups, partitioning helps the former and actively harms the latter. CYBERTEC demonstrated this concretely: partitioning a table and querying on a non-partition column produced worse performance than the unpartitioned original. Worse. Not the same. Worse. One hires a specialist to improve the household and the silver goes missing.
The existing staff you have been overlooking
The most common reason teams consider partitioning is slow queries on a large table. In the considerable majority of cases, the table does not need to be split into pieces. It needs a proper index. This is the infrastructural equivalent of hiring a team of movers to reorganise the pantry when the real problem is that no one labelled the shelves.
Partial indexes: attending only to what matters
A partial index indexes only the rows that match a predicate. If your application overwhelmingly queries recent data — the last 90 days, the last week, the active orders — a partial index gives you a small, fast index on exactly the data you care about, without so much as glancing at the rest of the table.
-- The table: 200 million orders, but dashboards only query recent ones
-- Full B-tree index on created_at: 4.2 GB
-- Partial index: only index the last 90 days
CREATE INDEX idx_orders_recent ON orders (created_at, status)
WHERE created_at > now() - interval '90 days';
-- Size: ~180 MB instead of 4.2 GB
-- Queries that filter on recent data use this index.
-- Queries on historical data use a sequential scan or BRIN (see below). EXPLAIN (ANALYZE, BUFFERS)
SELECT id, total, status
FROM orders
WHERE created_at > now() - interval '7 days'
AND status = 'pending';
-- QUERY PLAN
-- -----------------------------------------------------------------
-- Index Scan using idx_orders_recent on orders
-- Index Cond: (created_at > ...)
-- Filter: (status = 'pending')
-- Buffers: shared hit=342
-- Planning Time: 0.08 ms
-- Execution Time: 1.2 ms
-- Without the partial index, this was a 4.2 GB index scan
-- touching blocks it did not need. With it: 180 MB, 342 buffers. 180 MB instead of 4.2 GB. It fits in shared_buffers. It rebuilds faster during REINDEX. It carries none of the operational complexity of partitioning. And it achieves the same goal: fast queries on the subset of data that matters. One CREATE INDEX statement. No migration project. No ORM surgery.
I should be forthcoming about the limitation, because a butler who omits inconvenient details is no butler at all. The predicate is static — WHERE created_at > now() - interval '90 days' works as a predicate, but PostgreSQL evaluates it at index creation time, not at query time. You need to periodically recreate the index to keep the window current. A weekly cron job running REINDEX CONCURRENTLY on the partial index handles this, but it is a maintenance task to track. A modest maintenance task. Not a schema-wide renovation.
Composite indexes for multi-column filters
When the slow query filters on multiple columns, a composite index that matches the filter pattern is almost always the answer. Not sometimes. Almost always.
-- Common anti-pattern: partitioning to speed up multi-column filters
-- Better: a composite index that matches the query pattern
CREATE INDEX idx_orders_status_date ON orders (status, created_at DESC);
-- This single index handles:
-- WHERE status = 'pending' AND created_at > ...
-- WHERE status = 'shipped' ORDER BY created_at DESC LIMIT 20
-- The planner uses the leading column for equality,
-- then the second column for range filtering or sorting.
-- No partitioning required. A composite index on (status, created_at DESC) handles both equality filters on status and range filters on created_at in a single index scan. The planner does not need to combine two separate indexes. There is no partitioning to configure, no partition key to include in every constraint, and no ORM changes to make. The existing staff — PostgreSQL's index machinery — simply needed clearer instructions.
BRIN indexes: 240 KB where you were planning 1,400 partitions
If I may draw your attention to what I consider one of PostgreSQL's most elegant and most overlooked capabilities. BRIN — Block Range Index — is PostgreSQL's answer to the question that drives most partitioning conversations: "How do I efficiently query time-series data without the overhead of a massive B-tree or the complexity of partitioning?"
A BRIN index stores the minimum and maximum values for each range of physical pages in the table. For append-only, time-ordered data — where new rows always have later timestamps than existing rows — the physical order on disk matches the logical order of the timestamp column. BRIN exploits this correlation to achieve partition-pruning-like behaviour with an index measured in kilobytes. Not gigabytes. Kilobytes.
-- BRIN index: designed for physically ordered, append-only data
CREATE INDEX idx_events_created_brin ON events
USING brin (created_at)
WITH (pages_per_range = 32);
-- Size comparison on 500 million rows:
-- B-tree on created_at: 11 GB
-- BRIN on created_at: 240 KB
-- Yes, kilobytes. EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*)
FROM events
WHERE created_at BETWEEN '2026-01-01' AND '2026-01-31';
-- QUERY PLAN
-- -----------------------------------------------------------------
-- Aggregate
-- -> Bitmap Heap Scan on events
-- Recheck Cond: (created_at >= ... AND created_at <= ...)
-- Rows Removed by Recheck: 1,204
-- Heap Blocks: lossy=8,432
-- -> Bitmap Index Scan on idx_events_created_brin
-- Index Cond: (created_at >= ... AND created_at <= ...)
-- Buffers: shared hit=3
-- Planning Time: 0.12 ms
-- Execution Time: 48 ms
-- 3 buffer hits to read the index. The entire BRIN fits in cache.
-- BRIN scans some extra blocks (lossy) but the trade-off is
-- 240 KB of index instead of 11 GB. I shall ask you to sit with that number for a moment. 240 KB. On 500 million rows. A B-tree on the same column would be 11 GB. A partitioned table with daily partitions would require managing 1,400+ partitions — each with its own indexes, constraints, and maintenance schedule. The BRIN index is one CREATE INDEX statement. It is the kind of solution that makes one wonder what we have all been doing with our time.
The honest limitation — and I would not omit it. BRIN requires physical correlation between the indexed column and the storage order. For append-only tables (events, logs, metrics, audit trails), this correlation is natural: new rows go at the end, timestamps always increase. For tables with heavy updates or random inserts, the correlation degrades, and BRIN effectiveness diminishes accordingly. You can verify this:
SELECT attname, correlation
FROM pg_stats
WHERE tablename = 'events' AND attname = 'created_at';
-- correlation close to 1.0 or -1.0: BRIN will work well
-- correlation close to 0.0: BRIN will scan too many blocks For the time-series workloads that most commonly prompt partitioning discussions, the correlation is typically 0.99+. The tool was in the drawer the entire time.
Materialized views: do the maths once, serve the result indefinitely
The second most common driver of partitioning discussions arrives wearing the same disguise. "Our monthly revenue dashboard takes 14 seconds." The instinct is to partition the orders table by month so the dashboard query only scans one or two partitions. But the query is computing SUM, COUNT, and AVG across millions of rows. Even with partition pruning, it still reads millions of rows. The aggregation itself is the bottleneck, not the table scan. Partitioning this table for dashboard performance is like sorting every book in the library to speed up a word count — the problem was never which books to read; it was having to count the words each time.
A materialized view pre-computes the aggregation and stores the result. The dashboard query reads from the materialized view — a table with perhaps 500 rows instead of 200 million.
-- The query that prompted the partitioning discussion:
-- "monthly revenue by region takes 14 seconds on 200M rows"
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
date_trunc('month', created_at) AS month,
region,
count(*) AS order_count,
sum(total) AS revenue,
avg(total) AS avg_order
FROM orders
GROUP BY 1, 2;
CREATE UNIQUE INDEX ON monthly_revenue (month, region);
-- Query time: 200M row aggregation -> 2ms lookup
-- Refresh: REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue; 14 seconds becomes 2 milliseconds. The materialized view is refreshed on a schedule — every hour, every 15 minutes, whatever the staleness tolerance allows. REFRESH MATERIALIZED VIEW CONCURRENTLY does not block reads during the refresh. The work is done once; the result is served as many times as requested. This is, if I may observe, simply good household management.
Materialized views have their own operational concerns — staleness, disk usage, blocking refreshes without CONCURRENTLY, and no automatic invalidation. I would not pretend otherwise. But the complexity is localised: one view definition, one refresh schedule. Partitioning, by contrast, spreads its complexity across every query, every constraint, every migration, and every ORM model that so much as glances at the table. One solution contains its costs. The other distributes them.
When partitioning is precisely the right answer
I should be direct about this, because a butler who argues only one side of the case is not a butler — he is a salesman. None of the alternatives above replace partitioning for its core use cases. There are workloads where partitioning is genuinely, unambiguously the correct tool, and I would think less of myself if I failed to say so clearly.
Data retention policies
If your application needs to retain 90 days of data and discard the rest, partitioning wins. There is no close second. Dropping a partition is instant and produces no dead tuples. Batch-deleting millions of rows from a regular table generates bloat, triggers VACUUM, and holds row-level locks throughout. No index strategy, however clever, replicates the clean efficiency of DROP TABLE events_2025_12. This is partitioning doing what it was born to do — managing the lifecycle of data with the quiet authority of a household archive that knows which ledgers to keep and which to retire.
-- If your only goal is deleting old data, consider:
-- Option A: batch DELETE with a partial index
CREATE INDEX idx_orders_old ON orders (created_at)
WHERE created_at < now() - interval '1 year';
DELETE FROM orders
WHERE created_at < now() - interval '1 year'
LIMIT 10000;
-- Run in a loop until 0 rows affected. Gentle on locks.
-- Option B: pg_partman if you need instant drops
-- DROP TABLE orders_2024_01; -- instant, no dead tuples
-- This is where partitioning genuinely wins. Billions of rows at true scale
Once a table exceeds roughly one billion rows, the maintenance overhead becomes a problem regardless of query performance. B-tree indexes grow into the tens of gigabytes. VACUUM operations can take hours. REINDEX requires significant disk space for the temporary index. At this scale, partitioning the table into smaller physical units makes every maintenance operation manageable. The partitions are individually VACUUM'd, individually indexed, and individually backed up. This is not a preference. It is a necessity.
Time-series at genuine scale
IoT sensor data, observability metrics, financial tick data — workloads that generate millions of rows per day and need to retain months or years of history. The combination of partition pruning for time-range queries and partition drops for retention makes pg_partman genuinely valuable here. I would recommend it without hesitation. Cloudflare, facing this class of problem with their analytics infrastructure, ultimately adopted TimescaleDB — which is itself a partitioning layer built on top of PostgreSQL hypertables, tuned specifically for time-series ingest and query patterns. When the household genuinely needs specialist staff, one hires specialist staff.
A framework for the decision, since the decision deserves one
If you will allow me to lay this out plainly. Before reaching for pg_partman, consult the following. The first column that matches your situation should guide the approach.
| Problem | Partitioning | Alternative | Start here |
|---|---|---|---|
| Slow queries on a large table | Helps only if queries filter on the partition key | Composite indexes, partial indexes, or materialized views | Index first |
| Expensive aggregations (dashboards, reports) | Partition pruning helps if aggregating a time range | Materialized views pre-compute the result | Matview wins |
| Time-range queries on append-only data | Partition pruning skips irrelevant partitions | BRIN index achieves similar pruning with 240 KB | BRIN first, partition if billions |
| Data retention (drop old data efficiently) | DROP partition is instant, no dead tuples | Batch DELETE works but creates bloat | Partitioning wins |
| Table bloat from heavy UPDATE/DELETE | Smaller partitions make VACUUM faster | pg_repack, autovacuum tuning, fillfactor | Fix vacuum first |
| Billions of rows, true scale | Essential for maintenance and pruning | Indexes alone struggle above ~1B rows | Partition |
The pattern, once you see it, is difficult to unsee. Partitioning wins when the problem is data lifecycle — retention, archival, maintenance at scale. Indexing and materialized views win when the problem is query performance. Most teams reach for partitioning because of query performance. Most teams should have reached for an index. The advice to partition arrives most enthusiastically from those who have never had to un-partition.
Seven questions I would ask before permitting a partition
If, after everything above, you are still considering partitioning — good. That means you may genuinely need it. But allow me to conduct a brief inspection before the work begins.
- Does the slow query filter on the column you plan to partition by? If not, partitioning will not help that query. It may make it worse. I have seen this enough times to be quite certain.
- Have you checked for missing indexes? Run
EXPLAIN (ANALYZE, BUFFERS)on the slow query. If you see sequential scans where index scans should be, the diagnosis is straightforward. Add the index. Revisit the partitioning discussion afterwards, if it still seems necessary. It usually does not. - Is the problem aggregation cost or scan cost? If the query is slow because it aggregates millions of rows, a materialized view eliminates the aggregation entirely. Partitioning merely reduces the number of rows aggregated. These are not the same thing.
- Is your data append-only with a time column? If yes, a BRIN index on the time column gives you partition-pruning-like performance with zero operational overhead. 240 KB. I mention the number again because it bears repeating.
- Do you need data retention? If you need to drop old data on a schedule, partitioning is the right tool. I say this without reservation. Nothing else matches the efficiency of dropping a partition.
- Can your ORM handle compound primary keys? If your application uses Django, Rails, or another ORM with single-column primary key assumptions, partitioning will require significant schema and application changes. Significant. Not "a few tweaks."
- How many partitions will you have in two years? If the answer exceeds a few hundred, query planning overhead becomes a factor. Daily partitions accumulate faster than one tends to imagine.
If questions 1 through 4 point you toward an index or materialized view, start there. Those solutions are reversible — you can drop an index or a materialized view and try something else tomorrow. Converting a regular table to a partitioned table is a migration project. Converting it back is another migration project. The ratchet, I'm afraid, only turns one direction.
A word about the service I represent
Gold Lapel operates at the query layer. It observes the queries your application sends to PostgreSQL, identifies performance problems in the actual traffic, and applies the solutions we have been discussing — missing indexes, materialized views for repeated expensive aggregations, query rewrites that the planner missed — automatically, without migration files or committee meetings.
In practice, this addresses the query performance problems that teams mistakenly attempt to solve with partitioning. The dashboard aggregation that takes 14 seconds? Gold Lapel materialises the result set after observing the pattern, and subsequent queries are served from the materialized view in milliseconds. The multi-column filter on a 200-million-row table? Gold Lapel creates the composite index that the workload needs, without anyone writing a migration.
What Gold Lapel does not do is manage data lifecycle. If you need to drop old data efficiently, partition your tables. If you need to archive partitions to cold storage, use pg_partman. Those are storage-layer problems, and they require a storage-layer solution. I would be a poor butler indeed if I claimed otherwise. The two concerns are complementary, not competing — and knowing which is which is rather the point of everything I have been saying.