Chapter 14: Everything MongoDB Can't Do
Good evening. We have, if you'll permit me a moment of reflection, been keeping a ledger together for some thirteen chapters now. Twenty-one methods. Seven languages. Every MongoDB operator paired with its PostgreSQL equivalent, every aggregation stage rewritten as SQL, every index type matched or, on occasion, bettered. The ledger balances. PostgreSQL can, in fact, do what MongoDB does.
That was the whole point of the exercise. Until now.
This evening, I should like to turn the ledger around. For the remainder of the book we will stop asking whether PostgreSQL can match MongoDB, and begin — politely, factually, without relish — to list what MongoDB, by the shape of its design, cannot match back.
A word, before we begin, about the subject of this chapter. MongoDB is excellent software. It was built by serious engineers to solve a genuine problem — developers in 2009 were drowning in ORM impedance mismatches, and MongoDB offered a document model that simply fit the shape of the data. The six capabilities I am about to describe are not failures on MongoDB's part. They are the consequence of choosing to be a document store rather than a database that happens to store documents. The distinction is quiet, but it is the whole argument.
So: one list. Six entries. Stated plainly. The postgresql advantages over mongodb that matter are not, I'm afraid, the ones you tend to find in marketing comparisons. They are the ones you only notice when a feature you needed in PostgreSQL turns out not to exist in the other system — and you realise it was never going to.
ACID Transactions Across Collections
A single PostgreSQL transaction may touch a JSONB document table, a relational inventory table, and an audit log — all in one statement group, under one commit, under one set of ACID guarantees. Either every change lands, or none of them do. There is no compensation logic to write, no saga to orchestrate, no outbox pattern to implement. The transaction boundary is the database's job, and the database attends to it.
BEGIN;
INSERT INTO orders (data)
VALUES ('{"customer_id": 42, "items": [{"sku": "ABC-123", "qty": 1}]}'::jsonb);
UPDATE inventory
SET quantity = quantity - 1
WHERE sku = 'ABC-123';
INSERT INTO audit_log (event, actor)
VALUES ('order.placed', 'waiter');
COMMIT; Three tables. Three writes. One atomic unit. If the UPDATE would drop inventory below zero and a CHECK constraint rejects it, the order insert rolls back in sympathy. The audit entry never appears. The client sees a single error rather than a partial world — which is, I should say, rather the point.
PostgreSQL has supported ACID transactions since its earliest releases; MVCC — the concurrency model that makes those transactions cheap to compose — arrived in version 6.5 in June 1999 and has been the default ever since. It is not a feature the project added to compete with anyone. It is the substrate everything else is built upon.
MongoDB's transactions, fairly stated. MongoDB shipped multi-document transactions in 4.0 in June 2018 for replica sets, and extended them to sharded clusters in 4.2 in 2019. They work. Teams use them in production every day. The documented constraints are real but bounded: the default transactionLifetimeLimitSeconds is 60 seconds, individual oplog entries inherit the 16 MB BSON document size cap (large transactions are split across multiple entries from 4.2 onward), and long-running transactions create WiredTiger cache pressure that operators ought to monitor. These are tractable limits for short, focused transactions — which is precisely what MongoDB's own guidance recommends, and sensibly so.
The difference, I should note, is cultural as much as technical. In MongoDB, multi-document transactions are a tool you reach for deliberately. In PostgreSQL, they are how the database works. Every statement runs in a transaction, whether you wrote BEGIN or not. The cost of crossing a collection boundary is identical to the cost of crossing a row boundary within a single collection: zero, because the boundary is not architectural.
SQL JOINs Between Documents and Relational Tables
A JSONB document table is, at the end of the day, still a PostgreSQL table. It has a row type, it participates in the planner's statistics, it can be indexed, and it can be joined against anything else the planner can see — other document tables, relational tables, views, CTEs, foreign tables exposed by postgres_fdw, even materialized views. There is no special door for documents; they dine in the same room as everyone else.
SELECT
o.data->>'order_id' AS order_id,
c.name AS customer_name,
SUM((item->>'price')::numeric) AS total
FROM orders o
JOIN customers c
ON c.id = (o.data->>'customer_id')::int
CROSS JOIN LATERAL jsonb_array_elements(o.data->'items') AS item
WHERE o.data->>'status' = 'pending'
GROUP BY o.data->>'order_id', c.name; Three sources in one query: a document table, a relational table, and a lateral expansion of the document's line items. The planner treats all three as first-class citizens. With a GIN index on orders.data and a B-tree on customers.id, PostgreSQL 16 and 17 can choose hash, merge, or nested-loop strategies depending on cardinality. None of this is exotic. It is a Tuesday SQL query that happens to reach into JSON.
MongoDB's counterpart is $lookup, and it is, by design, scoped. A $lookup stage joins against exactly one foreign collection. An aggregation pipeline can chain several $lookup stages, but each remains a single foreign join, executed in pipeline order rather than planned holistically. Sharded foreign collections were not supported in $lookup until MongoDB 5.1 in late 2021, and while that limitation has since been lifted, the underlying shape of the operator is unchanged: one join at a time, evaluated in sequence, without a cost-based planner deciding which order to execute them in.
For simple lookups, the practical consequence is small. For analytical queries, it grows rather quickly. When you wish to join an orders document against customers, then against regions, then against a fx_rates table to convert currency, the PostgreSQL planner can reorder those joins based on statistics. A MongoDB pipeline will execute them in the order you wrote them. You become the planner — and you are, if I may be candid, working from less information than the database has.
One Database, No Sync Pipeline
When your documents and your relational rows live in the same database, they live in the same backup, the same connection pool, the same transaction, the same point-in-time recovery target. There is no CDC job moving rows from MongoDB into PostgreSQL for analytics. No Debezium topic, no Kafka cluster translating oplog events into SQL inserts, no eventual-consistency window in which the document store and the reporting store disagree by thirty seconds.
The two-system architecture has become so common that most teams, quite understandably, stop noticing it: MongoDB on the left for the application, Kafka in the middle carrying change events, Debezium connectors translating oplog entries into messages, a warehouse or PostgreSQL on the right for analytics. Debezium connectors, I should note, exist for both databases — this is not a MongoDB-only matter — but the problem only arises when your operational and analytical stores are separate systems in the first place. Collapse them into one database, and the middle column quietly disappears.
SELECT
date_trunc('day', (data->>'created_at')::timestamptz) AS day,
COUNT(*) AS orders,
SUM((data->>'total')::numeric) AS revenue
FROM orders
JOIN regions r ON r.code = data->>'region'
WHERE r.is_active
GROUP BY 1
ORDER BY 1; Operational document data joined against a reference table to produce a daily revenue roll-up. In a split architecture, this query requires a pipeline: extract orders from MongoDB, land them in a warehouse, join them against the regions table, schedule the refresh, monitor the lag, alert on drift. In one PostgreSQL database, it requires one SELECT.
Saved complexity, if I may say so, is not a feature one adds. It is a feature one never has to build.
The PostgreSQL Ecosystem — PostGIS, pgvector, Full-Text Search
Here is where the argument, gently, stops being about SQL and starts being about what a mature relational ecosystem provides when you arrive at the table. A document row in PostgreSQL can participate in a spatial query, a vector similarity search, and a full-text match — all in the same database, in one query, under one transaction. It is, I think, rather a lot on one plate. It is also none the worse for it.
Four pillars hold the thing up.
PostGIS (3.4+ on PostgreSQL 16; 3.5+ on PostgreSQL 17) adds a geography and geometry type, spatial indexes via GIST, and several hundred spatial functions. A JSONB document with a lat/lon field may be indexed spatially and queried with ST_Distance, ST_Contains, ST_Intersects against any other spatial data in the database. PostGIS is not a small project; it is among the most respected pieces of open-source geospatial software in existence, and it happens to live inside your database.
pgvector (0.5.0 introduced HNSW indexes in 2023, alongside the existing IVFFlat support) stores embeddings as a native type. Embeddings live in the same row as the document they describe. No separate vector database. No sync job keeping the vector index in step with the source text. When you update the document, the embedding updates in the same transaction — because it is, as far as the database is concerned, the same row.
tsvector full-text search ships in core PostgreSQL: stemming, ranking, phrase search, multiple language configurations, GIN indexes on the generated tsvector. It is not Lucene, and the feature surface is smaller. But it runs inside the database, and it joins against everything else.
SQL primitives — window functions, recursive CTEs, LATERAL, table partitioning — give you compositional tools the MongoDB aggregation framework does not offer. WITH RECURSIVE has no direct equivalent; $graphLookup is the closest thing and is strictly weaker, being limited to a single starting collection and a bounded traversal shape.
One query, three extensions, one document column:
SELECT
d.data->>'title' AS title,
ts_rank(to_tsvector('english', d.data->>'body'), query) AS text_score,
d.embedding <=> $1 AS vector_distance,
ST_Distance(d.location, ST_MakePoint($2, $3)::geography) AS meters
FROM documents d,
plainto_tsquery('english', $4) query
WHERE to_tsvector('english', d.data->>'body') @@ query
ORDER BY vector_distance
LIMIT 20; Full-text filter, vector ranking, spatial distance. All against a document column. All in one plan.
MongoDB Atlas has genuine answers here. Atlas Vector Search (GA December 2023) is a production-grade vector index. Atlas Search is a Lucene-backed full-text tier of considerable sophistication. Atlas geo indexes cover a solid portion of what PostGIS does. The argument is not that MongoDB has nothing — it has good, well-engineered offerings. The argument is about where they live. Each of those indexes runs on a separate tier, synchronised from the source collection. You cannot write one query that filters by full-text, ranks by vector distance, and sorts by spatial proximity across all three indexes in a single transaction — because the indexes are not in the same system. In PostgreSQL, you can. That is the whole of the difference, and the whole of the difference is rather a lot.
Row-Level Security on Document Tables
CREATE POLICY applies to a JSONB table precisely as it applies to a relational one. Multi-tenant isolation, per-user row visibility, read/write asymmetry — all enforced by the database rather than by application middleware that every new service must re-implement from first principles.
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON orders
AS RESTRICTIVE
USING ((data->>'tenant_id')::uuid = current_setting('app.tenant_id')::uuid);
CREATE POLICY owner_read ON orders
FOR SELECT
USING ((data->>'owner_id')::uuid = current_setting('app.user_id')::uuid); A brief note on the AS RESTRICTIVE modifier: by default, multiple policies on the same table combine with OR, which would broaden access rather than narrow it. Marking tenant_isolation as restrictive forces tenant containment on every query regardless of what the other policies say — the behaviour one almost always wants when layering a tenant predicate beneath a per-user one.
Row-level security shipped in PostgreSQL 9.5, in January 2016. A policy is a predicate attached to a table; the planner appends it to every query automatically. A tenant set through SET app.tenant_id at connection time sees only its own rows, and the enforcement lives in the database — not in the ORM, not in a middleware layer, and certainly not in a code reviewer's head, which is the worst possible location for a security boundary.
MongoDB's closest analogue historically lived at the platform layer in Atlas App Services (the platform formerly known as Realm), which let you define read and write filters per collection alongside field-level encryption for specific fields at rest. App Services was deprecated by MongoDB in September 2024 and reached end-of-life on 30 September 2025, so contemporary MongoDB deployments enforce comparable rules in the application layer, via driver-side middleware, or — for some shapes — via Queryable Encryption. These are real security primitives and serve real use cases; I would not wish to understate them. But they sit outside the database engine, and they do not compose with query planning the way a CREATE POLICY predicate does. A missed application-layer rule is a security gap. A missed CREATE POLICY is likewise a security gap — but the policy, once in place, applies to every query against that table forever after, without the application having to remember. Security that depends on remembering is, regrettably, security that occasionally forgets.
One Backup, One Restore
pg_dump captures the world. Tables, indexes, sequences, functions, views, row-level security policies, extensions, permissions — all of it — in one file, produced by one command, restored by one command.
pg_dump --format=custom --file=app.dump mydb
pg_restore --dbname=mydb_restored app.dump Two lines. The entire application's persistent state, captured and replayable. For PostGIS and pgvector data, the custom format handles the extension types cleanly in current versions; the geometries and the embeddings travel with the rows that own them, as one would rather hope they would. Point-in-time recovery through WAL archiving provides the second half of the story — continuous backup rather than snapshot backup — and it is native to the engine.
MongoDB has mongodump, oplog-based point-in-time recovery, and a genuinely excellent Atlas-managed backup experience. None of this is missing, and it would be ungracious to suggest otherwise. The argument is not that MongoDB cannot be backed up; it is that in a two-database architecture, backup becomes two procedures, two retention policies, two restore tests, and a coordination problem when you need to restore both systems to the same moment in time. One database means one procedure and one moment. It is a quieter way to sleep at night.
The Comparison Table
| Capability | PostgreSQL | MongoDB |
|---|---|---|
| ACID across collections/tables | Native; MVCC since 6.5 (1999) | 4.0 (2018), bounded |
| SQL JOINs across document + relational | Unlimited, planner-optimized | $lookup, one foreign collection per stage |
| Spatial queries on documents | PostGIS in-database | Atlas geo indexes (collection-scoped) |
| Vector search alongside documents | pgvector in-database | Atlas Vector Search (separate index tier) |
| Full-text search on document fields | tsvector in-database | Atlas Search (Lucene, separate tier) |
| Recursive queries | WITH RECURSIVE | $graphLookup (weaker) |
| Row-level security | CREATE POLICY | Application layer (App Services retired Sep 2025) |
| Single backup command | pg_dump | mongodump (per deployment) |
| Cross-system sync required | No | Yes, for warehouse/analytics |
Honest Boundary — What MongoDB Still Does Better
I should be candid now, because a chapter that catalogued only the one side of the ledger would be the work of a salesman rather than a Waiter. There are things MongoDB does better, and the gap in those places is real.
Single-node write throughput. On pure write workloads — high-volume ingest, sustained bulk inserts — MongoDB tends to outperform PostgreSQL on equivalent hardware, typically in the 2.5–4x range for sustained document-ingest workloads, with the exact multiple varying by schema shape, durability settings, and client concurrency. This is not a configuration artefact; it is the consequence of a storage engine and a write path designed specifically around document appends. PostgreSQL's parallel COPY work and libpq pipeline-mode protocol improvements are active areas of ongoing development and will, in time, narrow the gap, but they do not close it today. Chapter 15 takes up this trade-off in detail. If your workload is dominated by raw write rate and your reads can tolerate eventual-consistency read replicas, MongoDB earns its place on your evaluation list, and I would not fault you for putting it there.
Extreme horizontal sharding. When you are operating fifty-plus nodes with multi-terabyte working sets, MongoDB's sharding architecture is purpose-built for precisely the operational model you need: automatic chunk balancing, config servers, mongos routers, a well-documented playbook for adding and draining shards. PostgreSQL has answers — Citus (now maintained by Microsoft as open source under the Azure umbrella), native partitioning, logical replication — and at moderate scale they work well indeed. But at the scale where you have a dedicated team whose full-time occupation is keeping the sharded cluster healthy, MongoDB's operational ergonomics are better. The threshold is a matter of judgement rather than a hard number, but somewhere past fifty nodes the balance tips.
If either of those shapes matches your workload, MongoDB is a reasonable answer and this book is decidedly not trying to talk you out of it. The argument for the other ninety percent of applications — the ones whose operational scale fits comfortably within a single PostgreSQL cluster — is the six capabilities above.
The six capabilities in this chapter are not a feature list to be weighed against another feature list. They are the shape of what happens when your documents live inside a relational database rather than next to one. Transactions span documents and tables because the transaction system was there first. Joins work across document and relational data because the planner does not know the difference and has no reason to care. Spatial and vector and full-text queries compose with document queries because all four live in the same engine. Row-level security applies to JSONB tables because a JSONB table is, after all, still a table. Backups capture everything because everything is in one place. Once documents are inside PostgreSQL, they inherit an ecosystem that MongoDB cannot replicate without becoming a different database entirely — which is not a verdict on MongoDB, but a description of what a relational substrate gives you that a document substrate, by its nature, does not.
The next chapter takes the other side of this argument and gives it equal weight — the trade-offs, the places where the operational cost of "one database" genuinely costs you something, the decisions a team ought to make with clear eyes and a full picture. The postgresql advantages over mongodb are real. So are the costs. We shall count those honestly too.
If you'll follow me through. I think you'll find the other side of the ledger rather more interesting than you might expect.