Chapter 17: The Migration

The Waiter of Gold Lapel · Published Apr 19, 2026 · 14 min

The previous sixteen chapters have been the argument. This one, if you'll permit me, is the work.

If you are still reading, you have likely decided — or are nearly decided — that a MongoDB workload will be moving to PostgreSQL. I am delighted to hear it. The path is straightforward, and rather shorter than you probably expect: audit what you have, map it to a schema, move the bytes, rewrite the calls, verify the parity, and flip the switch. Five verbs. That is the whole chapter.

A note before I pour the wine. MongoDB is excellent software — genuinely well-engineered and built with care by people who understood their problem deeply. If you need MongoDB, run MongoDB. This chapter is for teams who have decided they do not, and the tone throughout reflects that. We are not leaving because the product is poor. We are leaving because one database is cheaper to run than two, and because Chapter 16 convinced us that the document model survives the move intact.

Chapter 16 is a prerequisite. The schema decisions happen there. What follows is pure execution.

Before You Migrate — The MongoDB Feature Audit

Before you write a line of migration code, allow me to suggest an inventory. In my experience, most teams use five MongoDB operations in earnest: insertOne/insertMany, find, updateOne/updateMany, aggregate, and deleteOne/deleteMany. Everything else is long-tail. The migration that sounded alarming at the planning meeting is, more often than not, a small one — and you will only know that for certain once you have looked.

Produce three lists: collections, query patterns, and indexes. The following mongosh commands generate them:

Audit commands

// Collections and approximate size
db.getCollectionNames().forEach(name => {
  const stats = db.getCollection(name).stats();
  print(name, stats.count, stats.size);
});

// Indexes for a given collection
db.users.getIndexes();

// A real query shape, with the planner's verdict
db.users.find({ tenant_id: "t_123", status: "active" })
  .sort({ created_at: -1 })
  .explain("executionStats");

For each collection, fill in a short template:

Name: users
Document count: 2.4M
Indexes: {email: 1} unique, {tenant_id: 1, created_at: -1}
Top three query shapes: by email, by tenant + recency, by _id
Special features used: none of $lookup, $graphLookup, change streams, GridFS, TTL

The last line is the one that saves weekends. $lookup and $graphLookup translate to SQL joins, which is fine. Change streams translate to LISTEN/NOTIFY plus logical replication — doable, but worth planning. GridFS is a different conversation entirely; it usually becomes object storage with a PostgreSQL metadata row alongside. If the audit turns up none of these, you are on the easy path, and I should like to congratulate you on the good taste of your prior architects.

A migration planned on assumption is a migration that finds its surprises at two in the morning. A migration planned on an audit finds its surprises on a Tuesday afternoon, which is where surprises belong.

Schema Mapping — From MongoDB Collection to PostgreSQL Table

Step two is a document, not a script. Before any data moves, I should like you to write down where every MongoDB field lands in PostgreSQL.

Chapter 16 gave you the decision framework. For each field, choose one of three destinations: stay in jsonb, get promoted to a typed column, or become a generated column. The typed column is for fields whose shape you have already stabilized — primary keys, tenancy, timestamps. The generated column is for fields you wish to index without asking the application to change how it writes. Everything else stays in jsonb, where it belongs and where it is, frankly, quite happy.

A mapping table for a users collection looks like this:

MongoDB field	PostgreSQL destination	Type	Index intent
`_id` (ObjectId)	`legacy_id` column	`text`	unique B-tree
(new)	`_id` column	`uuid`	primary key
`created_at`	`created_at` column	`timestamptz`	B-tree
`email`	generated column from `data->>'email'`	`text`	unique B-tree
`tenant_id`	JSONB path `data->>'tenant_id'`	—	expression B-tree
`profile.`, `preferences.`	stays in `data`	`jsonb`	none

The target DDL:

Target table DDL

CREATE TABLE users (
  _id         uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  legacy_id   text NOT NULL,
  created_at  timestamptz NOT NULL DEFAULT now(),
  data        jsonb NOT NULL,
  email       text GENERATED ALWAYS AS (data->>'email') STORED
);

Two PostgreSQL details worth naming plainly. Generated columns arrived in PostgreSQL 12 as STORED only; PostgreSQL 18 (released September 2025) added VIRTUAL generated columns and made them the default when the keyword is omitted. For a migration where you want the generated value indexable without recomputation on every read, stay explicit and write STORED — it works on every supported release and is the right choice when you plan to index the column. And the column type is jsonb, never json. jsonb is the binary, indexable, deduplicating form; json is a text blob that happens to validate. For queryable document data the answer is always jsonb. I shall not let you choose wrongly here on my watch.

Appendix A reproduces the full field-type-index mapping table. Keep it to hand while you write the DDL.

Exporting from MongoDB with `mongoexport` and Extended JSON

Now the bytes move. For small-to-medium collections, mongoexport is the friendly starting point — a gracious little tool that asks nothing of you beyond a connection string:

mongoexport

mongoexport \
  --uri="mongodb://user:pass@host:27017/appdb" \
  --collection=users \
  --out=users.json \
  --jsonArray

Without --jsonArray, mongoexport emits newline-delimited JSON, one document per line. With it, you get a single JSON array. NDJSON streams better for large collections; the array form is easier to hand to a script that expects "a JSON document." Pick whichever matches your loader.

For very large collections — tens of gigabytes and up — reach for mongodump. The BSON output is faster to produce and more compact on disk, at the cost of needing a BSON-aware loader on the other side. Most teams never need this; if you do, you already know.

mongoexport emits MongoDB Extended JSON, and there is one thing I must insist you understand about it. BSON has types that JSON does not — ObjectId, date, 64-bit integer, decimal, binary — and Extended JSON encodes them as tagged objects. A document that looks like this in Compass:

In Compass

{ _id: ObjectId("6512a9b4e4b0c3f1a2d4e5f6"),
  created_at: ISODate("2025-09-27T10:30:00Z"),
  balance: NumberDecimal("42.50") }

…arrives in the export file dressed rather differently:

In Extended JSON

{
  "_id":        { "$oid": "6512a9b4e4b0c3f1a2d4e5f6" },
  "created_at": { "$date": "2025-09-27T10:30:00Z" },
  "balance":    { "$numberDecimal": "42.50" }
}

By default, mongoexport uses Extended JSON v2 in relaxed mode, which means ordinary 32-bit integers and finite floats come out as plain JSON numbers. Only the types JSON cannot represent get the $-tagged wrapper. A considerate choice, and also the thing your loader has to normalize before it reaches PostgreSQL. The full list to handle: $oid, $date, $numberLong, $numberDecimal, $binary.

Five wrappers. Learn them once, and the rest of the export file is simply the document you remember.

Loading into PostgreSQL — `COPY`, Bulk INSERT, and ObjectId to UUID Conversion

On the load side, you have two reasonable approaches. For collections under a few million rows, a batched INSERT from a small script is simple and perfectly sufficient. For anything larger, I would suggest COPY ... FROM STDIN — it is the fastest path into PostgreSQL, and it accepts raw JSON text for jsonb columns directly (confirmed in PostgreSQL 14 and later; the behavior is older than that, but 14 is the oldest fully supported line as of this writing).

The ObjectId question deserves a clear answer, and I would not be much of a waiter if I hedged on it. A MongoDB ObjectId is 12 bytes, 24 hex characters, not a UUID. The first four bytes are a Unix timestamp (second granularity — fine for coarse ordering, useless for millisecond sorts), followed by a random value and a counter. The pragmatic migration answer is to mint fresh UUIDs for the primary key and keep the 24-character hex string as a legacy_id text column with a unique index. Existing external references — webhooks, cached URLs, third-party systems — continue to resolve without incident, and your new ids are proper UUIDs from the first day of operation.

Legacy ID index

CREATE UNIQUE INDEX users_legacy_id_idx ON users(legacy_id);

Here is the shape of a Python loader that streams NDJSON through COPY with on-the-fly Extended JSON normalization:

Python COPY loader

import json, sys, uuid, psycopg

def normalize(doc):
    oid = doc.pop("_id")["$oid"]
    created = doc.pop("created_at")["$date"]
    if "balance" in doc and isinstance(doc["balance"], dict):
        doc["balance"] = doc["balance"]["$numberDecimal"]
    return str(uuid.uuid4()), oid, created, json.dumps(doc)

with psycopg.connect("postgresql://localhost/appdb") as conn:
    with conn.cursor().copy(
        "COPY users (_id, legacy_id, created_at, data) FROM STDIN"
    ) as copy:
        for line in sys.stdin:
            doc = json.loads(line)
            copy.write_row(normalize(doc))

Three transformations per document. The $oid is unwrapped into legacy_id; the $date is passed through as ISO-8601 (PostgreSQL parses it into timestamptz without assistance); and $numberLong / $numberDecimal are unwrapped into their string forms. Everything else — nested objects, arrays, booleans, plain numbers — goes into data untouched.

On a laptop against a local PostgreSQL, this loader moves roughly 50,000–100,000 documents per second for small documents. Production numbers depend on document size and network, but COPY is not going to be your bottleneck. If something in the pipeline is slow, it will be somewhere more interesting.

Index Migration — Mapping MongoDB Indexes to PostgreSQL

With data jsonb, most indexes you port are expression indexes — indexes on a computed path into the document — not column indexes. The mapping looks like this:

MongoDB index	PostgreSQL equivalent	Notes
Single-field `{email: 1}`	B-tree expression index on `(data->>'email')`	Or on a generated column
Compound `{tenant_id: 1, created_at: -1}`	Composite B-tree on the same expressions	Column order matches the query
Multikey (on array fields)	GIN on `data` with `jsonb_path_ops`	Smaller, faster, but supports only `@>`
Text	GIN on `to_tsvector('english', data->>'body')`	See Chapter 13
2dsphere	PostGIS GiST on a `geography` generated column	Requires the PostGIS extension
TTL	Scheduled `DELETE` via `pg_cron`, or partition drop	No native TTL
Unique	`CREATE UNIQUE INDEX` on the expression	Same semantics

A concrete example for the three most common cases:

Index migration

-- Single-field
CREATE UNIQUE INDEX users_email_idx ON users ((data->>'email'));

-- Compound
CREATE INDEX users_tenant_created_idx
  ON users ((data->>'tenant_id'),
            ((data->>'created_at')::timestamptz) DESC);

-- Multikey / containment
CREATE INDEX users_data_gin_idx
  ON users USING gin (data jsonb_path_ops);

Two honest notes I should not keep from you. jsonb_path_ops is smaller and faster than the default jsonb_ops opclass, and it supports @>, @?, and @@ — but not the key-exists operators ?, ?|, or ?&. If you need those, stay on jsonb_ops. And PostgreSQL has no native TTL — no amount of searching the documentation will produce what is not there. pg_cron or time-range partitioning is the genuine answer, and either one is operationally cleaner than MongoDB's background TTL thread once you have lived with them for a quarter.

Gold Lapel's auto-indexing can create most of these for you from query telemetry, as described in Chapter 11. For a migration, I would recommend creating the indexes you already know you need by hand — the ones your audit turned up — and letting auto-indexing fill in the rest once reads have moved over. A waiter lays the table he knows will be needed, and lets the guests ask for what else they might like.

Rewriting Application Code — From MongoDB Driver to Gold Lapel

This is the part readers worry about most, and it turns out to be the smallest. Gold Lapel's document methods — doc_find, doc_insert, doc_update, doc_aggregate — mirror the MongoDB driver's filter and pipeline syntax deliberately. The filter grammar is preserved because the cognitive cost of learning a second one was, in our judgement, not worth the aesthetic satisfaction of inventing our own. Familiarity is a kindness.

A realistic before-and-after. In Node, against the MongoDB driver:

MongoDB driver

const users = await db.collection("users")
  .find({ tenant_id: t, status: "active" })
  .sort({ created_at: -1 })
  .limit(50)
  .toArray();

Through Gold Lapel:

Gold Lapel

const users = await gl.doc_find("users",
  { tenant_id: t, status: "active" },
  { sort: { created_at: -1 }, limit: 50 }
);

The filter document is identical. The sort and limit move from fluent chaining into an options object, and that is the only real ergonomic change the migration asks of you. For aggregations, the pipeline passes through unchanged:

Aggregation pipeline

const revenue = await gl.doc_aggregate("users", [
  { $match: { tenant_id: t, status: "active" } },
  { $group: { _id: "$plan", total: { $sum: "$mrr" } } },
  { $sort: { total: -1 } }
]);

The SQL this generates — the CTE chain, the JSONB path expressions, the occasional materialized-view shortcut — is treated fully in Chapter 12. Gold Lapel is available in seven languages (Python, Node.js, Ruby, Java, PHP, Go, and .NET), and the method names and filter syntax are identical across all of them. The full signature reference is Appendix B.

One grammar across seven languages. It is the least glamorous feature in the product, and the one most responsible for migrations that finish on schedule.

The Feature-to-Method Mapping Table

This is the table to keep open in another tab. I should like to make your life easier here:

MongoDB	Gold Lapel	PostgreSQL mechanism	Chapter
`insertOne` / `insertMany`	`doc_insert` / `doc_insert_many`	`INSERT ... RETURNING`	11
`find`	`doc_find`	`SELECT ... WHERE data @> ...`	10
`findOne`	`doc_find_one`	`SELECT ... LIMIT 1`	10
`updateOne` / `updateMany`	`doc_update_one` / `doc_update`	`UPDATE ... SET data = jsonb_set(...)`	11
`deleteOne` / `deleteMany`	`doc_delete_one` / `doc_delete`	`DELETE`	11
`aggregate`	`doc_aggregate`	CTE chain, optionally materialized	12
`countDocuments`	`doc_count`	`SELECT count(*)`	9
`distinct`	`doc_distinct`	`SELECT DISTINCT`	11
`createIndex`	`doc_create_index`	`CREATE INDEX` (expression or GIN)	6
`findOneAndUpdate`	`doc_find_one_and_update`	`UPDATE ... RETURNING` in a CTE	11
`findOneAndDelete`	`doc_find_one_and_delete`	`DELETE ... RETURNING` in a CTE	11
Change streams	`doc_watch` / `doc_unwatch`	row trigger + `LISTEN` / `NOTIFY`	13
TTL expiry	`doc_create_ttl_index` / `doc_remove_ttl_index`	expression index + sweep trigger	13
Capped collections	`doc_create_capped` / `doc_remove_cap`	row-count trigger + ordering index	13
Full-text search	`doc_find` with `$text`	`tsvector` + GIN	5
Transactions	ambient via `using(conn)`	`BEGIN` / `COMMIT` on the connection	14

The table covers the MongoDB operations most migrations encounter. The full set of 21 methods lives in Appendix B. Chapter numbers point to where each method is explained in depth, not merely named — if a row raises a question, the chapter contains its answer.

Dual-Write and Validation

Step six is belt and suspenders, and I make no apology for the redundancy — a migration is one of the few occasions where wearing both is not only acceptable but expected. The application writes to both MongoDB and PostgreSQL for a window — days, not hours, for anything you genuinely care about — while reads still come from MongoDB. This arrangement buys you two things: a rollback path that is a configuration flip, and a running parity check that reports itself as you go.

The write path looks like this in pseudocode:

Dual write

async function createUser(input) {
  const user = await mongo.collection("users").insertOne(input);
  try {
    await gl.doc_insert("users", { ...input, legacy_id: user.insertedId });
  } catch (err) {
    log.warn("gl_write_failed", { id: user.insertedId, err });
  }
  return user;
}

MongoDB remains the source of truth throughout this window. Gold Lapel writes are best-effort; failures are logged, counted, and investigated, but they do not fail the request. You are measuring, not yet committing.

Validation happens in three flavours:

Three validation checks

-- 1. Row counts. Compare with db.users.countDocuments() in mongosh.
SELECT count(*) FROM users;

-- 2. A deterministic hash over a recent window.
SELECT md5(string_agg(data::text, '' ORDER BY _id))
  FROM users
  WHERE created_at > now() - interval '1 hour';

-- 3. An aggregation parity check.
SELECT data->>'plan' AS plan, count(*)
  FROM users
  WHERE data->>'status' = 'active'
  GROUP BY 1
  ORDER BY 1;

Three cautions on the parity checks, and I would rather deliver them now than have you discover them on a dashboard. countDocuments() is the accurate MongoDB count — the older count() is deprecated, and I would not have you comparing against it. jsonb normalizes whitespace and key order on ingest, so the hash of a data::text column will not match the hash of the original export file byte-for-byte; canonicalize keys with jsonb_object_agg over a sorted key set if an exact diff is what you require. And $numberDecimal round-trips correctly provided you keep it as a string and convert to numeric only when arithmetic is needed — if you let it pass through JavaScript as a Number, precision will leave quietly on values outside the 53-bit range, and the loss is not one you notice until it matters.

Drift is entirely normal. Clock skew at write time, $date precision (MongoDB stores milliseconds; PostgreSQL timestamptz stores microseconds — so MongoDB → PostgreSQL is lossless, but any round-trip back to MongoDB truncates sub-millisecond digits), and the occasional failed Gold Lapel write will all show up as tiny differences. The job is not zero drift. The job is drift that decreases over time and explains itself when asked.

Cutover and Rollback Safety

Step seven is the flip, and if the prior steps were attended to with care, it is smaller than you expect. One does not rush a proper cutover — but one does not need to linger at it, either, once the table is set.

Point reads at Gold Lapel via a feature flag. Keep writes dual for one more window — long enough to observe p50/p95/p99 on doc_find, the error rate, and row-count drift across the dashboard you built during validation. Then stop writing to MongoDB, but leave the instance running, read-only, for as long as you can bear the invoice. Your rollback is now a configuration change, not a restore from archive.

The cutover itself is one environment variable:

Cutover flag

READ_BACKEND=goldlapel

That is the whole runbook. Everything before it — the migration, the validation, the dual-write window — existed so that the runbook could be one line. A boring flip on a Tuesday afternoon is the correct sound a migration should make. The dramatic ones, as a rule, belong to stories told afterwards rather than experiences had first-hand.

Honest Boundary — When the Migration Is Not a Weekend

I should be forthcoming about the cases where the path in this chapter does not fit, because pretending otherwise would be a disservice to you and an embarrassment to me.

Write-heavy, latency-sensitive workloads. The current performance gap on pure write throughput is 2.5–4x in MongoDB's favour. Parallel COPY and pipeline mode are on the Gold Lapel roadmap — they will close most of the gap when they ship — but they are not here today. If your workload is write-dominated and a 2–4x latency difference is load-bearing to the product, migrate when the write acceleration work lands, or migrate now with clear eyes on the gap. Do not migrate and then discover it.

Extreme horizontal sharding. If you are running fifty or more shard nodes, or your hot working set is in the multi-terabyte range, MongoDB's sharding story is more mature than anything you will assemble from a single PostgreSQL primary plus Citus, plus partitioning, plus read replicas. This is a genuine boundary, not a footnote. PostgreSQL will arrive there in time; it is not there for workloads at that scale today, and I will not pretend that it is.

MongoDB is excellent software, and in both of the situations above it is the correct tool in the correct hand. Operational simplicity is the argument of this book, not product quality. If either boundary is yours, run MongoDB, and come back when the shape of the problem changes. We stand on MongoDB's shoulders in the document-database story; your application may not need to, but the tradition is real, and we acknowledge it with gratitude.

Decommission — One Fewer Service, One Fewer Invoice

Step eight is the payoff, and I confess I enjoy this one.

Tear down the dual-write pipeline. Take a final cold archive:

Final archive

mongodump --archive=final-2026-04-18.archive --gzip

Store it wherever your cold storage lives, confirm it restores into a scratch instance, and stop the MongoDB server. Remove the driver from the application:

Remove MongoDB driver

- "mongodb": "^6.5.0",
  "pg": "^8.11.0",
  "goldlapel": "^1.2.0",

Delete the connection strings from your secrets manager. Delete the MongoDB row from your on-call runbook. Delete the "MongoDB — prod" dashboard. Each of these is a small pleasure in isolation, and taken together they are the operational-cost argument this book has been making since Chapter 1. The quietest dashboard is the one that has been removed.

A Migration Checklist You Can Copy

If I may, here is the card I would slip next to the place setting:

Inventory collections, query patterns, and indexes with mongosh.
Flag use of $lookup, $graphLookup, change streams, GridFS, and TTL.
Write the schema mapping document (field → destination → type → index intent).
Create target tables with jsonb data columns and generated columns for hot query paths.
Export with mongoexport --jsonArray (or mongodump for very large collections).
Write the normalization pass: $oid, $date, $numberLong, $numberDecimal, $binary.
Load through COPY ... FROM STDIN; keep the 24-char _id as legacy_id, mint fresh UUIDs.
Create expression indexes matching the MongoDB indexes from step 1.
Rewrite application calls against Gold Lapel — filters port unchanged.
Enable dual-write; MongoDB remains the source of truth.
Run validation: row counts, windowed hashes, aggregation parity.
Flip reads to Gold Lapel behind a feature flag; watch p50/p95/p99 and error rate.
Stop writing to MongoDB; leave it running read-only for the rollback window.
Take a final mongodump --archive --gzip; stop the instance; remove the driver.

Screenshot it, paste it into the migration ticket, check the boxes in order. A checklist is not glamorous, and it is not meant to be. It is meant to be followed.

Migration is the shortest chapter in this book and the longest one teams remember. Most of what you have just read is process; the interesting technical moves — the schema decisions in Chapter 16, the pipeline mechanics in Chapter 12, the index design in Chapter 11 — were attended to elsewhere. What this chapter offers you is the ordering that makes those decisions deployable without a ruined weekend.

Chapter 18 turns from migrating onto PostgreSQL to scaling on it — the five-rung ladder that begins at a single well-tuned node and climbs, one measured step at a time, through pooling, read replicas, declarative partitioning, Citus, and the Gold Lapel mesh for multi-region cache coherence. Most teams never climb past the second rung; knowing the full ladder is what lets you stay on the rung your workload actually requires rather than the one last week's architecture blog convinced you of. If you'll follow me, I shall put the kettle on.