Appendix C: MongoDB Migration Checklist
The card to slip next to the place setting. Chapter 17 is the narrative; this is the checklist. Print it, paste it into the migration ticket, and tick the boxes in order. A migration that follows this list to the letter is a boring flip on a Tuesday afternoon — which is the correct sound a migration should make.
1. Audit — Before You Move a Byte
- List every MongoDB collection, with row count and approximate size (
db.getCollectionNames().forEach(name => print(name, db[name].stats().count, db[name].stats().size))). - Capture the indexes on each collection (
db.<col>.getIndexes()). - Capture the top three query shapes per collection, with
.explain("executionStats")output for each. - Flag any use of
$lookup,$graphLookup, change streams, GridFS, TTL indexes, capped collections, or transactions. - Write the per-collection inventory: name, count, indexes, top three query shapes, special features used.
If the audit turns up none of $lookup, $graphLookup, change streams, GridFS, or TTL, you are on the easy path. Most teams are.
2. Schema Mapping — The Document Before the Script
- For each collection, decide: pure JSONB, pure columns, or hybrid (Chapter 16's framework).
- For each field in the hybrid case, choose: stays in
data jsonb, promoted to a typed column, or extracted as a generated column. - Write the field-to-destination table per collection (the example template is in Chapter 17).
- Decide on
_idstrategy: mint fresh UUIDs, keep MongoDB's 24-char hex aslegacy_id textwith a unique index. - Confirm the column type is
jsonb, neverjson. (jsonbis binary, indexable, deduplicating;jsonis a text blob.) - Write the target DDL for every collection. Review with at least one other engineer.
3. Target Schema — Create the Tables
CREATE TABLEfor each collection, with_id uuid PRIMARY KEY DEFAULT gen_random_uuid(),data jsonb NOT NULL,created_at timestamptz NOT NULL DEFAULT now(), plus any typed columns and generated columns from step 2.- Use
STOREDfor generated columns you intend to index (works on every supported PostgreSQL release). - Add
legacy_id text NOT NULLplus a unique index where you are preserving MongoDB's_id. - Confirm the schema with
\d <table>inpsql.
4. Data Export — Out of MongoDB
- For collections under a few million rows, use
mongoexport --jsonArray --uri=... --collection=<col> --out=<col>.json. - For larger collections, use
mongodump(BSON output, faster, more compact). - Note the Extended JSON v2 wrappers your data contains:
$oid,$date,$numberLong,$numberDecimal,$binary. (Relaxed mode — the default — only wraps what JSON cannot represent.) - Verify the export by counting lines (NDJSON) or array length and comparing to
db.<col>.countDocuments().
5. Normalization Pass — Extended JSON to PostgreSQL Types
- Write the normalization function: unwrap
$oidintolegacy_idand mint a fresh UUID for_id; pass$datethrough as ISO-8601 (PostgreSQL parses it natively intotimestamptz); unwrap$numberLongand$numberDecimalinto their string forms (cast tobigint/numericon load); decode$binarybase64 intobytea. - Test the normalization on a sample of 100 documents per collection before running the full load.
- Confirm that nested
$dateand$oidvalues inside arrays are also unwrapped — the wrappers can occur at any depth.
6. Data Load — Into PostgreSQL
- Use
COPY ... FROM STDINfor the load (the fastest path; accepts raw JSON forjsonbcolumns directly). - Stream NDJSON through the loader; one
COPYper collection per worker. - On a laptop against a local PostgreSQL, expect 50,000–100,000 documents per second for small documents.
- After the load, verify counts:
SELECT count(*) FROM <table>should equaldb.<col>.countDocuments()exactly. - Run
ANALYZE <table>so the planner has fresh statistics before the next steps.
7. Index Migration — Match Every MongoDB Index
For each MongoDB index from step 1, create the PostgreSQL equivalent:
- Single-field:
CREATE INDEX ON <table> ((data->>'field'))(or on the generated column). - Single-field unique:
CREATE UNIQUE INDEX ON <table> ((data->>'field')). - Compound: composite expression B-tree, column order matching the most common query shape.
- Multikey:
CREATE INDEX ON <table> USING gin (data jsonb_path_ops)(smaller, faster) orjsonb_opsif you need?,?|,?&. - Text:
CREATE INDEX ON <table> USING gin (to_tsvector('english', data->>'field')). - 2dsphere: PostGIS GiST on a
geographygenerated column (requires thepostgisextension). - TTL: skip — use Gold Lapel's
doc_create_ttl_index, or schedulepg_cronDELETE. - Use
CREATE INDEX CONCURRENTLYon tables with traffic. - Verify each index with
\d <table>andEXPLAINon the query it serves.
Or skip steps where Gold Lapel's auto-indexing will handle the pattern from observed traffic.
8. Application Code — Driver to Gold Lapel
- Replace the MongoDB driver import with
goldlapel. - Replace
db.<col>.find({`{...}`})withgl.doc_find("<col>", {`{...}`}). - Replace
db.<col>.insertOne/insertManywithdoc_insert/doc_insert_many. - Replace
db.<col>.updateOne/updateManywithdoc_update_one/doc_update. - Replace
db.<col>.deleteOne/deleteManywithdoc_delete_one/doc_delete. - Replace
db.<col>.aggregate(...)withgl.doc_aggregate("<col>", [...]). The pipeline ports unchanged. - Replace change streams (
db.<col>.watch(...)) withgl.doc_watch("<col>", callback). - Replace
findOneAndUpdate/findOneAndDeletewith the matchingdoc_find_one_and_*. - Run the application's full test suite against the Gold Lapel build before any traffic moves.
The filter syntax, update operators, and pipeline stages port unchanged. The diff is small.
9. Dual-Write — Belt and Suspenders
- Configure the application to write to both MongoDB and PostgreSQL for a window measured in days, not hours.
- MongoDB remains the source of truth; reads still come from MongoDB.
- Wrap Gold Lapel writes in a try/catch — log failures, count them, do not fail the request.
- Build a dashboard for: row-count drift, write failure rate, p50/p95/p99 latency on
doc_*calls. - Drift is normal (clock skew,
$dateprecision differences, occasional failures). The job is drift that decreases over time and explains itself when asked.
10. Validation — Three Checks
- Row counts.
SELECT count(*) FROM <table>versusdb.<col>.countDocuments()(the deprecatedcount()is not what you want). - Windowed hash.
SELECT md5(string_agg(data::text, '' ORDER BY _id)) FROM <table> WHERE created_at > now() - interval '1 hour'versus the equivalent MongoDB aggregation. (Note:jsonbnormalizes whitespace and key order; canonicalize keys withjsonb_object_aggif you need byte-for-byte parity.) - Aggregation parity. Run the application's three most important aggregations against both databases, compare results, log any deltas.
11. Cutover — One Environment Variable
- Confirm validation has been clean for at least one full business day.
- Confirm latency on Gold Lapel reads is within budget (compare p50/p95/p99 to the MongoDB baseline).
- Flip reads to Gold Lapel via feature flag (
READ_BACKEND=goldlapel). - Watch the dashboard for the first hour. Then the first day. Then the first week.
- Keep dual-write active for at least one more window. Your rollback is now a configuration change.
12. Stop Writing to MongoDB — Quietly
- After the cutover window has held, disable writes to MongoDB.
- Leave the MongoDB instance running, read-only, for as long as the invoice is bearable. This is your rollback path.
- Continue monitoring Gold Lapel reads at p50/p95/p99 and error rate.
13. Final Archive — Cold Storage
- Take a final cold archive:
mongodump --archive=final-<date>.archive --gzip. - Store it in your cold storage tier alongside other compliance archives.
- Restore it into a scratch instance to confirm the archive is good. (A backup that has not been restored is a hope, not a backup.)
14. Decommission — One Fewer Service, One Fewer Invoice
- Stop the MongoDB server.
- Remove the MongoDB driver from the application's dependencies.
- Delete MongoDB connection strings from the secrets manager.
- Remove the MongoDB row from the on-call runbook.
- Remove the MongoDB dashboard from the monitoring system.
- Cancel the Atlas subscription (or shut down the self-hosted instance).
- One fewer credential to rotate. One fewer backup to test. One fewer 3 AM call tree.
When to Pause the Checklist
Two situations where Chapter 15's honest boundaries apply:
- Write-dominated workloads. If the audit shows sustained write throughput where the 2.5–4x gap is load-bearing to the product, the migration may be premature. Wait for write acceleration on the Gold Lapel roadmap, or migrate now with eyes open.
- Extreme horizontal sharding. If you are running fifty or more shard nodes with multi-terabyte working sets, MongoDB's native sharding is more mature than anything you will assemble from PostgreSQL plus Citus today. Run MongoDB. The shape of the problem may change later.
MongoDB is excellent software. Operational simplicity is the argument of this book, not product quality. If either boundary is yours, the right tool is in your hand already.