← You Don't Need MongoDB

Appendix C: MongoDB Migration Checklist

The Waiter of Gold Lapel · Published Apr 19, 2026 · 5 min

The card to slip next to the place setting. Chapter 17 is the narrative; this is the checklist. Print it, paste it into the migration ticket, and tick the boxes in order. A migration that follows this list to the letter is a boring flip on a Tuesday afternoon — which is the correct sound a migration should make.

1. Audit — Before You Move a Byte

  • List every MongoDB collection, with row count and approximate size (db.getCollectionNames().forEach(name => print(name, db[name].stats().count, db[name].stats().size))).
  • Capture the indexes on each collection (db.<col>.getIndexes()).
  • Capture the top three query shapes per collection, with .explain("executionStats") output for each.
  • Flag any use of $lookup, $graphLookup, change streams, GridFS, TTL indexes, capped collections, or transactions.
  • Write the per-collection inventory: name, count, indexes, top three query shapes, special features used.

If the audit turns up none of $lookup, $graphLookup, change streams, GridFS, or TTL, you are on the easy path. Most teams are.

2. Schema Mapping — The Document Before the Script

  • For each collection, decide: pure JSONB, pure columns, or hybrid (Chapter 16's framework).
  • For each field in the hybrid case, choose: stays in data jsonb, promoted to a typed column, or extracted as a generated column.
  • Write the field-to-destination table per collection (the example template is in Chapter 17).
  • Decide on _id strategy: mint fresh UUIDs, keep MongoDB's 24-char hex as legacy_id text with a unique index.
  • Confirm the column type is jsonb, never json. (jsonb is binary, indexable, deduplicating; json is a text blob.)
  • Write the target DDL for every collection. Review with at least one other engineer.

3. Target Schema — Create the Tables

  • CREATE TABLE for each collection, with _id uuid PRIMARY KEY DEFAULT gen_random_uuid(), data jsonb NOT NULL, created_at timestamptz NOT NULL DEFAULT now(), plus any typed columns and generated columns from step 2.
  • Use STORED for generated columns you intend to index (works on every supported PostgreSQL release).
  • Add legacy_id text NOT NULL plus a unique index where you are preserving MongoDB's _id.
  • Confirm the schema with \d <table> in psql.

4. Data Export — Out of MongoDB

  • For collections under a few million rows, use mongoexport --jsonArray --uri=... --collection=<col> --out=<col>.json.
  • For larger collections, use mongodump (BSON output, faster, more compact).
  • Note the Extended JSON v2 wrappers your data contains: $oid, $date, $numberLong, $numberDecimal, $binary. (Relaxed mode — the default — only wraps what JSON cannot represent.)
  • Verify the export by counting lines (NDJSON) or array length and comparing to db.<col>.countDocuments().

5. Normalization Pass — Extended JSON to PostgreSQL Types

  • Write the normalization function: unwrap $oid into legacy_id and mint a fresh UUID for _id; pass $date through as ISO-8601 (PostgreSQL parses it natively into timestamptz); unwrap $numberLong and $numberDecimal into their string forms (cast to bigint / numeric on load); decode $binary base64 into bytea.
  • Test the normalization on a sample of 100 documents per collection before running the full load.
  • Confirm that nested $date and $oid values inside arrays are also unwrapped — the wrappers can occur at any depth.

6. Data Load — Into PostgreSQL

  • Use COPY ... FROM STDIN for the load (the fastest path; accepts raw JSON for jsonb columns directly).
  • Stream NDJSON through the loader; one COPY per collection per worker.
  • On a laptop against a local PostgreSQL, expect 50,000–100,000 documents per second for small documents.
  • After the load, verify counts: SELECT count(*) FROM <table> should equal db.<col>.countDocuments() exactly.
  • Run ANALYZE <table> so the planner has fresh statistics before the next steps.

7. Index Migration — Match Every MongoDB Index

For each MongoDB index from step 1, create the PostgreSQL equivalent:

  • Single-field: CREATE INDEX ON <table> ((data->>'field')) (or on the generated column).
  • Single-field unique: CREATE UNIQUE INDEX ON <table> ((data->>'field')).
  • Compound: composite expression B-tree, column order matching the most common query shape.
  • Multikey: CREATE INDEX ON <table> USING gin (data jsonb_path_ops) (smaller, faster) or jsonb_ops if you need ?, ?|, ?&.
  • Text: CREATE INDEX ON <table> USING gin (to_tsvector('english', data->>'field')).
  • 2dsphere: PostGIS GiST on a geography generated column (requires the postgis extension).
  • TTL: skip — use Gold Lapel's doc_create_ttl_index, or schedule pg_cron DELETE.
  • Use CREATE INDEX CONCURRENTLY on tables with traffic.
  • Verify each index with \d <table> and EXPLAIN on the query it serves.

Or skip steps where Gold Lapel's auto-indexing will handle the pattern from observed traffic.

8. Application Code — Driver to Gold Lapel

  • Replace the MongoDB driver import with goldlapel.
  • Replace db.<col>.find({`{...}`}) with gl.doc_find("<col>", {`{...}`}).
  • Replace db.<col>.insertOne / insertMany with doc_insert / doc_insert_many.
  • Replace db.<col>.updateOne / updateMany with doc_update_one / doc_update.
  • Replace db.<col>.deleteOne / deleteMany with doc_delete_one / doc_delete.
  • Replace db.<col>.aggregate(...) with gl.doc_aggregate("<col>", [...]). The pipeline ports unchanged.
  • Replace change streams (db.<col>.watch(...)) with gl.doc_watch("<col>", callback).
  • Replace findOneAndUpdate / findOneAndDelete with the matching doc_find_one_and_*.
  • Run the application's full test suite against the Gold Lapel build before any traffic moves.

The filter syntax, update operators, and pipeline stages port unchanged. The diff is small.

9. Dual-Write — Belt and Suspenders

  • Configure the application to write to both MongoDB and PostgreSQL for a window measured in days, not hours.
  • MongoDB remains the source of truth; reads still come from MongoDB.
  • Wrap Gold Lapel writes in a try/catch — log failures, count them, do not fail the request.
  • Build a dashboard for: row-count drift, write failure rate, p50/p95/p99 latency on doc_* calls.
  • Drift is normal (clock skew, $date precision differences, occasional failures). The job is drift that decreases over time and explains itself when asked.

10. Validation — Three Checks

  • Row counts. SELECT count(*) FROM <table> versus db.<col>.countDocuments() (the deprecated count() is not what you want).
  • Windowed hash. SELECT md5(string_agg(data::text, '' ORDER BY _id)) FROM <table> WHERE created_at > now() - interval '1 hour' versus the equivalent MongoDB aggregation. (Note: jsonb normalizes whitespace and key order; canonicalize keys with jsonb_object_agg if you need byte-for-byte parity.)
  • Aggregation parity. Run the application's three most important aggregations against both databases, compare results, log any deltas.

11. Cutover — One Environment Variable

  • Confirm validation has been clean for at least one full business day.
  • Confirm latency on Gold Lapel reads is within budget (compare p50/p95/p99 to the MongoDB baseline).
  • Flip reads to Gold Lapel via feature flag (READ_BACKEND=goldlapel).
  • Watch the dashboard for the first hour. Then the first day. Then the first week.
  • Keep dual-write active for at least one more window. Your rollback is now a configuration change.

12. Stop Writing to MongoDB — Quietly

  • After the cutover window has held, disable writes to MongoDB.
  • Leave the MongoDB instance running, read-only, for as long as the invoice is bearable. This is your rollback path.
  • Continue monitoring Gold Lapel reads at p50/p95/p99 and error rate.

13. Final Archive — Cold Storage

  • Take a final cold archive: mongodump --archive=final-<date>.archive --gzip.
  • Store it in your cold storage tier alongside other compliance archives.
  • Restore it into a scratch instance to confirm the archive is good. (A backup that has not been restored is a hope, not a backup.)

14. Decommission — One Fewer Service, One Fewer Invoice

  • Stop the MongoDB server.
  • Remove the MongoDB driver from the application's dependencies.
  • Delete MongoDB connection strings from the secrets manager.
  • Remove the MongoDB row from the on-call runbook.
  • Remove the MongoDB dashboard from the monitoring system.
  • Cancel the Atlas subscription (or shut down the self-hosted instance).
  • One fewer credential to rotate. One fewer backup to test. One fewer 3 AM call tree.

When to Pause the Checklist

Two situations where Chapter 15's honest boundaries apply:

  • Write-dominated workloads. If the audit shows sustained write throughput where the 2.5–4x gap is load-bearing to the product, the migration may be premature. Wait for write acceleration on the Gold Lapel roadmap, or migrate now with eyes open.
  • Extreme horizontal sharding. If you are running fifty or more shard nodes with multi-terabyte working sets, MongoDB's native sharding is more mature than anything you will assemble from PostgreSQL plus Citus today. Run MongoDB. The shape of the problem may change later.

MongoDB is excellent software. Operational simplicity is the argument of this book, not product quality. If either boundary is yours, the right tool is in your hand already.