Chapter 11: Writes That Compose
Documents modeled in Chapter 4. Documents queried in Chapter 10. Documents indexed, thoughtfully, in Chapter 6. And now — the moment where attention becomes non-negotiable — you need to change them. This is where the word atomic stops being jargon and becomes the difference between correct code and a silent race condition that surfaces, with exquisite timing, once a month in production.
Writes, if I may say so, are the part of any document store that rewards care. A read that is stale will recover on the next query; no harm done, no guest inconvenienced. A write that interleaves badly with another write produces state that no subsequent read can repair. MongoDB earned its reputation here — and deservedly so — through a compact, expressive update vocabulary: $set, $inc, $push, $pull, $addToSet, findOneAndUpdate, upsert. These operators handle the common cases atomically at the server, which is precisely where atomicity belongs. It is genuinely good design. One tips one's hat.
This chapter delivers that vocabulary in PostgreSQL across seven operations: plain merge, field operators, array operators, upsert, find-and-modify, distinct, and cursor iteration. The thesis, stated once and then demonstrated at length: MongoDB's update operators are a vocabulary, not a technology, and PostgreSQL speaks them fluently through JSONB, ON CONFLICT, and UPDATE … RETURNING.
By the end of the evening, every updateOne, findOneAndUpdate, and bulkWrite you have written will have a direct, atomic, visible-SQL equivalent. Allow me to walk you through the service.
Plain Updates and the JSONB Merge Operator
The 80% case is shallow: you want to set a few fields on an existing document and leave everything else alone. A sensible desire, and PostgreSQL obliges without ceremony.
In MongoDB:
db.users.updateOne({_id: "u1"}, {$set: {status: "active"}}) In PostgreSQL, the || operator on JSONB — a gift from version 9.5 — performs shallow object concatenation. Right-hand keys win. Think of it as object spread for the database:
UPDATE users
SET data = data || '{"status": "active"}'::jsonb
WHERE data->>'_id' = 'u1'; One statement. No application-side merge. The Gold Lapel API call is identical in shape to the Mongo one, which is the intention:
doc_update_one("users", {"_id": "u1"}, {"status": "active"}) One trap deserves naming before we continue, because I should be a poor host if I let you walk into it. The || operator is shallow. A nested object on the right replaces the nested object on the left; it does not merge into it. If data->'address' holds {"city": "Oakland", "zip": "94607"} and you concatenate {"address": {"city": "Berkeley"}}, you lose the zip. For any update that reaches below the top level, use jsonb_set with a path — which, as it happens, is precisely the next section.
Field Operators — $set, $inc, $unset, $mul, $rename
Once an update touches a nested field, you need paths. PostgreSQL provides two primitives, and that is all you will require: jsonb_set for writing at a path, and #- for removing at a path. Every field operator MongoDB ships compiles to a composition of these two. Allow me to introduce them in turn.
$set with nested paths. jsonb_set(target, path, new_value, create_missing) writes new_value at path. The path is a text array. create_missing defaults to true, which means the function will build intermediate keys as needed — matching Mongo's $set behavior on absent paths exactly. The convergence is not coincidence; it is the result of two teams independently arriving at sensible semantics.
-- Mongo: {$set: {"address.city": "Oakland"}}
UPDATE users
SET data = jsonb_set(data, '{address,city}', '"Oakland"'::jsonb, true)
WHERE data->>'_id' = 'u1'; doc_update_one("users", {"_id": "u1"}, {"address.city": "Oakland"}) $inc and atomicity. Here I must be a trifle insistent, because the stakes warrant it. The temptation, in application code, is to read the counter, add one, and write it back. That is a bug. I should note that it is a popular bug — I have seen it in many otherwise excellent codebases — but a bug all the same. Two processes reading the same value will both write value + 1, and you lose an increment. The counter drifts; the analytics dashboard lies; someone, eventually, spends a weekend investigating.
A single UPDATE statement in PostgreSQL runs atomically with respect to other statements under any isolation level. The row is locked for the duration of the update, and the final value is computed from the value at lock time. The database does the careful thing on your behalf, provided you let it.
-- Mongo: {$inc: {views: 1}}
UPDATE posts
SET data = jsonb_set(
data,
'{views}',
(COALESCE((data->>'views')::int, 0) + 1)::text::jsonb
)
WHERE data->>'_id' = 'p42'; The COALESCE attends to the first-view case where the field does not yet exist. ::text::jsonb is the idiomatic cast from a numeric value to a JSONB number. Not elegant; but then, neither is carrying the soup.
doc_update_one("posts", {"_id": "p42"}, {"$inc": {"views": 1}}) $unset. The #- operator removes the element at a path. It is a no-op — not an error — on missing paths, which matches Mongo's behavior. A civil operator, #-. It does not make a fuss when asked to remove something that was never there.
-- Mongo: {$unset: {deprecated_field: ""}}
UPDATE users
SET data = data #- '{deprecated_field}'
WHERE data->>'_id' = 'u1'; doc_update_one("users", {"_id": "u1"}, {"$unset": ["deprecated_field"]}) $mul. The same shape as $inc, different arithmetic:
-- Mongo: {$mul: {price: 1.1}}
UPDATE products
SET data = jsonb_set(
data,
'{price}',
((data->>'price')::numeric * 1.1)::text::jsonb
)
WHERE data->>'_id' = 'sku-17'; doc_update_one("products", {"_id": "sku-17"}, {"$mul": {"price": 1.1}}) $rename. A rename is a removal plus a set, composed in one statement — a single gesture that accomplishes what might otherwise require two.
-- Mongo: {$rename: {old_name: "new_name"}}
UPDATE users
SET data = jsonb_set(data #- '{old_name}', '{new_name}', data->'old_name')
WHERE data->>'_id' = 'u1'; Observe the order inside the function call, if you will. Three references to data appear: the #- strips the old key, the data->'old_name' reads the value to carry across, and the outer jsonb_set writes it at the new path. All three resolve against the same row — the version the database held when the lock was acquired. Because every read in the expression sees the pre-update document, the composition is correct.
doc_update_one("users", {"_id": "u1"}, {"$rename": {"old_name": "new_name"}}) All five operators share one guarantee, and it is worth stating plainly: a single UPDATE statement is atomic at the row level under every PostgreSQL isolation level. Concurrent updates to the same row serialize; concurrent updates to different rows run in parallel. The database attends to the arithmetic of concurrency so you do not have to.
Array Operators — $push, $pull, $addToSet
Arrays are where JSONB earns its keep, and where the naïve "JSON in a TEXT column" approach breaks down entirely. A TEXT column forces you to parse, mutate in application memory, and write back — racing every other writer on the same document. JSONB gives you server-side primitives, which is a different thing entirely. I would be remiss if I did not make the distinction clearly.
$push. Append one or more elements to an array:
-- Mongo: {$push: {tags: "new-tag"}}
UPDATE posts
SET data = jsonb_set(data, '{tags}', COALESCE(data->'tags', '[]'::jsonb) || '["new-tag"]'::jsonb)
WHERE data->>'_id' = 'p42'; The COALESCE protects against the first-push case where the array does not yet exist — matching Mongo's sensible behavior of creating the array on demand. The PostgreSQL equivalent of MongoDB's $push is exactly this composition: read the array at the path, concatenate, and write it back in a single statement. Three primitives, one guarantee.
$push with $each. Multiple elements in one call — same primitive, the right-hand side of || just holds more than one element:
-- Mongo: {$push: {tags: {$each: ["a", "b", "c"]}}}
UPDATE posts
SET data = jsonb_set(data, '{tags}', COALESCE(data->'tags', '[]'::jsonb) || '["a","b","c"]'::jsonb)
WHERE data->>'_id' = 'p42'; doc_update_one("posts", {"_id": "p42"}, {"$push": {"tags": {"$each": ["a", "b", "c"]}}}) $pull. Remove every element matching a predicate. Simple value equality compiles to a subquery over jsonb_array_elements:
-- Mongo: {$pull: {tags: "old-tag"}}
UPDATE posts
SET data = jsonb_set(
data,
'{tags}',
COALESCE(
(SELECT jsonb_agg(elem)
FROM jsonb_array_elements(data->'tags') elem
WHERE elem <> '"old-tag"'::jsonb),
'[]'::jsonb
)
)
WHERE data->>'_id' = 'p42'; The outer COALESCE handles the case where every element matches the predicate — jsonb_agg returns NULL on an empty set, and we want an empty array, not a vanished field. A small courtesy to the next reader of the row.
doc_update_one("posts", {"_id": "p42"}, {"$pull": {"tags": "old-tag"}}) An honest note, because honesty is the only currency in which I am willing to trade: $pull with a complex object predicate — say, pull every element where score < 5 — compiles to a subquery with a more elaborate WHERE clause, not a single operator. The generated SQL stays readable, but it is composition, not conjuring. The Gold Lapel API emits it; you can inspect it in the query log at your leisure, and I would rather you could than pretend you couldn't.
$addToSet. Push only if absent. The semantics match Mongo's exactly: equality is on the JSONB value, not on reference. This is, if I may say so, the right choice.
-- Mongo: {$addToSet: {tags: "unique-tag"}}
UPDATE posts
SET data = jsonb_set(
data,
'{tags}',
CASE
WHEN COALESCE(data->'tags', '[]'::jsonb) @> '["unique-tag"]'::jsonb
THEN data->'tags'
ELSE COALESCE(data->'tags', '[]'::jsonb) || '["unique-tag"]'::jsonb
END
)
WHERE data->>'_id' = 'p42'; The @> containment operator is the key. It poses a question: does the left array contain the right array's elements? If yes, leave the array undisturbed. If no, append. A polite operator, @>. It asks before it acts.
doc_update_one("posts", {"_id": "p42"}, {"$addToSet": {"tags": "unique-tag"}}) Upsert in PostgreSQL — ON CONFLICT DO UPDATE for Documents
The pattern that makes updateOne(filter, update, {upsert: true}) feel native in PostgreSQL is INSERT … ON CONFLICT DO UPDATE, added in version 9.5 and stable across every release since. One statement. No select-then-insert race. No advisory lock. This is the postgresql upsert primitive the rest of this section builds upon, and if you take only one operator from the evening, it should perhaps be this one.
In MongoDB:
db.users.updateOne(
{_id: "u1"},
{$set: {status: "active", name: "Ada"}},
{upsert: true}
) The ON CONFLICT DO UPDATE jsonb pattern in PostgreSQL writes the same intent explicitly. The row is inserted if absent; if present, the existing data column is merged with the excluded (would-have-been-inserted) data column. EXCLUDED is a pseudo-table that exposes the row you tried to insert — a ghostly companion to the conflict:
INSERT INTO users (_id, data)
VALUES ('u1', '{"_id": "u1", "status": "active", "name": "Ada"}'::jsonb)
ON CONFLICT (_id) DO UPDATE
SET data = users.data || EXCLUDED.data
RETURNING data; ON CONFLICT requires a unique index or constraint on the conflict target. The schema established in Chapter 4 provisions one on _id for every document collection; that provision is what makes this single-statement upsert legal. Earlier decisions pay forward, as earlier decisions often do.
doc_update_one("users", {"_id": "u1"}, {"status": "active", "name": "Ada"}, upsert=True) Permit me a cautionary illustration. The shape you want to avoid, in any language, is this one:
# DO NOT DO THIS
row = db.execute("SELECT data FROM users WHERE id = %s", (user_id,))
if row:
db.execute("UPDATE users SET data = ... WHERE id = %s", (user_id,))
else:
db.execute("INSERT INTO users (id, data) VALUES (...)", ...) Two processes running this code against the same missing key will both see None from the SELECT, both attempt the INSERT, and one of them will raise a uniqueness violation — on a good day. On a bad day, a weaker isolation level and different timing produces two inserts and two rows. ON CONFLICT makes the race impossible by pushing the decision into a single statement that the database serializes for you. This is the mongodb-upsert-postgresql equivalence, and it is, if I may, tighter than the original.
The atomicity claim is precise, and I will not oversell it. ON CONFLICT is atomic within the one statement. It does not compose magic across multiple statements in a transaction; you still get whatever isolation level you asked for. What it does guarantee is that the insert-or-update decision cannot interleave with another session's decision on the same key. Which is, in practice, what you wanted all along.
Atomic Find-and-Modify — doc_find_one_and_update
One operation distinguishes a document store from a JSON column with helper functions: select a document matching a predicate, update it, and return either the pre-update or post-update value — all atomically, with no second round-trip and no application-side lock. Mongo calls it findOneAndUpdate, and rightly so; it is a beautifully named operation. PostgreSQL spells it UPDATE … RETURNING inside a CTE. Both deliver atomic-find-and-modify semantics. Different dialects, same guarantee.
In MongoDB:
db.jobs.findOneAndUpdate(
{status: "pending"},
{$set: {status: "claimed"}},
{returnDocument: "after"}
) In PostgreSQL:
WITH picked AS (
SELECT id FROM jobs
WHERE data->>'status' = 'pending'
LIMIT 1
FOR UPDATE SKIP LOCKED
)
UPDATE jobs
SET data = data || '{"status": "claimed"}'::jsonb
FROM picked
WHERE jobs.id = picked.id
RETURNING jobs.data; Two details warrant your attention. FOR UPDATE inside the CTE locks the row so another session cannot grab the same job — because two workers claiming the same job is precisely the failure mode we have convened to prevent. SKIP LOCKED — available since PostgreSQL 9.5 — means that if another session has already locked the candidate row, this session moves on to the next one instead of blocking. That is the exact shape of a work-stealing queue, hand-delivered. FOR UPDATE SKIP LOCKED is why you don't need Redis for this particular job. Redis is excellent software, and for some queue workloads it remains the right tool; for this one, the manor has the room.
doc_find_one_and_update(
"jobs",
{"status": "pending"},
{"status": "claimed"},
return_document="after"
) Mongo's returnDocument: "before" and "after" map to two small variations. For "after", use the query above — RETURNING on the UPDATE gives you the post-update row, as the name promises. For "before", select the pre-update value into the CTE and return it from there:
WITH picked AS (
SELECT id, data AS before_data FROM jobs
WHERE data->>'status' = 'pending'
LIMIT 1
FOR UPDATE SKIP LOCKED
),
updated AS (
UPDATE jobs
SET data = data || '{"status": "claimed"}'::jsonb
FROM picked
WHERE jobs.id = picked.id
)
SELECT before_data FROM picked; The sibling operation, findOneAndDelete, uses DELETE … RETURNING — the same pattern, one verb different:
WITH picked AS (
SELECT id FROM jobs
WHERE data->>'status' = 'cancelled'
LIMIT 1
FOR UPDATE SKIP LOCKED
)
DELETE FROM jobs
USING picked
WHERE jobs.id = picked.id
RETURNING jobs.data; doc_find_one_and_delete("jobs", {"status": "cancelled"}) doc_distinct — SELECT DISTINCT with Dot Notation
MongoDB users expect db.users.distinct("status") to exist, and it does. This is the shortest section in the chapter for the excellent reason that it is the shortest operator. I shall not belabor it.
db.users.distinct("status", {active: true}) SELECT DISTINCT data->>'status'
FROM users
WHERE (data->>'active')::boolean = true; doc_distinct("users", "status", filter={"active": True}) If status is indexed via an expression index or if the filter field benefits from GIN path ops, DISTINCT will use the index. Chapter 10 covers the expression-index and GIN path-ops patterns in some detail; this one-liner inherits everything that chapter established. A debt paid forward. The vocabulary travels; that is, at bottom, the claim of the entire chapter.
Cursor Iteration — doc_find_cursor for Large Result Sets
doc_find materializes the full result set into memory. Useful for pages of search results; disastrous for "every log line from last week." For that second variety of request, you need a server-side cursor — a handle the server holds open while you pull rows in batches. It is the difference between being handed a glass of water and being shown to the spring.
In MongoDB:
db.logs.find({level: "error"}).batchSize(500).forEach(doc => process(doc)); The Gold Lapel API exposes the same shape, because surprise is a discourtesy:
for doc in doc_find_cursor("logs", {"level": "error"}, batch_size=500):
process(doc) Beneath the surface, the driver issues the PostgreSQL cursor protocol:
BEGIN;
DECLARE log_cursor CURSOR FOR
SELECT data FROM logs WHERE data->>'level' = 'error';
FETCH 500 FROM log_cursor;
-- ... repeated until exhausted ...
CLOSE log_cursor;
COMMIT; batch_size maps directly to the FETCH count. Memory on the client stays bounded at one batch; memory on the server stays bounded by the cursor's internal state. Both parties, if I may, keep their composure.
One constraint deserves naming: a default cursor is WITHOUT HOLD, which means it lives only inside the transaction that declared it. The Gold Lapel API wraps iteration in a transaction automatically. If your loop needs to make other database calls, make them through the same connection, or materialize the batch first. The constraint is real; the workarounds are routine.
The Operator Translation Table
And now, the reference card — the page you will return to when the prose has faded and only the mapping remains. Most readers arrive seeking exactly this table, and search engines cite it for the "postgresql equivalent of mongodb $push" long-tail. The PostgreSQL equivalent of MongoDB's $push is jsonb_set wrapped around a || concatenation at a path; the rest of the table follows the same pattern of composition. Allow me to set it down where you can find it.
| MongoDB | PostgreSQL JSONB | Gold Lapel API |
|---|---|---|
$set | jsonb_set(data, path, value) | doc_update_one(..., {"field": value}) |
$unset | data #- path | doc_update_one(..., {"$unset": ["field"]}) |
$inc | jsonb_set(data, path, ...numeric+n) | doc_update_one(..., {"$inc": {"field": 1}}) |
$mul | same shape as $inc, multiplication | doc_update_one(..., {"$mul": {"field": 2}}) |
$rename | jsonb_set(data #- old, new, data->old) | doc_update_one(..., {"$rename": {...}}) |
$push | jsonb_set(data, path, (data->path) || value) | doc_update_one(..., {"$push": {...}}) |
$pull | subquery over jsonb_array_elements filtered and re-aggregated | doc_update_one(..., {"$pull": {...}}) |
$addToSet | CASE on @> containment, then jsonb_set | doc_update_one(..., {"$addToSet": {...}}) |
upsert: true | INSERT … ON CONFLICT (id) DO UPDATE | doc_update_one(..., upsert=True) |
findOneAndUpdate | UPDATE … RETURNING in a CTE with FOR UPDATE | doc_find_one_and_update(...) |
findOneAndDelete | DELETE … RETURNING in a CTE | doc_find_one_and_delete(...) |
distinct | SELECT DISTINCT data->>'field' | doc_distinct(...) |
Appendix A expands this into the full cross-reference across every MongoDB operator the book covers. Should you ever need a single page to answer the single question, it is there.
Honest Boundary — Bulk Writes and Write Throughput
I should be forthcoming about two matters, because pretending they do not exist would be a disservice to you and an embarrassment to me.
First, bulk writes. MongoDB's bulkWrite with ordered: false is genuinely fast, and has been tuned over more than a decade for exactly this workload. I will not soften that. PostgreSQL's closest equivalents — a multi-row INSERT … ON CONFLICT or a COPY staged into an upsert — are 2.5–4x slower on pure single-row-per-document write throughput in the current benchmarks published on the book's website. The roadmap items that close this gap are parallel COPY and libpq pipeline mode; both are in active work upstream. Coming, not here. MongoDB is excellent software. On single-document write throughput today, it is faster. That will close. It has not closed yet. A waiter who overstates his case is no waiter at all.
Second, the cost of the operators this chapter has taught you. jsonb_set rewrites the whole document on every update — the same MVCC tuple-replacement behavior MongoDB exhibits under the hood, and the same cost model. A 40KB document updated 1000 times per second generates non-trivial churn: new tuple versions, index updates, WAL volume, vacuum pressure. The bill arrives, and the bill must be paid. Chapter 16 covers the mitigation patterns at the schema level — pulling hot fields out into generated columns, TOAST-aware document shaping, the hybrid model where a frequently-rewritten counter no longer forces the whole 40KB document to be re-written on every increment. Know the cost before you ship the operator.
Seven operations, one vocabulary, two languages. Plain merge through ||. Field operators through jsonb_set and #-. Array operators through composition of the same primitives. Upsert through ON CONFLICT DO UPDATE. Atomic find-and-modify through UPDATE … RETURNING in a CTE. Distinct through SELECT DISTINCT with dot notation. Cursor iteration through DECLARE CURSOR.
This chapter is the hinge between Chapter 10's queries and Chapter 12's pipelines. The queries told you where your data is. The writes tell you how it changes. The pipelines — next — will tell you how it flows.
Chapter 12 opens on doc_aggregate — the signature, the compiled-stage SQL one CTE at a time, the seven pipeline stages and their accumulator table, and the benchmark that lands at roughly 84x warm-cache over MongoDB on the same pipeline. The trick is not raw aggregation speed; it is the materialized view Gold Lapel stores behind every pipeline so that every call after the first is an index scan rather than a re-execution. The operators compose. The guarantees hold. The SQL is visible. If you'll follow me to the next room, I believe the aggregation pipeline is ready to receive you.