Chapter 4: JSONB: Storage With Opinions

The Waiter of Gold Lapel · Published Apr 19, 2026 · 11 min

I mentioned in the first chapter that the capability was never missing — only the introduction. This chapter is where the introduction begins in earnest, and I confess a certain eagerness to make it. I have spent two chapters discussing what the second database costs you. I would very much like to start showing you what the first one offers.

Every document database has opinions about how documents should be stored. MongoDB chose BSON — a binary JSON variant with extended types, designed with care and real insight into how applications produce and consume data. I respect that choice. It was the right engineering decision for the product MongoDB set out to build. PostgreSQL chose differently. JSONB is binary JSON with sorted keys, an offset table for direct field access, and the deep index integration that comes from being woven into the database engine itself rather than presented alongside it.

This is the foundation chapter. Everything that follows in this book — queries, indexes, aggregation, the Gold Lapel API — rests on what we cover here. I will keep the theory honest and the examples generous. You have my word that nothing in this chapter exists to impress. It exists to equip.

The Collection Schema

Every document collection in this book uses the same table shape:

Collection schema

CREATE TABLE users (
  _id UUID DEFAULT gen_random_uuid(),
  data JSONB NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now()
);

Three columns. Allow me to explain why each one exists and why the shape is what it is.

_id UUID serves the same role as MongoDB's _id, with a structural difference worth understanding. In MongoDB, _id lives inside the document — it is a field within the BSON object. Here, _id lives outside the document as a first-class PostgreSQL column. This means it is B-tree indexed by default (as a primary key), it participates in JOINs without JSONB extraction, and it can serve as a foreign key reference from other tables. Your documents are relational citizens — they have identifiers the rest of the database can reference natively.

gen_random_uuid() generates the identifier automatically. It has been native to PostgreSQL since version 13 — no extension required. If your application needs to supply its own identifiers, it can. If it does not, the database handles it.

data JSONB holds the complete document. Any structure, any depth, any combination of strings, numbers, booleans, nulls, arrays, and nested objects. One column for the entire document, just as MongoDB stores the entire document in one record.

created_at TIMESTAMPTZ provides a timestamp the application does not need to manage. It enables ordering by insertion time, time-based partitioning for growing collections, and TTL expiry for documents that should not live forever — all without requiring the document itself to carry a timestamp field.

When using Gold Lapel, the first doc_insert creates this table and its GIN index automatically. No CREATE TABLE required. But the schema underneath is always this shape, and I teach it here because understanding what lies beneath the tools you use is not optional knowledge. It is the knowledge that matters most when something goes wrong at an hour you would rather be sleeping.

A MongoDB document like this:

MongoDB document

{ "_id": ObjectId("..."), "name": "Alice", "age": 30, "status": "active" }

Becomes this in PostgreSQL:

PostgreSQL row

_id: 'a1b2c3d4-...'
data: {"name": "Alice", "age": 30, "status": "active"}
created_at: '2026-04-14T10:30:00Z'

The document contents are identical. The structural difference — _id and created_at as dedicated columns — is what gives you relational capabilities on document data. If you are arriving from MongoDB, your documents do not change shape. They gain a home that offers them more.

How JSONB Storage Works

Allow me, for a moment, to open the kitchen door. I will not keep you there long — but understanding why JSONB is fast will change how you use it for the rest of this book.

PostgreSQL offers two JSON storage types, and the difference between them is the difference between a filing cabinet and a card catalog.

JSON (the text type) stores documents as raw text. The exact characters you insert — whitespace, key order, duplicate keys — are preserved. Every time you access a field, the database parses the text from the beginning, character by character, until it finds what you asked for. A 50KB document costs 50KB of parsing on every field access. JSON cannot be indexed with GIN. It cannot use the containment operator. It is, in practical terms, a text column that validates curly braces.

JSONB (the binary type) parses the document once, on write. The keys are sorted alphabetically and stored with an offset table at the front of the binary structure. Whitespace is discarded. Key order is normalized. Duplicate keys are resolved (the last value wins). The result is a binary representation that the database can navigate without re-parsing.

When you read a field from a JSONB document, the database performs a binary search on the sorted keys and jumps directly to the value using the offset table. No scanning. No parsing. The cost of reading one field from a 50KB document is effectively the same as reading one field from a 500-byte document. This is not an optimization applied to an existing format. It is a format designed from the start for retrieval.

The practical consequences cascade through everything this book teaches. GIN indexing works because the binary format decomposes cleanly into indexable entries. Containment queries work because the sorted structure enables efficient structural comparison. jsonpath evaluation works because the database can navigate the document tree without reading it sequentially. Every feature you will use in the coming chapters traces back to the decision to parse once and store in binary. Good engineering decisions have this quality — they pay dividends long after the engineer who made them has moved on to other work.

When to use JSON (text) instead of JSONB: when you must preserve exact whitespace or key ordering. When you are storing documents you will never query by field. In my experience, across the range of applications this book addresses, the answer is almost always JSONB.

Now — let me show you how to speak to it.

The Operators

JSONB has eleven operators in total. That number sounds formidable until I tell you that four of them handle the vast majority of daily work. I will introduce those four properly, then show you the full family, and trust you to find the reference table in Appendix D when occasion requires it.

Extraction pulls values out of documents. This is the most basic operation — reaching into a document and retrieving something.

Extraction operators

-- Get a nested object as JSONB (for chaining into deeper levels)
SELECT data->'address' FROM users;
-- Result: {"city": "San Francisco", "state": "CA"}

-- Get a value as TEXT (for comparison, display, and casting)
SELECT data->>'email' FROM users;
-- Result: alice@example.com

-- Get a deeply nested value as TEXT via path array
SELECT data #>> '{address,city}' FROM users;
-- Result: San Francisco

The distinction between -> and ->> is the single most important thing in this section. -> returns JSONB — suitable for chaining into deeper levels or passing to other JSONB operators. ->> returns TEXT — suitable for comparison, display, WHERE clauses, and casting to other types. Every type mismatch bug in JSONB queries traces back to using one when the other was needed. You will make this mistake once. I mention it so that when you do, you will recognize it immediately.

Containment tests whether a document matches a structural pattern:

Containment and key existence

-- Does this user have status "active"?
WHERE data @> '{"status": "active"}'

-- Does this document contain an "email" key?
WHERE data ? 'email'

-- Does this document contain ALL of these keys?
WHERE data ?& array['email', 'name', 'age']

-- Does this document contain ANY of these keys?
WHERE data ?| array['phone', 'mobile', 'fax']

jsonpath handles complex conditions — array filtering, nested comparisons, conditional existence:

jsonpath operators

-- Does any element in the tags array equal "python"?
WHERE data @? '$.tags[*] ? (@ == "python")'

-- Does the jsonpath expression evaluate to true?
WHERE data @@ '$.age > 25'

Your daily toolkit: ->> for extraction, @> for containment, ? for key existence, @? for jsonpath. These four will carry you through most of what this book teaches. The remaining operators — <@ (contained by), ?| (any key), ?& (all keys), @@ (jsonpath match), #> and #>> (path access) — are specialists. They earn their keep when called upon, but they do not demand your attention until then. A good toolkit is not the one with the most tools. It is the one where you always know which tool to reach for.

With the vocabulary established, let me turn to the one operator that deserves a conversation of its own.

The Containment Operator

I give @> its own section because it has earned one. It is the mechanism behind doc_find. It is what GIN indexes accelerate. It is the pattern Gold Lapel's proxy watches for when deciding which indexes your collection needs. If you invest in understanding one operator deeply, invest here. The return on that investment will be felt in every chapter that follows.

Containment asks a simple question: does the document on the left contain the structure on the right?

Containment patterns

-- Single field equality
WHERE data @> '{"status": "active"}'

-- Multiple fields (implicit AND — all must be present and match)
WHERE data @> '{"status": "active", "role": "admin"}'

-- Nested structure, any depth
WHERE data @> '{"address": {"city": "San Francisco"}}'

-- Array containment — does the tags array include "featured"?
WHERE data @> '{"tags": ["featured"]}'

With one GIN index on the column, every one of these queries is indexed. Any field. Any depth. One index. The entire document. No per-field declarations. No createIndex() calls each time a new query pattern appears in your application. You create the index once. It covers queries that have not been written yet. There is something quietly powerful about an index that anticipates rather than reacts.

Two behaviors deserve particular attention because they represent places where PostgreSQL is actually stricter than MongoDB — and in a database, strictness is not a limitation. It is a promise.

Containment is type-aware. data @> '{"age": 30}' matches the number 30 but will not match the string "30". These are different values with different types, and PostgreSQL does not pretend otherwise. MongoDB's query engine performs type coercion in some contexts — {age: "30"} can match a query for {age: 30} depending on the operator — occasionally producing matches the developer did not intend. PostgreSQL compares types exactly. A number is a number. A string is a string. The database does not improvise.

Type-aware containment

-- Matches: the document contains age as the number 30
'{"age": 30}'::jsonb @> '{"age": 30}'     -- true

-- Does NOT match: the document contains age as the string "30"
'{"age": "30"}'::jsonb @> '{"age": 30}'   -- false

Containment distinguishes missing keys from null values. data @> '{"key": null}' matches documents where the key exists with a JSON null value. It does not match documents where the key is absent entirely. A key set to null is a deliberate statement — "this field exists and its value is nothing." A missing key is the absence of a statement entirely. Most databases blur this line. PostgreSQL draws it with precision.

Null vs missing

-- Key exists with null value: matches
'{"key": null}'::jsonb @> '{"key": null}'  -- true

-- Key missing entirely: does not match
'{}'::jsonb @> '{"key": null}'             -- false

What containment cannot do: range comparisons (greater than, less than), regex matching, negation. For those, you need extraction with casting or jsonpath, both of which Chapter 5 covers thoroughly. But equality and structural matching account for the majority of document queries in most applications. Start with containment. Reach for other tools when containment genuinely cannot express the question. You will reach less often than you expect.

Containment is the foundation of how you will speak to your documents. But speaking to them requires one more understanding — the way JSONB handles types. It is, I should warn you, the one area where arriving from MongoDB requires a brief adjustment. Allow me to make that adjustment as painless as possible.

Type Handling

If there is one section in this chapter where I would ask for your patience and your full attention, it is this one. Type handling is where developers arriving from MongoDB encounter their first real friction with JSONB — and where, once the patterns are learned, the friction disappears permanently. Every debugging session I can save you here is one you will not have at 2 AM.

MongoDB's drivers handle type conversion transparently. You pass a number, the driver stores a number, you query with a number, and the comparison works. The driver mediates between your application's types and the database's types without asking you to think about it. This is genuinely convenient, and I mention it without criticism.

In PostgreSQL, the ->> operator returns text regardless of the stored type, and casting is your responsibility. This is the friction point, and it is worth understanding clearly:

Type-aware comparisons

-- WRONG: this is a text comparison, not numeric
WHERE data->>'age' > '25'
-- "9" > "25" is true alphabetically. This query returns users with age 9.

-- RIGHT: cast to numeric, then compare
WHERE (data->>'age')::numeric > 25
-- Now the comparison is numeric. Age 9 is excluded.

-- ALSO RIGHT: containment is type-aware, no casting needed
WHERE data @> '{"age": 30}'
-- Matches the number 30 exactly. GIN-indexed.

The pattern to internalize — and it will serve you through every chapter in this book: containment for equality (type-aware, GIN-indexed, the default choice), extraction with casting for ranges (greater than, less than, between), jsonpath for complex conditions (array filtering, nested logic). Three approaches, each with a clear purpose. The discipline is knowing which to reach for, and the good news is that the choice is almost always obvious from the shape of the question. When someone asks you "is this equal to that," you use containment. When they ask "is this greater than that," you extract and cast. When they ask "does any element in this array satisfy these conditions," you use jsonpath. The question tells you the tool.

Let me walk through the type-specific behaviors that will save you the most time.

Timestamps. JSONB has no timestamp type — timestamps stored in documents are JSON strings. Range queries need explicit casting:

Timestamp queries

-- Filter events after a date
WHERE (data->>'created')::timestamptz > '2024-01-01'

-- Sort by a timestamp field
ORDER BY (data->>'published_at')::timestamptz DESC

If you find yourself casting the same timestamp field in multiple queries, a generated column is a one-line investment that pays back on every query:

Timestamp generated column

ALTER TABLE events ADD COLUMN created TIMESTAMPTZ
  GENERATED ALWAYS AS ((data->>'created')::timestamptz) STORED;
CREATE INDEX ON events (created);

Now the timestamp is a real column — indexed, typed, and queryable without casting. The document stays whole. The field becomes fast. This, if I may editorialize briefly, is one of PostgreSQL's most underappreciated gifts to the document model: the ability to keep your documents intact while giving individual fields the performance characteristics of dedicated columns. You do not have to choose between the flexibility of documents and the speed of relations. You can have both, in the same table, maintained by the same transaction. Chapter 16 explores this pattern in depth.

Booleans. Both containment and extraction work, but they differ in index support:

Boolean queries

-- Uses the GIN index — preferred
WHERE data @> '{"active": true}'

-- Works but does NOT use the GIN index
WHERE data->>'active' = 'true'

Prefer containment for boolean checks. It is type-aware (true the boolean vs "true" the string) and GIN-indexed. The extraction approach works but pays the cost of scanning without index support.

JSON null versus SQL NULL. This is the subtlest type distinction and the one most likely to produce incorrect query results if misunderstood.

Null vs NULL

-- Key is missing entirely
SELECT '{"name": "Alice"}'::jsonb -> 'age';
-- Returns: SQL NULL (the key does not exist)

-- Key exists with a JSON null value
SELECT '{"name": "Alice", "age": null}'::jsonb -> 'age';
-- Returns: null (JSON null — the key exists, the value is null)

-- Testing for key existence
SELECT '{"name": "Alice"}'::jsonb ? 'age';          -- false
SELECT '{"name": "Alice", "age": null}'::jsonb ? 'age';  -- true

The difference matters when your application distinguishes "the user did not provide this field" from "the user explicitly cleared this field." A user who never set their phone number is different from a user who deleted their phone number. Most applications should make that distinction. PostgreSQL makes it straightforward — ? tests for key existence regardless of the value, and containment with null tests for the specific case of a key present with a null value. A database that can distinguish between silence and a deliberate answer of "nothing" is a database that understands your data as well as you do.

The foundation is now beneath your feet. You know the table shape that every collection uses and how it maps to MongoDB's document model. You understand why JSONB is fast — not as a claim but as a mechanism, traced from the sorted key array through the offset table to the binary search that retrieves a field without parsing the document. You have the operator vocabulary, with the four daily tools distinguished from the seven specialists. You understand containment as a query paradigm — the default choice, type-aware, GIN-indexed, covering equality at any depth. And you know where the type system asks for your explicit attention and where it handles things on its own.

I have, I realize, spent this chapter on what amounts to the grammar of a language. Grammar is not exciting. But fluency is, and fluency begins here. The next chapter puts this grammar to work — from basic lookups to complex multi-pattern queries, all in raw SQL. We will write the queries that Gold Lapel will later generate for you, because understanding what a tool produces is the foundation of trusting it.

If you will forgive a professional observation: trust that is earned through understanding is the only kind worth having. In databases, as in most things.