Chapter 3: The Document Store You Already Have

The Waiter of Gold Lapel · Published Apr 19, 2026 · 9 min

The previous chapter presented an itemized bill — six categories of cost, each one real, each one recurring, each one unnecessary if the second database turns out to be optional. This chapter is the reason it turns out to be optional.

I am, I confess, looking forward to this one. Two chapters of discussing what the dual-database architecture costs you is important work, but it is not cheerful work. What follows is considerably more pleasant: an inventory of what you already own and have not yet had occasion to use.

PostgreSQL has not been idle while the document database market grew around it. Ten capabilities — most available for years, several for over a decade — collectively provide a document store inside the database you already run. I suspect you will recognize some of them. Others, I expect, will come as a genuine surprise. A well-appointed establishment has more rooms than most guests realize, and the best ones are often the quietest.

This chapter introduces each capability with enough depth to be convincing and enough brevity to maintain the pace. Parts II and III go deep. For now, allow me to show you around.

JSONB

JSONB is the foundation. Everything else in this chapter is built upon it, and everything this book demonstrates depends on it. If you take nothing else from this tour, take JSONB.

JSONB is PostgreSQL's binary JSON storage format. A document is parsed once on write, stored in a decomposed binary structure, and never re-parsed on read. When you access a field, the database goes directly to the value without scanning the document from the beginning. This is not JSON stored as text with some convenience functions bolted on. It is a purpose-built document storage format that accepts JSON as its input and stores it in a form optimized for retrieval.

The distinction between JSON and JSONB in PostgreSQL is worth stating plainly, because it is the distinction between a workaround and a feature. JSON preserves whitespace and key order but cannot be indexed. Every access re-parses the text from the beginning. JSONB discards whitespace, normalizes key order, stores in binary, and supports GIN indexes and containment queries. For document storage, JSONB is the correct choice in virtually every case. A feature that cannot be indexed is, for any collection of meaningful size, a feature in name only.

A quick taste of what this looks like in practice:

JSONB queries

-- Find all active users
SELECT * FROM users WHERE data @> '{"status": "active"}';

-- Get a user's email
SELECT data->>'email' FROM users WHERE _id = '...';

-- Find users in a specific city
SELECT * FROM users WHERE data->'address'->>'city' = 'San Francisco';

The @> containment operator is the workhorse you will come to know well. It checks whether the left JSONB value contains the right one, and it uses GIN indexes. A single GIN index on a JSONB column makes containment queries fast across the entire document — any field, at any depth, without specifying which fields to index in advance. One index. Every field. That is not a compromise. That is an architectural advantage.

JSONB has been available since PostgreSQL 9.4, released in December 2014. Twelve years. I mention the date not for the sake of historical record, but because twelve years of production use across every industry, on every major cloud provider, through every managed PostgreSQL service, is not a technology you need to evaluate. It is a technology you need to notice.

That is the foundation. Now let me show you what has been built on top of it.

jsonpath

For straightforward field access, JSONB's arrow operators serve well. For complex queries — filtered arrays, conditional logic, deeply nested structures — arrow chains begin to accumulate. jsonpath provides a cleaner syntax for the work that arrow chains struggle with.

jsonpath queries

-- Find products where the tags array contains "python"
SELECT * FROM products WHERE data @? '$.tags[*] ? (@ == "python")';

-- Find orders with items priced over $100
SELECT * FROM orders WHERE data @? '$.items[*] ? (@.price > 100)';

-- Find users where any address is in California
SELECT * FROM users WHERE data @? '$.addresses[*] ? (@.state == "CA")';

If you have used MongoDB's query language to filter inside arrays — {tags: "python"} or {items: {$elemMatch: {price: {$gt: 100}}}} — jsonpath is PostgreSQL's answer to the same question. The syntax is different, but the capability is equivalent, and for deeply nested conditions, jsonpath is often more expressive than MongoDB's filter syntax rather than less.

For simple access, the difference between jsonpath and arrow chains is modest. As queries grow in complexity — nested objects, filtered arrays, multiple conditions combined — jsonpath scales gracefully while arrow chains do not. A query language should grow with the difficulty of the question, not against it.

jsonpath has been available since PostgreSQL 12, released in 2019. Chapter 5 covers its query patterns in depth.

Indexing

The performance question arrives early, and it deserves a direct answer: yes, PostgreSQL can query documents fast. With GIN indexes, it can query them very fast indeed.

A GIN index on a JSONB column comes in two varieties. jsonb_path_ops is smaller and faster, supporting containment queries. jsonb_ops is broader, supporting containment plus key-existence checks. For most document workloads, where you query by field values, jsonb_path_ops is the right default:

GIN index

CREATE INDEX ON users USING GIN (data jsonb_path_ops);

One statement. Every containment query against that table is now indexed, regardless of which field the query targets. No need to anticipate query patterns in advance. No need to call createIndex() each time a new query shape appears. The index covers the entire document. The elegance of this arrangement is easy to underestimate: you index the column once, and the database handles every field within it. In MongoDB, every new query pattern risks a full collection scan until someone remembers to create the index. In PostgreSQL, the GIN index was waiting before the query was written.

For queries that need range comparisons on specific fields, expression indexes provide B-tree performance:

Expression indexes

CREATE INDEX ON users ((data->>'email'));
CREATE INDEX ON users (((data->>'age')::numeric));

And partial indexes go further still — indexing only the documents that match a condition, keeping the index small and the scans fast:

Partial index

CREATE INDEX ON users USING GIN (data jsonb_path_ops) WHERE data->>'status' = 'active';

Chapter 6 covers indexing in full. Gold Lapel creates these indexes automatically by observing query patterns — the developer writes queries, the proxy creates the indexes. But even without automation, a small number of well-chosen indexes makes a JSONB document store performant enough that most teams will wonder what they were worried about.

Performance established. Let me turn to a question I know is on your mind.

Schema Validation

I hear the objection forming: "But MongoDB is schemaless." Allow me to address it directly, and with the respect the question deserves.

MongoDB has had optional schema validation since version 3.6. It is not, in practice, schemaless — it is schema-optional. PostgreSQL offers the same spectrum, from fully open to fully validated, and the developer chooses where each collection sits on that spectrum. The freedom is identical. The enforcement, when you want it, is stronger.

No validation is the default. JSONB accepts any valid JSON document. Event logs, user preferences, dynamic form submissions — store them all without declaring a schema. The freedom MongoDB is known for is available here, identically. If you want to store documents of wildly different shapes in the same table, PostgreSQL will not stop you. The question is not whether you can, but whether you should — and that question, this chapter is not the place to answer. Chapter 7 is.

CHECK constraints add lightweight structure:

CHECK constraints

ALTER TABLE users ADD CONSTRAINT require_email CHECK (data ? 'email');
ALTER TABLE users ADD CONSTRAINT age_is_number CHECK (jsonb_typeof(data->'age') = 'number');

JSON Schema validation brings full enforcement. pg_jsonschema supports JSON Schema drafts 4, 6, 7, 2019-09, and 2020-12:

pg_jsonschema validation

ALTER TABLE users ADD CONSTRAINT schema_check
  CHECK (jsonb_matches_schema('{...schema...}'::jsonb, data));

The spectrum runs from "accept anything" to "validate everything," and the level of enforcement can differ per table within the same database. Event logs stay open. User profiles get key requirements. Financial documents get full JSON Schema validation. The database enforces the contract — not the application, not a middleware layer, not the collective discipline of every developer who writes to the table. A schema enforced by convention is a schema enforced by optimism. A schema enforced by the database is a schema enforced.

Chapter 7 walks the full spectrum. For now, the point is this: the choice between schemaless and strict is not a choice between MongoDB and PostgreSQL. It is a choice between tables, within the same database, using the same tools.

Generated Columns

Generated columns offer something that neither pure document storage nor pure relational modeling provides on its own: the ability to keep a document whole while making specific fields fast.

Generated column

ALTER TABLE users ADD COLUMN email TEXT GENERATED ALWAYS AS (data->>'email') STORED;
CREATE INDEX ON users (email);

The document stays intact in the data column — every field, every nested object, the complete structure. The generated column extracts one field into a real, typed, indexed column that the database maintains automatically. When the document changes, the column updates. No trigger. No application code. No maintenance.

The practical difference is immediate. Without the generated column, querying email requires extraction and casting: WHERE data->>'email' = 'alice@example.com' with an expression index. With the generated column, it is simply WHERE email = 'alice@example.com' — a standard B-tree lookup, faster, cleaner, and available for JOIN conditions and unique constraints.

For teams arriving from MongoDB, this is the migration path that respects the journey. Keep your documents in JSONB. Add generated columns for the fields you query most. Gain relational query performance without restructuring your data model. The architecture does not demand that you choose between documents and relations. It permits you to use both, in the same table, maintained by the same transaction. Chapter 16 covers when to use generated columns, when to extract to permanent columns, and when to stay with pure JSONB.

Now — three operational features that complete the picture.

LISTEN/NOTIFY

LISTEN/NOTIFY is PostgreSQL's built-in pub/sub mechanism, and it is the foundation for a capability many teams assume requires MongoDB: change streams.

A client listens on a channel. A trigger fires on document change. The trigger calls pg_notify. The listening client receives the event in real time — no polling, no external message broker, no additional service.

Change-notify trigger

CREATE TRIGGER user_changes AFTER INSERT OR UPDATE OR DELETE ON users
  FOR EACH ROW EXECUTE FUNCTION notify_document_change();

The notification carries whatever payload the trigger provides — operation type, document ID, the full document if desired. The event shape can match MongoDB's change stream format exactly: {"operationType": "update", "_id": "...", "fullDocument": {...}}. The delivery is real-time. The mechanism is built into the database.

Chapter 13 shows how Gold Lapel builds MongoDB-compatible change streams, TTL-based document expiry, and capped collections on this foundation, using PostgreSQL's trigger system for all three. A database that can notify your application when its own data changes is a database that does not need a companion service to watch it.

Table Partitioning

Document collections grow. Event logs, audit trails, session records, activity feeds — the collections that accumulate fastest are almost always document collections. Table partitioning ensures that growth does not become a performance problem.

PostgreSQL supports declarative partitioning — range, list, and hash. For document collections, time-based range partitioning is the most common and most natural pattern:

Range-partitioned events table

CREATE TABLE events (
  _id UUID DEFAULT gen_random_uuid(),
  data JSONB NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now()
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2026_q1 PARTITION OF events
  FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');

The query planner prunes irrelevant partitions automatically. A query for March data does not touch January or February. As the collection grows, you add partitions — not larger instances. Old partitions can be archived, detached, or dropped without affecting the active data. The growth is managed, not endured.

Chapter 18 covers scaling from single node to read replicas to Citus for horizontal distribution.

Row-Level Security

Row-level security enforces document access at the database level — not in the application, not in a middleware layer, but in the query planner itself:

Row-level security

ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON documents
  USING (data->>'tenant_id' = current_setting('app.tenant_id'));

Every query against that table — from any client, through any connection, including an ad-hoc query from a developer's terminal during an incident — automatically filters to the documents the current user is authorized to see. The policy cannot be bypassed without superuser privileges. It is not a convention. It is a constraint.

MongoDB requires a separate authorization layer for document-level access control — typically implemented in application middleware, where every endpoint must remember to apply the filter, and where a single missed check is a data exposure. In PostgreSQL, the documents and the permissions live in the same database, the same transaction, the same query planner. Access control that shares a transactional boundary with the data it protects is access control you can reason about. Access control that spans two systems is access control you can hope about.

Row-level security has been available since PostgreSQL 9.5, released in 2016. A decade.

The Extensions

The final stop on this tour is perhaps the most consequential, because it represents a category of capability that no standalone document database can replicate: the PostgreSQL extension ecosystem.

JSONB columns are regular PostgreSQL columns. They participate in every extension, every operator, every function the database provides. This is not an integration. It is an inheritance — your document data automatically has access to everything PostgreSQL offers, without configuration, without adapters, without a separate service.

Three extensions in particular deserve mention for what they add to the document store.

pg_trgm provides trigram-based fuzzy matching. Typo-tolerant search on names, addresses, and product titles stored inside JSONB documents — without a separate search service. A user searches for "Sanfrancisco" and finds "San Francisco." The search runs on the same data, in the same query, in the same transaction.

pgvector stores and queries vector embeddings alongside the documents they describe. Similarity search and document storage in the same table. Store a product description as a document, store its embedding as a vector, query both together. No separate vector database. No synchronization. No eventual consistency between the document and its representation. In an industry that is rapidly adding vector search to every application, having it available alongside your documents — without another service — is not a minor convenience. It is an architectural simplification.

PostGIS adds spatial queries to document data. Find documents within a geographic boundary. Filter by location, by distance, by containment — on fields stored inside JSONB. One query. One database. One result set. A delivery application that stores orders as documents with location fields can answer "find all orders within 5km" without a separate geospatial service.

The document store does not sit adjacent to the PostgreSQL ecosystem. It participates in it. Every extension, every capability, every optimization that PostgreSQL offers is available to your documents. A document database that is also a relational database, a spatial database, a vector database, and a full-text search engine is not a compromise. It is a consolidation. And consolidation, as the previous chapter established, is the opposite of a tax.

Ten capabilities. Each replaces or reduces the need for a separate document database. JSONB has been here since 2014. The others arrived steadily in the years that followed, each one quietly expanding what a single PostgreSQL instance could do for teams that store documents.

The question from the first chapter returns, sharper now. The document store was never missing. The tools were here — indexed, transactional, extensible, and available on the database you were already running. What was missing was the developer experience that made reaching for them as natural as reaching for MongoDB. The capability existed. The invitation did not.

Part II teaches that document layer in detail — storage, queries, indexes, validation, aggregation. Part III introduces the API that closes the gap. We have toured the rooms. Now it is time to furnish them properly.