Chapter 5: Querying Documents

The Waiter of Gold Lapel · Published Apr 19, 2026 · 11 min

The previous chapter taught you how PostgreSQL stores documents and gave you the vocabulary of operators. This chapter asks the natural next question: now that the documents are in there, how do you get them back out?

The answer, I am pleased to report, is six ways — each suited to a different shape of question, and all of them composable in a single query. I will walk you through each one with working code and honest guidance on when to reach for which. By the end, you will be writing JSONB queries with the confidence of someone who has done it for years, because the patterns, once seen, are not easily forgotten.

A note before we begin: this is the raw SQL layer. Chapter 10 shows how Gold Lapel generates this SQL from MongoDB-style filter syntax. I teach you the SQL first, because a developer who can read what a tool produces is a developer who does not panic when the tool produces something unexpected. And in my experience, the unexpected arrives most reliably at the moment you can least afford it.

Containment Queries

Containment is where you begin. If the question is "find documents where a field equals a value," containment answers it, and answers it fast.

Containment patterns

-- Single field equality
SELECT * FROM users WHERE data @> '{"status": "active"}';

-- Multiple fields (implicit AND)
SELECT * FROM users WHERE data @> '{"status": "active", "role": "admin"}';

-- Nested field
SELECT * FROM users WHERE data @> '{"address": {"state": "CA"}}';

-- Array containment
SELECT * FROM products WHERE data @> '{"tags": ["featured"]}';

One GIN index on the column covers every one of these queries — any field, any nesting level, without per-field index declarations. The majority of document queries in most applications are equality checks. Containment handles all of them with a single index that was created once and never needs to be revisited.

I dwell on this because the instinct for many developers — and this is true whether you are arriving from MongoDB or from the SQL world — is to reach for the most expressive query tool first. Write the complex query. Use the advanced syntax. Demonstrate fluency. In PostgreSQL document querying, the opposite instinct serves you better. Containment is both the simplest and the fastest pattern. Start here. Reach for more powerful tools when this one genuinely cannot answer the question. The simplest query that produces the correct result is not a shortcut. It is the correct query.

When containment is not enough — range comparisons, negation, regex, partial matching — allow me to show you what comes next.

Comparison Queries

Some questions cannot be phrased as "does the document contain this structure?" The question "find users older than 25" requires a comparison, and comparisons require extraction and casting.

Comparison patterns

-- Numeric range
SELECT * FROM users WHERE (data->>'age')::numeric BETWEEN 25 AND 40;

-- Date range
SELECT * FROM events
WHERE (data->>'created_at')::timestamptz > '2025-01-01';

-- Case-insensitive string match
SELECT * FROM users WHERE lower(data->>'name') LIKE 'alice%';

-- Sorting by a numeric field
SELECT * FROM products ORDER BY (data->>'price')::numeric DESC;

The pattern is consistent: extract with ->>, cast to the type you need, then compare or sort. Once internalized — and it takes one or two queries to internalize — the pattern becomes automatic.

Unlike containment, comparison queries need per-field indexes to be fast:

Expression indexes

CREATE INDEX ON users (((data->>'age')::numeric));
CREATE INDEX ON events (((data->>'created_at')::timestamptz));

The trade-off is straightforward: containment gives you one index covering every field. Comparison queries ask for one index per field. For fields you query with range operators regularly, a single line of index creation is a modest investment for the performance it returns.

A real-world pattern that combines both: paginated product listings with price filtering.

Combined pattern

-- Products under $50, sorted by price, paginated
SELECT * FROM products
WHERE data @> '{"category": "electronics"}'           -- containment (GIN)
  AND (data->>'price')::numeric < 50                  -- range (expression index)
ORDER BY (data->>'price')::numeric ASC
LIMIT 20 OFFSET 40;

Containment filters to the category. The expression index handles the price range and sort. This is the kind of query that powers every product listing page on the internet, and it runs efficiently on a JSONB document collection with two indexes.

Nested Access

Documents have depth. A user has an address. The address has a city. The city is three levels into the document. Queries need to reach into that depth, and PostgreSQL provides two ways to do it.

Arrow chains read left to right, one level at a time:

Arrow chains

-- Two levels
SELECT data->'address'->>'city' FROM users;

-- Three levels
SELECT data->'payment'->'billing'->>'zip' FROM orders;

-- In a WHERE clause
SELECT * FROM users WHERE data->'address'->>'state' = 'CA';

Path operators use a key array to jump directly to the target:

Path operators

SELECT data #>> '{address,city}' FROM users;
SELECT data #>> '{payment,billing,zip}' FROM orders;

Arrow chains feel natural at two levels. At three or more, they begin to accumulate — each -> adding visual noise without adding information. Path operators stay tidy regardless of depth. Use whichever reads more clearly in context. I will not insist on one over the other — clarity is a personal judgment, and you are the one who will read this code at 2 AM.

One practical note that saves both syntax and performance: for nested equality, containment is often the better path.

Containment vs arrow chain

-- Containment: GIN-indexed, no per-field index needed
WHERE data @> '{"address": {"state": "CA"}}'

-- Arrow chain: needs an expression index on this specific path
WHERE data->'address'->>'state' = 'CA'

If the question is "does this nested field equal this value," containment answers it with the index you already have. Arrow chains earn their place when you need to extract, cast, and compare — "find users in California with a zip code above 94000" requires arrow chains for the zip comparison even if containment handles the state.

The deeper lesson: every query pattern in this chapter has a natural domain. Learning the patterns is the first step. Learning their boundaries is the second, and more valuable, step.

jsonpath

There comes a point in every sufficiently complex query where containment is too simple, arrow chains are too verbose, and what you need is a language designed for navigating document structure. jsonpath is that language.

jsonpath patterns

-- Find products where ANY tag equals "python"
SELECT * FROM products WHERE data @? '$.tags[*] ? (@ == "python")';

-- Find orders with ANY line item priced over $100
SELECT * FROM orders WHERE data @? '$.items[*] ? (@.price > 100)';

-- Find users where ANY address is in California
SELECT * FROM users WHERE data @? '$.addresses[*] ? (@.state == "CA")';

If you have written MongoDB queries like {tags: "python"} or {items: {$elemMatch: {price: {$gt: 100}}}}, you will recognize the intent immediately. The syntax differs, but the capability is equivalent — and for deeply nested conditions, jsonpath is often more expressive than MongoDB's filter syntax rather than less.

Compare the first query to its non-jsonpath equivalent:

EXISTS subquery equivalent

SELECT * FROM products
WHERE EXISTS (
  SELECT 1 FROM jsonb_array_elements_text(data->'tags') AS t
  WHERE t.value = 'python'
);

Both return identical results. jsonpath says the same thing in fewer words, and the gap widens as queries grow more complex. A query language that scales with the difficulty of the question rather than against it is a query language worth learning.

jsonpath queries using @? and @@ are GIN-indexable — they participate in the same GIN index that serves containment queries. For simple equality, containment remains the faster and simpler path. jsonpath earns its place when the question involves conditions inside arrays, numeric comparisons on nested elements, or existence checks with filters — the queries that would otherwise require subqueries and lateral joins.

Array Queries

Arrays inside documents are where developers arriving from MongoDB most frequently ask "how do I do this in PostgreSQL?" I understand the concern. MongoDB's array query syntax is one of its most natural features — {tags: "python"} just works. The PostgreSQL equivalents are more explicit, but they are also more powerful once you see the full range of what they can do.

Simple element check — does the array contain this value:

Array element check

SELECT * FROM products WHERE data->'tags' ? 'python';

This is the closest equivalent to MongoDB's {tags: "python"}. The ? operator checks for key or element existence within JSONB. It uses GIN indexes with the jsonb_ops operator class.

Complex element matching — the equivalent of MongoDB's $elemMatch. Find documents where an array element satisfies multiple conditions simultaneously:

$elemMatch equivalent

-- MongoDB: db.orders.find({items: {$elemMatch: {product: "Widget", qty: {$gt: 5}}}})

-- PostgreSQL:
SELECT * FROM orders WHERE EXISTS (
  SELECT 1 FROM jsonb_array_elements(data->'items') AS item
  WHERE item->>'product' = 'Widget'
    AND (item->>'qty')::numeric > 5
);

jsonb_array_elements expands a JSON array into rows. The EXISTS subquery checks whether any row satisfies all conditions. This is structurally what $elemMatch does — find at least one element where every condition holds simultaneously, not just any element that matches any one condition.

The SQL is more verbose than MongoDB's syntax. It is also more explicit about what it is doing, which is a trade-off I am comfortable presenting honestly. MongoDB's syntax is more concise. PostgreSQL's syntax is more transparent. Each choice reflects a design philosophy, and both philosophies have merit. Chapter 10 shows how Gold Lapel gives you MongoDB's conciseness while generating PostgreSQL's transparency underneath — the best of both, if you will permit me the immodesty.

Expanding arrays for reporting — each element becomes its own row:

Array unnesting

SELECT _id,
  item->>'product' AS product,
  (item->>'qty')::numeric AS qty
FROM orders, jsonb_array_elements(data->'items') AS item
WHERE (item->>'qty')::numeric > 5;

A single order with ten line items becomes ten rows — one per item. This is useful for aggregation, JOINs against other tables, and any reporting that needs to work at the element level rather than the document level. In MongoDB, this requires a $unwind stage in the aggregation pipeline. In PostgreSQL, it is a JOIN. Chapter 8 explores this equivalence in depth.

Full-Text Search on Documents

Full-text search on JSONB fields works without a separate search service. I mention this not as a curiosity but as a practical capability that eliminates one of the most common reasons teams add a third service to their architecture.

Search a specific field:

Field-scoped FTS

SELECT * FROM articles
WHERE to_tsvector('english', data->>'body') @@ plainto_tsquery('database migration');

Search with relevance ranking:

Ranked FTS

SELECT *, ts_rank(to_tsvector('english', data->>'body'), plainto_tsquery('database migration')) AS rank
FROM articles
WHERE to_tsvector('english', data->>'body') @@ plainto_tsquery('database migration')
ORDER BY rank DESC;

Search across multiple document fields:

Multi-field FTS

SELECT * FROM articles
WHERE to_tsvector('english', coalesce(data->>'title', '') || ' ' || coalesce(data->>'body', ''))
  @@ plainto_tsquery('database migration');

Index it for performance:

FTS index

CREATE INDEX ON articles USING GIN (to_tsvector('english', data->>'body'));

This is not a replacement for a dedicated search engine — Book 2 in this series covers PostgreSQL's full search capabilities at length, including phrase search, weighted ranking, custom dictionaries, and fuzzy matching. But for the searches most applications actually need — finding products by name, filtering articles by topic, searching log messages for error patterns — the search capability built into PostgreSQL is sufficient. And it runs on the same data, in the same database, with the same transactional guarantees. A search result that reflects committed data and nothing else is a search result you can trust. A search result that depends on a synchronization pipeline is a search result that might be three seconds behind.

Combining Patterns

I have saved the most satisfying part for last. These six patterns are not separate tools that work in isolation. They compose freely in a single WHERE clause, and the query planner combines their indexes intelligently.

A product search with filters, sorting, and full-text:

Combined query

SELECT *,
  ts_rank(to_tsvector('english', data->>'description'), plainto_tsquery('wireless')) AS relevance
FROM products
WHERE data @> '{"category": "electronics", "in_stock": true}'   -- containment
  AND (data->>'price')::numeric BETWEEN 25 AND 100              -- range
  AND data @? '$.features[*] ? (@ == "bluetooth")'              -- jsonpath array
  AND to_tsvector('english', data->>'description') @@ plainto_tsquery('wireless')  -- full-text
ORDER BY relevance DESC, (data->>'price')::numeric ASC
LIMIT 20;

Four patterns in a single query. GIN handles the containment, jsonpath, and full-text conditions. An expression index handles the price range. The planner evaluates the most selective condition first and narrows from there. You write the question. The database decides how to answer it quickly.

A user dashboard query combining containment with nested access:

Dashboard query

SELECT _id,
  data->>'name' AS name,
  data->'stats'->>'login_count' AS logins,
  data->'subscription'->>'plan' AS plan
FROM users
WHERE data @> '{"status": "active"}'
  AND (data->'subscription'->>'expires_at')::timestamptz > now()
  AND data->'subscription'->'features' ? 'premium_support'
ORDER BY (data->'stats'->>'login_count')::numeric DESC;

Containment for status, casting for date comparison, key existence for feature check, arrow chains for extraction. Each pattern doing what it does best, in a single statement, against a single table.

The decision framework, stated once and meant to last:

Equality on any field, any depth → containment (@>)
Range, sorting, inequality → extraction + casting (->> with ::type)
Nested access → arrow chains (->, ->>) or path operators (#>, #>>)
Array element matching, conditional logic → jsonpath (@?) or EXISTS subquery
Text search → to_tsvector + @@
Key existence → ? operator

Six tools. Each with a clear purpose. All composable. The power is not in any individual pattern. It is in the fact that they work together, in the same query, served by the same planner, against the same data. A query that combines four patterns is not four queries. It is one question, asked clearly, answered once.

Six query patterns, and you now know all of them. You know when to reach for each, how they compose, and where the indexes that serve them come from. This, I should note, is more than most developers learn about querying MongoDB — not because MongoDB is difficult, but because an API that handles everything transparently does not invite the developer to look beneath it. There is nothing wrong with that approach. Transparency is a form of hospitality. But understanding is a form of preparedness, and the two serve different needs. You have looked beneath, and you are better equipped for it.

Chapter 6 makes these queries fast — GIN indexes, expression indexes, partial indexes, and the auto-indexing that means you may never need to think about any of them. Chapter 10 then brings MongoDB's familiar filter syntax back, generating the SQL you now understand from the {"field": {"$operator": value}} patterns you already know.

The queries are written. Now let us ensure they run at the speed your application requires.