Chapter 10: The Filter Language
Among the twenty-one methods Chapter 9 introduced, doc_find accepts the filter syntax that does most of the daily work — and most of that syntax is the language one already knows from MongoDB. This chapter is the translation table: fifteen operators, one containment shortcut, one notation convention, followed by an honest accounting of what the translation does, and does not, buy you. I would be a poor host if I offered only the pleasant half.
// MongoDB
db.users.find({ status: "active", age: { $gt: 21 } })
// Gold Lapel
users.find({ "status": "active", "age": { "$gt": 21 } }) -- Generated SQL
SELECT data FROM users
WHERE data @> '{"status":"active"}'
AND (data->>'age')::numeric > 21; The JavaScript is identical down to the braces. The difference is the import statement at the top of the file, and the SQL patterns from Chapter 5 running quietly underneath. The application code does not know. It does not need to.
The Fifteen Operators at a Glance
Before we visit each room, the floor plan.
| MongoDB | Gold Lapel | Generated SQL (sketch) | GIN-indexed? |
|---|---|---|---|
{field: value} | {"field": value} | data @> '{"field": value}' | Yes |
$eq | {"$eq": v} | data @> '{"field": v}' | Yes |
$gt | {"$gt": v} | (data->>'field')::T > v | No |
$gte | {"$gte": v} | (data->>'field')::T >= v | No |
$lt | {"$lt": v} | (data->>'field')::T < v | No |
$lte | {"$lte": v} | (data->>'field')::T <= v | No |
$ne | {"$ne": v} | NOT (data @> '{"field": v}') | Partial |
$in | {"$in": [...]} | data->>'field' IN (...) | No |
$nin | {"$nin": [...]} | data->>'field' NOT IN (...) | No |
$exists | {"$exists": true} | data ? 'field' | Yes (jsonb_ops only) |
$regex | {"$regex": "..."} | data->>'field' ~ '...' | No |
$or | {"$or": [...]} | (a OR b OR ...) | Per-branch |
$and | {"$and": [...]} | (a AND b AND ...) | Per-branch |
$not | {"$not": {...}} | NOT (inner) | Per-branch |
$elemMatch | {"$elemMatch": {...}} | EXISTS (SELECT 1 FROM jsonb_array_elements ...) | No |
$text | {"$text": {"$search": "..."}} | to_tsvector(...) @@ plainto_tsquery(...) | Yes (GIN on tsvector) |
The GIN-indexed column has teeth. Seven of the fifteen operators — every comparison, every negation, set membership, regex, and array element matching — bypass the JSONB GIN index entirely. On a million-row table with no expression indexes, those are sequential scans. I would not be much of a host if I let you discover that on a Tuesday afternoon in production. We shall return to it in the Honest Boundary section, where I intend to be quite plain.
Plain Equality and the Containment Operator
How a plain filter becomes containment
The filter compiler walks the top-level object and classifies each key by the shape of its value. A plain literal — string, number, boolean, null, or a nested object with no $-prefixed keys — routes to the containment branch. Everything else routes to an operator branch. A simple sorting, performed without ceremony, before any SQL is written.
Two plain predicates collapse into one containment object:
users.find({ "status": "active", "tier": "gold" }) SELECT data FROM users
WHERE data @> '{"status":"active","tier":"gold"}'; Not two @> clauses ANDed. One. This matters because @> against a GIN-indexed column is an inverted-index lookup — GIN being the Generalized Inverted Index, named with PostgreSQL's customary lack of marketing instinct — and one lookup covering two keys is cheaper than two lookups intersected.
Why this is the fastest path
jsonb_ops builds index entries for every {path, value} pair in every document. A containment query data @> '{"status":"active","tier":"gold"}' extracts the pairs (status, "active") and (tier, "gold") from the query and asks the inverted index for the intersection of matching rows. No per-row function call. No detoasting of the full JSONB payload. No cast. The index does the work, as a properly furnished index ought.
jsonb_path_ops builds one entry per leaf path, hashed. The index is smaller and the probes are faster for @>-only workloads, but the ?, ?|, and ?& key-existence operators are unsupported. Choose it when you know you will never ask "does this key exist" outside a full-value match. The decision is yours; I would only ask that it be made deliberately.
Comparison Operators
Type-aware casting from JSONB
The bug nobody writes about, and which I would prefer you never meet in production: data->>'age' returns text. In text, '10' < '9' is true, because '1' sorts before '9'. If you write
SELECT data FROM users WHERE data->>'age' > '21'; PostgreSQL will happily return rows where age is 3, so long as it begins with the character '3', '4', or any later digit. This fails silently — no error, wrong rows. The most expensive kind of mistake.
Gold Lapel inspects the JSON type of the right-hand value at compile time and emits the cast:
users.find({ "age": { "$gt": 21 } }) SELECT data FROM users
WHERE (data->>'age')::numeric > 21; Numbers cast to numeric, booleans to boolean, ISO-8601 strings recognized as timestamps to timestamptz. Other strings stay as text and sort lexically — which is, after all, what you asked for.
Why comparison operators do not use GIN
GIN indexes {path, value} pairs, not ranges. (data->>'age')::numeric > 21 is an expression evaluated per row; the planner has no index entry keyed on "numeric values above 21" to consult. A perfectly reasonable limitation of an inverted index, but a limitation nonetheless.
If a comparison is hot, add an expression index matching the exact cast chain Gold Lapel emits:
CREATE INDEX users_age_idx ON users (((data->>'age')::numeric)); The double parentheses are required — PostgreSQL's expression-index grammar insists upon them. The cast chain in the index must match the generated SQL character for character; (data->>'age')::integer will not accelerate a query that casts to numeric. Close is not, in this house, the same as exact.
Gold Lapel does not create expression indexes automatically. It reports the candidate expressions in EXPLAIN output and leaves the DDL decision to the operator, because the write-amplification cost of an index belongs to the person who owns the table. A waiter does not rearrange the furniture without being asked.
$eq versus plain equality — when they differ
They are equivalent for every scalar value except one: null. A tidy little exception, of the sort that ruins evenings. Consider two rows:
{ "id": 1, "deleted_at": null }
{ "id": 2 } {"deleted_at": null} emits data @> '{"deleted_at": null}', which matches row 1 but not row 2 — JSONB containment of null requires the key to be present with a null value, not merely absent.
MongoDB's {deleted_at: null} matches both rows — a missing field and an explicit null are treated the same. This is a thoughtful piece of API design on MongoDB's part, and one whose absence surprises people in PostgreSQL on a regular basis. If your application depends on MongoDB's laxer semantics, the correct Gold Lapel form is {"$or": [{"deleted_at": null}, {"deleted_at": {"$exists": false}}]}. A small inconvenience to preserve a behavior you may have built rather a lot upon.
Set Membership: $in and $nin
Generating SQL IN clauses from extracted paths
users.find({ "role": { "$in": ["admin", "owner"] } }) SELECT data FROM users
WHERE data->>'role' IN ('admin', 'owner'); For non-string values, the cast lives inside the extraction — Gold Lapel reads the JSON type of the first list element:
orders.find({ "total": { "$in": [49, 99, 199] } }) SELECT data FROM orders
WHERE (data->>'total')::numeric IN (49, 99, 199); $nin and the NULL trap
NOT IN in SQL has a well-known sharp edge: if any element in the list is NULL, the whole expression collapses to unknown, which filters everything. Gold Lapel rejects null in $nin lists at compile time for this reason. Better to refuse politely at the door than to let the guest discover an empty result set the hard way.
A more subtle MongoDB incompatibility: MongoDB's $nin returns documents where the field is missing entirely. A naive data->>'role' NOT IN ('banned') returns false for a missing field, because ->> on a missing key yields SQL NULL, and NULL NOT IN (...) is unknown. Gold Lapel emits the MongoDB-compatible form:
SELECT data FROM users
WHERE data->>'role' IS NULL
OR data->>'role' NOT IN ('banned', 'suspended'); The extra disjunct costs nothing when the field is indexed via expression index, and it restores MongoDB's result set exactly. Compatibility, where it matters, is worth a few extra characters.
Key Existence: $exists
The jsonb ? operator and top-level keys
users.find({ "email": { "$exists": true } }) SELECT data FROM users WHERE data ? 'email'; ? checks for a top-level key. It is GIN-indexed under jsonb_ops and unsupported under jsonb_path_ops — if you chose the smaller operator class for your GIN index, $exists falls back to a sequential scan. The trade-off you accepted earlier, presenting itself again. These things have a way.
Nested existence with dot notation
? does not descend. {"profile.verified": {"$exists": true}} cannot be rewritten to data ? 'profile.verified'; that asks whether the top-level key is literally "profile.verified", which it almost certainly is not. Gold Lapel emits a path extraction:
SELECT data FROM users
WHERE data #> '{profile,verified}' IS NOT NULL; #> returns JSONB at a path; a missing path yields SQL NULL. This is not GIN-indexed out of the box, but an expression index on the specific path works admirably:
CREATE INDEX users_profile_verified_idx
ON users ((data #> '{profile,verified}')); $exists: false inverts carefully — and I do mean carefully. NOT (data ? 'email') and data->>'email' IS NULL are not equivalent: the second is true when email exists but is explicitly null. Gold Lapel emits the former, matching MongoDB's "the key is absent" semantics. A small distinction, with rather large consequences for a soft-delete query.
Regex Matching: $regex
PostgreSQL ~ and the case-insensitive variant
users.find({ "name": { "$regex": "^alice", "$options": "i" } }) SELECT data FROM users
WHERE data->>'name' ~* '^alice'; Flag mapping, briefly: i selects ~* (case-insensitive), absence of i selects ~. MongoDB's m (multiline) and s (dotall) flags translate to PostgreSQL's embedded-option syntax: Gold Lapel prepends (?n) or (?s) to the pattern body rather than changing operators, since PostgreSQL has no operator variant for them. The x (extended whitespace) flag maps to (?x). Different dialects, the same intent, conveyed without fuss.
Why regex does not hit the GIN index
GIN on JSONB does not support ~. For prefix-anchored patterns (^alice), a B-tree index on the extracted field with the text_pattern_ops opclass accelerates the probe:
CREATE INDEX users_name_pattern_idx
ON users ((data->>'name') text_pattern_ops); For substring or fuzzy patterns, the pg_trgm extension — a trigram GIN index, and a quietly excellent piece of engineering — is the standard tool. Gold Lapel documents this pattern but does not install the extension on your behalf. CREATE EXTENSION is a DBA-scope operation, and I would not presume.
Logical Operators: $or, $and, $not
Recursive filter building
The compiler calls itself on each child filter. A branch that reduces to plain equalities still compiles to @>; a branch with operator values fans out. Every indexed fast path survives nesting — which is the kind of property you want from a recursive translator, and which does not arise by accident.
users.find({
"$or": [
{ "status": "active" },
{ "$and": [
{ "age": { "$gt": 65 } },
{ "grandfathered": true }
] }
]
}) SELECT data FROM users
WHERE (
data @> '{"status":"active"}'
OR (
(data->>'age')::numeric > 65
AND data @> '{"grandfathered":true}'
)
); The first $or branch reduces to a single containment probe. The second fans out into a comparison and a second containment probe, ANDed. The planner can choose to evaluate the GIN-accelerated branch first and short-circuit. A small kindness from a planner that is, by reputation, not always kind.
When @> survives
Write logical filters with plain values as deep as you can. {"$or": [{"kind": "a"}, {"kind": "b"}]} is slower than {"kind": {"$in": ["a", "b"]}}, and both are slower than two separate @> probes unioned — but the rewrite rules here belong to your query author, not the compiler. Gold Lapel emits what you wrote. It does not rewrite OR-of-equalities into $in, because the two forms have subtle differences in null handling, and silently changing the meaning of your query is a discourtesy I am not prepared to commit.
Array Element Matching: $elemMatch
This is the pattern most translation articles get wrong. I shall set the trap before disarming it. Given an array field like
{ "tags": [
{ "name": "urgent", "priority": 3 },
{ "name": "followup", "priority": 7 }
] } and a MongoDB filter asking for "a tag named urgent with priority above 5," the answer is: no such tag exists in this document. The urgent tag has priority 3; the priority-7 tag is named followup. But a naive translation using two top-level containment probes —
-- WRONG: matches the document above
SELECT data FROM orders
WHERE data @> '{"tags":[{"name":"urgent"}]}'
AND data @> '{"tags":[{"priority":6}]}'; — returns it, because the two predicates are satisfied by different array elements. $elemMatch exists in MongoDB precisely to scope multi-condition matches to a single element. A clean piece of API design, and one I am happy to acknowledge.
The EXISTS + jsonb_array_elements pattern
orders.find({
"tags": { "$elemMatch": {
"name": "urgent",
"priority": { "$gt": 5 }
} }
}) SELECT data FROM orders
WHERE EXISTS (
SELECT 1 FROM jsonb_array_elements(data->'tags') AS elem
WHERE elem @> '{"name":"urgent"}'
AND (elem->>'priority')::numeric > 5
); jsonb_array_elements unnests one array into a set of rows, and it has been in PostgreSQL since 9.3 for exactly this purpose. Each inner row carries one element as a jsonb value, and the inner filter is the normal translation applied to elem instead of data. Plain values inside $elemMatch still consolidate into a single @> on the element; operator values still fan out. The whole filter language is recursive, and the recursion is the point.
Why not jsonb_path_exists?
PostgreSQL 12 introduced SQL/JSONPath (jsonb_path_exists, jsonb_path_query), and the same query can be written as one path expression. A perfectly serviceable tool. Gold Lapel prefers EXISTS + jsonb_array_elements for three reasons: the generated SQL is readable in EXPLAIN; an expression index on the array elements can be defined and used; and the inner predicate composes with the rest of the filter grammar without a second parser. The path-expression form is accepted in raw SQL but not emitted by the compiler. Two grammars for one job is one grammar more than I would care to maintain.
Full-Text Search: $text
Top-level and field-level search
// Whole document
docs.find({ "$text": { "$search": "quarterly report" } })
// Scoped to one field
docs.find({ "notes": { "$text": { "$search": "urgent" } } }) -- Top-level
SELECT data FROM docs
WHERE to_tsvector('english', data::text)
@@ plainto_tsquery('english', 'quarterly report');
-- Field-level
SELECT data FROM docs
WHERE to_tsvector('english', data->>'notes')
@@ plainto_tsquery('english', 'urgent'); to_tsvector on the fly is correct but slow — the linguistic analysis runs once per row, every time, which is exactly as expensive as it sounds. In production, back it with an expression GIN index:
CREATE INDEX docs_notes_fts_idx
ON docs USING GIN (to_tsvector('english', data->>'notes')); The index pre-tokenizes once and is consulted thereafter. Considerably more agreeable.
The $language configuration
docs.find({ "$text": {
"$search": "rapport trimestriel",
"$language": "french"
} }) SELECT data FROM docs
WHERE to_tsvector('french', data::text)
@@ plainto_tsquery('french', 'rapport trimestriel'); MongoDB's language names map to PostgreSQL text-search configurations of the same name for the common cases (english, french, german, spanish, italian, portuguese, dutch, russian). PostgreSQL ships configurations for a dozen or so languages out of the box, and the set overlaps most of MongoDB's but not all of it; if your application uses a language in the gap — Arabic, Korean, Chinese — the mapping is not automatic, and a custom configuration or a third-party extension is the path forward. The default when $language is omitted is english, configurable per collection at connection time.
Dot Notation and Query Expansion
Expanding addr.city to a nested object
The expansion runs before SQL generation. A dotted key is syntactic sugar for a nested object, and the nested object is what goes into @>:
users.find({ "billing.address.country": "FR" }) SELECT data FROM users
WHERE data @> '{"billing":{"address":{"country":"FR"}}}'; One GIN probe, one index hit. The depth of the nesting does not cost you extra operators. A query about something three levels deep and a query about something at the top of the document run on the same indexed path.
Sibling merge semantics in deep nesting
users.find({
"addr.city": "NY",
"addr.zip": "10001"
}) The two dotted keys share the addr. prefix and merge into a single nested object:
SELECT data FROM users
WHERE data @> '{"addr":{"city":"NY","zip":"10001"}}'; Not data @> '{"addr":{"city":"NY"}}' AND data @> '{"addr":{"zip":"10001"}}'. The merge is the reason dot notation preserves the indexed fast path: if every dotted predicate under a prefix merges, the whole prefix is still one containment probe. A pleasing economy.
For the rare case where you mix a dotted key with an explicit nested object at the same path — {"addr": {"city": "NY"}, "addr.zip": "10001"} — Gold Lapel deep-merges the two and emits one containment object. If you write conflicting leaf values at the same path, the behavior is implementation-defined; the correct answer is not to write such a filter, and the compiler will not pretend the ambiguity is meaningful.
Mixed Filters: Plain Values and Operator Values Together
The compiler partitions the top-level object. Plain-valued keys fold into one containment object; operator-valued keys each produce one SQL clause. Everything is ANDed. The bookkeeping is unglamorous, the result is not.
users.find({
"status": "active",
"tier": "gold",
"age": { "$gt": 21 },
"signup": { "$gte": "2024-01-01" }
}) SELECT data FROM users
WHERE data @> '{"status":"active","tier":"gold"}'
AND (data->>'age')::numeric > 21
AND (data->>'signup')::timestamptz >= '2024-01-01'; Two plain predicates, one @>. Two operator predicates, two clauses. Three SQL expressions total, not four.
The practical rule, if I may offer one: every predicate you can write as field: value instead of field: {$eq: value} joins the indexed containment probe. The @> clause with GIN prunes first; the operator clauses filter what survives. On a well-indexed table, that ordering is the difference between 3 ms and 300 ms. I should prefer the former, and so, I expect, should you.
Honest Boundary: What This Translation Does Not Do
I should be forthcoming about the gaps. Pretending they do not exist would be a disservice to you and an embarrassment to me.
Operators that bypass GIN
Seven operators do not use the JSONB GIN index: $gt, $gte, $lt, $lte, $ne, $in, $nin, $regex. Add $elemMatch and nested $exists, and the count is nine of fifteen. On a ten-million-row table without targeted expression indexes, these are sequential scans. The fix is real — expression indexes matching the exact cast chain — but it is DDL you have to write. Benchmark numbers live on the Gold Lapel benchmarks page rather than in this chapter, because they turn over fast and I would rather not date a book with numbers that age in months.
Write performance is not at parity
The filter language is a read path. Gold Lapel's write performance sits 2.5–4x behind MongoDB on bulk ingest under current builds. Parallel COPY and libpq pipeline-mode ingestion are on the roadmap; they are not shipped. If your workload is write-heavy and ingestion latency dominates the cost, this is the trade-off you are accepting, and I would rather you accept it with eyes open.
MongoDB is excellent software — genuinely well-engineered, with a document model and ingestion path designed by people who took the problem seriously. The argument of this book is operational necessity — one fewer database, one backup story, one set of credentials — not quality. A waiter who overstates his case is no waiter at all.
Extreme horizontal sharding
Past fifty nodes and tens of terabytes, PostgreSQL's sharding story is less mature than MongoDB's. Citus, partitioned foreign tables, and logical replication each cover a slice, but there is no single integrated answer at MongoDB Atlas's operational polish. Atlas's sharding is, frankly, something to admire; if your scale is genuinely there, it earns its keep. The filter language compiles the same SQL either way; the infrastructure underneath is the constraint. Chapter 18 revisits this ceiling in detail.
One Import Statement: Migrating from MongoDB
A realistic mixed filter, the kind you actually write in production rather than the kind that fits neatly into a documentation example:
// MongoDB
import { MongoClient } from 'mongodb';
const client = new MongoClient(uri);
const users = client.db('app').collection('users');
const result = await users.find({
"$or": [
{ "status": "active", "tier": "gold" },
{ "tags": { "$elemMatch": { "name": "vip", "since": { "$gte": "2024-01-01" } } } }
],
"profile.region": "EU"
}).toArray(); // Gold Lapel
import * as goldlapel from 'goldlapel';
const gl = await goldlapel.start(dsn);
const result = await gl.docFind("users", {
"$or": [
{ "status": "active", "tier": "gold" },
{ "tags": { "$elemMatch": { "name": "vip", "since": { "$gte": "2024-01-01" } } } }
],
"profile.region": "EU"
}); Same filter, same return shape, same document structure. The import statement changed. The application did not.
Underneath, a single SQL statement:
SELECT data FROM users
WHERE (
data @> '{"status":"active","tier":"gold"}'
OR EXISTS (
SELECT 1 FROM jsonb_array_elements(data->'tags') AS elem
WHERE elem @> '{"name":"vip"}'
AND (elem->>'since')::timestamptz >= '2024-01-01'
)
) AND data @> '{"profile":{"region":"EU"}}'; Three branches, three idioms, one statement. The GIN index carries two of them; the EXISTS subquery handles the third. The planner does the rest, and the application does not need to be told any of this.
The next chapter turns the camera around. Having seen what the filter language compiles to on the read path, we shall look at updates — $set, $push, $inc, and the handful of array modifiers — and at how PostgreSQL's jsonb_set, the || concatenation operator, and ON CONFLICT DO UPDATE carry the same ideas into mutation. The translation table extends. The import statement, you will be relieved to hear, stays the same.
If you'll follow me through.