← PostgreSQL Internals & Maintenance

PostgreSQL JSONB: Indexing and Querying with Due Care

JSONB is powerful. It is also, without proper indexing, astonishingly slow. Allow me to show you the difference.

The Waiter of Gold Lapel · Updated Mar 20, 2026 Published Mar 5, 2026 · 18 min read
The illustration was stored as JSONB. We are still parsing it.

Good evening. I see you have a JSONB column.

An excellent choice for the right use case. PostgreSQL's JSONB type stores structured data in a binary format with full indexing support, containment operators, and path queries. It bridges the gap between the rigidity of relational columns and the flexibility of document stores — without requiring a second database.

I encounter JSONB columns frequently in production systems. They arrive for sensible reasons: an event stream with varying payloads, a settings object that differs per user, a third-party webhook whose schema you cannot control. These are legitimate uses. The column is not the problem.

The trouble begins when JSONB columns grow large, queries become frequent, and nobody has added the appropriate indexes. An unindexed JSONB query on a million-row table performs a sequential scan that extracts and compares JSON values for every single row. It is, to put it plainly, a sequential scan with extra work on top. On a table of ten million rows, I have watched these queries consume 300ms where 0.12ms was available — a factor of 2,500, delivered with the quiet indifference of a query planner doing exactly what it was asked to do.

The fix is straightforward — the right index strategy for your query patterns. But "the right index" depends on how you query the data, and JSONB supports several distinct indexing approaches, each suited to different patterns. If you'll permit me, I should like to walk through each of them, their trade-offs, and the evidence that distinguishes one from another.

JSON vs JSONB: a brief but necessary clarification

-- json: stores text verbatim. Preserves whitespace, key order, duplicates.
-- jsonb: binary format. Parsed on write, fast to query. No duplicates.

-- Always use jsonb unless you need exact text preservation.
CREATE TABLE events (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    occurred_at timestamptz NOT NULL DEFAULT now(),
    payload jsonb NOT NULL
);

The answer is almost always JSONB. The json type stores the raw text and must re-parse it on every access. Every time you extract a key, PostgreSQL parses the entire JSON string from the beginning. On a 4KB document accessed in a query that touches 100,000 rows, that is 400MB of JSON parsing — per query execution.

JSONB parses once on write and stores a binary representation that supports efficient querying and indexing. The write is marginally slower — perhaps 10-15% — but every subsequent read is faster. For a column that is written once and read many times, this is not a close decision.

The only reason to use json is if you need to preserve exact text formatting, key ordering, or duplicate keys. Audit logging that must reproduce the exact bytes received from an API is a reasonable example. I have seen perhaps three legitimate uses of the json type in production. I have seen several hundred that should have been jsonb.

The operators: how you query JSONB

-- Access a key (returns text):
SELECT payload->>'type' AS event_type FROM events;

-- Access a key (returns jsonb, for chaining):
SELECT payload->'user'->'address'->>'city' AS city FROM events;

-- Nested access:
SELECT payload->'user'->>'email' AS email FROM events;

-- Filter on a key:
SELECT * FROM events WHERE payload->>'type' = 'purchase';

-- Containment (does the document contain this sub-document?):
SELECT * FROM events WHERE payload @> '{"type": "purchase"}';

-- Key existence:
SELECT * FROM events WHERE payload ? 'refund_reason';

-- Path existence (does a nested path exist?):
SELECT * FROM events WHERE payload @? '$.items[*] ? (@.price > 100)';

The key distinction: ->> extracts a value as text. -> extracts it as a JSON object, which allows chaining for nested access. @> tests containment — does the left operand contain the right operand as a sub-document? ? tests key existence. @? evaluates a SQL/JSON path expression.

This matters for indexing because each operator family uses a different index type. The ->> extraction operator needs an expression index. The @> containment operator needs a GIN index. If your index does not match your operator, PostgreSQL will politely ignore it and proceed with a sequential scan. There is no error, no warning. Only silence and a query that takes 2,500 times longer than it should.

I should note that the @? path expression syntax, introduced in PostgreSQL 12, is remarkably powerful for complex conditions on nested arrays and objects. The example above — finding events where any item has a price exceeding 100 — would require a subquery or lateral join with the older operator syntax. With @?, it is a single WHERE clause. GIN indexes support it.

The JSONB function library

Beyond the operators, PostgreSQL provides a comprehensive set of functions for transforming JSONB data. These are not mere convenience — they are often the difference between a workable query and an impossible one.

-- Expand top-level keys into rows:
SELECT jsonb_object_keys(payload) FROM events LIMIT 10;

-- Expand an array into rows:
SELECT jsonb_array_elements(payload->'items') AS item FROM events
WHERE payload->>'type' = 'purchase';

-- Aggregate rows back into a JSON array:
SELECT jsonb_agg(jsonb_build_object(
    'type', payload->>'type',
    'occurred', occurred_at
)) FROM events
WHERE occurred_at > now() - interval '1 hour';

-- Merge two JSONB objects (PG 9.5+: || operator):
SELECT '{"a": 1}'::jsonb || '{"b": 2}'::jsonb;
-- Result: {"a": 1, "b": 2}

-- Deep merge with jsonb_set:
UPDATE events
SET payload = jsonb_set(payload, '{user,verified}', 'true')
WHERE payload->'user'->>'email' = 'guest@example.com';

Two functions deserve particular attention. jsonb_array_elements unnests a JSON array into rows, which allows you to use standard SQL aggregation on array contents. This is how you answer questions like "what is the average price of items in purchase events?" without leaving PostgreSQL. The alternative — fetching all documents to application code and aggregating there — is slower by an order of magnitude on any non-trivial dataset.

jsonb_set performs surgical updates on a JSONB document without rewriting the entire value. This is important for large documents. If your JSONB column stores a 10KB configuration object and you need to update one boolean flag, jsonb_set expresses the intent precisely. The full document is still rewritten on disk — JSONB does not support in-place partial updates at the storage level — but the SQL is clear, and the application need not fetch-modify-write.

Indexing JSONB: the three strategies

This is where the article earns its keep. JSONB indexing is not a single technique — it is a choice among strategies with sharply different trade-offs. I shall present each, then show you the numbers that distinguish them.

Strategy 1: GIN index with default operator class

-- Default GIN index: supports @>, ?, ?|, ?&
CREATE INDEX idx_events_payload ON events USING gin (payload);

-- This powers containment queries efficiently:
SELECT * FROM events WHERE payload @> '{"type": "purchase"}';
-- Index Scan using idx_events_payload
-- (actual time=0.08..2.41 rows=12847 loops=1)

A GIN (Generalized Inverted Index) on the entire JSONB column is the broadest approach. It indexes every key and every value in every document, creating an inverted index that maps individual elements back to the rows that contain them. It supports containment (@>), key existence (?, ?|, ?&), and path queries.

The strength is coverage. One index serves any containment query on any key, present or future. You need not know in advance which keys will be queried. For event tables, audit logs, and any column where the query patterns are diverse or unpredictable, this is the pragmatic choice.

The cost is size. A default GIN index on a table with complex documents can be large — on a ten-million-row table with 1KB average document size, expect the GIN index to consume 600-800MB. Build time is substantial: 60-100 seconds on that same table. And write overhead is real. Every INSERT must update the inverted index for every key and value in the document. On write-heavy tables with large documents, this overhead is measurable.

Strategy 2: GIN index with jsonb_path_ops

-- jsonb_path_ops: smaller index, supports only @>
CREATE INDEX idx_events_payload_path ON events
  USING gin (payload jsonb_path_ops);

-- ~30% smaller than the default GIN operator class.
-- Faster for containment queries, but cannot support ? or ?| operators.
-- Use when you only need @> containment checks.

The jsonb_path_ops operator class takes a different approach. Rather than indexing individual keys and values, it hashes the full path from root to each leaf value. The result is a more compact index — typically 25-35% smaller than the default — that is faster for containment queries.

The trade-off is narrower coverage. jsonb_path_ops supports only the @> containment operator. It cannot answer "does this document contain the key refund_reason?" — only "does this document contain the sub-document {\"type\": \"refund\"}?" If your queries are exclusively containment checks, and you never test for key existence alone, this is the better GIN variant. If you need ? or ?|, you must use the default operator class.

Strategy 3: expression index (B-tree on extracted value)

-- Expression index: index a specific extracted value as a B-tree.
CREATE INDEX idx_events_type ON events ((payload->>'type'));

-- Now this query uses a plain B-tree index scan:
SELECT * FROM events WHERE payload->>'type' = 'purchase';

-- Faster than GIN for equality on a single key.
-- Also supports range queries:
CREATE INDEX idx_events_amount ON events
  (((payload->>'amount')::numeric));

SELECT * FROM events
WHERE (payload->>'amount')::numeric > 100;

An expression index extracts a specific value from the JSONB column and indexes it as a scalar. This is the fastest approach when you always query the same key — it produces a standard B-tree that supports equality, ranges, sorting, and covering index techniques.

The difference in query time is significant. For equality lookups on a single key, an expression index delivers 0.12ms where a GIN index delivers 2.4ms. That is 20x. The index is also dramatically smaller — 42MB for a single-key expression index versus 680MB for a full GIN index on the same table.

The trade-off: you must create one expression index per key you want to query. A GIN index covers all keys at once. For a column where you query 2-3 specific keys frequently, expression indexes are ideal. For a column where any key might be queried, GIN is more practical. This is not a "better or worse" question — it is a "known patterns versus unknown patterns" question.

Advanced: composite and partial indexes on JSONB

The three strategies above are the foundation. But PostgreSQL's indexing capabilities compose — and the compositions are where real production workloads find their optimal path.

Composite: relational + JSONB
-- Composite expression index: relational + JSONB together
CREATE INDEX idx_events_date_type ON events (
    occurred_at,
    (payload->>'type')
);

-- Serves queries that filter on both:
SELECT * FROM events
WHERE occurred_at > '2025-01-01'
  AND payload->>'type' = 'purchase';

-- The planner uses both columns. No post-filter.

A composite expression index that combines a relational column with an extracted JSONB value serves queries that filter on both. This is particularly effective in the hybrid pattern I shall describe shortly — relational columns for the structured data, JSONB for the flexible remainder, and composite indexes that bridge the two.

Partial: index only matching documents
-- Partial index: only index documents that match a condition
CREATE INDEX idx_events_purchase_amount ON events
  (((payload->>'amount')::numeric))
  WHERE payload->>'type' = 'purchase';

-- Only purchase events are indexed. The index is a fraction
-- of the size of a full-table expression index.
-- Perfect when you only query a subset of documents.

-- This query uses the partial index:
SELECT * FROM events
WHERE payload->>'type' = 'purchase'
  AND (payload->>'amount')::numeric > 500;

A partial expression index is, I must confess, one of my favourite indexing techniques. It indexes only the rows that match a WHERE condition, producing an index that is a fraction of the size of its full-table equivalent. On the events table, if only 8% of rows are purchases, the partial index is roughly 8% the size of the full expression index — 11MB versus 42MB — and equally fast for the queries it serves.

Partial indexes combine beautifully with JSONB because JSONB tables frequently contain documents of different shapes. A partial index per document type gives you B-tree speed for each type's queries without indexing documents that will never match.

The numbers: index strategy compared

Ten million rows. Average document size: 1.2KB. The table is the events table you have seen throughout this article.

Index strategyDisk sizeBuild timeQuery timeSupports
No index0 MBN/A312msNothing
GIN (default)680 MB94s2.4ms@>, ?, ?|, ?&
GIN (jsonb_path_ops)470 MB67s1.9ms@> only
Expression (single key)42 MB8s0.12ms=, <, >, BETWEEN on that key
Expression (two keys)84 MB16s0.12msTwo specific keys
Partial expression11 MB2s0.10msOne key, subset of rows

The partial expression index is the standout: 11MB, 2 seconds to build, 0.10ms query time. But it only serves one specific query pattern on a subset of rows. The GIN index is 680MB and slower per query, but it serves any containment query on any key. This is the fundamental tension: specificity versus generality. The right answer depends on whether you know your query patterns.

For most production systems, I recommend starting with expression indexes on the keys you know you query, and adding a GIN index only if ad-hoc querying or diverse containment patterns emerge. The expression indexes handle the hot path. The GIN index handles the long tail.

JSONB vs relational columns: an honest comparison

This is the question that determines whether JSONB is appropriate for your use case. I shall present the numbers, and then I shall tell you what I think they mean.

OperationJSONBRelationalNote
Equality filter (indexed key)0.12ms0.08msExpression index narrows the gap
Containment (@>)2.4msN/AGIN-specific operation
Range filter (cast to numeric)4.8ms0.15msCast overhead + less selective index
JOIN on extracted key48ms2.1msCannot use foreign keys, poor plan quality
GROUP BY extracted key124ms18msExtraction cost per row during aggregation
Full-document retrieval0.08ms0.08msIdentical -- reading a single row
Array unnest + aggregate340ms22msFunction call overhead per row

For simple equality lookups with expression indexes, JSONB approaches relational performance — 0.12ms versus 0.08ms. The 50% overhead is real but unlikely to matter at this scale. For full-document retrieval, they are identical.

The gap widens dramatically for operations that require extracting values at query time. Range filters carry the overhead of casting from text. JOINs on extracted keys are 23x slower because the planner cannot use foreign key statistics and the extraction happens per row. Aggregations are 7x slower for the same reason — every GROUP BY requires extracting the key from every row being grouped.

The pattern is clear: the more "relational" the query, the more relational columns outperform JSONB. If you find yourself writing GROUP BY payload->>'region' in a query that runs hourly, that key has earned a relational column.

"JSONB with GIN indexes provides native JSON document storage with indexing, querying, and aggregation. You can store, query, and index semi-structured data without MongoDB."

— from You Don't Need Redis, Chapter 19: The Case for Simplicity

"But what about MongoDB?"

I am asked this with some regularity, and the question deserves a direct answer rather than a diplomatic deflection.

-- MongoDB query:
db.events.find({ type: "purchase", "user.email": "a@b.com" })

-- PostgreSQL JSONB equivalent:
SELECT * FROM events
WHERE payload @> '{"type": "purchase", "user": {"email": "a@b.com"}}';

-- Both use inverted indexes. Both support nested access.
-- The PostgreSQL version also supports:
--   * JOINs with relational tables in the same query
--   * ACID transactions across JSONB and relational data
--   * Aggregation with window functions
--   * Full SQL WHERE clause composability

At the query level, PostgreSQL JSONB and MongoDB's document model can express similar operations. Both store JSON-like documents. Both support inverted indexes for efficient querying. Both handle nested access. The syntax differs, but the capability overlaps substantially.

Where PostgreSQL has the clear advantage is composability. A JSONB column lives in a relational table alongside relational columns, with full JOIN support, ACID transactions, window functions, CTEs, and the entire PostgreSQL ecosystem. You can join your JSONB event payloads with your relational customer table in a single query, inside a single transaction. In MongoDB, this requires either denormalization or multiple round trips.

Where MongoDB has genuine advantages — and I should be forthcoming about this, because pretending they do not exist would be a disservice to you — is in three areas.

First, schema flexibility at scale. MongoDB was designed for documents from the ground up. Its query planner, storage engine, and sharding infrastructure are optimized for the document model. PostgreSQL added JSONB to a relational engine. The seams occasionally show: the casting overhead in range queries, the inability to use foreign keys on extracted values, the per-row extraction cost during aggregation. For a workload that is 90% document-shaped with no relational requirements, MongoDB's native model will feel more natural and, in some operations, perform better.

Second, horizontal sharding. MongoDB's built-in sharding distributes documents across nodes. PostgreSQL does not natively shard. Extensions like Citus exist, but the native experience is a single node. If your JSONB data exceeds what a single PostgreSQL instance can store or serve, MongoDB's sharding is a meaningful advantage.

Third, change streams. MongoDB's change streams provide a built-in mechanism for reacting to document changes in real time. PostgreSQL's logical replication and LISTEN/NOTIFY serve a similar purpose, but with more assembly required.

My position, stated plainly: if your data has any relational dimension — if you JOIN it, aggregate across it, enforce referential integrity, or combine document queries with relational queries — PostgreSQL JSONB eliminates the need for a separate document store. One database, one transaction boundary, one operational burden. If your data is purely document-shaped with no relational requirements and you need horizontal sharding, MongoDB is a reasonable choice. Most applications I encounter have at least some relational dimension. Most of them do not need a second database.

When to use JSONB — and when to reach for columns instead

-- JSONB is flexible. Relational is fast. Choose wisely.

-- Good JSONB use cases:
-- * Event payloads with varying schemas
-- * User preferences / settings
-- * API response caching
-- * Audit logs with varying detail levels
-- * Feature flags and configuration
-- * Third-party webhook payloads you cannot control

-- Bad JSONB use cases (use relational instead):
-- * Data you query by the same keys every time
-- * Data you JOIN on
-- * Data you aggregate (SUM, COUNT, GROUP BY)
-- * Data with referential integrity requirements
-- * Data with strict type constraints (dates, money)

The decision rule is straightforward: if you know the structure at schema-design time and you query it frequently, use relational columns. If the structure varies per row or you store the data for retrieval rather than querying, JSONB is appropriate.

In practice, I find the boundary is rarely binary. Most tables benefit from a hybrid approach.

-- The hybrid approach: relational columns for structure,
-- JSONB for the flexible remainder.

CREATE TABLE orders (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    customer_id bigint NOT NULL REFERENCES customers(id),
    status text NOT NULL,
    total numeric(12,2) NOT NULL,
    created_at timestamptz NOT NULL DEFAULT now(),

    -- Relational: everything you filter, join, or aggregate on.
    -- JSONB: everything else.
    metadata jsonb NOT NULL DEFAULT '{}'
);

-- metadata might contain:
-- {"source": "mobile_app", "campaign": "spring_2025",
--  "browser": "Safari 18", "notes": "Gift wrapping requested"}

-- You get: fast JOINs on customer_id, fast filters on status,
-- fast aggregation on total, AND flexible metadata storage.

The hybrid pattern is, in my experience, the correct default for most applications. Relational columns for the data that participates in your queries — the columns you filter, join, sort, and aggregate on. A JSONB column for the flexible remainder — the data that varies per row, arrives from external sources, or exists primarily to be stored and returned rather than queried.

This gives you type safety and foreign keys where they matter, flexibility where it is needed, and fast queries on both. The JSONB column absorbs schema changes without migrations — you can add a new field to the metadata without altering the table. The relational columns provide the performance guarantees that JSONB cannot match for structured operations.

I should offer one caution. The hybrid pattern can decay if not tended. I have seen tables where the JSONB "metadata" column gradually accumulated fields that were being queried in WHERE clauses, JOINed on, and aggregated — all signs that those fields have outgrown their JSONB home and deserve promotion to relational columns. A periodic review of your JSONB query patterns will reveal when a field has crossed that threshold.

Common JSONB performance mistakes

I encounter these with sufficient regularity that they warrant cataloguing.

  • No index at all. Every JSONB query becomes a sequential scan with per-row JSON extraction. On a million-row table, this is the difference between 0.12ms and 312ms. There is no scenario in which an unindexed JSONB column being queried frequently is acceptable.
  • GIN index when expression indexes would be better. If you only ever filter on payload->>'type' and payload->>'status', two small expression indexes will outperform a single GIN index — faster queries, less disk, less write overhead. GIN is for breadth. Expression indexes are for depth.
  • Using ->> extraction in WHERE instead of @> containment. WHERE payload->>'type' = 'purchase' requires an expression index. WHERE payload @> '{type": "purchase"}' uses a GIN index. If you have a GIN index but write your queries with ->>, the GIN index sits idle. Match your operator to your index.
  • Storing relational data as JSONB for "flexibility." If every row has the same keys and you query by those keys, you have recreated a relational table inside a JSONB column — with worse performance, no type safety, no foreign keys, and no schema validation. The flexibility that nobody uses is not flexibility. It is overhead.
  • Casting in every query without an index on the cast expression. WHERE (payload->>'amount')::numeric > 100 requires an expression index on ((payload->>'amount')::numeric), not on (payload->>'amount'). The index expression must match the query expression exactly, including the cast. PostgreSQL is precise about this.
  • Neglecting TOAST compression. JSONB values larger than about 2KB are compressed and stored in a TOAST table. This is usually transparent, but if you frequently access individual keys from large documents, each access must de-toast and decompress the entire value. For documents exceeding 8KB that are accessed key-by-key in hot queries, consider breaking the document into smaller pieces or promoting the hot keys to relational columns.

A note on JSONB and TOAST

PostgreSQL's TOAST mechanism (The Oversized-Attribute Storage Technique) deserves a moment of attention when discussing JSONB, because JSONB columns are among the most common occupants of TOAST storage.

When a JSONB value exceeds approximately 2KB, PostgreSQL compresses it and may store it in a separate TOAST table. This is invisible to your queries — the value arrives as though it were stored inline. But the cost is not invisible. Accessing a TOASTed value requires a separate heap lookup, decompression, and parsing. For a query that extracts a single key from a 50KB JSONB document across 100,000 rows, the TOAST overhead is significant.

The practical guidance: keep your frequently-queried JSONB values reasonably sized. If a document has grown to 20KB and you access it in a hot query path, the most impactful optimization may not be a better index — it may be moving the frequently-accessed keys to relational columns or splitting the document into a smaller, frequently-accessed portion and a larger, archival portion.

For documents that are stored and retrieved whole — API response caching, audit log entries — TOAST compression is a benefit. The documents are smaller on disk, and the decompression cost is paid once per retrieval. The problem arises specifically with key-level access on large, TOASTed documents in high-frequency queries.

The indexing decision, automated

The choice between GIN, jsonb_path_ops, expression indexes, partial indexes, and composite indexes depends on your actual query patterns — which keys are queried, which operators are used, how frequently, and on what subset of the data. This is precisely the kind of decision that benefits from observing real traffic rather than guessing at design time.

Gold Lapel observes which JSONB query patterns appear in your traffic and identifies the optimal index strategy for each. If every query on the events table filters on payload->>'type', an expression index is created. If queries use varied containment patterns, a GIN index is more appropriate. If a particular document type is queried disproportionately, a partial index targets just those rows. The evidence is in the traffic — the proxy simply acts on it.

It also catches the decay pattern I mentioned earlier: when a JSONB key is being queried frequently enough and consistently enough that it has outgrown its flexible home, that signal is visible in the query stream. The right index today may not be the right index in six months. A system that watches the traffic and adapts has a structural advantage over one that relies on a human noticing the change.

Parting observations

JSONB is one of the features that makes PostgreSQL genuinely remarkable. A single database that handles relational data, document data, full-text search, geospatial queries, and time-series workloads — with ACID transactions across all of them — is an extraordinary piece of engineering. The fact that it has been free and open-source for three decades is something I never quite get over.

But JSONB is not a replacement for schema design. It is a complement to it. The most effective use of JSONB I have observed in production is not "everything is a document" — it is "the structured data is relational, the flexible data is JSONB, and the boundary between them is reviewed periodically." That hybrid approach, with the right indexes on both sides, gives you the full power of PostgreSQL's relational engine and the full flexibility of its document capabilities. Without a second database. Without a second operational burden. Without a second failure mode at three in the morning.

The indexes are the part that most teams get wrong, or more commonly, the part they forget entirely. I trust that is no longer a concern for you.

Frequently asked questions

Terms referenced in this article

You may find the following worth your time: I have written a detailed comparison of JSON versus JSONB in PostgreSQL — storage differences, operator support, and the specific scenarios where each earns its place. The blog post covers the ground this guide could not without overstaying its welcome.