GIN index

Q: How much space does a GIN index use?

It depends heavily on the number of distinct elements being indexed — and I do not say "it depends" lightly. A GIN index on a JSONB column with many unique keys can be larger than the table itself. A GIN index on a small set of tags will be compact. Use pg_relation_size('index_name') to check. As a rough guide, GIN indexes are typically larger than B-tree indexes on the same data because they store one entry per element rather than one entry per row.

Q: Can I use a GIN index for equality lookups on scalar columns?

No. GIN indexes are designed for composite values — arrays, JSONB, tsvectors, and similar types where a single row contains multiple elements. For equality lookups on scalar columns (integers, text, timestamps), a B-tree index is the correct choice. Each has its domain; neither should be asked to do the other's work.

Q: Does CREATE INDEX CONCURRENTLY work with GIN?

Yes. CREATE INDEX CONCURRENTLY works with all index types including GIN. It avoids locking out writes during the build, which is important because GIN indexes on large tables can take a long time to create.

Q: Why is my GIN index not being used?

The most common reasons: the query uses an operator the index does not support (for example, jsonb_path_ops only supports @>, not ? or ?|), the table is small enough that the planner prefers a sequential scan, or the query is not selective enough. Run EXPLAIN ANALYZE to see what the planner chose. The answer is always in the plan — one simply has to look.

Q: Should I use jsonb_ops or jsonb_path_ops?

Use jsonb_path_ops when your queries only use the @> containment operator. It produces a smaller index and is faster for containment checks. Use jsonb_ops (the default) when you also need key-existence operators like ?, ?|, or ?&. If in doubt, start with the default — it is the default for good reason.

Q: How does VACUUM interact with GIN indexes?

VACUUM serves two roles for GIN indexes. First, it cleans up dead tuple pointers like it does for any index. Second, when fastupdate is enabled, it flushes the pending list — moving entries from the unsorted staging area into the main index structure. If you notice GIN query performance degrading over time on a write-heavy table, check that autovacuum is keeping up with the pending list.

Generalized Inverted Index. The specialist on staff for multi-element data — arrays, JSONB, full-text search. If you need to ask "which rows contain this?", allow me to make an introduction.

The Waiter of Gold Lapel · Updated Mar 30, 2026 Published Mar 21, 2026 · 8 min read

A GIN index is an inverted index: instead of mapping rows to their values, it maps each distinct value (an array element, a JSONB key, a lexeme in a text search vector) to the set of rows that contain it. This inversion makes GIN exceptionally fast for containment queries — "which rows contain this element?" — using operators like @> and @@. It is, if I may say so, a remarkably elegant answer to a question that B-tree was never built to ask.

What a GIN index is

Most index types map a row to its value. A B-tree on an email column stores one entry per row, pointing from the email address to the row's location on disk. This works well for scalar values, but falls apart when a single row contains many values — an array with ten tags, a JSONB document with twenty keys, a text search vector with hundreds of lexemes. One entry per row is no longer sufficient when a single row has a great deal to say.

GIN inverts the relationship. As the GIN index documentation explains, it extracts every individual element from the indexed value and creates a posting list for each element — an ordered list of row pointers (TIDs) where that element appears. The result is a structure that looks like a book index: "postgres" appears on rows 4, 17, 203, 891; "performance" appears on rows 17, 44, 891, 1042. A proper book index, maintained with care.

When a query asks "which rows contain both 'postgres' and 'performance'?", the GIN index intersects the two posting lists and returns rows 17 and 891 without scanning the table. The more elements you search for, the more the posting list intersection narrows the result — which is the opposite of how B-tree performance behaves with compound conditions on array data. The right tool for the right job. I am a firm believer.

When to use GIN

GIN is the right index type when your data has a one-to-many relationship between rows and the values you need to search. If a column stores multiple things and your queries ask about individual things within it, GIN is the member of staff you want on the matter.

Arrays (@>, &&) — find rows where an array column contains a given element or overlaps with a set of elements
JSONB containment (@>, ?) — find rows where a JSONB column contains a specific key-value pair or key
Full-text search (@@) — find rows where a tsvector column matches a tsquery
Trigram matching (pg_trgm) — find rows where a text column matches a LIKE, ILIKE, or similarity pattern, including patterns with leading wildcards

Array queries

SQL

-- Find articles tagged with both 'postgres' and 'performance'
SELECT * FROM articles WHERE tags @> ARRAY['postgres', 'performance'];

-- Find articles sharing any tag with a given set
SELECT * FROM articles WHERE tags && ARRAY['postgres', 'indexing'];

JSONB queries

SQL

-- Find events where metadata contains a specific key-value pair
SELECT * FROM events WHERE data @> '{"type": "signup"}';

-- Check if a key exists
SELECT * FROM events WHERE data ? 'email';

Full-text search

SQL

-- Full-text search using a tsvector column
SELECT * FROM documents
WHERE search_vector @@ to_tsquery('english', 'postgres & replication');

Creating GIN indexes

The basic syntax specifies USING gin in the CREATE INDEX statement. Unlike B-tree, GIN is never the default — you must request it explicitly. One hires a specialist deliberately, not by accident.

SQL

-- GIN index on an array column
CREATE INDEX idx_tags ON articles USING gin (tags);

-- GIN index on a JSONB column (default operator class)
CREATE INDEX idx_metadata ON events USING gin (metadata);

-- GIN index on a tsvector column for full-text search
CREATE INDEX idx_search ON documents USING gin (search_vector);

GIN supports several built-in operator classes that control which operators the index can accelerate and how elements are extracted from the indexed value.

SQL

-- jsonb_ops (default): supports @>, ?, ?|, ?& operators
CREATE INDEX idx_data_ops ON events USING gin (data jsonb_ops);

-- jsonb_path_ops: supports only @>, but smaller and faster for containment
CREATE INDEX idx_data_path ON events USING gin (data jsonb_path_ops);

-- gin_trgm_ops: trigram matching for LIKE, ILIKE, ~, similarity()
-- Requires: CREATE EXTENSION pg_trgm;
CREATE INDEX idx_name_trgm ON customers USING gin (name gin_trgm_ops);

-- array_ops (default for arrays): supports @>, <@, &&, = operators
CREATE INDEX idx_tags ON articles USING gin (tags array_ops);

Choosing the right operator class matters — and I do mean matters. jsonb_path_ops produces an index roughly 30-40% smaller than jsonb_ops because it hashes the full path to each value rather than indexing keys and values separately. The trade-off is that it only supports the @> operator — you lose key-existence checks (?, ?|, ?&). Know your query patterns before choosing. A 35% smaller index is well worth the consideration.

GIN vs GiST

GIN and GiST both support full-text search and some overlapping data types, but they make different trade-offs. Knowing which to reach for is one of those quiet competencies that separates a well-maintained database from one that merely runs.

	GIN	GiST
Read speed	Faster — exact posting list lookups	Slower — lossy, may require recheck
Write speed	Slower — must update posting lists for each element	Faster — single entry per row
Index size	Larger — one entry per element	Smaller — one entry per row (lossy)
Nearest-neighbor	Not supported	Supported (ORDER BY <->)
Range/overlap	Limited	Full support (range types, geometric)
Full-text search	Exact matches, faster reads	Smaller index, lossy matching (recheck required)

The rule of thumb: use GIN when reads dominate and you need precise, fast lookups. Use GiST when you need distance-based ordering, range overlap queries, or when write throughput is the primary concern. For full-text search on read-heavy workloads, GIN is almost always the better choice. They are both excellent at their respective duties — the question is which duties your workload actually requires.

Pending list and fastupdate

Inserting into a GIN index is expensive. A single row with a JSONB document containing 50 keys requires 50 posting list updates. To amortize this cost, GIN uses a pending list — a staging area where new entries are written sequentially without updating the main index structure. It is, I must say, a rather clever bit of engineering: accept the mess temporarily so the work gets done faster.

This behavior is controlled by the fastupdate setting, which is enabled by default. When fastupdate is on, inserts go to the pending list first. The pending entries are merged into the main index by VACUUM, or when the pending list exceeds gin_pending_list_limit (default 4 MB).

SQL

-- Disable fastupdate for workloads where read latency matters more
ALTER INDEX idx_metadata SET (fastupdate = off);

-- Check the current setting
SELECT reloptions FROM pg_class WHERE relname = 'idx_metadata';

-- Force pending list cleanup without disabling fastupdate
VACUUM articles;

The trade-off is straightforward — which is refreshing, since so few trade-offs are. With fastupdate on, writes are faster but reads may be slower because each query must scan the pending list in addition to the main index. With fastupdate off, writes are slower but reads are consistently fast because all data lives in the main index structure.

For read-heavy workloads where query latency consistency matters, disabling fastupdate is worth considering. For write-heavy workloads with periodic batch reads, the default is usually fine — just ensure autovacuum is running frequently enough to keep the pending list short. An untended pending list is how small latency problems become large latency problems.

How Gold Lapel relates

Gold Lapel analyzes query patterns at the proxy level and recommends indexes based on the operators and data types it observes. When it detects containment queries on JSONB columns (@>), array overlap checks (&&), or full-text search operations (@@), it recommends GIN indexes — including the appropriate operator class for the query pattern. The distinction between jsonb_ops and jsonb_path_ops is not one I leave to chance.

Gold Lapel also considers the write profile of the table before recommending GIN. On a table with a high insert rate and infrequent reads, a GIN index may cost more in write overhead than it saves in query time. A recommendation that ignores the cost side of the ledger is not a recommendation — it is a guess. I prefer to account for both.

GIN index

What a GIN index is

When to use GIN

Array queries

JSONB queries

Full-text search

Creating GIN indexes

GIN vs GiST

Pending list and fastupdate

How Gold Lapel relates

Frequently asked questions

How much space does a GIN index use?

Can I use a GIN index for equality lookups on scalar columns?

Does CREATE INDEX CONCURRENTLY work with GIN?

Why is my GIN index not being used?

Should I use jsonb_ops or jsonb_path_ops?

How does VACUUM interact with GIN indexes?

Related content