GIN index
Generalized Inverted Index. The specialist on staff for multi-element data — arrays, JSONB, full-text search. If you need to ask "which rows contain this?", allow me to make an introduction.
A GIN index is an inverted index: instead of mapping rows to their values, it maps each distinct value (an array element, a JSONB key, a lexeme in a text search vector) to the set of rows that contain it. This inversion makes GIN exceptionally fast for containment queries — "which rows contain this element?" — using operators like @> and @@. It is, if I may say so, a remarkably elegant answer to a question that B-tree was never built to ask.
What a GIN index is
Most index types map a row to its value. A B-tree on an email column stores one entry per row, pointing from the email address to the row's location on disk. This works well for scalar values, but falls apart when a single row contains many values — an array with ten tags, a JSONB document with twenty keys, a text search vector with hundreds of lexemes. One entry per row is no longer sufficient when a single row has a great deal to say.
GIN inverts the relationship. It extracts every individual element from the indexed value and creates a posting list for each element — an ordered list of row pointers (TIDs) where that element appears. The result is a structure that looks like a book index: "postgres" appears on rows 4, 17, 203, 891; "performance" appears on rows 17, 44, 891, 1042. A proper book index, maintained with care.
When a query asks "which rows contain both 'postgres' and 'performance'?", the GIN index intersects the two posting lists and returns rows 17 and 891 without scanning the table. The more elements you search for, the more the posting list intersection narrows the result — which is the opposite of how B-tree performance behaves with compound conditions on array data. The right tool for the right job. I am a firm believer.
When to use GIN
GIN is the right index type when your data has a one-to-many relationship between rows and the values you need to search. If a column stores multiple things and your queries ask about individual things within it, GIN is the member of staff you want on the matter.
- Arrays (
@>,&&) — find rows where an array column contains a given element or overlaps with a set of elements - JSONB containment (
@>,?) — find rows where a JSONB column contains a specific key-value pair or key - Full-text search (
@@) — find rows where atsvectorcolumn matches atsquery - Trigram matching (
pg_trgm) — find rows where a text column matches aLIKE,ILIKE, or similarity pattern, including patterns with leading wildcards
Array queries
-- Find articles tagged with both 'postgres' and 'performance'
SELECT * FROM articles WHERE tags @> ARRAY['postgres', 'performance'];
-- Find articles sharing any tag with a given set
SELECT * FROM articles WHERE tags && ARRAY['postgres', 'indexing']; JSONB queries
-- Find events where metadata contains a specific key-value pair
SELECT * FROM events WHERE data @> '{"type": "signup"}';
-- Check if a key exists
SELECT * FROM events WHERE data ? 'email'; Full-text search
-- Full-text search using a tsvector column
SELECT * FROM documents
WHERE search_vector @@ to_tsquery('english', 'postgres & replication'); Creating GIN indexes
The basic syntax specifies USING gin in the CREATE INDEX statement. Unlike B-tree, GIN is never the default — you must request it explicitly. One hires a specialist deliberately, not by accident.
-- GIN index on an array column
CREATE INDEX idx_tags ON articles USING gin (tags);
-- GIN index on a JSONB column (default operator class)
CREATE INDEX idx_metadata ON events USING gin (metadata);
-- GIN index on a tsvector column for full-text search
CREATE INDEX idx_search ON documents USING gin (search_vector); GIN supports several operator classes that control which operators the index can accelerate and how elements are extracted from the indexed value.
-- jsonb_ops (default): supports @>, ?, ?|, ?& operators
CREATE INDEX idx_data_ops ON events USING gin (data jsonb_ops);
-- jsonb_path_ops: supports only @>, but smaller and faster for containment
CREATE INDEX idx_data_path ON events USING gin (data jsonb_path_ops);
-- gin_trgm_ops: trigram matching for LIKE, ILIKE, ~, similarity()
-- Requires: CREATE EXTENSION pg_trgm;
CREATE INDEX idx_name_trgm ON customers USING gin (name gin_trgm_ops);
-- array_ops (default for arrays): supports @>, <@, &&, = operators
CREATE INDEX idx_tags ON articles USING gin (tags array_ops); Choosing the right operator class matters — and I do mean matters. jsonb_path_ops produces an index roughly 30-40% smaller than jsonb_ops because it hashes the full path to each value rather than indexing keys and values separately. The trade-off is that it only supports the @> operator — you lose key-existence checks (?, ?|, ?&). Know your query patterns before choosing. A 35% smaller index is a fine reward for those who do their homework.
GIN vs GiST
GIN and GiST both support full-text search and some overlapping data types, but they make different trade-offs. Knowing which to reach for is one of those quiet competencies that separates a well-maintained database from one that merely runs.
| GIN | GiST | |
|---|---|---|
| Read speed | Faster — exact posting list lookups | Slower — lossy, may require recheck |
| Write speed | Slower — must update posting lists for each element | Faster — single entry per row |
| Index size | Larger — one entry per element | Smaller — one entry per row (lossy) |
| Nearest-neighbor | Not supported | Supported (ORDER BY <->) |
| Range/overlap | Limited | Full support (range types, geometric) |
| Full-text search | Exact matches, faster reads | Supports ranking-aware scans |
The rule of thumb: use GIN when reads dominate and you need precise, fast lookups. Use GiST when you need distance-based ordering, range overlap queries, or when write throughput is the primary concern. For full-text search on read-heavy workloads, GIN is almost always the better choice. They are both excellent at their respective duties — the question is which duties your workload actually requires.
Pending list and fastupdate
Inserting into a GIN index is expensive. A single row with a JSONB document containing 50 keys requires 50 posting list updates. To amortize this cost, GIN uses a pending list — a staging area where new entries are written sequentially without updating the main index structure. It is, I must say, a rather clever bit of engineering: accept the mess temporarily so the work gets done faster.
This behavior is controlled by the fastupdate setting, which is enabled by default. When fastupdate is on, inserts go to the pending list first. The pending entries are merged into the main index by VACUUM, or when the pending list exceeds gin_pending_list_limit (default 4 MB).
-- Disable fastupdate for workloads where read latency matters more
ALTER INDEX idx_metadata SET (fastupdate = off);
-- Check the current setting
SELECT reloptions FROM pg_class WHERE relname = 'idx_metadata';
-- Force pending list cleanup without disabling fastupdate
VACUUM articles; The trade-off is straightforward — which is refreshing, since so few trade-offs are. With fastupdate on, writes are faster but reads may be slower because each query must scan the pending list in addition to the main index. With fastupdate off, writes are slower but reads are consistently fast because all data lives in the main index structure.
For read-heavy workloads where query latency consistency matters, disabling fastupdate is worth considering. For write-heavy workloads with periodic batch reads, the default is usually fine — just ensure autovacuum is running frequently enough to keep the pending list short. An untended pending list is how small latency problems become large latency problems.
How Gold Lapel relates
Gold Lapel analyzes query patterns at the proxy level and recommends indexes based on the operators and data types it observes. When it detects containment queries on JSONB columns (@>), array overlap checks (&&), or full-text search operations (@@), it recommends GIN indexes — including the appropriate operator class for the query pattern. The distinction between jsonb_ops and jsonb_path_ops is not one I leave to chance.
Gold Lapel also considers the write profile of the table before recommending GIN. On a table with a high insert rate and infrequent reads, a GIN index may cost more in write overhead than it saves in query time. A recommendation that ignores the cost side of the ledger is not a recommendation — it is a guess. I prefer to account for both.