← Indexes & Missing Index Detection

A Proper Introduction to PostgreSQL Index Types

B-tree is not the only option. It is merely the one PostgreSQL selects when you haven't specified a preference.

The Waiter of Gold Lapel · Updated Mar 20, 2026 Published Mar 5, 2026 · 22 min read
Seven keys, each cut for a different lock. The wise housekeeper knows which opens which door.

Ah. You have been using B-tree for everything.

Do not be embarrassed. Nearly everyone does. When you write CREATE INDEX without specifying a type, PostgreSQL creates a B-tree, and B-tree is an excellent general-purpose index. It handles equality, ranges, sorting, and null checks. For the majority of columns in the majority of applications, it is the correct choice.

But PostgreSQL offers six index types, each designed for a specific category of query. Using B-tree for everything is rather like setting every table with a soup spoon — functional, but you will find yourself struggling when the fish course arrives.

I should like to present the full service. We shall examine each type in the detail it deserves — its internal structure, the operators it supports, the situations where it excels, and the situations where it does not. I shall also be forthcoming about the trade-offs, because an index that accelerates your reads while quietly punishing your writes is not a gift. It is an invoice you have not yet opened.

B-tree: the dependable standard

B-tree indexes store data in a balanced, sorted tree structure. The "B" stands for balanced — every leaf node is the same distance from the root, which guarantees consistent lookup performance regardless of where your value falls in the range. They support equality comparisons, range queries, sorting, and IS NULL checks. If you are unsure which index type to use, B-tree is almost certainly correct.

-- B-tree: the default. Handles =, <, >, <=, >=, BETWEEN, IN, IS NULL.
CREATE INDEX idx_orders_created_at ON orders (created_at);

-- Powers queries like:
SELECT * FROM orders WHERE created_at > '2025-01-01';
SELECT * FROM orders WHERE created_at BETWEEN '2025-01-01' AND '2025-03-01';
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20;

B-tree excels because sorted data supports so many operations. A range query walks a contiguous portion of the tree. An ORDER BY reads the tree in sequence, eliminating the need for a separate sort. A LIMIT with ORDER BY can stop after finding the first N entries:

Index Scan Backward — sorted retrieval, no sort step
Index Scan Backward using idx_orders_created_at on orders
  (cost=0.43..1.85 rows=20 width=64)
  (actual time=0.02..0.04 rows=20 loops=1)
  Index Cond: (created_at IS NOT NULL)
  Buffers: shared hit=3

Note Index Scan Backward — PostgreSQL walked the index in reverse order to satisfy ORDER BY ... DESC. No sort node. Three buffer hits. Twenty rows in 0.04ms.

How deep does the tree go?

Remarkably shallow. A B-tree on a table with 10 million rows typically has a depth of 3 or 4. That means locating any single value requires reading 3 or 4 pages — and the upper levels of the tree are almost always cached in shared buffers, so in practice you are often reading a single page from disk. This is why B-tree feels fast even on large tables. The tree's branching factor is high enough that depth grows logarithmically.

Checking your index size and usage
-- B-tree depth is logarithmic. A table with 10 million rows
-- typically produces a B-tree with depth 3 or 4.
-- That means 3-4 page reads to locate any single row.

-- Check the depth of your index:
SELECT
  indexrelname AS index_name,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan AS times_used
FROM pg_stat_user_indexes
WHERE relname = 'orders'
ORDER BY pg_relation_size(indexrelid) DESC;

Multicolumn B-trees

B-tree handles composite indexes natively, and column order matters enormously. The index is sorted by the first column, then by the second within each group of the first, and so on. This means a composite B-tree can serve any query that uses a prefix of its column list — but not one that skips the leading column.

-- B-tree handles multicolumn indexes natively.
-- Column order matters: equality columns first, range column last.
CREATE INDEX idx_orders_status_date ON orders (status, created_at);

-- This single index serves:
SELECT * FROM orders WHERE status = 'shipped';
SELECT * FROM orders WHERE status = 'shipped' AND created_at > '2025-01-01';
SELECT * FROM orders ORDER BY status, created_at;

-- But NOT efficiently:
SELECT * FROM orders WHERE created_at > '2025-01-01';
-- Status is the leading column. Without it, the index is a poor fit.

I have written separately about composite index column ordering, and I commend that piece to your attention. The short version: equality columns first, range column last.

The honest counterpoint

B-tree's generality is also its limitation. For the specific domains where other index types excel — containment queries on arrays, geometric proximity, full-text search — B-tree cannot compete. It indexes scalar values, not composite structures. It finds exact points and ranges on a number line; it cannot answer "which documents contain this word" or "which point is nearest to this location." If B-tree could do everything, PostgreSQL would not ship five other index types.

B-tree also has a size consideration on very large, append-only tables. On a 10GB log table, the B-tree on the timestamp column might be 200MB. That is not alarming on its own, but when BRIN can achieve the same filtering in 48KB, the question becomes whether you need B-tree's precision or whether BRIN's coarser filtering is sufficient.

When to choose B-tree: equality, ranges, sorting, BETWEEN, IN, IS NULL. Which is to say, most of the time.

Hash: the specialist for equality

A hash index supports exactly one operation: equality. It cannot do ranges, sorting, or null checks. In exchange, it is smaller than a B-tree for equality-only columns and marginally faster for lookups.

-- Hash index: equality lookups only. Smaller than B-tree for this case.
CREATE INDEX idx_sessions_token ON sessions USING hash (token);

-- Powers:
SELECT * FROM sessions WHERE token = 'abc123def456';

-- Does NOT power:
SELECT * FROM sessions WHERE token > 'abc'; -- range query, hash cannot help

Hash indexes were considered unsafe before PostgreSQL 10 — they were not WAL-logged, making them vulnerable to crash recovery failures and invisible to replication. Since version 10, they are fully production-safe. This history has left a lingering reputation problem that the hash index does not entirely deserve.

Hash vs B-tree: size and speed on equality lookups
-- Size comparison on a 5-million-row sessions table with UUID tokens:
-- B-tree index on token:  ~152MB
-- Hash index on token:    ~112MB

-- Lookup benchmark (single equality, averaged over 10,000 queries):
-- B-tree:  0.031ms average
-- Hash:    0.028ms average

-- The difference is real but small. Hash wins on size, marginally on speed.

The honest counterpoint

The savings are real but modest. On a 5-million-row table with UUID tokens, you might save 40MB of disk space and 0.003ms per lookup. For most applications, this is not the difference between a good night and a bad one. The moment you need a single range query on that column — perhaps a debugging query, perhaps an admin screen that filters by token prefix — the hash index cannot help, and you will need a B-tree anyway.

I reach for hash indexes in specific situations: session tokens, API keys, lookup tables keyed by a long string that will never be range-queried. If there is even a possibility of future range queries on the column, B-tree is the safer choice.

When to choose hash: columns where you only ever query with = — session tokens, API keys, UUIDs used purely for lookup. If you ever need a range query on the column, use B-tree instead.

GIN: the index that looks inside values

A GIN (Generalized Inverted Index) is designed for values that contain multiple elements — arrays, JSONB documents, and tsvector for full-text search. Where a B-tree indexes a single scalar value per row, GIN indexes every element within a composite value and maps each element back to the rows that contain it.

Think of it as a book's index at the back: you look up a term and it tells you which pages mention it. GIN does exactly this. Each distinct key within your arrays, JSONB documents, or text vectors gets an entry pointing to every row that contains it.

-- GIN: inverted index for multi-valued columns (arrays, JSONB, tsvector).
CREATE INDEX idx_products_tags ON products USING gin (tags);

-- Powers array containment:
SELECT * FROM products WHERE tags @> ARRAY['organic', 'gluten-free'];

-- GIN for JSONB:
CREATE INDEX idx_events_payload ON events USING gin (payload jsonb_path_ops);

-- Powers:
SELECT * FROM events WHERE payload @> '{"type": "purchase"}';

The write-performance trade-off

GIN indexes are larger and more expensive to maintain than B-trees. Every insert or update to an indexed column requires updating the inverted index for each element in the value. A JSONB document with 20 keys means 20 index entries to maintain. For write-heavy workloads, GIN offers a fastupdate mechanism that is worth understanding:

fastupdate: batching writes to the inverted index
-- GIN's pending list: the write-performance safety valve.
-- When fastupdate is enabled (the default), new entries go to a pending list
-- instead of the main index. This batches expensive inverted-index updates.

CREATE INDEX idx_products_tags ON products USING gin (tags)
  WITH (fastupdate = on, gin_pending_list_limit = 4096);

-- The pending list is merged during VACUUM or when it exceeds the limit.
-- During this merge, reads are slightly slower because PostgreSQL must
-- check the pending list in addition to the main index.

-- To see how much is pending:
SELECT * FROM pg_stat_user_indexes WHERE indexrelname = 'idx_products_tags';

With fastupdate enabled — and it is by default — new entries accumulate in a pending list rather than being inserted directly into the main index. This makes writes significantly faster, at the cost of slightly slower reads while the pending list is being checked. The list is merged during VACUUM or when it exceeds the configured limit. For tables with high write throughput and tolerable read latency, this is the correct default. For tables where read latency is critical and writes are infrequent, you may disable it.

GIN with pg_trgm: fast LIKE queries

One of GIN's most useful applications is often overlooked. The pg_trgm extension creates trigram-based indexes that support LIKE, ILIKE, and regular expression queries — including patterns with leading wildcards, which B-tree cannot help with at all.

-- GIN with pg_trgm: fast LIKE and ILIKE queries.
CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE INDEX idx_users_name_trgm ON users USING gin (name gin_trgm_ops);

-- Powers:
SELECT * FROM users WHERE name ILIKE '%johnson%';
-- Without this index, PostgreSQL must sequential-scan the entire table.
-- With it, the trigram index narrows to matching rows directly.

This is genuinely useful. The query WHERE name ILIKE '%johnson%' is a sequential scan without a trigram index. With one, it becomes an index scan. For search-as-you-type features, admin screens with flexible filters, or any pattern matching on text columns, GIN with pg_trgm is the proper tool.

Full-text search with GIN

-- Full-text search with GIN:
ALTER TABLE articles ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (to_tsvector('english', title || ' ' || body)) STORED;

CREATE INDEX idx_articles_search ON articles USING gin (search_vector);

-- Powers:
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgresql & performance') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

This is PostgreSQL's built-in alternative to Elasticsearch for many use cases. The generated column maintains the search vector automatically, and the GIN index makes @@ queries fast. For applications with moderate search requirements — thousands to tens of thousands of documents, moderate query complexity — this is often sufficient without an external search engine.

The honest counterpoint

I should be straightforward about where GIN falls short. On tables with very high write throughput — hundreds of inserts per second to GIN-indexed columns — the write amplification is substantial. Each row insert may produce dozens of index entry updates. If your workload is 90% writes and 10% reads, GIN is extracting a high toll for modest benefit.

For full-text search at genuine scale — millions of documents, complex relevance scoring, faceted search, real-time indexing — PostgreSQL's built-in search is outmatched by dedicated search engines like Elasticsearch or Meilisearch. It is not a matter of configuration; it is a matter of architecture. PostgreSQL was not designed to be a search engine, and pretending otherwise does no one a service. For moderate scale, PostgreSQL's search is excellent. Know where that line is for your application.

When to choose GIN: array containment (@>), JSONB querying (@>, ?, ?&, @?), full-text search (@@), trigram pattern matching with pg_trgm.

GiST: the index for geometry, ranges, and proximity

A GiST (Generalized Search Tree) index supports data types that don't have a natural linear ordering — points in space, date ranges, network addresses. It answers questions like "which rows overlap with this region?" or "which point is nearest to this location?"

GiST works by organizing data into a hierarchy of bounding regions. Each internal node contains a bounding box that encloses all of its children. To find all points within a search area, the tree prunes any branch whose bounding box does not intersect the search area. This is efficient for spatial and range data, but it means the index can have overlap between sibling nodes — unlike a B-tree, where each value exists in exactly one place.

-- GiST: geometric and range types. PostGIS uses this extensively.
CREATE INDEX idx_stores_location ON stores USING gist (location);

-- Powers spatial queries:
SELECT * FROM stores
WHERE ST_DWithin(location, ST_MakePoint(-73.99, 40.73)::geography, 1000);

-- GiST for range types:
CREATE INDEX idx_reservations_period ON reservations USING gist (period);

-- Powers overlap queries:
SELECT * FROM reservations
WHERE period && tstzrange('2025-03-01', '2025-03-07');

GiST is the backbone of PostGIS, the spatial extension that makes PostgreSQL a capable geographic database. It also powers range type operations — tstzrange, int4range, daterange — which are remarkably useful for scheduling, reservations, and time-series data.

Exclusion constraints: business rules in the database

This is, in my estimation, one of GiST's most underappreciated capabilities. An exclusion constraint uses a GiST index to enforce that no two rows conflict on a specified combination of operators. The canonical example is preventing overlapping reservations:

-- GiST powers exclusion constraints: business rules enforced by the database.
-- "No two reservations for the same room may overlap."

ALTER TABLE reservations
  ADD CONSTRAINT no_overlapping_reservations
  EXCLUDE USING gist (room_id WITH =, period WITH &&);

-- Now this INSERT will fail if it overlaps an existing reservation:
INSERT INTO reservations (room_id, period)
VALUES (42, tstzrange('2025-03-05 14:00', '2025-03-05 16:00'));
-- ERROR: conflicting key value violates exclusion constraint

-- This is not application logic. This is a database guarantee.
-- No race condition. No check-then-insert gap. No distributed lock.

I cannot overstate the value of this. Without an exclusion constraint, preventing overlapping reservations requires application-level locking — check for conflicts, then insert, and hope no one else inserted between your check and your insert. That gap is a race condition. The exclusion constraint eliminates it entirely. The database enforces the rule atomically, regardless of how many concurrent sessions are inserting.

Nearest-neighbor search

GiST supports nearest-neighbor queries via the <-> distance operator. This is not merely a convenience — it is a fundamentally different access pattern that B-tree cannot provide.

-- GiST nearest-neighbor search with the <-> distance operator:
SELECT name, location <-> ST_MakePoint(-73.99, 40.73)::geography AS distance
FROM stores
ORDER BY location <-> ST_MakePoint(-73.99, 40.73)::geography
LIMIT 5;

-- This uses an index-ordered scan. PostgreSQL does NOT compute the distance
-- for every row and sort. It walks the GiST tree in distance order and
-- stops after 5 results. On a table with 1 million stores, this is the
-- difference between scanning 1,000,000 rows and reading approximately 5.

The distinction matters. Without a GiST index, a nearest-neighbor query computes the distance for every row, sorts the entire result set, and returns the top N. With GiST, the index traversal itself is ordered by distance. PostgreSQL reads the tree in proximity order and stops the moment it has enough results. On large tables, this is the difference between scanning millions of rows and reading a handful of pages.

The honest counterpoint

GiST indexes are more expensive to build and maintain than B-trees. The bounding-box hierarchy requires rebalancing as data changes, and the potential overlap between sibling nodes means that some queries may need to explore multiple branches — particularly when the data is clustered in a small region of the search space. GiST also does not support Index Only Scan for most operator classes, meaning the query must always fetch the actual tuple from the heap to verify the result.

For simple equality and range queries on scalar values, GiST is the wrong choice. It is slower and larger than B-tree for that workload. GiST earns its keep specifically when the data type or query pattern requires spatial reasoning.

When to choose GiST: PostGIS spatial queries, range type overlap/containment, nearest-neighbor searches, exclusion constraints. For a practical guide to spatial query patterns and GiST tuning, I have prepared a guide on this topic in PostGIS for Application Developers.

SP-GiST: the partitioning specialist

A SP-GiST (Space-Partitioned Generalized Search Tree) index is the least discussed of PostgreSQL's index types, which is unfortunate because it solves certain problems more elegantly than any alternative.

Where GiST uses overlapping bounding regions, SP-GiST partitions the search space into non-overlapping regions using structures like quadtrees, k-d trees, and radix trees. This means each value belongs to exactly one partition, which eliminates the need to search multiple branches — a property that can make SP-GiST faster than GiST for certain data distributions.

-- SP-GiST: space-partitioned generalized search tree.
-- For data with natural partitioning structure: IP addresses, text, points.
CREATE INDEX idx_routes_ip ON routes USING spgist (client_ip inet_ops);

-- Powers:
SELECT * FROM routes WHERE client_ip << inet '192.168.1.0/24';
-- Finds all IPs within a subnet.

-- SP-GiST for text with prefix matching:
CREATE INDEX idx_urls_path ON urls USING spgist (path text_ops);

-- Powers:
SELECT * FROM urls WHERE path LIKE '/api/v2/%';
-- SP-GiST partitions text by prefix, making prefix searches very efficient.

SP-GiST is particularly effective for three categories of data. First, IP addresses and network ranges — the hierarchical structure of IP addressing maps naturally to a radix tree. Second, text with prefix relationships — paths, URLs, hierarchical identifiers. Third, point data that is relatively uniformly distributed across the coordinate space.

SP-GiST vs GiST: different partitioning strategies
-- SP-GiST vs GiST for point data:
CREATE INDEX idx_points_gist ON locations USING gist (coords);
CREATE INDEX idx_points_spgist ON locations USING spgist (coords);

-- SP-GiST uses quadtree/kd-tree partitioning instead of bounding boxes.
-- For uniformly distributed point data, SP-GiST often produces a smaller
-- index and faster lookups than GiST. For clustered or unevenly
-- distributed data, GiST's bounding-box approach may win.

The honest counterpoint

SP-GiST's non-overlapping partitions are a strength when the data distribution cooperates, but a limitation when it does not. Highly clustered data — millions of points concentrated in a small area, for instance — can produce deeply unbalanced partitions. GiST's bounding-box approach adapts better to skewed distributions. SP-GiST also supports fewer operator classes than GiST and lacks exclusion constraint support. It is a specialist's tool, and specialists, while exceptional within their domain, should not be asked to do everything.

When to choose SP-GiST: IP subnet queries, text prefix matching, uniformly distributed point data. If your data is heavily clustered or you need exclusion constraints, prefer GiST.

BRIN: the remarkably small index for ordered data

A BRIN (Block Range Index) does not index individual rows. It indexes ranges of physical blocks, storing the minimum and maximum values for each range. This makes it extraordinarily small — but only useful when the physical order of rows on disk correlates with the indexed column's values.

-- BRIN: tiny index for naturally ordered data. Fraction of B-tree size.
CREATE INDEX idx_logs_created_at ON logs USING brin (created_at)
  WITH (pages_per_range = 32);

-- Powers range queries on append-only tables:
SELECT * FROM logs WHERE created_at > '2025-03-01';

-- A 10GB logs table might have:
-- B-tree index: ~200MB
-- BRIN index:   ~48KB

The size difference is staggering. A B-tree on a 10GB append-only table might be 200MB. The equivalent BRIN might be 48KB — four thousand times smaller. The trade-off is precision: BRIN can eliminate large swathes of irrelevant blocks, but within a qualifying range, PostgreSQL must read every row and filter.

Is your data ordered enough?

BRIN's effectiveness depends entirely on the correlation between the column's logical values and the physical row order on disk. PostgreSQL tracks this correlation in pg_stats, and you should check it before creating a BRIN index:

Checking physical correlation for BRIN suitability
-- Check whether BRIN is a good fit for your column.
-- correlation = 1.0 means perfectly ordered. -1.0 means reverse order.
-- Values near 0 mean no physical ordering — BRIN will not help.

SELECT
  attname,
  correlation
FROM pg_stats
WHERE tablename = 'logs'
  AND attname = 'created_at';

--  attname     | correlation
-- -------------+-------------
--  created_at  |    0.9987

-- 0.9987 — nearly perfect correlation. BRIN will work brilliantly here.

A correlation near 1.0 or -1.0 means the data is physically well-ordered and BRIN will be effective. A correlation near 0.0 means the physical order is essentially random — every block range will contain the full spread of values, and BRIN will be unable to eliminate any blocks. In that case, BRIN is not merely unhelpful; it is actively misleading, because it gives you a tiny index that provides no actual filtering.

Tuning pages_per_range

The pages_per_range parameter controls the granularity of BRIN's block-range summaries. A smaller value means more precise filtering at the cost of a slightly larger index. A larger value means coarser filtering but a smaller index.

Granularity vs size: tuning pages_per_range
-- pages_per_range controls granularity vs size.
-- Smaller values = more precise filtering, larger index.
-- Larger values = less precise filtering, smaller index.

-- Default: 128 pages per range
CREATE INDEX idx_logs_brin_default ON logs USING brin (created_at);
-- ~12KB index, but eliminates blocks in chunks of 128 pages

-- Tighter: 32 pages per range
CREATE INDEX idx_logs_brin_tight ON logs USING brin (created_at)
  WITH (pages_per_range = 32);
-- ~48KB index, but eliminates blocks in chunks of 32 pages
-- More precise, still absurdly small compared to a 200MB B-tree

Even at its most precise, BRIN remains absurdly small compared to a B-tree. The question is not whether you can afford the index — you can — but whether the precision is sufficient for your query patterns. If your queries typically select a narrow time range from a very large table, a tighter pages_per_range eliminates more blocks and reads fewer unnecessary pages.

The honest counterpoint

BRIN is ideal for timestamp columns on tables where rows are inserted in chronological order and rarely updated — logs, events, audit trails, time-series data. But it falls apart in two scenarios. First, if rows are inserted out of order or frequently updated with new timestamps, the correlation between physical position and column value degrades, and BRIN's summaries become useless. Second, for point queries — looking up a single specific timestamp rather than a range — BRIN may identify the correct block range but still require scanning many pages within it. B-tree would locate the exact row in 3-4 page reads.

BRIN is also not updated eagerly. When new rows are inserted after the index was last summarized, those block ranges have no summary entries and are always included in scan results. Running brin_summarize_new_values() or relying on autovacuum to maintain BRIN summaries is necessary for continued effectiveness.

When to choose BRIN: large, append-only tables with naturally ordered data. Check the correlation in pg_stats first. If the column's values are not correlated with row insertion order, BRIN will not help.

"The right index is not the one with the most columns. It is the one whose column order mirrors the way your queries actually filter and sort."

— from You Don't Need Redis, Chapter 18: The PostgreSQL Performance Decision Framework

Index features that apply to all types

Before we arrive at the comparison table, two index features deserve mention because they multiply the effectiveness of any index type.

Partial indexes

A partial index includes only rows that satisfy a WHERE clause. This reduces both the index size and the maintenance cost, often dramatically.

-- Partial indexes: index only the rows that matter.
-- Any index type can be partial.

CREATE INDEX idx_orders_pending ON orders (created_at)
  WHERE status = 'pending';

-- If only 2% of orders are pending, this index is ~50x smaller
-- than indexing the entire table. And it's faster too — a smaller
-- index means fewer pages to traverse.

-- GIN partial index for active products only:
CREATE INDEX idx_active_products_tags ON products USING gin (tags)
  WHERE active = true;

If only 2% of your orders are in "pending" status, a partial index on the pending rows is roughly 50 times smaller than a full index. It is faster to scan, faster to update, and uses a fraction of the disk space. Any index type — B-tree, GIN, GiST, BRIN — can be partial.

Expression indexes

An expression index indexes the result of a function or expression rather than a raw column value. This allows the planner to use an index for queries that apply functions in their WHERE clause.

-- Expression indexes: index a computed value.
CREATE INDEX idx_users_email_lower ON users (lower(email));

-- Powers:
SELECT * FROM users WHERE lower(email) = 'guest@example.com';

-- Without the expression index, PostgreSQL cannot use a regular
-- index on email for the lower() expression. It would sequential scan.

Without the expression index, WHERE lower(email) = 'guest@example.com' is a sequential scan — PostgreSQL has no way to know that the output of lower(email) for each row matches the indexed values of email. The expression index solves this by indexing the computed result directly. There is a thorough guide to all six reasons PostgreSQL ignores your index and how to fix each one: Fix PostgreSQL Sequential Scans.

How do the index types compare?

A reference table, for those occasions when you need a quick answer rather than a thorough explanation.

TypeOperatorsBest forSizeWrite cost
B-tree= < > <= >= BETWEEN INGeneral purpose, sorting, range queriesMediumLow
Hash=Equality-only lookups (tokens, UUIDs)SmallLow
GIN@> @@ ? ?& ?| @?Arrays, JSONB, full-text search, trigramsLargeHigh
GiST<< >> && @> <@ ~= <->Geometry, ranges, nearest-neighbor, exclusionMediumMedium
SP-GiST<< >> @> <@ ~=IP addresses, text prefix, partitioned pointsMedium-smallMedium
BRIN= < > <= >=Large append-only tables with natural orderTinyVery low

Which index type should I use?

Match the index type to your query pattern, not your assumption about which type sounds most advanced.

If your query does this...Use this
Equality and range queries on scalar columns?B-tree (the default)
Equality-only on long text/UUID columns?Hash (smaller, marginally faster for pure equality)
Searching inside arrays or JSONB documents?GIN
Full-text search with tsvector?GIN
LIKE or ILIKE with wildcards?GIN with pg_trgm
Geometric or spatial queries (PostGIS)?GiST
Range type overlap or containment?GiST
Exclusion constraints (no overlapping reservations)?GiST
IP address subnet queries?SP-GiST
Text prefix matching (paths, hierarchies)?SP-GiST
Range queries on a large, append-only table (logs, events)?BRIN
Nearest-neighbor search (ORDER BY <-> distance)?GiST (or SP-GiST)

When in doubt, start with B-tree. It handles 90% of real-world queries. Add specialized indexes when you have a specific query pattern that B-tree cannot serve or when you need the size advantage of BRIN on a very large ordered table.

A note on over-indexing

Every index accelerates reads and decelerates writes. Each INSERT, UPDATE, or DELETE must update every index on the table. On a table with eight indexes, a single insert performs nine write operations — one for the heap and one per index.

I have seen tables with 15 indexes, half of which were never used by any query. This is not an exaggeration for dramatic effect. It is a Tuesday. Teams add indexes to solve a slow query, the slow query changes or disappears, and the index remains — a monument to a problem that no longer exists, quietly taxing every write.

Find unused indexes that are costing writes for nothing
-- Find indexes that are costing you writes but serving no reads.
SELECT
  schemaname || '.' || relname AS table,
  indexrelname AS index,
  pg_size_pretty(pg_relation_size(indexrelid)) AS size,
  idx_scan AS scans,
  idx_tup_read AS tuples_read
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelid NOT IN (
    SELECT conindid FROM pg_constraint
    WHERE contype IN ('p', 'u')  -- keep primary keys and unique constraints
  )
ORDER BY pg_relation_size(indexrelid) DESC;

The pg_stat_user_indexes view reveals which indexes are actually being scanned. An index with zero idx_scan is costing you write performance and disk space in exchange for nothing. The query above excludes primary keys and unique constraints — those serve a purpose beyond query acceleration.

I should note that index usage statistics reset when the server restarts or when you call pg_stat_reset(). Before dropping an index with zero scans, verify that the counter has had sufficient time to accumulate — at minimum, a full business cycle. An index that is used once a month for a critical report is not unused.

Gold Lapel tracks which indexes are used and which are not, and it creates new indexes only when the query traffic demonstrates a clear need — a repeated filter pattern without a supporting index. It does not create indexes speculatively. The evidence must be in the traffic.

The index types you did not know you needed

Most teams arrive at a PostgreSQL deployment with B-tree as their only index type, and many never leave. This works, in the same way that a household can function with a single room — everything happens there, and it suffices until it does not.

The team that learns GIN avoids deploying Elasticsearch for a search feature that serves 5,000 documents. The team that learns GiST discovers that overlapping-reservation bugs are not an application problem at all — they are a missing constraint. The team that learns BRIN discovers that their 500GB analytics table can have an index that fits in 200KB.

These are not exotic tools for exotic workloads. They are the standard equipment of a database that most teams are underusing. If you'll permit me one observation before you go: the best index for your query may not be the one you have been using. It may be one you did not know existed until today.

And that, if nothing else, is worth the visit.

Frequently asked questions

Terms referenced in this article

Knowing which index to create is half the problem. Knowing which indexes are missing is the other half. I have prepared a guide on detecting missing indexes in PostgreSQL that automates the question this article taught you to answer manually.