Hash index
A specialist who does exactly one thing — equality lookups — and nothing else. Allow me to explain where that singular focus earns its keep.
A hash index uses a hash function to map column values directly to buckets, giving it O(1) lookup time for equality comparisons (=). It cannot support range queries, sorting, or multicolumn indexes. Hash indexes have been WAL-logged and crash-safe since PostgreSQL 10, but remain a niche choice — because B-tree handles equality and everything else. A reformed reputation does not erase a long memory.
What a hash index is
A hash index applies a hash function to the indexed column value and stores the result in a fixed set of buckets. Each bucket contains pointers to the table rows whose hash values map to that bucket. When a query searches for an exact match (WHERE token = 'abc123'), PostgreSQL hashes the search value, jumps directly to the corresponding bucket, and follows the pointers to the matching rows. No tree traversal, no sorted pages. Straight to the answer.
This is fundamentally different from how a B-tree works. A B-tree maintains sorted data and traverses a tree structure — O(log n) operations. A hash index skips the traversal entirely and goes straight to the bucket — O(1) in the average case. The trade-off is that the hash function destroys ordering information entirely. The index knows that two values hash to the same bucket, but it has no concept of "greater than" or "less than." It traded its memory of order for speed at a single task.
Internally, PostgreSQL's hash index implementation uses linear hashing, which grows the number of buckets incrementally as the table grows. Overflow pages handle hash collisions within a bucket. The implementation also maintains a bitmap page to track which buckets have been split during expansion.
The history
Hash indexes have existed in PostgreSQL for decades, but their reputation was poor for most of that time — and deservedly so.
Before PostgreSQL 10, hash indexes were not WAL-logged. This meant they were not crash-safe: if PostgreSQL crashed, the hash index could be corrupted and had to be rebuilt with REINDEX. They were also not replicated to standby servers, making them useless in any setup with streaming replication. The PostgreSQL documentation itself recommended against using them. When the official documentation tells you to avoid something, one listens.
PostgreSQL 10 changed this. Hash indexes became fully WAL-logged, crash-safe, and replicated. The implementation was also overhauled for better concurrency and performance.
PostgreSQL 11 added support for hash indexes on partitioned tables, further closing the gap with B-tree.
I respect a comeback story. The hash index served its penance, did the work, and emerged rehabilitated. But the years of "do not use hash indexes" advice still echo through blog posts, Stack Overflow answers, and institutional memory. That advice was correct for its time. It is no longer categorically true — but the situations where a hash index is genuinely the better choice remain narrow.
When to use hash indexes
Hash indexes are worth considering when — and I must stress the when — all of the following are true:
- The query pattern is pure equality — only
=comparisons, never range, sorting, or pattern matching - The column has high cardinality — many distinct values, like UUIDs, session tokens, or content hashes
- You have measured that B-tree is not meeting your needs on that specific column, or index size is a constraint
Practical examples where the hash index earns its place on staff:
- Session lookup tables — keyed by a random token, always queried with
= - Content-addressable storage — keyed by a SHA-256 hash, always queried with
= - UUID primary key lookups — where the index size savings over B-tree matter and you never sort by the UUID
-- Create a hash index on a UUID column
CREATE INDEX idx_sessions_token ON sessions USING hash (token);
-- Create concurrently (no write lock)
CREATE INDEX CONCURRENTLY idx_lookups_key ON lookups USING hash (lookup_key); Limitations
Hash indexes give up rather a lot in exchange for O(1) equality lookups:
- No range queries —
<,>,<=,>=,BETWEENcannot use a hash index - No ORDER BY — the hash function destroys sort order, so PostgreSQL cannot use the index to avoid a sort step
- No multicolumn indexes — hash indexes support only a single column
- No UNIQUE constraint — you cannot create a unique hash index or back a UNIQUE constraint with one
- No index-only scans — the index stores hash codes, not original values, so it always requires a heap lookup
- No prefix or pattern matching —
LIKE 'abc%'cannot use a hash index
These restrictions mean that if your query pattern ever evolves beyond pure equality — adding an ORDER BY, a date range filter, or a second indexed column — the hash index becomes useless for that query and you will need a B-tree anyway. One does not hire a specialist who cannot adapt when the household's needs change.
Hash vs B-tree
This is the central question, and I should be direct about the answer: B-tree is the right choice for most workloads, even equality-only ones. Here is where they differ:
| Hash | B-tree | |
|---|---|---|
Equality (=) | O(1) average | O(log n) |
| Range queries | Not supported | Supported |
| ORDER BY | Not supported | Supported |
| Multicolumn | Not supported | Supported |
| UNIQUE constraint | Not supported | Supported |
| Index-only scan | Not supported | Supported |
| Index size | Often 30-50% smaller | Larger (stores full values) |
| Crash safety | Since PG 10 | Always |
The O(1) vs O(log n) difference sounds dramatic on paper. In practice, a B-tree lookup on a table with a billion rows requires 4-5 page reads, and most of those pages are cached in shared buffers. The actual latency difference for a single lookup is often microseconds — a difference your application will not notice and your users certainly will not. Hash indexes win on size, not speed, and the size advantage only matters when you are constrained on memory or disk and the column values are wide. I would be a poor guide if I let the theoretical complexity class do the selling.
To compare index sizes on your own data:
-- Compare index sizes between B-tree and hash on the same column
CREATE INDEX idx_events_id_btree ON events USING btree (event_id);
CREATE INDEX idx_events_id_hash ON events USING hash (event_id);
SELECT indexname, pg_size_pretty(pg_relation_size(indexname::regclass)) AS size
FROM pg_indexes
WHERE tablename = 'events'
AND indexname IN ('idx_events_id_btree', 'idx_events_id_hash'); To verify a query is using the hash index:
-- Check which indexes exist on a table
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'sessions';
-- Confirm the query uses the hash index
EXPLAIN ANALYZE
SELECT * FROM sessions WHERE token = 'abc123-def456';
-- Look for "Index Scan using idx_sessions_token" in the output How Gold Lapel relates
Gold Lapel generally recommends B-tree indexes — they cover the widest range of query patterns with no risk of needing replacement as workloads evolve. The dependable choice is dependable for a reason. However, when Gold Lapel observes a column that is exclusively queried with equality comparisons on high-cardinality data, and where index size is contributing to memory pressure, it may suggest a hash index as an alternative. The recommendation considers the full query profile for that column: if even one query uses a range comparison or ORDER BY, B-tree remains the recommendation. I see no virtue in recommending a specialist when the generalist already has matters in hand.