← PostgreSQL Extensions

pg_trgm

Trigram-based similarity matching and fast LIKE/ILIKE searches — because a sequential scan on every keystroke is no way to run a search box.

Extension · March 21, 2026 · 7 min read

If you have a search box and a growing table, you will meet this extension eventually. pg_trgm breaks text into three-character sequences (trigrams) and uses them for similarity comparison and index-accelerated substring search. It turns expensive LIKE '%term%' queries — which normally force a sequential scan — into fast index lookups.

What pg_trgm does

A trigram is a group of three consecutive characters. The string "cat" produces the trigrams " c", " ca", "cat", "at ". pg_trgm uses these trigrams to measure how similar two strings are — the more trigrams they share, the higher the similarity score (0 to 1).

This is useful for two things: fuzzy matching (finding strings that are similar to a search term, even with typos) and substring search (finding strings that contain a term). The fuzzy matching works through the similarity() function and the % operator. The substring search works by creating GIN or GiST indexes with trigram operator classes. Between the two, you have a remarkably capable search toolkit that never leaves PostgreSQL.

When to use pg_trgm

The most common use cases:

  • Search boxes that need %LIKE% — product search, user search, any text search with a leading wildcard. Without pg_trgm, these force a sequential scan on every query.
  • Typo-tolerant search — "posgres" should match "PostgreSQL". The similarity() function handles this gracefully.
  • "Did you mean?" suggestions — rank candidates by similarity score to suggest corrections for misspelled input.
  • Deduplication — find near-duplicate records by comparing text fields with a similarity threshold.

If your search needs are more linguistic — stemming, ranking by relevance, boolean queries — you want PostgreSQL's built-in full-text search (tsvector/tsquery). Many applications use both: full-text search for primary results, pg_trgm for fuzzy fallback.

Installation and setup

pg_trgm is a contrib module included with PostgreSQL — no external packages or shared library preloading required. One statement and it is at your service.

SQL
-- No shared_preload_libraries needed — just create the extension
CREATE EXTENSION pg_trgm;

Similarity matching

The similarity() function returns a score between 0 (no match) and 1 (identical).

SQL
-- Basic similarity comparison
SELECT similarity('PostgreSQL', 'Postgresql');
-- Returns: 0.727273

-- Find similar product names
SELECT name, similarity(name, 'Posgres') AS sim
FROM products
WHERE similarity(name, 'Posgres') > 0.3
ORDER BY sim DESC;

The % operator is the indexed version — it returns true when similarity exceeds the threshold (default: 0.3).

Word similarity

PostgreSQL 11 added word_similarity(), which checks whether the query appears as a contiguous substring within the target. This is more useful for autocomplete-style matching where the search term is a partial word.

SQL
-- word_similarity: checks if the query appears as a substring
-- More useful for autocomplete-style matching
SELECT word_similarity('SQL', 'PostgreSQL');
-- Returns: 1.0 (SQL is a complete substring match within PostgreSQL)

SELECT word_similarity('post', 'PostgreSQL');
-- Returns: 0.5

-- The <<% and %>> operators use word_similarity
SELECT name FROM products WHERE name %>> 'sql';

Indexing LIKE and ILIKE

This is the most impactful feature of pg_trgm — and, candidly, the reason most people install it. A standard B-tree index cannot accelerate LIKE '%term%' because the leading wildcard prevents index range scans. A trigram GIN index can.

SQL
-- Create a GIN index for trigram-accelerated LIKE/ILIKE
CREATE INDEX idx_products_name_trgm ON products
  USING gin (name gin_trgm_ops);

-- These queries now use the index:
SELECT * FROM products WHERE name LIKE '%widget%';
SELECT * FROM products WHERE name ILIKE '%PostgreSQL%';

-- Without pg_trgm, LIKE with a leading wildcard forces a sequential scan.
-- With the trigram GIN index, PostgreSQL can use the index for any
-- substring match — leading, trailing, or middle.

One index. Leading wildcards, trailing wildcards, middle-of-string matches — all served from that single GIN index. The sequential scan that was quietly getting worse with every new row simply stops being a concern.

GIN vs GiST indexes

pg_trgm provides operator classes for both GIN and GiST indexes. They serve different masters, and knowing which to choose will save you from an awkward re-indexing later.

SQL
-- GIN index: faster reads, slower writes, larger on disk
CREATE INDEX idx_gin ON products USING gin (name gin_trgm_ops);

-- GiST index: slower reads, faster writes, supports distance operator
CREATE INDEX idx_gist ON products USING gist (name gist_trgm_ops);

-- GiST supports the distance operator for ORDER BY similarity:
SELECT name
FROM products
ORDER BY name <-> 'Posgres'
LIMIT 10;

-- GIN does not support <-> ordering.
-- Rule of thumb: use GIN for search, GiST for nearest-neighbor ranking.
PropertyGINGiST
Read speedFasterSlower
Write speedSlowerFaster
Index sizeLargerSmaller
LIKE/ILIKE accelerationYesYes
Distance operator (<->)NoYes
Similarity operator (%)YesYes

Tuning the similarity threshold

The % operator and similarity()-based queries use a configurable threshold to decide what counts as a match.

SQL
-- Check the current similarity threshold (default: 0.3)
SHOW pg_trgm.similarity_threshold;

-- Set a stricter threshold for the current session
SET pg_trgm.similarity_threshold = 0.5;

-- The % operator uses this threshold:
SELECT * FROM products WHERE name % 'Posgres';
-- Returns rows where similarity(name, 'Posgres') >= 0.5

Lower thresholds return more results but include weaker matches. Higher thresholds are more precise but may miss legitimate matches. Start with the default (0.3) and adjust based on your data — the right threshold depends on how forgiving your users expect the search to be, and that is something only your data can tell you.

Cloud availability

ProviderStatus
Amazon RDS / AuroraAvailable — contrib module, create with CREATE EXTENSION
Google Cloud SQLAvailable
Azure Database for PostgreSQLAvailable
SupabaseAvailable
NeonAvailable
Crunchy BridgeAvailable

How Gold Lapel relates

I should mention that Gold Lapel watches for exactly the pattern described above — repeated LIKE or ILIKE queries with leading wildcards hitting unindexed text columns. When it sees this, it recommends the trigram GIN index, and can create it automatically if you have given it permission to do so.

This is, in my experience, one of the most frequently surfaced recommendations. The story is always the same: a search feature ships with a simple WHERE name LIKE '%term%' that performs admirably on a small table, then quietly degrades to multi-second sequential scans as the data grows. Gold Lapel catches this trajectory early — before your users notice, and well before anyone opens a performance ticket at 2 a.m.

Frequently asked questions