Chapter 12: Custom Analyzers and Relevance Tuning

The Waiter of Gold Lapel · Published Apr 12, 2026 · 10 min

Every kitchen, no matter how well run, has moments when a dish comes back.

The search returns results. But not the right results. Or the right results in the wrong order. Or one document that should appear is stubbornly absent while an irrelevant one sits at position three with an air of unearned confidence. This happens. It happens in every search system — PostgreSQL, Elasticsearch, anything that tokenizes and ranks text. The question is not whether it will happen. The question is whether you have the tools to understand why.

I find debugging search results genuinely interesting. Not because the errors are pleasant — they are not — but because the diagnostic process reveals how the system thinks. And once you can see how it thinks, you can teach it to think differently. That is what this chapter is about: seeing inside the search pipeline, understanding what it did and why, and adjusting when its judgment is not quite right.

In Elasticsearch, you debug with the _analyze API and the _explain API. PostgreSQL has direct equivalents: ts_debug() for the tokenization pipeline, and tsvector/tsquery inspection with ts_rank() for score analysis. This chapter also covers custom text search configurations — because sometimes the built-in language configs are not what your data needs, and the right response is to create one that is.

If you will permit me, I should like to open the hood.

ts_debug(): The Tokenization Pipeline

ts_debug() is the X-ray of full-text search. It shows the complete tokenization pipeline for any text — every decision the system made about every word:

SQL

SELECT alias, description, token, dictionary, lexemes
FROM ts_debug('english', 'The quick brown foxes jumped');

Result:

Result

   alias   |   description   |  token  |  dictionary  | lexemes
-----------+-----------------+---------+--------------+---------
 asciiword | Word, all ASCII | The     | english_stem | {}
 blank     | Space symbols   |         |              |
 asciiword | Word, all ASCII | quick   | english_stem | {quick}
 blank     | Space symbols   |         |              |
 asciiword | Word, all ASCII | brown   | english_stem | {brown}
 blank     | Space symbols   |         |              |
 asciiword | Word, all ASCII | foxes   | english_stem | {fox}
 blank     | Space symbols   |         |              |
 asciiword | Word, all ASCII | jumped  | english_stem | {jump}

Allow me to walk through what happened to each token, because this is where the understanding lives.

“The” → consulted english_stem → recognized as a stop word → lexemes empty {}. Discarded. It will not appear in the tsvector and cannot be matched. This is why searching for “the” returns nothing — the word was removed before it could be indexed. This is correct behavior, but if you did not know it was happening, the empty result set would be baffling.

“foxes” → consulted english_stem → stemmed to “fox.” A search for “fox,” “foxes,” or “foxy” will all produce the same lexeme and match this document. The stemmer collapsed the variations. This is usually what you want.

“jumped” → stemmed to “jump.” A search for “jumping,” “jumps,” or “jumped” will match.

The diagnostic question ts_debug() answers is the most common search debugging question there is: “Why doesn’t this document match my query?” Run ts_debug() on the document text and on the query text. Compare lexemes. If they don’t overlap, the stemming or stop words are the cause — and now you can see exactly which token was stemmed to what, and which tokens were discarded entirely. The mystery becomes a diagnosis. The diagnosis becomes a fix.

This is the equivalent of Elasticsearch’s _analyze API. Same purpose, same insight, different syntax.

Gold Lapel’s analyze() and explain_score()

analyze():

gl.analyze("english", "The quick brown foxes jumped")

Wraps ts_debug(). Use case: “Why doesn’t searching for ‘running’ match a document containing ‘ran’?” Run analyze() on both. See that both stem to “run” — so they should match. If they don’t, the problem is elsewhere: wrong text search config, missing GIN index, wrong tsquery function. The tool eliminates causes systematically. I find that a more productive approach than guessing.

explain_score():

gl.explain_score("articles", "body", "database performance", article_id=42)

Returns a diagnostic object showing:

The tsvector of the document
The tsquery of the search terms
Whether the @@ match is true or false
The ts_rank() score
The ts_rank_cd() score (cover density)
The ts_headline() snippet with highlighted matches

This is the equivalent of Elasticsearch’s _explain API. It answers not just whether a document matches but why it scored the way it did. “Why is this document ranked #15 instead of #1?” Inspect the tsvector — perhaps the matching terms appear in a low-weight field. Inspect the rank — perhaps the document is very long and the length normalization is penalizing it. Inspect the headline — perhaps only one of three search terms actually appears. The score tells you what happened. The diagnostic tells you why.

ts_rank vs. ts_rank_cd: Choosing the Right Ranking

Chapter 4 introduced both ranking functions. Here they are in diagnostic context — because when the ranking feels wrong, this is the first lever to reach for.

SQL

SELECT title,
    ts_rank(search_vector, q) AS rank,
    ts_rank_cd(search_vector, q) AS rank_cd,
    ts_rank(search_vector, q, 2) AS rank_norm
FROM articles, plainto_tsquery('english', 'database performance') AS q
WHERE search_vector @@ q
ORDER BY rank DESC LIMIT 5;

Result:

Result

            title             |   rank    |  rank_cd  | rank_norm
------------------------------+-----------+-----------+-----------
 Database Systems Overview    | 0.0991032 | 0.0607927 | 0.0282843
 PostgreSQL Performance Guide | 0.0865362 | 0.1000000 | 0.0452819
 Database Performance Tuning  | 0.0759909 | 0.1500000 | 0.0759909

Three scores, three different rankings. “Database Systems Overview” ranks first by ts_rank() — the terms appear frequently. But “Database Performance Tuning” ranks first by ts_rank_cd() — the terms appear adjacent, which cover density rewards. And the normalized rank penalizes “Database Systems Overview” for being a long document.

Which ranking is correct? That depends on what “correct” means for your users:

ts_rank() — term frequency. Good general default. Documents that mention the terms often rank high.
ts_rank_cd() — cover density. Rewards proximity. Use when adjacent terms signal higher relevance — “database performance” as a phrase is more relevant than “database” in paragraph one and “performance” in paragraph twelve.
Normalization (parameter 2) — divide by document length. Use when short, focused documents should compete fairly with long comprehensive ones.

I would recommend starting with ts_rank(search_vector, q, 2) and adjusting from there. Most teams never need to adjust further. When they do, the diagnostic query above shows them exactly what each function produces, which makes the decision informed rather than speculative.

Custom Text Search Configurations

PostgreSQL ships with 30+ built-in text search configurations. Sometimes you need one that does not exist yet.

When to create a custom configuration:

Content that should not be stemmed — product SKUs, part numbers, identifiers
Domain-specific stop words to filter out noise unique to your data
Domain-specific dictionaries — medical terms, legal terminology, industry jargon
A language not in the built-in set

Creating one is straightforward:

SQL

CREATE TEXT SEARCH CONFIGURATION product_search (COPY = english);

This creates a new configuration as a copy of English. Then modify it to suit your needs.

Example 1: No-stem configuration for identifiers and codes:

SQL

CREATE TEXT SEARCH CONFIGURATION nostem (COPY = simple);

The simple configuration does no stemming and no stop word removal. Compare:

SQL

SELECT to_tsvector('english', 'running databases v2.0');
-- 'databas':2 'run':1 'v2.0':3

SELECT to_tsvector('nostem', 'running databases v2.0');
-- 'databases':2 'running':1 'v2.0':3

The English config stems “running” to “run” and “databases” to “databas.” The no-stem config preserves the original words. Use each where appropriate — linguistic stemming for natural language, no stemming for identifiers. A product catalog might use english for descriptions and nostem for SKU fields. The right configuration depends on the column, not the table.

Example 2: Custom stop words.

SQL

ALTER TEXT SEARCH CONFIGURATION product_search
ALTER MAPPING FOR asciiword WITH my_domain_dict, english_stem;

The configuration consults dictionaries in order. The first dictionary that recognizes the token determines the result. By placing your domain dictionary first, you intercept domain-specific noise words before the English stemmer processes them.

Gold Lapel’s create_search_config():

gl.create_search_config("product_search", "english")

Creates a copy of the named configuration. You then customize via SQL.

ES custom analyzer comparison. In Elasticsearch, custom analyzers are configured in index settings JSON — character filters, tokenizer, token filters in a nested object. In PostgreSQL, custom configurations use CREATE TEXT SEARCH CONFIGURATION and ALTER MAPPING SQL. The concepts are parallel. The syntax differs. The capability is equivalent.

The Debugging Workflow

Four scenarios, step by step. This is where the tools come together into a diagnostic practice.

Scenario 1: “This document should match but doesn’t”

Run analyze() on the document text and the query text.
Compare lexemes. Do the stemmed forms overlap?
If not: stemming or stop words are the cause. Check the text search configuration.
If they do overlap: check the tsquery function — is plainto_tsquery() producing what you expect?

Scenario 2: “This document matches but shouldn’t”

Run explain_score() on the document.
Inspect the tsvector — which lexemes are present?
Look for false positives from stemming — unexpected lexeme overlaps.
Consider a custom configuration with different stemming behavior.

Scenario 3: “The ranking is wrong”

Run explain_score() on the correctly-ranked and incorrectly-ranked documents.
Compare ts_rank() and ts_rank_cd() scores.
Check normalization — is the document very long (penalized) or very short (inflated)?
Try different normalization parameters.
Consider switching between ts_rank() and ts_rank_cd().

Scenario 4: “Search is slow”

Run EXPLAIN ANALYZE on the search query.
Check for a GIN index — no GIN index means sequential scan.
If the index exists but queries are slow, check if it needs VACUUM or REINDEX.
Gold Lapel’s proxy should have auto-created the index — check the dashboard’s index inventory.

I would suggest keeping this workflow accessible — bookmarked, printed, taped to a monitor. Search debugging follows the same patterns repeatedly, and having the steps at hand saves the time of rediscovering them each time a result looks wrong.

Elasticsearch Debugging Tools vs. PostgreSQL

ES Tool	PostgreSQL Equivalent	What It Shows
`_analyze` API	`ts_debug()` / `analyze()`	How text is tokenized, stemmed, filtered
`_explain` API	`explain_score()`	Why a document scored the way it did
Custom analyzer	`CREATE TEXT SEARCH CONFIGURATION`	Custom tokenization pipeline
Index settings	`ALTER TEXT SEARCH CONFIGURATION`	Modify analyzer behavior

Honest Boundary

PostgreSQL’s debugging tools cover tsvector/tsquery — lexical search. Debugging pgvector search quality is a different process: compare embeddings directly, check model quality, verify dimension alignment. This chapter focuses on full-text search debugging because that is where the tokenization pipeline applies.

Custom text search configurations are powerful but require understanding the dictionary chain. The examples above cover the most common cases. The PostgreSQL documentation is the authoritative reference for advanced configuration — custom C dictionaries, Ispell dictionaries, Snowball stemmers, thesaurus dictionaries. I have given you the practical foundation. The documentation provides the depth.

There is no equivalent of Elasticsearch’s Quepid or Splainer — dedicated visual relevance tuning tools — for PostgreSQL. The debugging workflow in this chapter is SQL-based. For teams that need a visual relevance tuning interface, this is a genuine gap in the PostgreSQL ecosystem, and I note it because acknowledging a gap is more useful than pretending it does not exist.

Part III is complete. And I should like to take a moment to observe what has been assembled.

Part II built the search stack: lexical search, fuzzy matching, phonetic search, semantic search, autocomplete. Five pillars, each mapped to an Elasticsearch feature, each demonstrated with runnable SQL. That alone was the argument most teams need to hear.

Part III went further than most teams expect. Aggregations — the feature that keeps teams on Elasticsearch — handled by SQL that has been doing this work since before Elasticsearch existed. The percolator — the feature most teams assume requires a dedicated search engine — handled by stored tsquery values and a GIN index, with real-time notifications via a trigger. Custom analyzers and relevance debugging — the tools that make search work correctly — handled by ts_debug(), explain_score(), and custom text search configurations.

Every major Elasticsearch capability for application search now has a PostgreSQL equivalent. In several areas — window functions, JOINs, CTEs, real-time percolator triggers, ACID consistency, row-level security — PostgreSQL provides capabilities that Elasticsearch does not. The parity argument is not partial. It is, I believe, rather thorough.

But an argument, however thorough, is not a deployment. The question is no longer “can PostgreSQL do what Elasticsearch does?” You have seen that it can. The question is now practical. How do the pillars combine into a single search query? How fast is the combined result? How far does it scale? And for the team currently running Elasticsearch — how do you migrate?

Part IV answers each of those questions. It begins with the most powerful pattern in this book: hybrid search. Lexical scores and semantic scores, combined into a single ranked result using Reciprocal Rank Fusion — a technique that produces relevance neither approach achieves alone.

If you will follow me, I should like to show you what the pillars look like when they work together. I have found, across many establishments, that the whole is rather more impressive than the sum of its parts.