Chapter 2: The Thesis
The word "search" is used loosely, and I must confess this troubles me.
In a product meeting, "search" means autocomplete — the user should see suggestions before they finish typing. In an architecture review, "search" means full-text indexing — tokenized, stemmed, ranked by relevance. In a machine learning team, "search" means vector similarity — embeddings, cosine distance, nearest neighbors. In a support ticket, "search" means something else entirely: "I typed words into the box and it didn't find what I was looking for, and I am displeased." These are all called "search." They are not the same thing.
They use different algorithms, different data structures, different index types, and different PostgreSQL mechanisms. Treating them as one problem — as the single word "search" invites you to do — leads to architectural decisions that solve the wrong problem well and the right problem not at all. I have seen this more often than I would like, and the result is always the same: a team running a powerful search engine to handle a problem that a well-indexed column would resolve in milliseconds.
Elasticsearch bundles all of these capabilities behind one service and one API. That bundling is genuinely convenient, and I have no interest in dismissing the value of convenience — convenience is, after all, rather central to my profession. But it is also why most teams never realize how many distinct problems they are conflating. When everything is "search," nothing is diagnosed precisely. When the user says "the search doesn't work," the engineer has no map to locate the failure.
Allow me to draw one.
When Someone Says "The Search Doesn't Work"
Before I lay out the taxonomy, I should explain why it matters — because the types of search that follow are not academic categories. They are diagnoses. And a good diagnosis, if you will permit me the observation, is worth more than a powerful tool applied to the wrong problem.
The user does not think in search types. The user types words into a box and expects results. When the results are wrong, they do not file a ticket saying "your lexical search fails to account for semantic similarity across the embedding space." They say: "The search doesn't work." It is the most common complaint in application development, and it is also the least specific.
What that sentence actually means depends entirely on context — and each meaning points to a different capability, a different PostgreSQL mechanism, and a different chapter of this book:
The user typed "postgre" and got no results. The search works perfectly — it simply has no tolerance for typos. This is a fuzzy matching problem, and I assure you it is among the most common. Chapter 5 addresses it with care.
The user typed "Smith" and could not find "Smyth." The search works perfectly — it matches characters, not sounds. This is a phonetic search problem. Chapter 6 addresses it.
The user typed "comfortable office chair" and did not see "ergonomic desk seating." The search works perfectly — it matches words, not meaning. This is a semantic search problem, and it is the one that most surprises teams who thought full-text search was sufficient. Chapters 7 and 8 address it.
The user typed "database performance" and the most relevant article appeared on page three. The search works perfectly — the ranking simply needs attention. This is a relevance scoring problem, and it is eminently solvable. Chapter 12 addresses it.
The user started typing "post" and nothing appeared in the dropdown. The search works perfectly — it simply was not asked to provide suggestions. This is an autocomplete problem. Chapter 9 addresses it.
The user found five hundred results but no way to filter by category. The search works perfectly — it returned results without structure. This is an aggregation problem. Chapter 10 addresses it.
The user saved a search for "senior engineer AND remote" and never received a notification when matching content appeared. The search works perfectly — but only in one direction. This is a reverse search problem, and it is the one that most teams assume requires Elasticsearch. Chapter 11 will demonstrate otherwise.
The developer cannot figure out why a document that should match a query does not. The search works — but its reasoning is opaque. This is a relevance debugging problem. Chapter 12 provides the tools to open the hood.
Eight complaints. Eight capabilities. Eight different PostgreSQL mechanisms. One word: "search." I find the English language is occasionally insufficient for the demands placed upon it.
The taxonomy that follows is not an exercise in classification for its own sake. It is a diagnostic guide. When someone tells you the search doesn't work, this map tells you where to look — and which chapter of this book to open. I have organized it for your convenience.
The Eight Capabilities
What types of search can PostgreSQL handle natively? I am pleased to report: all of them. Eight distinct capabilities, each with its own mechanism, each corresponding to a specific Elasticsearch feature that teams assume requires a separate service. They do not.
I will present each in the same format: the question it answers, a concrete example, how it works, the PostgreSQL mechanism that powers it, the Elasticsearch feature it replaces, the Gold Lapel method that wraps it, and the chapter that covers it in depth. The consistency is deliberate — I expect you may wish to return to this section as a reference, and references should be predictable in their structure.
1. Lexical Search (Full-Text)
The question: "Find documents containing these words, or their stems."
A user searches for "running shoes." The system finds documents containing "running," "run," "runs," and "shoes," "shoe" — because it understands that these are stems of the same root words. It also ignores "the" and "a" and "is," because those words carry no search value. It ranks results by how frequently and prominently the search terms appear. It does not, however, find "jogging sneakers" — the words are different even if the meaning is the same. That gap is a different kind of search entirely, and we will address it in capability number four.
Full-text search is the foundation. It is what most people mean when they say "search," and it is what most teams add Elasticsearch to provide. PostgreSQL has offered it natively — through tsvector, tsquery, the @@ match operator, ts_rank() for relevance scoring, and GIN indexes for performance — since version 8.3, released in 2008. That is nearly two decades of production use, quietly available, doing precisely the job that most teams assume requires a separate service. Elasticsearch uses BM25 for relevance scoring; PostgreSQL uses ts_rank() and ts_rank_cd(). Different algorithms, same purpose, both thoroughly proven.
Elasticsearch equivalent: The match query. Gold Lapel method: search(). Chapter 4 covers this in the depth it deserves.
2. Fuzzy Search
The question: "Find results even when the query has typos."
What is fuzzy search? It is the capability that forgives the fingers. A user types "Postgre" and expects to find "PostgreSQL." Types "Jonh" and expects to find "John." The spelling is wrong, but the intent is clear to any human reader — and should be equally clear to the search system. A search that demands perfect spelling is a search that has forgotten whom it serves.
PostgreSQL handles this through pg_trgm, which splits strings into trigrams — three-character sequences — and computes the overlap between the query's trigrams and each candidate's trigrams. This is fundamentally different from Elasticsearch's Levenshtein-based fuzzy matching, and in many cases more flexible: trigram similarity handles transpositions, insertions, and deletions naturally, without special configuration. Both approaches are worthy of respect. PostgreSQL's simply requires less fussing.
Elasticsearch equivalent: The fuzzy query. Gold Lapel method: search_fuzzy(). Chapter 5 covers this in depth.
3. Phonetic Search
The question: "Find results that sound like the query, regardless of spelling."
A user searches for "Stefan" and expects to find "Stephen." Searches for "Shmidt" and expects to find "Schmidt." These are not typos — they are legitimate spelling variations that share a pronunciation. Fuzzy matching might catch some of them, but the problem is phonetic, not orthographic. The user's fingers are fine. It is the language itself that offers multiple spellings for the same sound.
PostgreSQL handles this through fuzzystrmatch — Soundex, Metaphone, and Double Metaphone algorithms that convert strings to phonetic codes. Strings that sound alike produce the same code, regardless of how they are spelled. Soundex has been performing this service since 1918, originally for the United States Census Bureau. It is, I confess, one of the more durable algorithms in the history of computing — and I appreciate durability.
Elasticsearch equivalent: The phonetic analysis plugin. Gold Lapel method: search_phonetic(). Chapter 6 covers this in depth.
4. Semantic Search
The question: "Find results that mean something similar, even if they use completely different words."
This is where the difference between lexical and semantic search becomes vivid. A user searches for "comfortable office chair" and expects to find "ergonomic desk seating." The words share no stems. They share no characters. They share no phonetic codes. Lexical search, fuzzy matching, and phonetic search are all powerless here — they match the surface of language. Semantic search matches the meaning beneath it.
Semantic search works by converting text into vector embeddings — numerical representations of meaning produced by a neural network. Texts with similar meanings produce vectors that are close together in high-dimensional space. Finding similar documents becomes a nearest-neighbor query, which is a well-solved database problem when you have the right index.
PostgreSQL handles this through pgvector — the vector data type, the <=> cosine distance operator, and HNSW indexes for fast approximate nearest-neighbor search. HNSW is the same algorithm used by Elasticsearch's dense vector search, by Pinecone, and by Weaviate. The algorithm belongs to no single vendor. PostgreSQL simply provides a very good home for it.
Elasticsearch equivalent: kNN (k-nearest neighbors) search. Gold Lapel method: similar(). Chapters 7 and 8 cover this — Chapter 7 explains what embeddings are in terms the reader will find intuitive, and Chapter 8 shows how to use them in PostgreSQL.
5. Autocomplete
The question: "What might the user be typing?"
A user types "Post" into a search box. Before they finish, suggestions appear: "PostgreSQL," "PostGIS," "Postgres Connection Pooling." This is autocomplete — the most frequently encountered search feature in any application, and often the very first reason teams reach for Elasticsearch's completion suggester. It is, if I may observe, a feature that users notice only when it is absent.
PostgreSQL handles this through prefix matching with ILIKE and a trailing wildcard, ranked by pg_trgm's similarity() function. A GIN trigram index makes the query fast — fast enough that the user sees results before they have finished thinking about what they are typing. The same extension that powers fuzzy search also powers autocomplete — different problem, same underlying tool, one installation.
Elasticsearch equivalent: The completion suggester. Gold Lapel method: suggest(). Chapter 9 covers this in depth.
6. Aggregations
The question: "How many results are in each category? What is the average price? What are the top tags?"
A user searches for "chair" and sees: "Showing 342 results — Furniture (127), Office (89), Outdoor (73), Kids (53)." Those category counts are aggregations — faceted search results. They are how users navigate large result sets, and they are one of the primary reasons teams stay on Elasticsearch after they have adopted it for other purposes. The assumption is that faceted search requires a search engine's aggregation framework.
I understand the assumption. I respectfully disagree with it. Faceted search is GROUP BY and COUNT(*). PostgreSQL has been doing this since before Elasticsearch was a project. And SQL aggregations compose with JOINs, subqueries, window functions, and CTEs in ways that Elasticsearch's aggregation DSL — a nested JSON structure with its own particular syntax — simply cannot match. I do not say this to diminish Elasticsearch's aggregation capabilities, which are substantial. I say it because SQL has been refining this particular craft for rather a long time.
Elasticsearch equivalent: Terms aggregation, metric aggregations. Gold Lapel methods: facets(), aggregate(). Chapter 10 covers this in depth.
7. Reverse Search (Percolator)
The question: "Which of my saved queries match this new document?"
Normal search works in one direction: here is a query, find me matching documents. Reverse search inverts this elegantly: here is a document, find me matching queries.
A user saves the search "senior engineer AND remote AND PostgreSQL" and wants an email when a new job posting matches. A content management system stores fifty topic categories as queries and needs to classify each new article automatically. A support platform needs to route incoming tickets to the right team based on content. These are all the same pattern — stored queries, matched against incoming documents — and it is the capability most people assume requires Elasticsearch.
PostgreSQL handles this by storing tsquery values in a table with a GIN index. When a new document arrives, it is converted to a tsvector and matched against all stored queries. The GIN index makes this efficient even across thousands of stored queries. The solution is, I should note, rather satisfying in its simplicity.
Elasticsearch equivalent: The percolator API. Gold Lapel methods: percolate_add(), percolate(), percolate_delete(). Chapter 11 covers this in depth — and I suspect the reader will find it one of the more surprising chapters in the book.
8. Relevance Debugging
The question: "Why did this document match — or not match? Why is it ranked here and not higher?"
A developer notices that searching for "running" does not match a document containing "ran." Or that a highly relevant document appears on page three instead of page one. Something is wrong with the search — but what? The developer needs to see inside the search engine's decision-making: how the text was tokenized, which dictionary was applied, whether the term was discarded as a stop word, and how the relevance score was computed.
PostgreSQL provides ts_debug(), which opens the entire tokenization pipeline for inspection — token by token, dictionary by dictionary, lexeme by lexeme. It is, if I may say, refreshingly transparent. One does not often encounter a system so willing to explain its reasoning. Elasticsearch provides the _analyze and _explain APIs for similar purposes; PostgreSQL's version is a SQL query you can run in psql, and the output is a table you can read with your own eyes.
Elasticsearch equivalent: The _analyze API and _explain API. Gold Lapel methods: analyze(), explain_score(). Chapter 12 covers this in depth.
The Bundle and the Alternative
Elasticsearch packages all eight capabilities behind one service and one API. That bundling is genuinely convenient — one dependency, one query language, one cluster to manage. For the team that needs all eight capabilities, the convenience of a single service has real value, and I would not presume to dismiss it.
But the convenience comes bundled with the cost. Every one of those eight capabilities requires the entire Elasticsearch infrastructure: the JVM, the cluster topology, the data synchronization pipeline, the monitoring. You do not get to use just the autocomplete without also running the distributed search cluster. You do not get fuzzy matching without the full index management overhead. The capabilities are bundled. The cost is bundled too.
PostgreSQL's approach is the opposite, and I confess I find it more considerate of the team's actual needs: each capability is independent. You can use tsvector without pgvector. You can use pg_trgm without fuzzystrmatch. You install what you need. You pay — in complexity, in extensions, in configuration — only for what you use. And every capability you install shares the same database, the same ACID guarantees, the same backup process, the same monitoring you already have in place.
A team that needs only full-text search and autocomplete installs one contrib extension and writes SQL. A team that needs all eight capabilities installs three extensions and writes SQL. Neither team runs a separate cluster. Neither team maintains a synchronization pipeline. Neither team receives alerts at three in the morning because the search service has opinions about its availability that differ from the database's.
The modularity is not a limitation. It is, I would suggest, a courtesy.
How They Combine
The real power of these capabilities is not any one of them alone — it is their composition.
A search query that uses tsvector for lexical matching, pgvector for semantic similarity, and pg_trgm for fuzzy fallback — with faceted aggregations on the results — runs as a single SQL query against a single database. No orchestration layer routing requests between separate services. No merging results from different systems. One query. One round trip. One source of truth.
The technique that makes this practical is called Reciprocal Rank Fusion — a method for combining ranked results from different search methods into a single coherent ranking. Chapter 13 covers it in detail, including the SQL that implements it. I will not spoil the elegance of it here, but I will say this: it is the chapter where I expect the reader to find the architecture most deeply satisfying.
The Map
For reference — and I expect you may wish to return to this table:
| # | Capability | The Question | PostgreSQL Mechanism | ES Equivalent | GL Method(s) | Chapter |
|---|---|---|---|---|---|---|
| 1 | Lexical search | "Find documents with these words" | tsvector/tsquery, GIN | match query | search() | 4 |
| 2 | Fuzzy search | "Find results despite typos" | pg_trgm similarity, GIN | fuzzy query | search_fuzzy() | 5 |
| 3 | Phonetic search | "Find results that sound like this" | fuzzystrmatch, B-tree | phonetic plugin | search_phonetic() | 6 |
| 4 | Semantic search | "Find results that mean this" | pgvector, HNSW | kNN search | similar() | 7-8 |
| 5 | Autocomplete | "What might the user be typing?" | ILIKE + pg_trgm, GIN | completion suggester | suggest() | 9 |
| 6 | Aggregations | "How many in each category?" | GROUP BY, window functions | terms/metric aggs | facets(), aggregate() | 10 |
| 7 | Reverse search | "Which saved queries match this?" | tsquery in table, GIN | percolator API | percolate_*() | 11 |
| 8 | Relevance debugging | "Why did this rank here?" | ts_debug(), score analysis | _analyze, _explain | analyze(), explain_score() | 12 |
Eight capabilities. Eight PostgreSQL mechanisms. Eight Elasticsearch features replaced. The map is drawn. I trust you will find it useful.
Before we explore each capability individually, we need to prepare the surface on which the search indexes will operate. The tsvector columns, the vector columns, the GIN indexes, the HNSW indexes — they all need a proper home. And the best home, as readers of the first book will recognize, is a table that does not yet exist in your schema: the materialized view.
Chapter 3 will set the table. If you will forgive the expression.