Chapter 13: Hybrid Search: Combining the Pillars

The Waiter of Gold Lapel · Published Apr 12, 2026 · 12 min

I have been waiting twelve chapters to show you this.

Each pillar from Part II is, individually, a capable tool. tsvector matches exact words — or more precisely, their stems. It finds documents containing “postgres” and “performance” and “tuning” with the precision of a good index. But it misses a document titled “PostgreSQL Query Optimization Guide” because “optimization” and “tuning” share no stems. The words are different. tsvector does not know they mean the same thing. It was not built to know.

pgvector matches meaning. It finds the optimization guide — the embedding captures the semantic relationship. But it might also return “Database Performance Monitoring Best Practices” — semantically adjacent, conceptually nearby, and not what the user wanted. Meaning is close. Intent is not quite right.

pg_trgm handles typos. “Postges” still finds “PostgreSQL.” But it operates on characters, not meaning — it cannot bridge the gap between “tuning” and “optimization” any more than tsvector can.

Used alone, each has gaps. Used together — and this is the part I have been looking forward to — they cover each other’s weaknesses completely. This is hybrid search: combining multiple search signals into a single ranked result that is more accurate than any signal alone. The pieces you have built across twelve chapters are about to click together. I trust you will find the result satisfying.

Elasticsearch handles signal blending internally — its relevance scoring combines multiple factors within the scoring engine. PostgreSQL requires composing the signals explicitly via SQL. The result is more transparent, more debuggable, and more controllable — because each signal is a separate query you can inspect, test, and tune independently. Opacity is not a feature. Visibility is.

The Problem Hybrid Search Solves

A concrete example, because the problem is clearer when you can see it.

A user searches for “postgres performance tuning.”

tsvector alone: Finds documents containing all three terms as lexemes. Returns “PostgreSQL Performance Tuning Guide” and “Postgres Performance Best Practices.” Misses “PostgreSQL Query Optimization Guide” — because “optimization” does not stem to “tuning.” The words are different. tsvector cannot see that the meaning is the same.

pgvector alone: Finds documents whose embeddings are semantically close to the query. Returns the optimization guide — the meaning is similar. Also returns “Database Performance Monitoring Best Practices” — semantically adjacent, but about monitoring, not tuning. The semantic signal is broad. It catches more, including results that merely sound related.

Hybrid: The optimization guide appears — pgvector found it. “Database Performance Monitoring” drops — tsvector didn’t confirm the match, so it scores low in the combined ranking. Documents that both signals endorse rank highest. The lexical precision of tsvector and the semantic understanding of pgvector reinforce each other, and neither’s weaknesses survive the combination.

What’s better — keyword search or vector search? Neither alone. Both together. The combination outperforms either approach individually because lexical precision and semantic understanding compensate for each other’s blind spots. This is not a theoretical improvement. It is a measurable one, and it is the reason hybrid search has become the standard approach in modern retrieval systems.

Reciprocal Rank Fusion (RRF)

What is Reciprocal Rank Fusion? It is the technique that makes hybrid search work — the method for combining ranked lists from different search methods into a single ranked result.

Formula: RRF_score = Σ (1 / (k + rank_i)) where k is a constant (typically 60) and rank_i is the document’s rank in each search method.

How it works, intuitively: A document ranked #1 by tsvector and #3 by pgvector gets a higher combined score than a document ranked #50 by tsvector and #1 by pgvector. Both signals must agree for a document to rank high. A document that impresses only one signal is less trustworthy than a document that both signals independently endorse. I find this a reasonable principle — in search, as in most things, corroboration is more persuasive than a single opinion.

Why RRF over simple score addition. This is the question most developers ask first, and the answer matters. Different search methods produce scores on different scales. tsvector’s ts_rank() returns a small float (typically 0.0–0.3). pgvector’s <=> returns cosine distance (0.0–2.0). pg_trgm’s similarity() returns 0.0–1.0. Adding these raw scores together produces meaningless results — a cosine distance of 0.5 and a ts_rank of 0.05 are not comparable quantities. You would not add kilograms and miles and call the sum meaningful.

RRF solves this by using ranks instead of scores. Position 1 means “best result from this method” regardless of what the raw score was. Ranks are commensurable. Scores are not. This is why RRF is the standard approach, and I would suggest trusting the standard before inventing alternatives.

The k constant: k=60 is the default from the original Cormack, Clarke & Buettcher 2009 paper. It controls how much weight higher ranks receive relative to lower ranks. Lower k means stronger preference for top-ranked results. Higher k means more even weighting across the list. Start with 60. Adjust only if testing on your specific data shows measurable improvement. In my experience, most teams never adjust it.

Implementation in PostgreSQL

How to implement RRF in SQL. This is the centerpiece of the book, and I intend to walk through every line:

SQL

WITH lexical AS (
    SELECT id,
        ROW_NUMBER() OVER (ORDER BY ts_rank(search_vector, query) DESC) AS rank
    FROM search_products,
        plainto_tsquery('english', 'postgres performance tuning') AS query
    WHERE search_vector @@ query
    LIMIT 100
),
semantic AS (
    SELECT id,
        ROW_NUMBER() OVER (ORDER BY embedding <=> query_embedding) AS rank
    FROM search_products
    ORDER BY embedding <=> query_embedding
    LIMIT 100
)
SELECT p.*,
    COALESCE(1.0 / (60 + l.rank), 0) +
    COALESCE(1.0 / (60 + s.rank), 0) AS rrf_score
FROM search_products p
LEFT JOIN lexical l ON p.id = l.id
LEFT JOIN semantic s ON p.id = s.id
WHERE l.id IS NOT NULL OR s.id IS NOT NULL
ORDER BY rrf_score DESC
LIMIT 20;

Step 1: The lexical CTE. Full-text search. Finds the top 100 matches by ts_rank() score. Assigns each a rank — 1 for the best match, 100 for the weakest in the candidate set. The WHERE search_vector @@ query filters to only matching documents. This is Chapter 4’s search query, wrapped in a CTE with row numbering. You have seen this query before. It is doing exactly what it did in Chapter 4 — it is simply contributing its results to a larger composition now.

Step 2: The semantic CTE. Vector search. Finds the top 100 nearest neighbors by cosine distance. Assigns each a rank — 1 for the closest embedding, 100 for the furthest in the candidate set. No WHERE clause — vector search ranks everything by distance and takes the top k. This is Chapter 8’s query, wrapped in a CTE with row numbering.

Step 3: The main query. LEFT JOINs both result sets onto the base table. For each document, computes the RRF score: 1/(60 + lexical_rank) + 1/(60 + semantic_rank). Documents found by both methods get both terms — highest possible scores. Documents found by only one method get one term plus zero (COALESCE handles the NULL from the missing LEFT JOIN) — lower scores, but still present in the results.

Step 4: The WHERE clause. l.id IS NOT NULL OR s.id IS NOT NULL — only return documents that appeared in at least one search method’s top 100. Documents that neither method found are excluded entirely.

Step 5: The final ORDER BY. rrf_score DESC LIMIT 20 — the final ranked result. The 20 documents that both signals most agree on.

I would ask you to notice something about this query. Each CTE is a search method you already know. The main query is a join and an arithmetic expression. The sophistication is in the composition, not the components. You built the pieces across twelve chapters. The assembly is three CTEs and a formula. That is the architecture.

The LIMIT 100 on each CTE is a practical choice. The candidate set from each method should be large enough that good documents are not excluded, but small enough that the fusion query is fast. 100 is a good default. Increase to 200 or 500 if testing shows relevant results being cut off.

Adding Fuzzy Fallback: The Three-Signal Pipeline

The two-signal pipeline handles most cases. But what if the user types “postges performnce” — a query with typos?

tsvector returns nothing — “postges” doesn’t stem to any known lexeme. pgvector might still find semantic matches if the embedding model is tolerant of typos, but many models are not. Both primary signals fail. The user sees an empty results page.

The fuzzy signal catches what both missed. Add a third CTE:

SQL

fuzzy AS (
    SELECT id,
        ROW_NUMBER() OVER (ORDER BY similarity(name, 'postges performnce') DESC) AS rank
    FROM search_products
    WHERE similarity(name, 'postges performnce') > 0.2
    LIMIT 100
)

Add it to the LEFT JOINs and add a third term to the RRF score:

SQL

COALESCE(1.0 / (60 + l.rank), 0) +
COALESCE(1.0 / (60 + s.rank), 0) +
COALESCE(1.0 / (60 + f.rank), 0) AS rrf_score

The three-signal pipeline:

Signal 1 (tsvector): “Do the words match?” — keyword relevance.
Signal 2 (pgvector): “Does the meaning match?” — semantic relevance.
Signal 3 (pg_trgm): “Is this close enough despite typos?” — fuzzy relevance.

RRF combines all three. Documents endorsed by multiple signals rank highest. Documents found by only one signal still appear but rank lower. The user who types “postges performnce” gets results. The user who types “postgres performance tuning” gets better results — because more signals agree. The pipeline rewards precision without punishing imprecision. I find that a principle worth building a search system around.

When to use two vs three signals. Two signals (lexical + semantic) for most applications. Add the fuzzy signal when your users commonly make typos — e-commerce product search, user-facing search boxes, name lookups. The fuzzy signal costs one additional CTE per query. Add it when the benefit justifies the cost. For many user-facing applications, it does.

Where This Surpasses Elasticsearch

Transparency. Each signal is a separate CTE. You can run the lexical CTE alone and see exactly what tsvector found. Run semantic alone and see exactly what pgvector found. Compare the candidate sets. Understand where they agree and where they disagree. In Elasticsearch, the relevance blending happens inside the scoring engine — isolating one signal from another requires restructuring the query. Here, you comment out a CTE and run the query. The debugging is built into the structure.

SQL composability. Add a WHERE clause for row-level security — users see only results they are authorized to see. Add a JOIN for user preferences. Add a GROUP BY for faceted results — Chapter 10’s aggregation patterns compose directly with hybrid search. Search results, facets, and hybrid ranking in one round trip. In Elasticsearch, each concern requires a separate query clause. In PostgreSQL, they are all SQL.

Controllability. Want to weight lexical results more heavily than semantic? Multiply one RRF term by a constant. Want to exclude the fuzzy signal for authenticated users who rarely make typos? Remove the CTE conditionally. Want to add a fourth signal — popularity, recency, user personalization? Add another CTE and another term in the RRF formula. The pipeline is SQL. It is as flexible as SQL is, which is to say: very.

The Materialized View: Where It All Lives

The materialized view from Chapter 3 is where the tsvector column and the vector column coexist. Both columns are indexed on the same view — GIN for tsvector, HNSW for pgvector, GIN trigram for pg_trgm. Both are refreshed atomically. Both are queried in the same SQL statement.

The architecture from Chapter 3 was designed for this moment. The materialized view is not just a convenience — it is the surface that makes hybrid search a single query instead of a multi-service orchestration. One view. Three indexes. Three CTEs. One result. No sync pipeline between services. No eventual consistency between search methods. One database, answering one query, with ACID guarantees.

I trust, looking back at Chapter 3, that its purpose is now clear.

Gold Lapel’s Role

The wrapper methods generate the individual search queries. The proxy creates and maintains the indexes. You compose the RRF pipeline using the CTEs shown above.

Future: Gold Lapel may offer a hybrid_search() method that generates the RRF pipeline automatically. For now, the CTE composition is explicit — which has the advantage of being fully visible, fully debuggable, and fully customizable. You control the pipeline. I consider that a feature, not a limitation.

Honest Boundary

RRF is a simple, effective fusion method. It is not the only one. Weighted linear combination, learned ranking models (LambdaMART, LambdaRank), and cross-encoder reranking can produce better results for specific use cases where ranking quality is the primary differentiator. RRF is the practical default — effective, simple to implement, and interpretable. Advanced ranking models are a specialized topic beyond this book’s scope, and I would not have you believe that RRF is the final word on the subject. It is the right starting point.

The CTE approach runs two (or three) searches per query. For very high-throughput search (thousands of queries per second), this is more expensive than a single search. For the vast majority of applications, the latency is still single-digit milliseconds — the LIMIT 100 on each CTE keeps the candidate sets small, and the fusion is a lightweight join operation. If single-digit milliseconds are not fast enough for your application, you have my genuine admiration for the scale of your problem.

Hybrid search quality depends on both the tsvector configuration and the embedding model quality. If one signal is poor — bad stemming config, low-quality embeddings — RRF will be dragged down by the weaker signal. Each signal must be individually competent for the combination to be effective. Chapters 4 and 7 covered the quality of each signal. The foundation matters. It always matters.

The pillars are combined. Twelve chapters of capability — lexical search, fuzzy matching, phonetic search, semantic search, autocomplete, aggregations, reverse search, custom analyzers — assembled into a single search pipeline through three CTEs and a formula that uses ranks instead of scores. The architecture is complete.

I am, if you will permit the observation, rather pleased with how it turned out.

But I am also a realist, and I know that an architecture without evidence is an argument without proof. The pipeline is assembled. How fast is it? How does it compare to Elasticsearch on the same workload, the same data, the same hardware? You deserve numbers, not assurances. And a good establishment does not ask you to take the quality of the kitchen on faith.

Chapter 14 provides four complete architecture patterns. You find your use case, you get your blueprint. I look forward to presenting them.