Chapter 11: The Percolator: Reverse Search

The Waiter of Gold Lapel · Published Apr 12, 2026 · 10 min

I have been looking forward to this chapter.

Everything you have seen so far works in one direction: you provide a query, and the database finds matching documents. “Here is what I’m looking for. What do you have?” The user asks. The database answers. This is how search has worked since Chapter 4.

This chapter inverts the relationship entirely. “Here is a document. Which of my stored queries does it match?” The queries came first — saved in a table, indexed, waiting patiently. The document is new. The question is not “what matches this query?” The question is “which queries match this document?”

I realize that may require a moment to sit with. It is, I find, one of those ideas that sounds like a parlor trick until you see the use cases. Then it sounds like the solution to a problem you have been solving the hard way for years.

Saved search alerts. “Email me when a new job posting matches ‘senior engineer AND remote AND PostgreSQL’.” The user saves the query once. Every new posting is tested against all saved queries. Matching users get notified. You have almost certainly used a product that does this. Now you know what’s behind it.

Content classification. “Which of our 50 topic categories does this article match?” Each category has a stored query. A new article arrives. The percolator tells you which categories apply — automatically, at insert time.

Email routing. “Which support team should handle this incoming ticket?” Each team has a stored query pattern. Incoming tickets are matched against all patterns. The highest-scoring match determines routing.

Content moderation. “Does this post match any of our flagged-content patterns?” Stored queries define the patterns. New content is tested against all of them.

Elasticsearch calls this the percolator. Most developers assume it requires a dedicated search engine. I am pleased to report that it does not. PostgreSQL handles it by storing tsquery values in a table with a GIN index. The implementation is, if I may say, considerably simpler than you might expect.

How It Works in Elasticsearch

Elasticsearch’s percolator stores queries as documents in a dedicated percolator index with a special percolator field type. When a new document arrives, it is “percolated” against all stored queries. The API returns the list of matching queries with scores.

The percolator is one of Elasticsearch’s more complex features. It requires a dedicated percolator index or field mapping, understanding of how stored queries interact with the analysis pipeline, and careful management of stored query documents. It is well-engineered, and I acknowledge the design with respect. The question, as with every feature in Part III, is whether you need a separate service to do it.

How It Works in PostgreSQL

Can PostgreSQL do reverse search like Elasticsearch’s percolator? Yes. Allow me to show you. Three steps.

Step 1: Create the stored queries table.

SQL

CREATE TABLE saved_searches (
    id SERIAL PRIMARY KEY,
    user_id INT NOT NULL,
    label TEXT,
    query tsquery NOT NULL,
    created_at TIMESTAMP DEFAULT now()
);
CREATE INDEX idx_percolate ON saved_searches USING GIN(query);

The key insight — and it is the kind of insight I find genuinely elegant — is that you can create a GIN index on a tsquery column. The GIN index does not just index documents for queries. It indexes queries for documents. The same data structure, serving the inverse purpose. PostgreSQL did not add a special “percolator mode.” The capability was already there, waiting to be used in the other direction.

Step 2: Store queries.

SQL

INSERT INTO saved_searches (user_id, label, query) VALUES
(42, 'Dream job alert',
    plainto_tsquery('english', 'senior engineer remote PostgreSQL')),
(17, 'Database news',
    to_tsquery('english', 'postgresql | postgres | pgvector')),
(23, 'Hiring updates',
    plainto_tsquery('english', 'hiring database engineer'));

Any tsquery function works: plainto_tsquery() for simple queries, to_tsquery() for boolean logic, websearch_to_tsquery() for user-provided search syntax. The stored query is a tsquery value — the same type you used in Chapter 4 for forward search. No new syntax. No new concepts. The same tools, applied in the other direction.

Step 3: Percolate an incoming document.

SQL

SELECT id, user_id, label, ts_rank(doc, query) AS score
FROM saved_searches,
    to_tsvector('english',
        'PostgreSQL senior database engineer wanted for remote position') AS doc
WHERE doc @@ query
ORDER BY score DESC;

Result:

Result

 id | user_id |      label       |    score
----+---------+------------------+------------
  1 |      42 | Dream job alert  | 0.0991032
  3 |      23 | Hiring updates   | 0.0607927
  2 |      17 | Database news    | 0.0607927
(3 rows)

A document came in. It was converted to a tsvector. It was matched against all stored tsquery values. The GIN index made it fast. The @@ operator did the matching — the same operator from Chapter 4, but with the operands inverted: the tsvector is the incoming document, the tsquery values are stored in the table.

That is it. Three SQL statements: create the table, store a query, match a document. I trust you were expecting something more complex. The percolator pattern in PostgreSQL is not a complex feature requiring dedicated infrastructure. It is a natural consequence of tsquery being a first-class data type with index support. The elegance, if I may say, is in the simplicity.

Gold Lapel’s Percolator Methods

percolate_add():

gl.percolate_add("saved_searches", "query", "senior engineer AND remote", user_id=42, label="Dream job alert")

Stores the tsquery with metadata. Handles the plainto_tsquery() conversion. You provide a natural-language search string. Gold Lapel converts and stores it.

percolate():

gl.percolate("saved_searches", "query", document_text)

Matches the document against all stored queries. Returns matching rows with _score. One method call to answer “which saved searches does this document satisfy?”

percolate_delete():

gl.percolate_delete("saved_searches", query_id)

Removes a stored query. Uses DELETE ... RETURNING for confirmation. A user unsubscribes from an alert. The query is removed. The percolator no longer matches against it.

Practical Patterns

Saved Search Notifications — Real-Time

The most common percolator use case. And this, if you will permit me a moment of genuine enthusiasm, is where the pattern becomes something rather special.

SQL

CREATE OR REPLACE FUNCTION notify_matching_searches()
RETURNS TRIGGER AS $$
BEGIN
    PERFORM pg_notify('search_match', json_build_object(
        'content_id', NEW.id,
        'search_id', s.id,
        'user_id', s.user_id
    )::text)
    FROM saved_searches s
    WHERE to_tsvector('english', NEW.title || ' ' || NEW.body) @@ s.query;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_percolate AFTER INSERT ON articles
FOR EACH ROW EXECUTE FUNCTION notify_matching_searches();

Read what this does: a new article is inserted. The trigger fires. The article is converted to a tsvector and matched against all stored queries. For each match, a notification is sent via pg_notify. A listening application process receives the notification and takes action — sends an email, pushes a mobile notification, updates a feed.

This is real-time. No polling. No cron job checking every five minutes. No external message queue. No separate service coordinating between the database and a notification pipeline. The notification fires at insert time, inside the transaction. A new article is published. Before the INSERT statement returns, every matching saved search has been identified and its user notified.

I find this worth pausing on. The entire pipeline — content storage, reverse search matching, and notification dispatch — happens inside PostgreSQL, in a single transaction, triggered by an ordinary INSERT. No infrastructure was added. No service was deployed. The database did what it does, and one of the things it does is this.

Content Classification

Store one tsquery per category:

SQL

INSERT INTO content_categories (name, query) VALUES
('PostgreSQL',
    to_tsquery('english', 'postgresql | postgres | pgvector | postgis')),
('Machine Learning',
    to_tsquery('english', 'embedding & (vector | neural | model)')),
('DevOps',
    to_tsquery('english', '(kubernetes | docker | terraform) & deploy'));

Percolating a new article against these categories tells you which topics it covers — automatically, at insert time if combined with the trigger pattern above. The article classifies itself by arriving.

Email and Ticket Routing

Each team or queue has a stored query defining its scope. Incoming messages are percolated against all teams. The highest-scoring match determines routing. Same three SQL statements. Same GIN index. Different use case. The percolator is a general-purpose matching engine — saved searches, classification, and routing are all instances of one pattern: “find which stored queries match this new document.”

Performance

The GIN index on tsquery is the key. Without it: sequential scan, comparing the document against every stored query. With it: PostgreSQL uses the index to find only queries whose constituent lexemes are present in the document, then evaluates the full tsquery match on that reduced candidate set. The index eliminates the vast majority of stored queries from consideration before the full match is computed.

The approach scales well to thousands of stored queries. For tens of thousands, consider partitioning by category, user group, or query type.

For context on the upper end of scale: the funda.nl engineering blog documents an enterprise system with over one million saved searches processing four million daily matches. They used OpenSearch’s percolator — not because PostgreSQL couldn’t handle the pattern, but because no comprehensive PostgreSQL guide existed at the time. I mention this both to acknowledge the scale at which reverse search operates in production and to note that the guide they were missing is the chapter you are reading now.

Elasticsearch Percolator vs. PostgreSQL Reverse Search

Aspect	Elasticsearch Percolator	PostgreSQL tsquery
Stored query type	Any ES query DSL	tsquery (lexical)
Index on queries	Percolator field type	GIN index on tsquery
Query registration	Dedicated percolator mapping	INSERT into table
Matching	Percolate API	`WHERE tsvector @@ tsquery`
Scoring	ES relevance scoring	ts_rank()
Real-time triggers	Requires external integration	LISTEN/NOTIFY + triggers
Full query DSL support	Yes (any ES query)	tsquery only (lexical)
Infrastructure	Elasticsearch cluster	Same database
Configuration complexity	High (dedicated index, mapping)	Low (table, GIN index)

I will note one row in particular: real-time triggers. Elasticsearch’s percolator requires external integration to trigger notifications — a separate process must poll for new documents and submit them to the percolate API. PostgreSQL handles it with a trigger that fires at INSERT time. The notification is part of the transaction. I find this difference worth your consideration.

Honest Boundary

PostgreSQL’s percolator is lexical — it matches tsvector against tsquery. It does not percolate vector similarity queries, fuzzy matches, or complex boolean logic beyond what tsquery supports. If you need “alert me when new content is semantically similar to this embedding,” you would need a separate mechanism — a periodic similarity scan or application-level logic. There is no native “vector percolator” in PostgreSQL. I mention this because I would rather you know the boundary than discover it during implementation.

Elasticsearch’s percolator supports the full query DSL — not just text matching. It can percolate geo queries, range queries, nested queries, and compound boolean queries across multiple field types. PostgreSQL’s version is scoped to tsquery. For most percolator use cases — saved text searches, content classification, keyword-based alerts — tsquery is sufficient. For complex multi-field percolation with non-text criteria, Elasticsearch’s percolator is genuinely more capable, and I acknowledge that without reservation.

At very large scale (millions of stored queries), performance characteristics need testing. The GIN index handles thousands of queries efficiently. Millions may require partitioning, query categorization, or architectural adjustments.

The percolator completes the inversion. PostgreSQL can now find queries for documents as fluently as it finds documents for queries. Store the query. Index it. Match incoming documents. Three SQL statements for the basic pattern. A trigger for real-time notifications. The applications — alerts, classification, routing, moderation — are as broad as the questions you can express in tsquery. And tsquery, as you have seen since Chapter 4, can express quite a lot.

One capability remains in Part III — and it is the one you will reach for when something goes wrong. When the results are not what you expect. When the ranking is puzzling. When a document that should match stubbornly refuses to appear. Before you can fix a search problem, you must understand it — and understanding it requires seeing inside the search pipeline.

Chapter 12 opens the hood. If you will follow me, I should like to hand you the diagnostic tools. Every good establishment has them. Not every establishment shows you where they are kept.