Chapter 3: The Materialized View as Search Index
Readers of the first book will recognize what follows. For everyone else, allow me to introduce the most useful table in your database — the one that does not yet exist.
A materialized view is a precomputed, denormalized snapshot of a query result. It is a real table on disk, with real columns and real indexes, that refreshes on a schedule you control. It is not a cache. It is not a temporary table. It is a persistent, indexable surface that PostgreSQL maintains on your behalf — quietly, reliably, and without requiring a separate service to manage it.
This chapter establishes the foundation on which every search index in this book will be built. I have kept it deliberately brief — the materialized view is a technique, not a thesis. Its job is to prepare the surface. The craftsmanship begins in Chapter 4. A well-set table, if you will forgive me, is the precondition for everything that follows.
The Problem: Searching Across Multiple Tables
A typical application stores data across normalized tables: products, categories, tags, reviews, descriptions. This is good database design — it reduces duplication, enforces referential integrity, and keeps write operations clean. I would not recommend otherwise.
It is also, I must acknowledge, a problem for search.
A search query that spans all of these tables requires JOINs. JOINs at query time add overhead. JOINs at search time add considerably more — full-text search with tsvector across a five-table JOIN is not what GIN indexes were designed to accelerate. The query planner does its best, and I have great respect for the query planner, but the best available plan for a five-way JOIN with a tsvector match, a similarity ranking, and an ORDER BY on relevance score is still going to be slower than the same operation on a single flat table. One cannot ask the planner to outperform the architecture.
Elasticsearch solves this problem by denormalizing at index time. You flatten your data — joining products with their categories, tags, and reviews — into a single document before indexing it. A sync pipeline handles the flattening, the transformation, and the loading. It is an elegant approach, and I acknowledge it as such. The cost, as we discussed in Chapter 1, is the pipeline itself — the synchronization daemon, the consistency questions, the infrastructure that must be maintained alongside your database.
PostgreSQL can do the same flattening. The materialized view is the mechanism. And because it lives inside the database, the flattening happens without a sync pipeline, without a separate service, and without introducing a second source of truth that must be kept in agreement with the first.
The Search View Pattern
How do you use a PostgreSQL materialized view as a search index? You create a view that joins your normalized tables, flattens them into one row per searchable entity, and adds the columns and indexes your search requires. The pattern has three steps, and none of them will surprise you.
Step 1: Create the view.
CREATE MATERIALIZED VIEW search_products AS
SELECT
p.id,
p.name,
p.description,
p.price,
c.name AS category,
string_agg(DISTINCT t.name, ', ') AS tags,
string_agg(r.body, ' ') AS review_text,
to_tsvector('english',
p.name || ' ' ||
p.description || ' ' ||
c.name || ' ' ||
coalesce(string_agg(DISTINCT t.name, ' '), '') || ' ' ||
coalesce(string_agg(r.body, ' '), '')
) AS search_vector
FROM products p
JOIN categories c ON c.id = p.category_id
LEFT JOIN product_tags pt ON pt.product_id = p.id
LEFT JOIN tags t ON t.id = pt.tag_id
LEFT JOIN reviews r ON r.product_id = p.id
GROUP BY p.id, p.name, p.description, p.price, c.name; One query. Five tables flattened into one row per product. The search_vector column contains a precomputed tsvector built from the product name, description, category, tags, and review text — all concatenated and tokenized. The denormalization that Elasticsearch requires a sync pipeline to perform, PostgreSQL performs with a SELECT statement.
Step 2: Create the indexes.
-- Full-text search (Chapter 4)
CREATE INDEX idx_search_vector ON search_products USING GIN(search_vector);
-- Fuzzy matching and autocomplete (Chapters 5, 9)
CREATE INDEX idx_search_name_trgm ON search_products USING GIN(name gin_trgm_ops);
-- Phonetic search (Chapter 6)
CREATE INDEX idx_search_name_soundex ON search_products USING btree(soundex(name)); When you add a vector column for embeddings (Chapters 7-8), you would also add:
-- Semantic search (Chapter 8)
CREATE INDEX idx_search_embedding ON search_products USING hnsw(embedding vector_cosine_ops); Four index types. Four search capabilities. All on the same surface. Each index will be covered in its own chapter — for now, note that they coexist comfortably on a single materialized view, which is rather the point.
Step 3: Query it.
SELECT name, category, price, ts_rank(search_vector, q) AS score
FROM search_products, plainto_tsquery('english', 'ergonomic chair') AS q
WHERE search_vector @@ q
ORDER BY score DESC
LIMIT 20; Three SQL statements to set up. One SQL statement to search. No sync pipeline. No external service. No configuration file. The reader can copy these statements, adapt the column names to their own schema, and have a working search surface in ten minutes. I find this rather satisfying.
The Refresh Question
The materialized view is a snapshot. It reflects the state of the underlying tables at the time it was last refreshed. New products added after the last refresh will not appear in search results until the next refresh. Updated reviews will not be reflected. Deleted products may, regrettably, still appear.
This is the trade-off. And it is worth understanding clearly, because it is a different trade-off from the one Elasticsearch imposes.
REFRESH MATERIALIZED VIEW CONCURRENTLY search_products; The CONCURRENTLY keyword deserves a moment of appreciation. Without it, the view is locked during the refresh — queries against it will block until the refresh completes. With CONCURRENTLY, the view remains queryable while the new snapshot is being built. Users continue searching. When the refresh completes, the new data is swapped in atomically. No downtime. No interruption. The guests never notice the table being reset.
CONCURRENTLY requires a unique index on the materialized view — in most cases, the primary key of the underlying entity:
CREATE UNIQUE INDEX idx_search_products_id ON search_products(id); How often should you refresh? That depends on your application's tolerance for staleness:
- Every minute — for applications where search freshness matters. A product added one minute ago will be searchable within two.
- Every five minutes — a reasonable default for most applications. I have found this serves the majority of cases well.
- On a cron schedule — for applications where search data changes infrequently. Documentation sites, catalogs that update daily, reference databases.
- Triggered by application writes — refresh after a batch of writes completes. More complex to implement, but provides the tightest staleness window.
There is also pg_ivm — an extension that refreshes materialized views incrementally when the underlying tables change, rather than recomputing the entire snapshot. Book 1 covered this in detail. It is optional but recommended for applications that need near-real-time search freshness without the cost of full refreshes.
One point bears emphasis: the materialized view is eventually consistent with your base tables, but it is consistent with itself. All search indexes reflect the same snapshot. When the view refreshes, every index updates together, atomically. There is no window where the full-text index reflects one version of a product while the vector index reflects another. This is a meaningfully better form of eventual consistency than a system where individual documents sync independently and the search index may, at any given moment, reflect a partially-updated view of the data.
Consistency within a snapshot is not the same as no consistency at all. The distinction matters.
How This Compares to an Elasticsearch Index
The materialized view and the Elasticsearch index solve the same architectural problem: denormalize data for fast search. Both are worthy approaches. They differ in where the denormalization lives and what infrastructure it requires.
| Aspect | Elasticsearch Index | PostgreSQL Materialized View |
|---|---|---|
| Data source | Copied from database via sync pipeline | Created from database via SQL query |
| Freshness | Depends on sync pipeline latency | Depends on refresh schedule |
| Consistency | Individual docs sync independently | All indexes refresh atomically |
| Index types | Inverted index (built-in) | GIN, HNSW, B-tree, GiST (your choice) |
| Query language | Elasticsearch Query DSL | SQL |
| Infrastructure | Separate cluster | Same database |
| Maintenance | Mapping management, reindexing | REFRESH MATERIALIZED VIEW CONCURRENTLY |
| Access control | Separate authorization layer | Row-level security (RLS) inherited |
The materialized view is not a workaround for the absence of Elasticsearch. It is the same architectural pattern — denormalize, index, query — implemented inside the database instead of outside it. Elasticsearch's implementation requires a separate cluster and a synchronization pipeline. PostgreSQL's implementation requires a SQL query and a refresh schedule. Both approaches denormalize. Both approaches index. The question is where you prefer the complexity to live.
When You Don't Need a Materialized View
I would be doing you a disservice if I suggested the materialized view is always the right answer. It is not. Simplicity has its own considerable merits, and I would not recommend additional complexity where simplicity serves.
tsvector generated columns work directly on base tables. For simple cases — one table, one or two columns to search — indexing the base table is perfectly appropriate:
ALTER TABLE articles
ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (to_tsvector('english', title || ' ' || body)) STORED;
CREATE INDEX idx_articles_search ON articles USING GIN(search_vector); This is simpler. No refresh needed — the generated column updates automatically when the row changes. No staleness at all. For a blog with a single posts table, or an application where search spans only one table, this is the correct choice.
The materialized view earns its place when:
- Search spans multiple tables. The five-table JOIN problem. Denormalize once, search as many times as you wish.
- You want to combine multiple search types on the same surface. tsvector for full-text, pgvector for semantic, pg_trgm for fuzzy — all indexed on one view.
- You want all search indexes to refresh atomically. One refresh updates everything. No partial states. No windows of inconsistency between index types.
- You want to decouple search performance from write performance. Writes hit the base tables. Reads hit the materialized view. Neither interferes with the other.
For the common case — an application with products, users, documents, and a search bar that needs to span multiple related tables — the materialized view is what I would recommend. For the simple case, the generated column on the base table is sufficient, and sufficient is a perfectly respectable standard.
Use the simpler approach when the simpler approach serves. The best architecture is the one that matches the actual requirements — not the one that anticipates requirements you may never have.
The surface is prepared. The indexes are in place. The foundation, if I may say, is sound.
Chapter 4 will attend to the first and most fundamental capability built upon this surface — full-text search with tsvector. PostgreSQL has offered it since 2008. Most teams have never thought to use it. I intend to make the case that they should.