Chapter 8: Semantic Search (pgvector)
I promised you the SQL. Here it is.
Chapter 7 explained what embeddings are — how text becomes a position in a high-dimensional space, how similar meanings produce nearby positions, and how the "matching meaning" problem reduces to the "finding nearby points" problem. That chapter involved no code. It was, by design, entirely conceptual. I trust it was useful, and I trust you are now ready for something more concrete.
This chapter is concrete. The library from Chapter 7 becomes a table. The coordinate system becomes a column. The search becomes ORDER BY distance LIMIT k. The concept becomes code you can run in psql before you finish your coffee.
Can PostgreSQL be used as a vector database? Yes. pgvector adds a vector data type and distance operators to PostgreSQL, providing the same core capability — vector storage, distance computation, approximate nearest neighbor search via HNSW indexes — that purpose-built vector databases like Pinecone and Weaviate provide. The difference, and it is not a small one: pgvector lives inside your existing PostgreSQL database, alongside your tsvector columns, your trigram indexes, your Soundex indexes, and your application data. No separate service. No sync pipeline. No new infrastructure to monitor at hours you would prefer to be sleeping.
The extension is third-party but available on every major managed provider: Supabase, Neon, RDS, Cloud SQL, Azure Database for PostgreSQL. If you can run PostgreSQL, you can run pgvector.
Installing pgvector
CREATE EXTENSION IF NOT EXISTS vector; Gold Lapel's wrapper creates it lazily on first similar() call. On managed providers, pgvector is typically pre-installed or available with one-click enable. On self-hosted PostgreSQL, install from your package manager (apt install postgresql-16-pgvector on Debian/Ubuntu) or build from source. The pgvector GitHub repository has detailed installation instructions for every platform.
Version note: pgvector 0.8.0, the latest as of early 2026, delivers up to 9x faster query processing compared to earlier versions. If you are running an older version and finding performance insufficient, I would recommend upgrading before investigating other explanations. The improvement is substantial.
Storing Vectors
ALTER TABLE articles ADD COLUMN embedding vector(1536); The dimension in parentheses must match your embedding model's output dimension. If your model produces 768-dimensional vectors, use vector(768). If it produces 1536-dimensional vectors, use vector(1536). The types must agree, and PostgreSQL will tell you if they do not — which I consider a courtesy.
Maximum supported dimensions: 2,000 for full-precision vectors. Up to 4,000 with halfvec (half-precision floating point). Up to 64,000 with binary vectors. For the text search use cases in this book, full-precision vectors at 768-1536 dimensions are standard.
Populating the column. Generate embeddings via your model of choice and write them to the column:
-- After generating an embedding via your model API or local inference
UPDATE articles
SET embedding = '[0.023, -0.041, 0.087, 0.012, ...]'::vector
WHERE id = 42; In practice, this is a background job: a process that reads new or updated rows, generates embeddings (via API call or local model inference), and writes the vectors back. For the materialized view pattern from Chapter 3, the embedding column can be populated during the view creation query or updated by a separate process after each refresh.
Storage reminder from Chapter 7: a 1536-dimensional vector is approximately 6 KB per row. A 768-dimensional vector is approximately 3 KB. On a million-row table, plan for 3-6 GB of vector data plus index storage. These are manageable numbers. They are also numbers worth knowing before they surprise you.
Distance Operators
pgvector provides three distance operators:
| Operator | Distance Type | Use Case | Operator Class |
|---|---|---|---|
<=> | Cosine distance | Text embeddings (most common) | vector_cosine_ops |
<-> | Euclidean (L2) distance | Geometric/spatial data | vector_l2_ops |
<#> | Negative inner product | Maximum inner product search | vector_ip_ops |
For text search, use <=> (cosine distance). This is what Gold Lapel's similar() uses. Chapter 7 explained why cosine is the right metric for text embeddings — it compares direction (meaning), not magnitude (length).
A note that will save you debugging time: These are distances, not similarities. Lower values mean more similar. A cosine distance of 0.0 means identical vectors. A distance of 2.0 means maximally different. This is the opposite of similarity() from pg_trgm (where higher = more similar) and ts_rank() from tsvector (where higher = more relevant). If your results appear to be in the wrong order, check whether you are sorting ascending or descending. Vector search sorts ascending. I mention this because I have seen experienced developers spend an hour debugging what turned out to be ORDER BY distance DESC instead of ORDER BY distance.
The Query Pattern
Semantic search in PostgreSQL:
SELECT id, title, (embedding <=> query_embedding) AS distance
FROM articles
ORDER BY distance
LIMIT 10; This returns the 10 articles whose embeddings are closest to the query embedding — the 10 most semantically similar results.
No WHERE clause for the vector comparison. Vector search is expressed as ORDER BY distance LIMIT k — "give me the k nearest neighbors." This is fundamentally different from tsvector search (which uses WHERE search_vector @@ query) and trigram search (which uses WHERE similarity() > threshold). Full-text search and fuzzy search filter — they exclude non-matching rows. Vector search ranks — it orders all rows by distance and returns the top k. Everything is a candidate. The nearest ones win.
This is the pattern Gold Lapel's proxy detects. When it sees <=> or <-> in an ORDER BY clause, it auto-creates the appropriate HNSW index. It looks for these operators in ORDER BY specifically, not in WHERE — because that is where vector search lives.
Filtered vector search. You can add WHERE clauses for non-vector filtering:
SELECT id, title, (embedding <=> query_embedding) AS distance
FROM articles
WHERE category = 'technology' AND published = true
ORDER BY distance
LIMIT 10; This finds the 10 nearest technology articles that are published. The WHERE clause filters by standard SQL conditions; the ORDER BY ranks by semantic similarity within that filtered set. This is one of PostgreSQL's genuine advantages over purpose-built vector databases — the filter is standard SQL, which means it supports JOINs, subqueries, row-level security policies, and anything else you would normally write in a WHERE clause. The full power of SQL is available alongside vector search. I find this combination of capabilities rather difficult to match.
Exact vs. Approximate Nearest Neighbor
Without an index: PostgreSQL computes the distance from the query vector to every row in the table and returns the closest k. Results are exact — guaranteed to include the true nearest neighbors. But the computation is O(n): on a million-row table, it performs a million distance calculations per query. Thorough, but slow.
With an HNSW or IVFFlat index: Approximate nearest neighbor. The index provides a shortcut — it navigates a graph structure (HNSW) or cluster structure (IVFFlat) to find vectors that are very close to the nearest neighbors without checking every row. Results may not include the absolute closest vector, but they typically achieve 95-99%+ recall — meaning 95-99% of the true top-k results are in the approximate top-k.
The trade-off: exact accuracy vs. query speed. For nearly all applications, approximate nearest neighbor is both fast enough and accurate enough. The user cannot perceive the difference between result #1 being the true nearest neighbor and result #1 being the second-nearest neighbor. Nor should they have to. That is the index's job.
Exact search remains useful for small datasets (under ~100K rows) where the sequential scan completes quickly and you want guaranteed results.
HNSW Indexes
HNSW stands for Hierarchical Navigable Small World. It is the same algorithm used by Elasticsearch's dense vector search, by Pinecone, by Weaviate, and by most modern vector databases. The algorithm belongs to no single vendor. PostgreSQL, via pgvector, provides an excellent implementation of it.
CREATE INDEX idx_embedding ON articles USING hnsw(embedding vector_cosine_ops); How HNSW works. Imagine a series of maps at different zoom levels. The highest level is a coarse map with just a few landmarks spread across the terrain. Each level below adds more detail — more points, more connections between them. To find a point near your query, you start at the coarsest level, navigate to the approximate region, then descend through increasingly detailed levels, refining your position at each step. By the time you reach the bottom level, you have found a very good approximate neighbor — without ever checking every point on the map.
This is dramatically faster than exact search, which is the equivalent of unfolding the most detailed map available and checking every point by hand. The HNSW index replaces exhaustive comparison with intelligent navigation. I appreciate the efficiency.
Tuning Parameters
| Parameter | Default | What It Controls | Trade-off |
|---|---|---|---|
m | 16 | Connections per node per layer | Higher = better recall, more memory, slower builds |
ef_construction | 64 | Candidate list size during index build | Higher = better index quality, slower builds |
CREATE INDEX idx_embedding ON articles
USING hnsw(embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64); Query-time parameter: SET hnsw.ef_search = 40; (default). Controls how many candidates are evaluated during search. Higher = better recall, slower queries.
Practical guidance. The defaults are good for most workloads, and I would recommend starting with them. Increase ef_construction to 128 or 200 for datasets where recall quality matters more than build time. Increase ef_search to 100 or higher if relevance testing shows that expected results are missing from the top-k. For most application search, the defaults work without tuning — which is, in my view, the mark of well-chosen defaults.
Performance. On a 1M-vector dataset at 1536 dimensions, HNSW delivers approximately 40.5 queries per second with 95%+ recall at default settings.
Incremental inserts. HNSW handles new vectors added after the index is built. They are incorporated into the graph automatically — no rebuild needed. This matters for tables that receive ongoing writes, and it is one of the reasons HNSW is the default recommendation.
IVFFlat Indexes
IVFFlat stands for Inverted File with Flat quantization. An older algorithm than HNSW, with different trade-offs that make it the right choice for specific situations.
CREATE INDEX idx_embedding ON articles
USING ivfflat(embedding vector_cosine_ops)
WITH (lists = 100); The lists parameter sets the number of clusters. Rule of thumb: sqrt(number_of_rows) for tables under 1M rows, number_of_rows / 1000 for larger tables.
IVFFlat advantages over HNSW:
- 32x faster index builds. On very large tables where HNSW build time is measured in hours, this difference is significant.
- 2.8x less memory. 257MB vs 729MB for 1M vectors at 1536 dimensions.
IVFFlat disadvantages vs. HNSW:
- Lower query throughput. ~2.6 QPS vs HNSW's ~40.5 QPS on 1M vectors at comparable recall.
- Does not handle incremental inserts well. New vectors added after build are not optimally placed. The index should be rebuilt periodically if the table receives significant writes.
- CRITICAL: Never build IVFFlat on an empty table. I emphasize this because the failure mode is silent. The clustering step requires existing data to define meaningful clusters. An index built on an empty or near-empty table produces meaningless clusters and silently incorrect search results — your queries return results, but the results are wrong. You will not receive an error message. You will receive confident, incorrect answers. Load your data first, then create the index. This is the single most common IVFFlat mistake, and I would spare you the debugging session it produces.
When to use IVFFlat. Very large tables (tens of millions of rows) where HNSW build time is prohibitive. Batch-loaded tables where data doesn't change frequently. Memory-constrained environments where 2.8x less memory matters.
For most applications: use HNSW. It handles inserts, has higher recall, higher throughput, and the defaults work without tuning.
HNSW vs. IVFFlat
| Aspect | HNSW | IVFFlat |
|---|---|---|
| Recall | 95-99%+ (default settings) | Lower (depends on lists and probes) |
| Query throughput | ~40.5 QPS (1M vectors) | ~2.6 QPS (1M vectors) |
| Build speed | Slower (hours for very large tables) | 32x faster |
| Memory | 729MB (1M x 1536-dim) | 257MB (1M x 1536-dim) |
| Incremental inserts | Handled automatically | Requires periodic rebuild |
| Empty table build | Works | Broken — never do this |
| Best for | Most workloads (default choice) | Very large batch-loaded tables |
Start with HNSW. Move to IVFFlat only if build time or memory is a demonstrated constraint for your specific workload. When in doubt, HNSW. I have yet to see a team regret this default.
Putting It All Together
If you prefer to see the complete picture — and I respect that preference considerably — here is a runnable example from table creation through semantic search results:
-- Enable pgvector
CREATE EXTENSION IF NOT EXISTS vector;
-- Create a table with a vector column
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
description TEXT NOT NULL,
embedding vector(4) -- using 4 dimensions for readability
);
-- Insert sample data with embeddings
-- (In production, these would be 768-1536 dimensions from a real model.
-- 4-dimensional vectors are used here to keep the example readable.)
INSERT INTO products (name, description, embedding) VALUES
('Ergonomic Office Chair',
'Adjustable lumbar support with breathable mesh back',
'[0.82, 0.15, 0.91, 0.33]'),
('Standing Desk Converter',
'Sit-stand workstation that raises and lowers smoothly',
'[0.78, 0.22, 0.85, 0.41]'),
('Mechanical Keyboard',
'Cherry MX switches with programmable RGB lighting',
'[0.11, 0.89, 0.23, 0.67]'),
('Laptop Cooling Pad',
'USB-powered fan base for reducing laptop temperature',
'[0.15, 0.76, 0.31, 0.72]'),
('Monitor Arm Mount',
'Adjustable desk clamp for ergonomic screen positioning',
'[0.71, 0.18, 0.88, 0.29]');
-- Create the HNSW index
CREATE INDEX idx_products_embedding ON products
USING hnsw(embedding vector_cosine_ops);
-- Search: find products similar to "comfortable desk seating"
-- (using the embedding for that phrase — simplified to 4 dimensions)
SELECT name, description,
(embedding <=> '[0.80, 0.17, 0.89, 0.35]'::vector) AS distance
FROM products
ORDER BY distance
LIMIT 3; Result:
name | description | distance
-------------------------+----------------------------------------------------+------------------
Ergonomic Office Chair | Adjustable lumbar support with breathable mesh back | 0.0065161497
Monitor Arm Mount | Adjustable desk clamp for ergonomic screen position | 0.0173285841
Standing Desk Converter | Sit-stand workstation that raises and lowers smooth | 0.0258389184
(3 rows) The query vector [0.80, 0.17, 0.89, 0.35] — representing something like "comfortable desk seating" — is closest to the Ergonomic Office Chair, then the Monitor Arm Mount, then the Standing Desk Converter. The Mechanical Keyboard and Laptop Cooling Pad are further away because their embeddings point in a different direction — they are about different things.
No WHERE clause was needed. No keyword matching. No stemming. The query found results by meaning — by proximity in the vector space. The words "comfortable," "desk," and "seating" do not appear anywhere in the results. The meaning matches anyway. I find that worth pausing on.
In production, the embeddings would be 768-1536 dimensions from a real model rather than 4 dimensions chosen for readability. The query pattern is identical — only the vector length changes. The SQL is the same. The magic, if I may call it that, is the same.
Gold Lapel's similar()
goldlapel.similar(conn, "articles", "embedding", query_vector, limit=10) Generates:
SELECT *, (embedding <=> $1::vector) AS _score
FROM articles
ORDER BY _score
LIMIT $2; The proxy auto-creates an HNSW index with vector_cosine_ops when it detects <=> in ORDER BY. Also detects <-> for L2 distance and creates the index with vector_l2_ops. The developer calls one method. The proxy creates the right index. The user gets semantic search results ranked by meaning. The plumbing is invisible, which is how plumbing should be.
pgvector vs. Purpose-Built Vector Databases and Elasticsearch
| Aspect | pgvector (PostgreSQL) | Pinecone / Weaviate | Elasticsearch kNN |
|---|---|---|---|
| Vector storage | vector column type | Native | Dense vector field |
| Index algorithm | HNSW, IVFFlat | HNSW (proprietary variants) | HNSW |
| Filtering | SQL WHERE (full SQL power) | Metadata filters (limited) | Query DSL filters |
| Data colocation | Same database as app data | Separate service | Separate service |
| Consistency | ACID — immediate | Eventually consistent | Eventually consistent |
| Additional capabilities | FTS, fuzzy, phonetic, aggs, SQL | Vector search only | Full ES feature set |
| Infrastructure | Existing PostgreSQL | New service | Existing ES cluster |
Timescale has published benchmarks showing pgvector at 28x lower p95 latency, 16x higher throughput, and 75% lower cost compared to Pinecone s1 at 99% recall. These are Timescale's numbers — I present them because they are well-documented and reproducible, not because I claim them as my own. Verify against their published methodology for your workload.
The key differentiator for this book is not raw performance, which varies by workload and configuration. It is colocation. pgvector lives in the same database as your tsvector columns, your trigram indexes, your Soundex indexes, your application tables, and your row-level security policies. The full search stack — lexical + fuzzy + phonetic + semantic — runs in a single database with ACID consistency. No purpose-built vector database and no Elasticsearch deployment offers this combination. They are fine tools, each of them. But they are separate tools, and separate tools require separate infrastructure, separate synchronization, and separate late-night attention when something disagrees.
One database that does everything is, I would suggest, worth more than two systems that must be taught to agree.
Honest Boundary
pgvector provides approximate nearest neighbor search. HNSW recall is typically 95-99%+ with good tuning, but not 100%. For applications requiring guaranteed exact results, use exact search (no index) on smaller datasets. For most application search, the distinction between approximate and exact is imperceptible to the user.
Vector search quality depends entirely on your embedding model. pgvector does not generate embeddings — it stores and queries them. If your model does not capture the semantic distinctions that matter for your data, the search results will reflect the model's limitations, not pgvector's. Chapter 7 covered model selection. The database can only search what the model understands.
Index build time scales with dataset size and dimensions. Building an HNSW index on 10M rows x 1536 dimensions takes meaningful time — plan for this in deployment, especially if you are refreshing a materialized view with a vector column.
pgvector 0.8.0 represents a major performance improvement — up to 9x faster — over earlier versions. If you are running an older version and finding performance insufficient, upgrade before concluding that pgvector is the bottleneck. It may simply need its latest shoes.
Semantic search closes the gap between what users type and what they mean. The concept from Chapter 7 — text as position, similarity as distance — is now code you can run. Four lines of SQL to store a vector, one line to index it, one line to search it. The library we imagined in Chapter 7 is a table you can query.
The five pillars of search are nearly complete. Four are in place: lexical search found the right words (Chapter 4). Fuzzy matching forgave the wrong spelling (Chapter 5). Phonetic search matched the right sounds (Chapter 6). Semantic search understood the right meaning (Chapters 7-8). Together, they cover every dimension of imprecision between what the user types and what they intend.
One pillar remains — and it is the one the user encounters first. Before they have finished typing. Before they have pressed Enter. Before they have committed to a search at all. The dropdown that appears as their fingers move across the keyboard, suggesting what they might be looking for.
Chapter 9 handles autocomplete. It uses a familiar tool — pg_trgm, from Chapter 5 — applied to a problem the user notices more than any other. If you will follow me, I believe it will feel like reuniting with a capable acquaintance in a new and useful setting.