pgvector
Vector similarity search for PostgreSQL — because the manor already has a perfectly good room for your embeddings.
I confess a particular admiration for pgvector. The industry spent several years insisting that embeddings required an entirely new category of database — and then Andrew Kane wrote an extension that added vector similarity search to PostgreSQL with a single CREATE EXTENSION. PostgreSQL is not a database. It is an ecosystem that most teams use as a database. pgvector is among the finest evidence of this.
pgvector is an open-source PostgreSQL extension that adds vector data types and similarity search operators. It lets you store embedding vectors directly in your database and find nearest neighbors using L2 distance, cosine distance, or inner product — with optional approximate indexing via IVFFlat or HNSW for fast search at scale.
What pgvector does
pgvector introduces the vector data type to PostgreSQL, allowing you to store fixed-length arrays of floating-point numbers as column values. These vectors typically represent embeddings generated by machine learning models — dense numerical representations of text, images, or other data that capture semantic meaning.
Once vectors are stored, pgvector provides distance operators that measure how similar two vectors are. You can use standard SQL ORDER BY with these operators to find the nearest neighbors to a query vector. Without an index, this is an exact (brute-force) scan. With an IVFFlat or HNSW index, pgvector performs approximate nearest neighbor (ANN) search that trades a small amount of recall for significantly faster queries.
The key advantage is operational simplicity. Your vectors live in the same database as your relational data. You can join vector search results with regular tables, filter by SQL predicates, wrap searches in transactions, and use your existing backup and replication infrastructure. No separate vector database to deploy, synchronize, or maintain. One household, properly staffed, rather than two households in perpetual need of coordination.
When to use pgvector
pgvector is a good fit when your vector workload lives alongside relational data and you want to avoid the operational complexity of a separate system. Specific use cases:
- Semantic search — embed documents or product descriptions, then find results by meaning rather than keyword matching
- Retrieval-Augmented Generation (RAG) — store chunked documents as embeddings, retrieve relevant context at query time, and feed it to an LLM
- Recommendation engines — represent users and items as vectors, then find similar items or users via nearest neighbor search
- Image similarity — store image embeddings from models like CLIP and search by visual similarity
- Deduplication and clustering — find near-duplicate records by computing distances between embedding vectors
- Hybrid search — combine vector similarity with traditional SQL filters (e.g., nearest neighbors where
category = 'electronics'andprice < 100)
Installation and setup
pgvector does not require shared_preload_libraries and does not need a server restart. Installation is a single CREATE EXTENSION statement. The extension is third-party (not a core contrib module), so it must be installed on the server first — on most cloud providers it is pre-installed and ready to enable.
-- Install the extension (no shared_preload_libraries needed)
CREATE EXTENSION vector;
-- Verify it's working
SELECT * FROM pg_extension WHERE extname = 'vector'; Two statements. No restart. I mention this because the dedicated vector databases that pgvector replaces tend to involve rather more ceremony on installation day.
On self-managed PostgreSQL, install the package first. For Debian/Ubuntu, use the PostgreSQL APT repository (apt install postgresql-17-pgvector). For Red Hat/CentOS, use the PostgreSQL Yum repository.
Vector types
pgvector provides several data types for different precision and storage needs:
vector— single-precision (4 bytes per dimension), up to 16,000 dimensions. The default choice for most embedding models.halfvec— half-precision (2 bytes per dimension), up to 16,000 dimensions. Cuts storage and memory in half with minimal quality loss for most embeddings.sparsevec— sparse vectors that only store non-zero elements. Efficient for high-dimensional vectors that are mostly zeros.bit— binary vectors for Hamming and Jaccard distance calculations.
-- Create a table with a vector column (1536 dimensions for OpenAI embeddings)
CREATE TABLE documents (
id bigserial PRIMARY KEY,
content text,
embedding vector(1536)
);
-- Insert a vector
INSERT INTO documents (content, embedding)
VALUES ('PostgreSQL is a relational database', '[0.1, 0.2, 0.3, ...]'); Distance operators
pgvector uses custom operators for distance calculations. Because PostgreSQL index scans only support ascending order, some operators return negative or inverted values so that ORDER BY ... ASC returns the most similar results first.
| Operator | Distance metric | Use case |
|---|---|---|
<-> | L2 (Euclidean) distance | General-purpose; works on raw vectors |
<=> | Cosine distance | Preferred for normalized embeddings (most common) |
<#> | Negative inner product | When you need dot product similarity |
<+> | L1 (Manhattan) distance | Alternative distance metric |
-- L2 (Euclidean) distance — find nearest neighbors
SELECT id, content, embedding <-> '[0.1, 0.2, 0.3, ...]' AS distance
FROM documents
ORDER BY embedding <-> '[0.1, 0.2, 0.3, ...]'
LIMIT 5;
-- Cosine distance — preferred for normalized embeddings
SELECT id, content, 1 - (embedding <=> '[0.1, 0.2, 0.3, ...]') AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, 0.3, ...]'
LIMIT 5;
-- Negative inner product — useful when vectors are not normalized
SELECT id, content, (embedding <#> '[0.1, 0.2, 0.3, ...]') * -1 AS inner_product
FROM documents
ORDER BY embedding <#> '[0.1, 0.2, 0.3, ...]'
LIMIT 5; Standard SQL. Standard ORDER BY. Standard LIMIT. The only thing new is the operator in the middle — the rest is PostgreSQL as you already know it.
Index types
Without an index, pgvector performs exact nearest neighbor search — it computes distances against every row. This guarantees perfect recall but becomes slow as tables grow. pgvector provides two approximate nearest neighbor index types that trade a small amount of recall for dramatically faster queries.
HNSW (Hierarchical Navigable Small World)
HNSW builds a multi-layered graph structure. It provides the best speed-recall tradeoff for most workloads and is the recommended default. Key properties:
- Can be created on an empty table (no training step required)
- Better recall than IVFFlat at the same query speed
- Slower to build and uses more memory than IVFFlat
- Tunable via
m(connections per node) andef_construction(build-time beam width) - Query-time recall controlled by
hnsw.ef_search
-- Create an HNSW index (can be created on an empty table)
-- m: max connections per layer (default 16)
-- ef_construction: size of the dynamic candidate list (default 64)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Set the search beam width at query time (higher = better recall, slower)
SET hnsw.ef_search = 40; IVFFlat (Inverted File with Flat Compression)
IVFFlat divides vectors into clusters (lists) and searches only the nearest clusters at query time. Key properties:
- Requires data in the table before building (uses k-means clustering as a training step)
- Faster to build and uses less memory than HNSW
- Lower recall than HNSW at the same query speed
- Tunable via
lists(number of clusters) - Query-time recall controlled by
ivfflat.probes
-- Create an IVFFlat index (requires data in the table first)
-- lists: number of clusters — a common starting point is rows / 1000
-- for up to 1M rows, or sqrt(rows) for larger tables
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Set the number of probes at query time (higher = better recall, slower)
SET ivfflat.probes = 10; Half-precision vectors
For large tables where storage and memory are a concern, halfvec stores each dimension in 2 bytes instead of 4. This halves the index size and improves cache efficiency, with negligible impact on recall for most embedding models. Half the storage for very nearly the same quality — the sort of economy that appeals to any well-run household.
-- Half-precision vectors use 2 bytes per dimension instead of 4
-- Useful for reducing storage and memory when full precision isn't needed
CREATE TABLE documents_half (
id bigserial PRIMARY KEY,
embedding halfvec(1536)
);
-- Index half-precision vectors
CREATE INDEX ON documents_half
USING hnsw (embedding halfvec_cosine_ops); Cloud availability
| Provider | Status |
|---|---|
| Amazon RDS / Aurora | Available — supported on PostgreSQL 15.2+ instances |
| Google Cloud SQL | Available — enable via database flags |
| Azure Database for PostgreSQL | Available — add to allowlist, then CREATE EXTENSION |
| Supabase | Available — pre-installed, enable with CREATE EXTENSION |
| Neon | Available — pre-installed on PostgreSQL 15+ |
| Crunchy Bridge | Available — pre-installed on all clusters |
How Gold Lapel relates
Allow me a candid observation about vector workloads: they are expensive. A nearest-neighbor search touches the index, computes distances across candidate vectors, and returns results — all of which consume CPU and memory that your PostgreSQL instance would rather spend on other guests. In RAG pipelines and semantic search, these queries arrive on every user request, often with the same or remarkably similar query vectors.
Gold Lapel sits between your application and PostgreSQL as a proxy. It sees every query before it reaches the database — including your vector similarity searches. When the same embedding is searched repeatedly, or when query patterns emerge across your traffic, Gold Lapel identifies them and applies proxy-level optimizations. The database is spared redundant computation. Response times improve. Your application code and pgvector configuration remain untouched.
pgvector handles vector storage and search inside PostgreSQL. Gold Lapel attends to the query traffic in front of it. Different levels of the stack, complementary by nature. I would not suggest one as a substitute for the other — but together, they keep the household running efficiently under load.