← How-To

pgvector Performance Tuning: From Default to Optimized

Out of the box, pgvector performs a sequential scan over every vector. This will not do. Allow me to walk you through every tuning parameter that matters.

The Butler of Gold Lapel · March 26, 2026 · 25 min read
The commissioned piece — a navigable small world graph in the style of a star chart — arrived as a single dot with 1,536 lines radiating outward. The artist says it represents a vector. He is not wrong, but neither is he helpful.

Default pgvector Is Not Production pgvector

Good evening. I see you have arrived with pgvector in its factory settings, and I must confess a certain concern. Out of the box, pgvector does not use any index at all — it performs a sequential scan over every vector in the table. This is rather like asking the entire household staff to line up each time you need a single glass of water. Functional, certainly. Sustainable, not at all.

pgvector is the most widely adopted vector search extension for PostgreSQL, and for good reason — it stores, indexes, and queries high-dimensional vectors alongside your relational data using standard SQL. But the difference between default and properly tuned pgvector is often 10–50x in query latency and a meaningful improvement in recall accuracy. The defaults are a starting point for demonstrations, not a destination for production.

This guide covers every tuning parameter that matters: index type selection, build parameters, query-time tradeoffs, batch insert strategies, filtered search patterns, and hardware considerations. If you'll permit me, I would like to walk through each one with practical recommendations at every step.

For a conceptual overview of pgvector (installation, types, distance functions), see the pgvector extension page. For query-level optimization patterns (distance function selection, embedding model considerations, filtering SQL), see the pgvector query optimization guide. This article focuses on index-level tuning and infrastructure configuration.

Why the Defaults Are a Starting Point, Not a Destination

Allow me to be direct about what happens when you first create a table with a vector column and run a similarity search: pgvector performs a sequential scan. It computes the distance between your query vector and every row in the table, sorts the results, and returns the top K. This is exact nearest-neighbor search — perfect recall, but O(n) in the number of vectors.

-- Without an index, this scans every row
SELECT id, embedding <-> '[0.1, 0.2, ...]'::vector AS distance
FROM items
ORDER BY embedding <-> '[0.1, 0.2, ...]'::vector
LIMIT 10;

For a table with 10,000 vectors, a sequential scan completes in a few milliseconds — perfectly manageable. For 100,000 vectors, it takes tens of milliseconds. For 1,000,000 vectors with 1536 dimensions (the size of OpenAI text-embedding-3-small output), sequential scan takes hundreds of milliseconds to seconds per query. At that point, I'm afraid we have a situation that requires attention.

Creating an approximate nearest-neighbor (ANN) index changes the query from O(n) to approximately O(log n), trading a small amount of recall accuracy for dramatically lower latency. But the default index parameters are conservative starting points, not production-ready configurations.

HNSW vs IVFFlat — Choosing Your Index Type

pgvector supports two ANN index types: IVFFlat and HNSW. They use fundamentally different algorithms, and the choice between them affects build time, query latency, recall, and memory usage. Allow me to introduce each properly.

IVFFlat

IVFFlat (Inverted File with Flat quantization) partitions vectors into a set of lists, sometimes called Voronoi cells. Each vector is assigned to the list whose centroid it is closest to. At query time, instead of scanning every vector, the algorithm identifies the closest list centroids and searches only those lists.

-- Create an IVFFlat index with 100 lists
CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);

How it works: A single pass through the data computes centroids using k-means clustering, then assigns each vector to its nearest centroid. This produces a flat structure — no hierarchy, just partitions.

Build time: Fast. A single pass through the data makes IVFFlat builds significantly cheaper than HNSW.

Critical requirement — and I cannot stress this enough: IVFFlat needs representative data in the table before index creation. Building an IVFFlat index on an empty table or a small, non-representative sample produces poor centroid placement. The resulting partitions will not reflect the actual data distribution, and recall will suffer permanently until the index is rebuilt.

Query parameter — ivfflat.probes: Controls how many lists are searched at query time. The default is 1, meaning only the single nearest list is searched. I should note that this produces remarkably low recall — relevant vectors in adjacent lists are missed entirely.

-- Default: searches only 1 list (very low recall)
SET ivfflat.probes = 1;

-- Better: search more lists for higher recall
SET ivfflat.probes = 10;

A reasonable starting point is probes = sqrt(lists). For 100 lists, that means 10 probes. From there, measure recall against ground truth and adjust upward until your recall target is met.

Best for: Datasets that do not change frequently, cost-sensitive environments where HNSW memory requirements are prohibitive, and situations where moderate recall (90–95%) is acceptable.

HNSW (Hierarchical Navigable Small World)

HNSW builds a multi-layer graph where each node (vector) is connected to its nearest neighbors. The top layers contain a sparse subset of nodes for coarse navigation, while the bottom layer contains all nodes for fine-grained search. A query enters at the top layer and traverses downward, narrowing the search at each level.

-- Create an HNSW index
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);

How it works: Vectors are inserted into a navigable small-world graph one at a time. Each insertion connects the new vector to its nearest neighbors at each layer, maintaining the graph's navigability properties. The result is a structure that enables efficient approximate nearest-neighbor search through greedy graph traversal.

Build time: Significantly slower than IVFFlat — 10 to 100 times slower depending on the dataset size and parameters. HNSW builds are also memory-intensive because the graph structure must be held in memory during construction.

Build advantage: HNSW works on empty tables. You can create the index before inserting data, and new vectors are incorporated into the graph as they are inserted. This makes HNSW suitable for applications with continuous data ingestion.

Query parameter — hnsw.ef_search: Controls the size of the dynamic candidate list during search. Higher values explore more of the graph, producing better recall at the cost of higher latency. The default is 40.

-- Default
SET hnsw.ef_search = 40;

-- Higher recall
SET hnsw.ef_search = 100;

Build parameters: Two parameters control graph construction quality:

  • m (default 16): Maximum number of connections per node at each layer. Higher values produce a more connected graph with better recall, at the cost of more memory and slower builds.
  • ef_construction (default 64): Size of the candidate list during index build. Higher values produce a higher-quality graph. Should be at least 2 times m.
-- Production-quality HNSW index
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)
WITH (m = 16, ef_construction = 128);

Best for: Production workloads that need high recall (>95%), applications with continuous data ingestion, and situations where build time is a one-time or infrequent cost.

When to Use Which

If I may offer a direct recommendation: HNSW is the right choice for most production use cases. It provides a higher recall ceiling, does not require data to be present before index creation, and is less sensitive to parameter misconfiguration. The main cost is build time and memory during construction.

IVFFlat earns its place with large datasets (10M+ vectors) where HNSW build time or memory requirements are prohibitive, or when recall requirements are moderate and the dataset is relatively static.

Neither index is necessary for small datasets. Below approximately 10,000 vectors, sequential scan is fast enough for most applications. The overhead of maintaining an ANN index outweighs the latency benefit. There is no virtue in optimizing what is already fast.

Distance Functions and Their Performance Impact

pgvector supports three distance operators, each backed by a corresponding index operator class. The choice matters for correctness, though I should note the performance difference between them is negligible.

The Three Distance Operators

OperatorDistance MetricOperator ClassUse Case
<->L2 (Euclidean)vector_l2_opsGeneral-purpose, most common
<=>Cosine distancevector_cosine_opsNormalized embeddings, angular similarity
<#>Inner product (negative)vector_ip_opsMaximum inner product search
-- L2 distance (Euclidean)
SELECT * FROM items ORDER BY embedding <-> query_vector LIMIT 10;

-- Cosine distance
SELECT * FROM items ORDER BY embedding <=> query_vector LIMIT 10;

-- Inner product (returns negative inner product for ORDER BY ASC)
SELECT * FROM items ORDER BY embedding <#> query_vector LIMIT 10;

Choosing the Right Distance Function

The choice of distance function should be driven by correctness, not performance.

If your embeddings are normalized (most modern embedding models produce normalized output, including OpenAI, Cohere, and Voyage): cosine distance and L2 distance produce identical ranking. In this case, use L2 (<->) because it avoids the normalization step that cosine distance applies internally.

If your embeddings are not normalized: Use cosine distance (<=>) to handle varying vector magnitudes. Without normalization, L2 distance gives disproportionate weight to vectors with larger magnitudes.

Inner product (<#>): Used for specific workloads like recommendation systems that rely on dot-product similarity. Note that pgvector returns the negative inner product to maintain ascending sort order compatibility.

Pre-normalizing Vectors

If your application uses cosine distance, you can normalize vectors at insert time and then use L2 distance for queries. This avoids the per-query normalization computation.

Normalize in SQL
-- Normalize a vector at insert time
INSERT INTO items (embedding)
VALUES (
  '[0.1, 0.2, 0.3]'::vector /
  sqrt('[0.1, 0.2, 0.3]'::vector <#> '[0.1, 0.2, 0.3]'::vector * -1)
);

In practice, normalizing in application code before inserting is cleaner — and I would recommend it. Most embedding model client libraries provide normalization functions, or you can divide each vector by its L2 norm.

After pre-normalizing, create the index with vector_l2_ops and query with <->. The results will be equivalent to cosine distance but with slightly less per-query computation. A small optimization, but the kind that compounds.

Index Build Parameters — The Parameters That Earn Their Keep

HNSW Build Parameters

Two parameters control the quality of the HNSW graph at build time:

m (default 16): The maximum number of bi-directional connections per node at each graph layer. Higher values produce a more densely connected graph, which improves recall but increases memory usage and build time.

  • Practical range: 8–64
  • Recommended: 16 for most workloads. Increase to 32–48 for applications requiring very high recall (>99%) or when working with high-dimensional vectors (>1536 dimensions).
  • Effect on index size: Approximately linear — doubling m roughly doubles the graph portion of the index.

ef_construction (default 64): The size of the candidate list used during graph construction. Higher values produce a better-quality graph (more thoroughly optimized neighbor connections) at the cost of slower builds.

  • Practical range: 64–512
  • Rule of thumb: ef_construction should be at least 2 times m. A value below this produces suboptimal neighbor selection.
  • Recommended: 128–256 for production builds.
-- Conservative production configuration
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)
WITH (m = 16, ef_construction = 128);

-- High-recall production configuration
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)
WITH (m = 32, ef_construction = 256);

Memory during build: HNSW index construction is memory-intensive. The entire graph structure must fit in memory during the build process. The maintenance_work_mem parameter controls how much memory PostgreSQL allocates for index build operations.

-- Set before building the index
SET maintenance_work_mem = '2GB';

-- Build the index
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)
WITH (m = 16, ef_construction = 200);

-- Reset after build
RESET maintenance_work_mem;

A rough formula for memory during HNSW build: approximately m * 2 * 4 bytes * num_vectors for the graph structure, plus the vectors themselves. For 1 million vectors with 1536 dimensions, plan for 2–4 GB of maintenance_work_mem.

If maintenance_work_mem is too small, the build process will either fail or produce a degraded graph. I'm afraid PostgreSQL does not warn you that the graph quality was affected — it silently builds what it can within the memory limit. This is one of those situations where the absence of an error message is itself the problem.

IVFFlat Build Parameters

lists (number of partitions): The primary tuning knob for IVFFlat. This determines how many partitions the k-means clustering produces.

  • Rule of thumb for datasets under 1M vectors: lists = sqrt(num_vectors). For 1 million vectors, that is approximately 1,000 lists.
  • Rule of thumb for datasets over 1M vectors: lists = num_vectors / 1000.
  • Too few lists: Each partition contains many vectors, making per-list scans slow. Query latency degrades toward sequential scan.
  • Too many lists: Relevant vectors are split across many partitions, requiring more probes to achieve the same recall. The index becomes less efficient, not more.
-- For a table with ~1M vectors
CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 1000);

Build requirement — worth repeating: The table must contain representative data before building an IVFFlat index. Building on a partially loaded table produces poor centroid placement. If you are bulk loading, insert all data first, then create the index.

Query-Time Parameters — Where You Decide What Matters Most

HNSW ef_search

hnsw.ef_search controls the size of the dynamic candidate list during graph traversal. A larger candidate list means more nodes are examined, which improves recall at the cost of increased latency.

ef_searchTypical Recall (1M vectors, 1536d)Approximate Latency
1085–90%<1ms
40 (default)93–96%2–5ms
10097–99%5–10ms
20099–99.5%10–20ms
400>99.5%20–40ms

These numbers vary with dataset characteristics, dimensionality, and hardware. They represent typical ranges, not guarantees.

-- Set per session
SET hnsw.ef_search = 100;

-- Or per transaction
BEGIN;
SET LOCAL hnsw.ef_search = 200;
SELECT * FROM items ORDER BY embedding <-> query_vector LIMIT 10;
COMMIT;

Tuning approach: Start at the default of 40. Measure recall against ground truth (exact nearest-neighbor results from a sequential scan). Increase ef_search until your recall target is met. The latency cost scales roughly linearly with ef_search — doubling ef_search approximately doubles query time.

For most production workloads, values between 40 and 200 provide a sensible balance. For applications where recall is critical — RAG retrieval, recommendation systems, anything where a missed result has a measurable cost — values of 200–400 may be necessary to achieve 99%+ recall. The right number is the one your requirements demand, not the one that feels tidy.

IVFFlat probes

ivfflat.probes controls how many lists (partitions) are searched at query time. The default of 1 is almost always too low for production use.

-- Default: 1 probe (low recall)
SET ivfflat.probes = 1;

-- Better starting point
SET ivfflat.probes = 10;

-- High recall
SET ivfflat.probes = 30;

Tuning approach: Start at sqrt(lists). For an index with 1,000 lists, that is approximately 32 probes. Measure recall against ground truth and adjust. Production values typically range from sqrt(lists) to lists / 4.

Setting probes equal to lists produces a full scan of all partitions — sequential scan performance with the overhead of the index structure. At that point, the index is not serving its purpose.

Measuring Recall

I would be remiss if I did not address how to measure what you are tuning for. Recall is the fraction of true nearest neighbors returned by the approximate search. If the exact top-10 nearest neighbors are {A, B, C, D, E, F, G, H, I, J} and the ANN search returns {A, B, C, D, E, F, G, H, K, L}, recall is 80% (8 out of 10 correct).

To measure recall, compare the results of an indexed search against an exact sequential scan on a representative sample:

-- Step 1: Get exact results (ground truth)
SET enable_indexscan = off;
SET enable_bitmapscan = off;

SELECT id
FROM items
ORDER BY embedding <-> '[0.1, 0.2, ...]'::vector
LIMIT 10;

-- Step 2: Get approximate results (indexed search)
RESET enable_indexscan;
RESET enable_bitmapscan;

SELECT id
FROM items
ORDER BY embedding <-> '[0.1, 0.2, ...]'::vector
LIMIT 10;

-- Step 3: Compare the two result sets

Automate this by running the comparison across 100–1,000 randomly sampled query vectors and averaging the recall. This gives you a statistically meaningful recall measurement for your specific dataset and index configuration.

Target recall guidelines:

  • 90–95%: Acceptable for exploratory search, content discovery, "more like this" features
  • 95–99%: Appropriate for most production applications, including product search and recommendations
  • 99%+: Required for RAG retrieval, de-duplication, and search-critical applications where missing a relevant result has a measurable cost

Batch Insert Optimization

A word on inserts, if you'll permit me. Default single-row inserts into a table with an HNSW index are slow — slower than most teams expect. Each INSERT triggers an index update: the new vector must be inserted into the graph, which involves neighbor search and connection updates. At scale, this per-row overhead dominates insert time.

Strategy 1: Bulk Load Without Index

The fastest approach for initial data loads is to drop the index, load the data, and rebuild the index.

-- Drop the index
DROP INDEX items_embedding_idx;

-- Bulk load data (COPY is fastest)
COPY items (id, content, embedding) FROM '/path/to/data.csv' WITH (FORMAT csv);

-- Alternatively, batch INSERT
INSERT INTO items (id, content, embedding)
VALUES
  (1, 'text1', '[0.1, 0.2, ...]'),
  (2, 'text2', '[0.3, 0.4, ...]'),
  -- ... thousands of rows per statement
  ;

-- Set memory for index build
SET maintenance_work_mem = '4GB';

-- Rebuild the index
CREATE INDEX items_embedding_idx ON items USING hnsw (embedding vector_l2_ops)
WITH (m = 16, ef_construction = 200);

-- Reset
RESET maintenance_work_mem;

HNSW index build time on 1 million vectors with 1536 dimensions: approximately 10–30 minutes depending on hardware, m, and ef_construction settings. IVFFlat builds on the same data complete in 1–3 minutes.

Strategy 2: Batch Inserts with Index in Place

For ongoing data ingestion into a table with an existing index, insert in batches within a single transaction. Each transaction triggers one index update operation instead of N.

-- Insert in batches of 1,000-10,000 rows per transaction
BEGIN;
INSERT INTO items (id, content, embedding) VALUES
  (1001, 'text', '[...]'),
  (1002, 'text', '[...]'),
  -- ... up to 10,000 rows
  ;
COMMIT;

The optimal batch size depends on vector dimensions and available memory. For 1536-dimensional vectors, batches of 1,000–5,000 rows are a reasonable starting point.

Parallel Index Builds

PostgreSQL 17 and later versions support parallel HNSW index builds. Enable parallelism by setting max_parallel_maintenance_workers:

-- Allow parallel workers for index build
SET max_parallel_maintenance_workers = 4;
SET maintenance_work_mem = '2GB';

CREATE INDEX items_embedding_idx ON items USING hnsw (embedding vector_l2_ops)
WITH (m = 16, ef_construction = 200);

Parallel builds can reduce HNSW index creation time by 2–4x depending on the number of workers and available CPU cores.

Filtering with WHERE Clauses

The Filtering Problem

This is where things become interesting. Most real-world vector search queries include a filter. You want the 10 most similar items, but only within a specific category, tenant, or time range.

-- Common pattern: vector search with a filter
SELECT id, content
FROM items
WHERE category = 'electronics'
ORDER BY embedding <-> query_vector
LIMIT 10;

Without a specific strategy, pgvector handles this by searching the full ANN index for the top candidates, then applying the WHERE filter to the results. If the filter is selective — say, "electronics" represents only 2% of the data — most of the ANN results will be discarded. You asked for 10 results, the index returned its top candidates, and the filter removed most of them. The effective recall drops, and you may get fewer than 10 results or results that are not truly the nearest neighbors within the filtered set.

This is the post-filtering problem, and it is worth understanding clearly — it affects all vector indexes that are not aware of metadata during traversal.

Partial Indexes

Create a separate HNSW index for each filter value:

-- One index per category
CREATE INDEX items_electronics_idx ON items
USING hnsw (embedding vector_l2_ops)
WHERE category = 'electronics';

CREATE INDEX items_clothing_idx ON items
USING hnsw (embedding vector_l2_ops)
WHERE category = 'clothing';

The query planner automatically selects the correct partial index when the WHERE clause matches.

Best for: Low-cardinality filters — tens of categories, not thousands. Each partial index consumes memory and must be maintained on inserts, so this approach does not scale to hundreds or thousands of distinct filter values.

Partitioning

Partition the table by the filter column. Each partition gets its own vector index, and the query planner directs the search to the correct partition.

-- Partition by tenant
CREATE TABLE items (
  id bigint,
  tenant_id int,
  content text,
  embedding vector(1536)
) PARTITION BY LIST (tenant_id);

CREATE TABLE items_tenant_1 PARTITION OF items FOR VALUES IN (1);
CREATE TABLE items_tenant_2 PARTITION OF items FOR VALUES IN (2);

-- Each partition gets its own HNSW index
CREATE INDEX ON items_tenant_1 USING hnsw (embedding vector_l2_ops);
CREATE INDEX ON items_tenant_2 USING hnsw (embedding vector_l2_ops);

Best for: High-cardinality filters with natural partitioning — tenant isolation, date ranges, geographic regions. The query planner eliminates irrelevant partitions before the vector search begins, so the ANN search runs only within the filtered set.

Over-Fetching with Post-Filtering

The simplest approach: fetch more candidates than you need, then filter in a subquery.

-- Over-fetch and post-filter
SELECT * FROM (
  SELECT id, content, category, embedding <-> query_vector AS distance
  FROM items
  ORDER BY embedding <-> query_vector
  LIMIT 200
) candidates
WHERE category = 'electronics'
ORDER BY distance
LIMIT 10;

This works when the filter matches a substantial fraction of the data (>10%). If the filter is very selective, you need an impractically large LIMIT to reliably get enough matching results. I would characterize this as a stopgap — useful in the interim, but not a long-term arrangement for selective filters.

Hardware and Configuration Considerations

If you'll allow me a brief tour of the infrastructure requirements — vector indexes are memory-intensive and benefit from specific PostgreSQL configuration adjustments.

shared_buffers: Vector indexes perform best when cached in PostgreSQL's buffer pool. If the index does not fit in shared_buffers, queries will incur disk I/O on every graph traversal step, dramatically increasing latency.

An HNSW index on 1 million vectors with 1536 dimensions consumes approximately 6–8 GB. If your shared_buffers is set to the default 128 MB, none of the index will be cached.

-- Check index size
SELECT pg_size_pretty(pg_relation_size('items_embedding_idx'));

Set shared_buffers large enough to hold your vector index plus your regular working set. A common starting point is 25% of available RAM, but workloads with large vector indexes may need more.

effective_cache_size: This parameter tells the query planner how much total cache (shared_buffers + OS page cache) is available. Set it accurately so the planner can make informed decisions about whether to use the index or fall back to sequential scan.

work_mem: Affects sort operations in vector queries. If EXPLAIN shows external merge sorts, increase work_mem for the session.

SSD vs HDD: HNSW graph traversal involves random access patterns — following graph edges to non-adjacent memory locations. On spinning disks, each random read incurs a full seek time (5–10ms). On SSDs, random reads complete in microseconds. I should be direct: SSDs are not optional for production vector search. The performance difference is not marginal — it is categorical.

RAM sizing: For optimal performance, the entire vector index should fit in memory (shared_buffers + OS page cache). If the index exceeds available memory, consider:

  • Reducing m (fewer connections per node, smaller index)
  • Using IVFFlat instead of HNSW (more compact index structure)
  • Reducing vector dimensions (if your embedding model supports dimensionality reduction)
  • Partitioning so only active partitions need to be in memory

When pgvector Is Not Enough — An Honest Assessment

pgvector is an excellent choice for the majority of vector search workloads. But a guide that did not acknowledge its limitations would be doing you a disservice — and would be an embarrassment to me.

Billion-scale datasets: HNSW index build time and memory requirements become prohibitive at billions of vectors on a single PostgreSQL instance. Dedicated vector databases with horizontal sharding across multiple nodes handle this scale more naturally.

Sub-millisecond latency requirements at scale: For workloads that require sub-millisecond p99 latency over millions of vectors with concurrent queries, dedicated vector databases with GPU acceleration or optimized memory layouts may be necessary. pgvector achieves low single-digit milliseconds, but sub-millisecond at scale is not its design target.

Complex hybrid search: If your application needs dense vector similarity combined with sparse vector matching, BM25 keyword ranking, and metadata scoring in a single query, purpose-built systems like Weaviate or Vespa have more mature hybrid search pipelines. pgvector can be combined with PostgreSQL's tsvector for basic hybrid search, but the integration is manual.

Newer ANN algorithms: The vector search research community ships new algorithms regularly — DiskANN, ScaNN, graph+quantization hybrids. Dedicated vector databases adopt these faster than pgvector, which follows PostgreSQL's more deliberate release cadence.

pgvector covers the needs of roughly 80% of applications that require vector search. If you are in the other 20%, you likely already know it — because you have encountered a specific, measurable limitation, not a hypothetical one. For the broader comparison, see the vector database comparison. For the decision framework on whether you need a dedicated vector database at all, see Do You Need a Vector Database?

How Gold Lapel Attends to Vector Queries

Gold Lapel detects pgvector queries that are missing HNSW indexes and recommends index creation. It identifies expensive vector search patterns — queries that would benefit from ef_search tuning or partial indexes for filtered search — and surfaces them as actionable recommendations. It also monitors pgvector query performance over time, so you are informed when degradation occurs as your vector data grows, rather than discovering it from a user complaint.

Frequently asked questions