Do You Need a Vector Database? A Decision Framework
Five options. One decision framework. Before you add another database to your stack, allow me to help you determine whether you actually need one.
The question worth asking before adding infrastructure
Good evening. I see you have arrived with embeddings.
Retrieval-augmented generation, semantic search, recommendation engines, anomaly detection, image similarity — all of them rely on vector representations and nearest-neighbor search. The tooling ecosystem has responded accordingly: purpose-built vector databases have proliferated, each promising to solve the embedding storage and retrieval problem. They are, I should say, genuinely impressive software.
The default advice, repeated across blog posts, conference talks, and AI tutorials, is straightforward: "use a vector database." But the conversation tends to skip a prior question — do you need a separate one?
This article applies the same thesis that underlies You Don't Need Redis: explore what you already have before adding what you don't. Dedicated vector databases are excellent, well-engineered tools solving real problems at scale. The decision to adopt one should be informed by your actual workload, not by default assumptions about what "doing AI" requires.
The framework that follows examines five options — pgvector (PostgreSQL's vector extension), Pinecone, Weaviate, Qdrant, and Milvus — through the lens of five questions. By the end, you should know which questions to ask about your own workload, and which tool those answers point toward.
If you have already read PostgreSQL vs Redis for Caching, the reasoning pattern will be familiar: start with what you have, optimize first, scale later.
The five contenders
Before walking through the decision framework, allow me to introduce the principals.
pgvector is a PostgreSQL extension (available on GitHub) that adds vector data types and similarity search operators directly to PostgreSQL. Vectors are stored alongside relational data in ordinary tables, queried with SQL, and managed with standard PostgreSQL tooling. It supports HNSW and IVFFlat indexing algorithms. For existing PostgreSQL users, adoption requires installing an extension — not adding a database.
Pinecone is a fully managed, serverless-first vector database. There is no self-hosted option. You send vectors through an API, Pinecone handles indexing, sharding, replication, and scaling. It is designed to minimize operational overhead.
Weaviate is an open-source vector database with a schema-driven design. It supports multiple data modalities, offers built-in hybrid search (combining vector and keyword search), and includes integrations with popular embedding providers. Available self-hosted or as a managed cloud offering.
Qdrant is an open-source, Rust-based vector database designed with a strong emphasis on filtering performance. It implements a custom HNSW variant optimized for combining vector search with metadata filters. Available self-hosted or through Qdrant Cloud.
Milvus is an open-source, distributed-first vector database designed for large-scale deployments. Its architecture separates storage and compute, allowing independent scaling. Zilliz Cloud offers a managed version.
What they all have in common
Despite significant architectural differences, all five share core capabilities:
- Approximate nearest neighbor (ANN) search. All five implement ANN algorithms that trade a small amount of recall accuracy for dramatically faster search times.
- Multiple distance metrics. Cosine similarity, Euclidean distance (L2), and inner product are supported across all options.
- Metadata filtering. All five allow attaching metadata to vectors and filtering results during search.
- API access. The dedicated vector databases expose REST and/or gRPC APIs with SDKs. pgvector uses SQL, which is both its interface and its advantage.
- Ecosystem integration. All five integrate with LangChain, LlamaIndex, and other orchestration frameworks.
The differences — and they are meaningful — lie in scale limits, operational models, query capabilities, and how well each tool integrates with the rest of your data infrastructure.
The decision framework
Rather than comparing feature lists and benchmarks (which rarely match your actual workload), I find it more productive to pose five questions. Your answers narrow the field.
Question 1 — How large is your vector dataset?
If you'll permit me to be direct: dataset size is the single strongest predictor of whether pgvector is sufficient or whether a dedicated vector database provides meaningful advantages.
Under 1 million vectors. pgvector handles this comfortably. With an HNSW index, query latencies sit consistently in the low single-digit milliseconds. At this scale, the overhead of operating a separate database is difficult to justify on performance grounds alone.
1 to 10 million vectors. pgvector with HNSW indexing continues to perform well. The decision at this tier usually comes down to operational preferences and retrieval patterns rather than raw performance.
10 to 100 million vectors. Dedicated vector databases begin showing clearer advantages. HNSW index build times in pgvector become substantial, and the memory required can dominate the PostgreSQL instance's resources.
100 million vectors and beyond. This is the territory where distributed vector databases — Milvus in particular, and Pinecone's managed infrastructure — are purpose-built. Attempting pgvector at this scale would require manual sharding strategies that negate its simplicity advantage entirely.
An important caveat: these thresholds are not fixed. pgvector's performance improves with each release, and hardware improvements shift the boundaries upward. Benchmark with your own data and queries. Thresholds from an article — even this one — are orientation, not gospel.
Question 2 — Do your vectors live with relational data?
This question does not receive the attention it deserves, and it is often the deciding factor.
If your vectors are derived from or associated with relational data — product embeddings linked to a product catalog, document embeddings associated with access control metadata, user embeddings joined to account information — then pgvector offers something no dedicated vector database provides: the vector and the relational data live in the same database, queryable in the same SQL statement. For the detailed head-to-head, see pgvector vs Pinecone.
SELECT
p.name,
p.price,
p.image_url,
e.embedding <=> $1 AS distance
FROM product_embeddings e
JOIN products p ON p.id = e.product_id
WHERE p.category_id = $2
AND p.in_stock = true
AND p.price BETWEEN $3 AND $4
ORDER BY distance
LIMIT 20; One query. Ordinary SQL with a distance operator. Nothing remarkable about it — which is precisely the point. Replicating this with a dedicated vector database requires synchronizing the product metadata to the vector database, or performing the vector search in one system and the relational filtering in another, then merging results in the application layer.
If your vectors are standalone — embeddings of external documents with no relational context in your database — this advantage does not apply, and a dedicated vector database introduces no synchronization overhead.
Question 3 — What are your latency requirements?
Latency is the most frequently cited reason for choosing a dedicated vector database over pgvector. It is also, I'm afraid, the most frequently overstated.
Low-millisecond latency (1–10ms). pgvector, Qdrant, Pinecone, and Weaviate all deliver consistently in this range for datasets up to several million vectors. At this performance tier, the choice depends on other factors, not latency.
Sub-millisecond latency. At very large scale or extremely high query throughput, purpose-built vector databases have structural advantages. For applications where p99 latency below 1ms is a genuine requirement, not an aspirational target, dedicated databases are the appropriate tool.
The more productive question is often: where is your actual latency bottleneck? In many RAG and semantic search applications, the embedding generation step takes 50–200ms. If your embedding step is 100ms and your vector search is 5ms, reducing the search to 1ms improves total request time by 4%. Measure the full pipeline first, then optimize what matters.
Question 4 — How much operational complexity can you absorb?
Every database in your stack is not merely a deployment. It is a member of the household staff — requiring ongoing attention: monitoring, alerting, backups, upgrades, security patching, capacity planning, and incident response.
pgvector adds zero operational overhead if you already run PostgreSQL. The extension installs in seconds, and vector data is managed with the same tools you already use. For tuning guidance, see the pgvector performance tuning guide and pgvector query optimization guide.
Pinecone adds zero self-hosted operational overhead. It is fully managed. The tradeoff is vendor dependency.
Weaviate, Qdrant, and Milvus (self-hosted) add the full operational burden of another database. Deployment, monitoring, backups, upgrades, capacity planning, and networking all fall on your team.
Operational cost is ongoing, not one-time. The deployment is day one. The configuration tuning, capacity adjustments, version upgrades, incident debugging, and knowledge maintenance are every day after that.
Question 5 — What is your retrieval pattern?
Simple similarity search — find the K nearest vectors. Every tool handles this efficiently. This pattern does not differentiate between options.
Filtered similarity search — find the K nearest vectors matching metadata conditions. This is arguably the most common production pattern. In pgvector, it is standard SQL WHERE clauses. Qdrant has invested heavily in making filtered search performant through its modified HNSW implementation.
Hybrid search — combining vector similarity with full-text keyword search. pgvector combined with PostgreSQL's built-in tsvector provides native hybrid search. Weaviate includes built-in hybrid search with configurable weighting.
Multi-vector queries and cross-collection search — querying across multiple vector spaces or multi-modal search. Dedicated vector databases are purpose-built for these patterns.
Comparison table
Allow me to present the differences at a glance. This table is a reference, not a decision-maker — use the framework questions above to determine which rows matter most for your workload.
| Feature | pgvector | Pinecone | Weaviate | Qdrant | Milvus |
|---|---|---|---|---|---|
| Hosting model | Self-hosted (or managed PostgreSQL) | Fully managed only | Self-hosted or cloud | Self-hosted or cloud | Self-hosted or cloud (Zilliz) |
| Practical dataset size | Up to ~10M vectors per instance | Billions (managed scaling) | Hundreds of millions | Hundreds of millions | Billions (distributed) |
| Indexing algorithms | HNSW, IVFFlat | Proprietary (HNSW-based) | HNSW, flat | HNSW (custom filtered variant) | HNSW, IVF_FLAT, IVF_SQ8, DiskANN, and others |
| Filtering | SQL WHERE clauses | Metadata filters | GraphQL-style filters | Payload filters (optimized) | Boolean expressions |
| Hybrid search | Native (tsvector + pgvector) | Sparse-dense vectors | Built-in hybrid (BM25 + vector) | Full-text index + vector | Sparse + dense vectors |
| Relational JOINs | Native SQL JOINs | Not supported | Not supported | Not supported | Not supported |
| Transaction support | Full ACID transactions | Not applicable | Not supported | Not supported | Not supported |
| Operational complexity | None (if you run PostgreSQL) | None (fully managed) | Moderate / Low (cloud) | Moderate / Low (cloud) | High / Moderate (cloud) |
| Open source | Yes (PostgreSQL License) | No (proprietary) | Yes (BSD-3-Clause) | Yes (Apache 2.0) | Yes (Apache 2.0) |
A few notes: Practical dataset size reflects general-purpose guidance, not hard limits. Operational complexity is relative to an organization already running PostgreSQL.
The pgvector starting point
If you already run PostgreSQL, pgvector is where I would recommend you begin. This is not because it is the best vector database in every dimension — it is not, and a butler who overstated that case would be no butler at all. It is because starting with pgvector lets you defer a significant infrastructure decision until you have real-world data about your workload.
What pgvector does well:
- Datasets up to several million vectors, with query latencies in the low single-digit milliseconds using HNSW indexes.
- Filtered vector search expressed as ordinary SQL, taking advantage of PostgreSQL's query planner and existing indexes on metadata columns.
- Relational integration — vectors stored alongside the data they represent, queryable with JOINs, managed within transactions.
- Hybrid search combining semantic similarity (
<=>operator) with full-text search (tsvector,ts_rank) in a single query. - Operational simplicity — no new database to deploy, monitor, back up, or secure.
For guidance on getting the most out of pgvector, see pgvector Query Optimization.
What pgvector does not do:
- Distributed vector search. pgvector runs on a single PostgreSQL instance. There is no built-in vector-aware sharding.
- Sub-millisecond latency at very large scale.
- Multi-modal indexing. pgvector does not include built-in model integrations for generating embeddings.
The "graduate when you need to" approach is not settling for a lesser tool. It is recognizing that most teams' vector workloads start small and grow. pgvector lets you build the feature, ship it, learn how your users interact with it, and observe real-world query patterns. If and when you reach pgvector's ceiling — and many teams never will — you migrate to a dedicated database with concrete knowledge of your requirements rather than assumptions.
The vectors themselves are portable. A 1536-dimensional embedding is the same array of 1,536 floats whether stored in pgvector, Pinecone, or Qdrant. The migration effort is in adapting the query layer, not converting the data.
When you genuinely need a dedicated vector database
I should be forthcoming about when the framework leads naturally to a dedicated vector database, because pretending these scenarios do not exist would be a disservice to you.
Your dataset exceeds what a single PostgreSQL instance handles efficiently. If you are indexing tens of millions of vectors with high dimensionality and your queries demand consistent low-latency responses, a database designed for this workload will perform better and be easier to operate.
Your retrieval patterns are vector-only with no relational context. If your vectors represent standalone documents with no meaningful relationship to data in your relational database, pgvector's primary advantage does not apply.
You need features specific to a dedicated database. Multi-modal search, advanced server-side reranking, built-in RAG pipeline components, or native model integrations are features that dedicated databases invest in.
You are building vector search as a platform capability. If vector search is a shared service consumed by many teams and applications, the operational and scaling requirements align with purpose-built databases.
Your query throughput exceeds what a single PostgreSQL instance can serve. The ability to independently scale the vector search tier becomes a meaningful advantage.
The mistakes worth avoiding
Adding a vector database "because AI." The presence of embeddings does not automatically require a dedicated vector database. If your dataset is 100,000 product embeddings joined to a product catalog, pgvector handles this comfortably. Evaluate based on your workload, not on assumptions about what AI workloads need.
Choosing based on benchmarks that do not match your workload. Published benchmarks typically measure throughput on uniform, synthetic datasets with no filtering. Production workloads involve filtered queries, concurrent writes, and metadata lookups. Run your own benchmarks. The detailed performance comparison for pgvector specifically is covered in pgvector vs Pinecone.
Underestimating the synchronization cost. If your vectors are derived from relational data, keeping a separate vector database in sync is a non-trivial engineering commitment. You need a synchronization pipeline that handles creates, updates, and deletes. You need to monitor for drift. This cost is ongoing.
Over-indexing on search latency when your bottleneck is elsewhere. In a typical RAG pipeline, the embedding step takes 50–200ms. If you are spending 150ms on embedding generation and deliberating whether your vector search takes 3ms or 0.5ms, I would gently suggest the attention is on the wrong component.
Choosing distributed infrastructure for a workload that does not require it. Running self-hosted Milvus for a 500,000-vector dataset introduces distributed systems overhead that far exceeds any performance benefit. Match the tool's complexity to the problem's complexity.
A practical recommendation
Start with pgvector if you already run PostgreSQL. The migration path out is straightforward — vectors are portable arrays of floats — and you avoid the operational and synchronization costs of a separate system. See Why Postgres Won for the broader argument.
Choose Pinecone if you want fully managed and your vectors are decoupled from relational data. Pinecone eliminates operational overhead entirely, and its serverless pricing model can be cost-effective for bursty workloads.
Choose Qdrant or Weaviate (self-hosted) if you need open-source control at scale. Both are well-engineered and suitable for production at the tens-of-millions scale. Qdrant excels at filtered search. Weaviate excels at hybrid search and multi-modal use cases. For a detailed comparison of these three options, see pgvector vs Weaviate vs Qdrant.
Choose Milvus if you are operating at genuine large scale with distributed requirements. Hundreds of millions to billions of vectors, horizontal scaling, and distributed database infrastructure.
Revisit the decision when your workload changes. The right tool for your 500K-vector prototype may not be the right tool for your 50M-vector production system. Infrastructure decisions age. The good ones age gracefully; the rest benefit from periodic review.