pgvector vs Pinecone vs Chroma: Which Vector Store for Your LLM App?
Three illustrators were evaluated. One works in the building we already lease.
Three Vector Stores Walk Into a LangChain Tutorial
Good evening. If you are building an LLM application — RAG, semantic search, document Q&A — you need somewhere to store your vectors. And the three names you will encounter before your first coffee break are pgvector, Pinecone, and Chroma.
Chroma appears in nearly every LangChain and LlamaIndex tutorial. It works in three lines of code, runs in-process, and requires no infrastructure. Pinecone is the managed SaaS option — sign up, get an API key, start upserting vectors. pgvector is the PostgreSQL extension — vectors stored alongside your relational data in a database you already operate. All three are well-engineered tools built by talented teams solving real problems.
Each solves vector storage differently, and “works in a tutorial” is a rather different conversation from “works in production.” If you will permit me, I would like to compare the three through the lens of LLM application development: architecture, embedding workflow, filtering, framework integration, scaling from prototype to production, and operational cost.
This comparison focuses on the three stores that LLM application developers actually encounter. For teams evaluating Weaviate, Qdrant, Milvus, and other self-hosted engines, see the vector database comparison. For a deep 1:1 pgvector vs Pinecone analysis with benchmarks and cost modeling, see the dedicated comparison. For the broader question of whether you need a separate vector database at all, see Do You Need a Vector Database?
Architecture — Where Your Vectors Actually Live
pgvector — Vectors Inside Your Existing Database
pgvector is a PostgreSQL extension. It adds a vector column type and distance operators to the database you already run. Vectors live as a native column alongside your relational data — same tables, same schema, same infrastructure.
What this means in practice:
- Same infrastructure: Backup, replication, monitoring, connection pooling, and upgrades cover your vector data automatically. No new operational surface.
- HNSW and IVFFlat indexes for approximate nearest-neighbor search. HNSW provides better recall; IVFFlat uses less memory.
- Transactional consistency: Vector writes and relational writes in the same transaction. Insert a document, store its embedding, and update a metadata table — all atomically.
- Full SQL: JOINs, subqueries, CTEs, window functions, and WHERE clauses all work with vector columns.
The trade-off is worth knowing: pgvector shares resources with your OLTP workload. Heavy vector search can compete with your application's transactional queries for CPU and memory. Scaling is vertical (larger instance) rather than horizontal.
The mental model: vector search as a PostgreSQL feature, not a separate system.
Pinecone — Managed Vector Infrastructure
Pinecone is a purpose-built managed vector database — genuinely well-engineered and designed from the ground up for vector workloads. Serverless and pod-based deployment options, no infrastructure to operate, and scaling handled by the service.
What this means in practice:
- No infrastructure management: Provisioning, scaling, replication, and failover are Pinecone's responsibility. That is the value proposition, and it is a real one.
- Separate system: Vectors exist outside your application database. Your application inserts vectors into Pinecone via API and queries them via API. Keeping Pinecone in sync with your source-of-truth database is your responsibility.
- Namespaces for logical isolation. Metadata filtering is built into the query engine.
- Mature managed service: Production-tested at scale, with SLAs, monitoring dashboards, and support.
The trade-off: another service to integrate, another bill to manage, and eventual consistency between your application database and your vector store. The synchronization pipeline — ensuring Pinecone has the same documents as your database — is the primary engineering investment.
The mental model: vector search as a managed API.
Chroma — The Notebook-Friendly Option
Chroma is an open-source embedding database — and an excellent one for getting started quickly. It runs in three modes:
- In-process —
import chromadb; client = chromadb.Client(). Zero setup. Data lives in memory or local SQLite/DuckDB storage. This is the mode used in tutorials. - Client-server — a Docker container with persistent storage for multi-process access. A real service, but one you operate yourself.
- Chroma Cloud — managed hosting. A newer offering that is still maturing relative to Pinecone's established service.
What this means in practice:
- Lowest friction for prototyping: Three lines of code to a working vector store, no account, no network, no configuration.
- Built-in embedding: Chroma can call embedding models at insert and query time. You pass text; it returns results. Convenient for getting started quickly.
- In-process limitations: Data durability depends on the storage backend (memory = ephemeral, SQLite = single-process). Not production infrastructure in this mode.
The trade-off: in-process mode is excellent for prototyping but is not production infrastructure. Client-server mode is a real service you deploy and operate yourself. Cloud is a newer managed offering that is growing into its potential.
The mental model: vector search as a Python library that can optionally become a service.
Embedding Workflow — Who Turns Text Into Vectors
pgvector — You Manage Embeddings Yourself
pgvector stores vectors but does not generate them. Your application calls an embedding API (OpenAI, Cohere, sentence-transformers) and inserts the resulting vector into PostgreSQL.
This gives you full control: choose any embedding model, swap models, version embeddings, batch efficiently, and manage costs. The responsibility that comes with that control is building and maintaining the embedding pipeline — batching, rate limiting, error handling, and re-embedding when models change.
LangChain and LlamaIndex abstract the embedding call, but the API call still happens in your application code. pgvector is purely storage and search.
Pinecone — You Manage Embeddings, Pinecone Stores Them
The same model as pgvector: your application generates embeddings and upserts them to Pinecone via its API. Pinecone is storage and search.
Pinecone Inference is a newer feature that can generate embeddings through Pinecone's API, but it is limited to supported models. Most production setups generate embeddings application-side and use Pinecone purely for storage and retrieval.
Chroma — Built-In Embedding (Convenient, With Caveats)
Chroma can embed at insert time:
collection.add(
documents=["PostgreSQL supports JSONB for semi-structured data..."],
ids=["doc1"]
) No embedding vector needed — Chroma calls the configured embedding function automatically. The default is all-MiniLM-L6-v2 via sentence-transformers (384 dimensions). You can swap it for OpenAI, Cohere, or a custom function.
This is genuinely convenient for prototyping — it removes an entire step from your workflow. For production, however, the caveats are worth your attention:
- Model versioning: Changing the embedding function after data is stored produces incompatible vectors. Old embeddings and new embeddings occupy different vector spaces, making similarity meaningless. I mention this not to alarm, but because it is the kind of thing one discovers at the least convenient moment.
- Less control over batching and rate limiting compared to managing the embedding pipeline yourself.
- Implicit dependency: The embedding model is a configuration of the collection, not an explicit step in your code. This makes debugging retrieval quality harder.
Filtering and Hybrid Search
pgvector — Full SQL at Your Disposal
This is where pgvector's heritage as a PostgreSQL extension pays dividends. pgvector filtering is PostgreSQL filtering — standard WHERE clauses combine with vector similarity in a single query:
SELECT content, 1 - (embedding <=> query_vec) AS similarity
FROM documents
WHERE metadata->>'category' = 'product-docs'
AND created_at > '2026-01-01'
ORDER BY embedding <=> query_vec
LIMIT 5; JOINs, subqueries, CTEs, and window functions — all available. GIN indexes on JSONB metadata columns provide efficient filtered vector search. There are no filter syntax limitations because the filter language is SQL.
For hybrid search combining full-text search and vector similarity, pgvector can be used alongside PostgreSQL's built-in tsvector full-text search. The two search methods run independently; score fusion happens in the application or via SQL.
I should be forthcoming about one nuance: pgvector's default behavior applies the WHERE filter after the approximate nearest-neighbor search (post-filtering). With highly selective filters, this can reduce recall — the ANN search may not return enough candidates that also pass the filter. For most workloads, this is not a practical concern, but it is worth knowing. See the pgvector query optimization guide for tuning strategies.
Pinecone — Metadata Filtering Built for Scale
Pinecone handles metadata filtering well. Filters apply to key-value pairs attached to each vector:
results = index.query(
vector=query_embedding,
top_k=5,
filter={"category": "product-docs", "year": {"$gte": 2026}}
) Supported operators: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin. Filtering is integrated into the search algorithm — Pinecone applies filters during the search, not after, which provides better recall under selective filters than post-filter approaches.
The boundary: there are no JOINs, no subqueries, no SQL. Metadata must be denormalized into the vector record at upsert time. If you need to filter by data that lives in your application database, you will need to synchronize it into Pinecone's metadata.
Pinecone also supports sparse-dense hybrid search via sparse vectors, enabling keyword + semantic combined search.
Chroma — Simple Metadata Filters
Chroma supports metadata filtering with a syntax similar to Pinecone's:
results = collection.query(
query_embeddings=[query_embedding],
n_results=5,
where={"category": "product-docs"},
where_document={"$contains": "postgresql"}
) Supported operators mirror Pinecone's: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin. The where_document filter adds keyword filtering on the stored document text.
As with Pinecone, there is no JOIN capability. Metadata must be stored alongside the embedding at insert time. The filter expressiveness sits between Pinecone's (which integrates filtering more deeply with the search algorithm) and pgvector's (which offers the full SQL language).
Framework Integration — LangChain, LlamaIndex, and Friends
Allow me to address this quickly: all three vector stores have integrations with the major LLM frameworks. This is largely a settled question — availability is not a differentiator.
| Framework | pgvector | Pinecone | Chroma |
|---|---|---|---|
| LangChain | PGVector (first-class) | Pinecone (first-class) | Chroma (first-class, tutorial default) |
| LlamaIndex | PGVectorStore (first-class) | PineconeVectorStore (first-class) | ChromaVectorStore (first-class) |
| Haystack | Official integration | Official integration | Community integration |
| Semantic Kernel | Official connector | Official connector | Community connector |
Chroma is the default in most LangChain tutorials because of its zero-setup in-process mode — it lets tutorial authors skip infrastructure setup and get to the interesting parts. This is a convenience choice, not a quality signal. A reasonable one, to be clear — tutorials have different priorities than production applications.
For direct API usage without a framework: pgvector uses SQL (via psycopg, SQLAlchemy, or any PostgreSQL driver), Pinecone has a Python SDK, and Chroma has a Python SDK. All three have straightforward direct APIs.
Scaling — From Prototype to Production
The Prototype Phase (0–10K Vectors)
All three work well at small scale. This is not where the decision matters.
- Chroma has the lowest friction: in-process mode, no infrastructure, no account signup, three lines of code.
- pgvector requires a PostgreSQL instance — but if you are building an application, you very likely have one already.
- Pinecone requires account signup and an API key. The free tier handles prototype-scale data comfortably.
My recommendation: prototype with whatever gets you to a working demo fastest. The prototype vector store is not necessarily the production vector store.
The Production Phase (10K–1M Vectors)
This is where the decision earns its keep.
- pgvector: Well within comfortable range on a standard PostgreSQL instance with an HNSW index. Single-digit millisecond queries. The shared-infrastructure advantage is genuine — no new service to deploy or monitor.
- Pinecone: Serverless tier handles this effortlessly. Fully managed operations. Cost is manageable at this scale (typically under $100/month for moderate query volume).
- Chroma: Client-server mode is now required — in-process mode is not production infrastructure. You are operating a separate stateful service. Chroma Cloud is an option and a growing one.
The operational cost question arrives at this phase: do you want to operate Chroma as a separate service, pay for Pinecone's managed offering, or use the database you already run?
The Scale Phase (1M–10M+ Vectors)
- pgvector: Performance is strong at 1–5 million vectors. Beyond 5–10 million, purpose-built engines have real advantages in index build time, memory efficiency, and horizontal scaling. pgvector's ceiling is the single PostgreSQL instance, and I would be doing you a disservice to pretend otherwise.
- Pinecone: Designed for this scale. Serverless pricing grows linearly. No operational scaling work on your end.
- Chroma: The horizontal scaling story is still developing. Self-hosted Chroma at this scale requires meaningful operational investment.
The honest assessment: if you know you will have tens of millions of vectors, evaluate Pinecone's managed model or a self-hosted engine like Qdrant from the start. If your dataset is in the millions, pgvector handles it well and avoids the operational complexity of a separate system.
Operational Cost — The Bill Nobody Reads in Tutorials, Until Production
Infrastructure Cost
- pgvector: Incremental. Vectors consume storage and RAM on your existing PostgreSQL instance. No new service bill. For 500K vectors at 1536 dimensions: approximately 3 GB of vector data, 6–12 GB of index storage.
- Pinecone: Serverless pricing based on read units, write units, and storage. For 500K vectors at 1536 dimensions with 50 queries/second: approximately $100–250/month. Pod-based pricing is higher but provides dedicated resources.
- Chroma (self-hosted): Free software, but you pay for the compute and storage to run it. A Docker container on a $50–100/month VM handles this scale.
- Chroma Cloud: Consumption-based pricing. Newer offering with less established pricing predictability than Pinecone.
Engineering Cost
- pgvector: Near-zero if you already know SQL and run PostgreSQL. The embedding pipeline is the main development investment — and you build that regardless of which vector store you choose.
- Pinecone: SDK integration is straightforward. The engineering investment is the sync pipeline between your application database and Pinecone — keeping documents, metadata, and vectors consistent across two systems.
- Chroma: Lowest initial investment (in-process for development). The cost that arrives later: many teams prototype with in-process Chroma and later face a migration to client-server mode or a different store entirely when production durability becomes the priority.
Operational Burden
- pgvector: Zero incremental operations. Your existing PostgreSQL practices — backups, monitoring, upgrades, connection pooling — cover vectors automatically.
- Pinecone: Zero operations by design. That is the value proposition, and it is a genuine one. Monitoring, scaling, and availability are Pinecone's responsibility.
- Chroma (self-hosted): A separate stateful service to deploy, monitor, back up, and upgrade. The operational tooling and community knowledge base are still growing relative to PostgreSQL's decades of production experience.
- Chroma Cloud: Reduces the operational burden and is developing its operational guarantees.
The Decision Framework
If I may, three profiles and three recommendations:
“I have PostgreSQL and want to keep things simple”
Use pgvector. Your vectors live with your data. No new infrastructure. Full SQL filtering. JOINs between vectors and relational data. Scales to millions of vectors on a single instance. The operational cost is zero incremental because you already operate PostgreSQL.
For a complete implementation guide, see Building a RAG Pipeline with pgvector and Python.
“I want fully managed and don't mind a separate service”
Use Pinecone. Zero operations. Transparent scaling. A mature managed service with SLAs. This is a perfectly reasonable choice if operational simplicity is the priority and your team would otherwise spend engineering time on database operations. The tradeoff to accept: maintaining a sync pipeline between your application database and Pinecone.
“I'm prototyping and need zero friction right now”
Use Chroma. In-process, no setup, instant. The fastest path to a working demo, and there is real value in that speed. Do plan the migration to pgvector or Pinecone before production — in-process Chroma is prototyping infrastructure, not production infrastructure.
The Honest Take
Chroma is excellent for prototyping and local development. The developer experience for getting started is genuinely unmatched. The production story is improving — client-server mode and Chroma Cloud are real options — but neither yet has the operational maturity of PostgreSQL (decades of production tooling) or Pinecone (purpose-built managed service).
For teams already running PostgreSQL, pgvector is my default recommendation. You avoid a separate service, keep your data in one place, and gain full SQL filtering. You can always migrate to Pinecone later if you genuinely outgrow pgvector — and I would rather you start with the simpler architecture and migrate when the need is real than add complexity on speculation. Starting with Pinecone or Chroma and migrating to pgvector later is harder than starting with pgvector and migrating out — the SQL integration patterns are easier to replace than to adopt retroactively.
What Gold Lapel Adds
Gold Lapel's proxy sees pgvector similarity queries alongside your OLTP workload. Applications running LLM pipelines also run traditional queries — user authentication, session management, application state. Gold Lapel attends to both workloads simultaneously.
The comparison above stands on its own — you should leave this article having learned something useful regardless of any purchasing decision. Gold Lapel is one piece of the operational simplicity argument for staying on PostgreSQL: the more you can do in one database with one set of tools, the less infrastructure surface you maintain.