← Compare

PostgreSQL vs Elasticsearch

Do You Need a Search Engine?

We searched for a suitable illustration. Both candidates found one in under 5ms. One of them, however, required three additional staff members and a separate filing cabinet to do so.

The Search Question

Ah. The search question. Every application with a search bar eventually arrives at this door: do we need Elasticsearch?

The question usually arrives at a specific moment. Search works — users can find things — but someone wants fuzzy matching, autocomplete, faceted navigation, or more control over relevance ranking. The team evaluates Elasticsearch, sees its capabilities, and faces the decision: adopt a second system or push PostgreSQL further.

The honest answer depends on two factors: how complex your search needs are, and at what scale. PostgreSQL's native search capabilities — tsvector for full-text search, pg_trgm for fuzzy matching, GIN indexes for performance — cover more ground than most teams realize. Elasticsearch is genuinely excellent software that earns its place in specific scenarios. This guide is intended to help you determine which scenario you are in — without adding infrastructure you do not need, or underinvesting in search when your users deserve better.

What PostgreSQL Offers for Search — More Than You Might Expect

Full-Text Search with tsvector and tsquery

PostgreSQL has had full-text search since version 8.3 (released in 2008). It is not an afterthought or a bolted-on feature — it is a mature, well-optimized subsystem built into the core database. This is worth knowing before evaluating alternatives.

tsvector is the document representation: a sorted list of normalized lexemes (word stems) with optional positional information. Normalizing words to their stems means that "running," "runs," and "ran" all map to "run."

tsquery is the query language: boolean operators (AND, OR, NOT), phrase matching, and prefix matching.

-- Create a table with a tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'B')
  ) STORED;

-- Create a GIN index for fast search
CREATE INDEX articles_search_idx ON articles USING GIN (search_vector);

-- Search with ranking
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'database & performance') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

This produces ranked, language-aware full-text search with sub-millisecond response times on millions of rows when backed by a GIN index.

Key capabilities:

  • Language-aware stemming: Built-in support for English, French, German, Spanish, Portuguese, Italian, Russian, and many more. "Running" matches "run" automatically.
  • Boolean queries: to_tsquery('english', 'database & (performance | tuning)') — AND, OR, NOT, grouping.
  • Phrase matching: phraseto_tsquery('english', 'connection pooling') — matches words in sequence.
  • Prefix matching: to_tsquery('english', 'optim:*') — matches "optimize," "optimization," "optimizer."
  • Relevance ranking: ts_rank scores results by lexeme frequency. ts_rank_cd uses cover density, which penalizes matches spread across the document.

Performance: A GIN-indexed tsvector search on 1 million rows typically returns results in 1–5ms. On 5 million rows, expect 3–10ms. These numbers are competitive with Elasticsearch at the same scale.

For framework-specific implementation details, see the Django full-text search guide and the Laravel Scout + tsvector guide.

Fuzzy Matching with pg_trgm

The pg_trgm extension provides trigram-based similarity matching. It decomposes strings into three-character subsequences and compares the overlap between them. This enables "did you mean?" suggestions, typo-tolerant search, and fast LIKE/ILIKE queries.

-- Enable the extension
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Create a GIN trigram index
CREATE INDEX products_name_trgm_idx ON products USING GIN (name gin_trgm_ops);

-- Fuzzy product name search with similarity ranking
SELECT name, similarity(name, 'postgressql') AS sim
FROM products
WHERE name % 'postgressql'  -- % operator uses similarity threshold
ORDER BY sim DESC
LIMIT 10;

The % operator returns rows with a similarity score above the configurable threshold (default 0.3). The similarity() function returns a score between 0 and 1.

LIKE and ILIKE acceleration: pg_trgm's GIN index also accelerates LIKE and ILIKE queries that would otherwise require a sequential scan:

-- Without pg_trgm GIN index: sequential scan
-- With pg_trgm GIN index: index scan
SELECT * FROM products WHERE name ILIKE '%widget%';

Performance: GIN trigram index queries on 1 million rows typically return in 5–20ms for LIKE queries and 10–30ms for similarity ranking. This is sufficient for autocomplete and fuzzy search in most applications.

Combining tsvector + pg_trgm

The most effective PostgreSQL search strategy combines both tools: tsvector for structured full-text search (exact terms, boolean logic, phrase matching, ranked results) and pg_trgm for fuzzy/typo-tolerant matching (misspelled queries, partial matches, autocomplete).

A common pattern is to search with tsvector first (fast, precise) and fall back to pg_trgm if no results are found (handles typos and variations):

-- Combined search: tsvector primary, pg_trgm fallback
WITH fts_results AS (
  SELECT id, title, ts_rank(search_vector, query) AS rank, 'fts' AS source
  FROM articles, to_tsquery('english', 'postgresq:*') AS query
  WHERE search_vector @@ query
  ORDER BY rank DESC
  LIMIT 20
),
trgm_results AS (
  SELECT id, title, similarity(title, 'postgresq') AS rank, 'trgm' AS source
  FROM articles
  WHERE title % 'postgresq'
    AND id NOT IN (SELECT id FROM fts_results)
  ORDER BY rank DESC
  LIMIT 20
)
SELECT * FROM fts_results
UNION ALL
SELECT * FROM trgm_results
LIMIT 20;

This combination covers 80–90% of application search requirements — a fact that bears emphasis. Users get precise results for well-formed queries and reasonable results for misspelled or partial queries.

Generated Columns and Materialized Views for Search

Generated columns automate tsvector maintenance — the search vector updates automatically when the source columns change:

-- Generated tsvector column with weighted fields
ALTER TABLE articles ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(description, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'C') ||
    setweight(to_tsvector('english', coalesce(tags, '')), 'D')
  ) STORED;

The setweight() function assigns weight classes A through D to different fields. ts_rank uses these weights when scoring: a match in the title (weight A) scores higher than a match in the body (weight C). This is PostgreSQL's equivalent of field-level boosting.

Materialized views combine multiple tables into a single searchable surface:

-- Materialized view for multi-table search
CREATE MATERIALIZED VIEW search_index AS
SELECT
  'product' AS type,
  p.id,
  p.name AS title,
  p.description,
  setweight(to_tsvector('english', p.name), 'A') ||
  setweight(to_tsvector('english', coalesce(p.description, '')), 'B') ||
  setweight(to_tsvector('english', coalesce(c.name, '')), 'C') AS search_vector
FROM products p
LEFT JOIN categories c ON p.category_id = c.id

UNION ALL

SELECT
  'article' AS type,
  a.id,
  a.title,
  a.excerpt AS description,
  setweight(to_tsvector('english', a.title), 'A') ||
  setweight(to_tsvector('english', coalesce(a.body, '')), 'B') AS search_vector
FROM articles a;

CREATE INDEX search_index_vector_idx ON search_index USING GIN (search_vector);

Materialized views require periodic refresh (REFRESH MATERIALIZED VIEW CONCURRENTLY search_index), which introduces a small staleness window. For most applications, refreshing every few minutes or after write operations is sufficient. For more on materialized view patterns, see the materialized views guide.

What Elasticsearch Offers — and It Deserves Genuine Respect

Elasticsearch is excellent software — well-engineered and purpose-built for the workloads it serves. Acknowledging its strengths is not generosity; it is accuracy. These are the areas where Elasticsearch provides capabilities that PostgreSQL cannot match or can only approximate with significant effort:

Complex text analysis pipelines. Elasticsearch's analyzer framework is composable and deep. You can chain character filters, tokenizers, and token filters to build custom analysis pipelines: synonym expansion, phonetic matching (Soundex, Metaphone), language detection, HTML stripping, edge-ngram tokenization for search-as-you-type, compound word decomposition for German, and decompounding for Nordic languages. PostgreSQL has text search configurations and dictionaries, but the composability and breadth of Elasticsearch's analyzer ecosystem is substantially greater.

Nested multi-level aggregations. This is Elasticsearch's strongest analytical feature. A single query can compute:

Elasticsearch nested aggregations
{
  "aggs": {
    "categories": {
      "terms": { "field": "category" },
      "aggs": {
        "brands": {
          "terms": { "field": "brand" },
          "aggs": {
            "price_ranges": {
              "range": {
                "field": "price",
                "ranges": [
                  { "to": 50 },
                  { "from": 50, "to": 200 },
                  { "from": 200 }
                ]
              }
            }
          }
        }
      }
    }
  }
}

This returns a hierarchy: categories with counts, brands within each category with counts, and price ranges within each brand with counts — in a single query. Achieving the same in PostgreSQL requires multiple CTEs or separate queries, and the performance degrades as the number of aggregation levels increases.

Search-as-you-type autocomplete. Elasticsearch's completion suggester and edge-ngram analyzer are purpose-built for autocomplete UX. The completion suggester uses a finite-state transducer (FST) for in-memory prefix matching, returning suggestions in microseconds.

Log and event analytics. The ELK stack (Elasticsearch, Logstash, Kibana) exists for a reason. Elasticsearch handles append-only, time-series data at massive scale. PostgreSQL can store and query logs, but it is not designed for the ingestion throughput and analytical query patterns of centralized logging.

Distributed full-text search at scale. Beyond 10 million documents, Elasticsearch's sharding provides consistent, low-latency search across distributed nodes.

Highlighting and snippets. Elasticsearch extracts and highlights matching fragments from documents for search result display. PostgreSQL has ts_headline, which provides similar functionality, but Elasticsearch offers multiple highlighting strategies and better handling of large documents.

Honest Counterpoint — Where Elasticsearch Asks More of You

Not a database. Elasticsearch has no transactions, no referential integrity, no foreign keys, and no JOINs. You must denormalize your data model — every document must contain all the fields you want to search and display.

Data synchronization. Your source of truth remains PostgreSQL. You must build and maintain a pipeline to keep Elasticsearch in sync: CDC (Change Data Capture) with Debezium, application-level dual writes, or batch ETL. Each approach has failure modes.

Operational complexity. Elasticsearch is a JVM application. Production operation requires: JVM heap sizing, garbage collection tuning, shard sizing and rebalancing, rolling upgrade procedures, split-brain prevention, and index lifecycle management.

Cost. Elasticsearch is memory-intensive. Running Elasticsearch alongside PostgreSQL approximately doubles your infrastructure cost and more than doubles your operational cost. These are not reasons to avoid Elasticsearch — they are reasons to be certain you need it before committing.

Feature Comparison

FeaturePostgreSQLElasticsearch
Full-text searchtsvector + GIN indexInverted index (Lucene)
Stemming/language supportBuilt-in for 15+ languagesExtensive analyzer ecosystem (30+ languages, custom analyzers)
Fuzzy matchingpg_trgm (trigram similarity)Native fuzzy queries, phonetic plugins, edit distance
Autocompletepg_trgm + LIMIT + ORDER BY similarityCompletion suggester (FST), edge-ngrams
Faceted search / aggregationsGROUP BY + COUNT (single level efficient)Nested multi-level aggregations (purpose-built)
Relevance tuningts_rank, ts_rank_cd, setweight (A/B/C/D)function_score, boosting, custom scoring scripts
SynonymsSynonym dictionaries, ts_rewriteSynonym token filter (inline or file-based)
Highlightingts_headlineMultiple highlighter strategies (unified, plain, FVH)
Horizontal scalingRead replicas, table partitioningNative sharding + replication
ACID transactionsYesNo
JOINsYesNo (denormalize everything)
Real-time indexingImmediate (same transaction)~1 second refresh interval
Data consistencySource of truthRequires sync pipeline from PostgreSQL

Performance Comparison

Full-Text Search Latency

Performance at moderate scale (under 5 million documents) is closer than most teams expect.

ScalePostgreSQL (tsvector + GIN)Elasticsearch
100K documents<1ms<1ms
1M documents1–5ms1–5ms
5M documents5–15ms3–10ms
10M documents10–30ms (single node)5–15ms (sharded)
50M+ documentsDegrades on single nodeScales horizontally

At 1 million documents, both systems return results in single-digit milliseconds with proper indexing. The difference between 3ms and 5ms is invisible to users. At 10 million documents, Elasticsearch's sharding provides more consistent latency because the workload is distributed across nodes.

Aggregation Performance

Simple aggregations (COUNT by category, top 10 brands): PostgreSQL is competitive with proper indexes.

Multi-level nested aggregations (category > subcategory > brand > price range): Elasticsearch is purpose-built for this and significantly faster. A 4-level faceted navigation query completes in 10–50ms in Elasticsearch. The equivalent in PostgreSQL requires multiple CTEs and may take 100–500ms.

If your search UI requires multi-level faceted navigation with counts at each level, and this is a core feature rather than an occasional reporting need, Elasticsearch earns its operational cost.

The Operational Cost Comparison — A Matter Worth Discussing Frankly

One System vs Two Systems

With PostgreSQL search: Your application talks to one database. Your backup covers everything. Your monitoring watches one service. Your on-call team understands one system. Your search results are transactionally consistent with your application data — if a product is deleted, it is immediately gone from search results.

With Elasticsearch: You add a stateful distributed system. This means a separate deployment pipeline, monitoring stack, backup procedure, data synchronization pipeline, and on-call competency.

The Sync Pipeline

If you'll permit me to dwell on this topic — the data synchronization pipeline deserves careful attention because it is the primary source of bugs and operational incidents in PostgreSQL + Elasticsearch architectures.

CDC with Debezium: Streams PostgreSQL WAL changes to Elasticsearch in near real-time. The recommended approach — it captures all changes without application code modifications. But Debezium itself is a distributed system (Kafka Connect cluster) that requires deployment, monitoring, and operational knowledge.

Application-level dual writes: Simpler to implement, but introduces consistency risks: if one write succeeds and the other fails, the systems diverge.

Batch ETL: A scheduled job reads changes from PostgreSQL and writes them to Elasticsearch. Simplest to build, but introduces staleness.

Each approach has failure modes. The sync pipeline must be monitored for lag, errors, and consistency drift.

Team Expertise

PostgreSQL search: SQL you already know. GIN indexes you have probably already created. The learning curve is a few new functions (to_tsvector, to_tsquery, ts_rank) and one new index type.

Elasticsearch: The Query DSL is a JSON-based query language with its own concepts. Cluster management requires understanding shards, replicas, node roles, and heap sizing. A team proficient in PostgreSQL will take weeks to become operationally competent with Elasticsearch.

Decision Framework — When to Stay with PostgreSQL, When to Add Elasticsearch

PostgreSQL Search Is Enough When

  • Your search corpus is under 5 million documents. PostgreSQL tsvector + GIN delivers single-digit millisecond search at this scale.
  • You need full-text search with stemming and relevance ranking. tsvector provides this natively with language-aware stemming for 15+ languages.
  • You need fuzzy/typo-tolerant matching. pg_trgm handles misspellings, partial matches, and LIKE queries with index support.
  • Your faceted search has 1–2 levels of aggregation. Simple GROUP BY + COUNT queries with proper indexes are fast.
  • Search is one feature among many, not the core product. If search is part of a broader application, PostgreSQL search is almost certainly sufficient.
  • You value transactional consistency. Search results are always consistent with application data — no sync lag, no stale results.

Consider Elasticsearch When

  • Your search corpus exceeds 10 million documents and is growing. Elasticsearch's sharding provides more consistent latency and horizontal scalability.
  • You need complex text analysis pipelines. Custom analyzers, synonym expansion, phonetic matching, language detection.
  • Your UI requires multi-level nested aggregations. 3+ levels of faceted navigation with counts at each level.
  • Search is the core user experience. A job board, a real estate listing site, a legal document search tool.
  • You are building log analytics or event monitoring. The ELK stack is the standard solution for centralized logging.
  • You need sophisticated search-as-you-type autocomplete. Elasticsearch's completion suggesters and edge-ngram analyzers provide richer autocomplete than pg_trgm.

The Honest Middle Ground

Many teams add Elasticsearch before PostgreSQL search has been properly explored. A tsvector column with a GIN index, combined with pg_trgm for fuzzy matching and setweight for field-level relevance, covers 80–90% of application search requirements. Most teams that believe they need Elasticsearch have not yet tried PostgreSQL search with proper indexing and ranking.

My recommendation: try PostgreSQL search first. Implement tsvector with weighted fields, add pg_trgm for fuzzy matching, create proper GIN indexes, and measure the results against your actual requirements. If you hit a specific limitation — not a hypothetical one — then Elasticsearch has earned its place.

Migrating from PostgreSQL search to Elasticsearch later is straightforward. Your data is already structured, your search requirements are well understood, and your search queries map to Elasticsearch's Query DSL. Starting with PostgreSQL search does not lock you in — it gives you a working baseline to evaluate against.

Implementation Patterns

PostgreSQL Search Done Right

Pattern 1: Weighted multi-field search with tsvector + GIN

-- Table setup with generated search vector
ALTER TABLE products ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(brand, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(description, '')), 'C') ||
    setweight(to_tsvector('english', coalesce(category, '')), 'D')
  ) STORED;

CREATE INDEX products_search_idx ON products USING GIN (search_vector);

-- Ranked search query
SELECT name, brand, category,
       ts_rank_cd(search_vector, query) AS rank
FROM products, websearch_to_tsquery('english', 'wireless bluetooth headphones') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

websearch_to_tsquery accepts natural-language search input (including quoted phrases and minus signs for exclusion), making it suitable for user-facing search bars without requiring users to learn tsquery syntax.

Pattern 2: Fuzzy autocomplete with pg_trgm

CREATE INDEX products_name_trgm_idx ON products USING GIN (name gin_trgm_ops);

-- Autocomplete query (handles typos)
SELECT name, similarity(name, 'blutooth') AS sim
FROM products
WHERE name % 'blutooth'
ORDER BY sim DESC
LIMIT 5;

Pattern 3: Combined search with tsvector primary + pg_trgm fallback

-- Search function: exact match first, fuzzy fallback if no results
CREATE OR REPLACE FUNCTION search_products(search_term text, result_limit int DEFAULT 20)
RETURNS TABLE(id int, name text, rank real, match_type text) AS $$
BEGIN
  -- Try full-text search first
  RETURN QUERY
    SELECT p.id, p.name, ts_rank_cd(p.search_vector, query)::real, 'exact'::text
    FROM products p, websearch_to_tsquery('english', search_term) AS query
    WHERE p.search_vector @@ query
    ORDER BY ts_rank_cd(p.search_vector, query) DESC
    LIMIT result_limit;

  -- If no full-text results, fall back to trigram similarity
  IF NOT FOUND THEN
    RETURN QUERY
      SELECT p.id, p.name, similarity(p.name, search_term)::real, 'fuzzy'::text
      FROM products p
      WHERE p.name % search_term
      ORDER BY similarity(p.name, search_term) DESC
      LIMIT result_limit;
  END IF;
END;
$$ LANGUAGE plpgsql;

-- Usage
SELECT * FROM search_products('wireless headphons');

These three patterns — weighted full-text search, fuzzy autocomplete, and combined search with fallback — cover most application search needs. I would recommend implementing them before evaluating Elasticsearch. You may find the evaluation unnecessary.

When You Do Add Elasticsearch

If you determine that Elasticsearch is necessary, these architectural patterns reduce operational risk:

Keep PostgreSQL as the source of truth. Elasticsearch is a search index, not a database. All writes go to PostgreSQL first.

Use CDC for sync rather than dual writes. Debezium or a similar CDC tool captures PostgreSQL WAL changes and streams them to Elasticsearch. This eliminates the consistency bugs inherent in application-level dual writes.

Abstract the search backend. Build your search API so the backend is swappable. This lets you switch between backends without frontend changes and provides a fallback if Elasticsearch is unavailable.

# Abstract search interface
class SearchBackend:
    def search(self, query, filters, limit):
        raise NotImplementedError

class PostgresSearchBackend(SearchBackend):
    def search(self, query, filters, limit):
        # tsvector + pg_trgm implementation
        pass

class ElasticsearchBackend(SearchBackend):
    def search(self, query, filters, limit):
        # Elasticsearch Query DSL implementation
        pass

How Gold Lapel Relates

Gold Lapel monitors PostgreSQL full-text search query performance. It identifies slow tsvector queries that would benefit from GIN indexes or search vector optimization, and detects missing GIN indexes on tsvector and pg_trgm-indexed columns. The comparison above stands on its own merits regardless of any specific tooling.

Frequently asked questions