Chapter 10: Aggregations Without a Separate System

The Waiter of Gold Lapel · Published Apr 12, 2026 · 12 min

Welcome to Part III. I suspect you were not expecting this room.

Parts I and II made the case that PostgreSQL handles search — full-text, fuzzy, phonetic, semantic, autocomplete. Five pillars, each demonstrated with runnable SQL, each mapped to a specific Elasticsearch feature. The search argument, I trust, has been made.

But if you have used Elasticsearch, you know that search is only part of the story. Teams adopt Elasticsearch for search. They stay because of everything else. And the feature they cite most often when explaining why they cannot leave — more often than the query DSL, more often than distributed scale, more often than the ecosystem — is aggregations.

Faceted search results. “Showing 342 results: Furniture (127), Office (89), Outdoor (73), Kids (53).” The category counts that help users navigate large result sets. The price range filters on the sidebar. The “average rating by brand” analysis. These are how users make sense of search results, and they are one of the most visible features of any e-commerce or content search interface.

When a team considers migrating away from Elasticsearch, aggregations are often the conversation-stopper. “We use aggregations everywhere. We can’t leave.”

I understand why they believe this. I would like to demonstrate why it is not the case. Aggregations are SQL. GROUP BY and COUNT(*) have been doing this since before Elasticsearch was conceived. SQL did not learn aggregations from Elasticsearch. If anything, the chronology runs the other direction — and I say this not to diminish Elasticsearch’s implementation, which is well-engineered, but to observe that the capability the team thought was locked inside Elasticsearch has been in their database the entire time.

Elasticsearch’s Aggregation Model

Before I present the PostgreSQL approach, you should see what Elasticsearch provides — because the mapping should be clear.

Terms aggregation. Group by a field, count documents per group. The faceted search pattern — category counts, tag counts, brand counts.

Metric aggregations. Compute sum, avg, min, max, stats over numeric fields, optionally grouped.

Bucket aggregations. Date histograms, range buckets, filters — grouping documents into buckets by criteria more complex than a single field value.

Pipeline aggregations. Aggregations computed on top of other aggregations — averages of counts, derivatives of sums, bucket selectors that filter aggregation results.

These are expressed in Elasticsearch’s JSON query DSL — a nested object structure where each aggregation contains its configuration, and nested aggregations appear as sub-objects within parent aggregations. The system works, and I acknowledge the engineering behind it. The question this chapter asks is not whether Elasticsearch does aggregations well, but whether you need a separate service to do them at all.

The SQL Approach

How to implement faceted search in PostgreSQL:

SQL

SELECT category, COUNT(*) AS count
FROM search_products
WHERE search_vector @@ plainto_tsquery('english', 'chair')
GROUP BY category
ORDER BY count DESC;

Result:

Result

  category  | count
------------+-------
 Furniture  |   127
 Office     |    89
 Outdoor    |    73
 Kids       |    53
(4 rows)

That is a faceted search result. Full-text search filtered by the query “chair,” grouped by category, counted, ordered by frequency. One SQL query. No aggregation framework. No JSON DSL. I trust you will notice that the SQL is considerably shorter than the Elasticsearch equivalent — and, if I may observe, rather easier to read.

Metric aggregations are equally direct:

SQL

SELECT category,
    COUNT(*) AS count,
    AVG(price)::numeric(10,2) AS avg_price,
    MIN(price) AS min_price,
    MAX(price) AS max_price
FROM search_products
WHERE search_vector @@ plainto_tsquery('english', 'chair')
GROUP BY category
ORDER BY count DESC;

Result:

Result

  category  | count | avg_price | min_price | max_price
------------+-------+-----------+-----------+-----------
 Furniture  |   127 |    189.50 |     29.99 |    899.00
 Office     |    89 |    245.75 |     49.99 |   1299.00
 Outdoor    |    73 |    156.30 |     39.99 |    599.00
 Kids       |    53 |     89.95 |     19.99 |    249.00
(4 rows)

Category counts AND price statistics. One query. This is what Elasticsearch’s terms + metric aggregation combination produces — expressed as SQL you learned years before you heard the word “Elasticsearch.”

Gold Lapel’s facets() and aggregate()

facets():

gl.facets("products", "category", "name", "chair")

Generates a full-text search filtered GROUP BY category with COUNT(*). Returns: [{category: "Furniture", count: 127}, {category: "Office", count: 89}, ...]. The Elasticsearch terms aggregation — in one method call. The SQL is generated. The developer writes one line.

aggregate():

gl.aggregate("products", "price", "avg", group_by="category")

Generates: SELECT category, AVG(price) FROM products GROUP BY category. Supports count, sum, avg, min, max. Optional group_by parameter. Optional where clause for filtering. The interface is deliberately simple because the operation is, at its core, simple. It has always been simple. It just lived in the wrong service.

Where SQL Surpasses Elasticsearch Aggregations

Can PostgreSQL replace Elasticsearch aggregations? It can. And I should be forthcoming about something you may not expect: in several meaningful ways, SQL aggregations are not merely equivalent to Elasticsearch’s aggregation DSL. They are more powerful. This is not a claim I make lightly, so allow me to demonstrate.

Window Functions

Window functions compute aggregates alongside individual rows — not just summaries, but summaries in context:

SQL

SELECT name, price, category,
    AVG(price) OVER (PARTITION BY category) AS category_avg,
    price - AVG(price) OVER (PARTITION BY category) AS diff_from_avg,
    RANK() OVER (PARTITION BY category ORDER BY price DESC) AS price_rank
FROM search_products
WHERE search_vector @@ plainto_tsquery('english', 'chair');

This returns each matching product with its price, the average price in its category, how far this product is from that category average, and its price rank within the category. One query. Every row includes both its own data and the aggregate context around it.

The user can see: “This chair costs $150. The average in Furniture is $189. It is $39 below average. It is the 47th most expensive chair in Furniture.” All computed by the database, in one pass, in the same query that found the search results.

Elasticsearch has no equivalent to window functions. Its aggregation DSL produces summaries separate from the result documents. Combining per-document data with aggregate context requires multiple requests or client-side computation. Window functions are, I would suggest, one of SQL’s most valuable capabilities — and one of the most underused in search contexts. I hope to correct that.

JOINs

Aggregate across related tables without denormalization:

SQL

SELECT c.name AS category, COUNT(p.id) AS count, AVG(p.price)::numeric(10,2) AS avg_price
FROM categories c
JOIN search_products p ON p.category_id = c.id
WHERE p.search_vector @@ plainto_tsquery('english', 'chair')
GROUP BY c.name
ORDER BY count DESC;

Elasticsearch requires denormalization — category data must be flattened into each product document before indexing. The category name, the category description, the category hierarchy — all copied into every product document. When a category name changes, every document in that category must be updated and re-indexed.

PostgreSQL aggregates across normalized, relational data. The category name lives in one place. The aggregation is a query, not a data transformation. The data stays clean. I find this a more considerate approach to data management.

CTEs (Common Table Expressions)

Multi-step aggregation pipelines expressed as readable SQL:

SQL

WITH search_results AS (
    SELECT * FROM search_products
    WHERE search_vector @@ plainto_tsquery('english', 'chair')
),
category_stats AS (
    SELECT category,
        COUNT(*) AS count,
        AVG(price)::numeric(10,2) AS avg_price
    FROM search_results
    GROUP BY category
)
SELECT * FROM category_stats
WHERE count > 5
ORDER BY count DESC;

This reads top to bottom: filter search results → compute category statistics → keep only categories with more than 5 results. Each step is named. Each step is readable. Each step can be tested independently.

Elasticsearch’s pipeline aggregations achieve some of this, but the syntax is deeply nested JSON objects — aggregations inside aggregations inside aggregations. The nesting makes debugging difficult and modification error-prone. SQL CTEs are flat, sequential, and debuggable. One reads them the way one reads instructions: step one, then step two, then step three.

HAVING

“Find categories where the average product price in search results is above $100”:

SQL

SELECT category, COUNT(*) AS count, AVG(price)::numeric(10,2) AS avg_price
FROM search_products
WHERE search_vector @@ plainto_tsquery('english', 'chair')
GROUP BY category
HAVING AVG(price) > 100
ORDER BY avg_price DESC;

One clause. In Elasticsearch, this requires a bucket selector pipeline aggregation nested inside the terms aggregation — a structure that is correct and functional but requires the developer to learn Elasticsearch’s specific syntax for a concept that SQL expresses with a single keyword. I respect the Elasticsearch implementation. I also note that HAVING has been available in SQL since 1992.

The Combined Pattern: Search Results + Facets

A real search results page needs search results (ranked), faceted counts (by category, by price range, by brand), and a total result count. Here are three approaches, from simplest to most comprehensive:

Approach 1: Two queries

SQL

-- Search results
SELECT name, category, price, ts_rank(search_vector, q) AS score
FROM search_products, plainto_tsquery('english', 'chair') AS q
WHERE search_vector @@ q
ORDER BY score DESC LIMIT 20;

-- Faceted counts
SELECT category, COUNT(*)
FROM search_products
WHERE search_vector @@ plainto_tsquery('english', 'chair')
GROUP BY category ORDER BY count DESC;

Two queries, same materialized view, same GIN index. Simple and fast. Most applications use this approach, and it is the one I would recommend starting with. Two round trips is rarely a performance concern when each query returns in milliseconds.

Approach 2: Single query with CTE

SQL

WITH results AS (
    SELECT *, ts_rank(search_vector, plainto_tsquery('english', 'chair')) AS score
    FROM search_products
    WHERE search_vector @@ plainto_tsquery('english', 'chair')
)
SELECT 'result' AS type, name, category, price, score, NULL::bigint AS count
FROM results ORDER BY score DESC LIMIT 20
UNION ALL
SELECT 'facet', NULL, category, NULL, NULL, COUNT(*)
FROM results GROUP BY category;

One round trip. Results and facets in a single response. The SQL is more complex, but it eliminates a network round trip — useful for latency-sensitive applications where every millisecond in the API response matters.

Approach 3: Multi-facet navigation

For e-commerce faceted navigation with multiple facet dimensions — category, brand, and price range, each computed from the same search results:

SQL

WITH results AS (
    SELECT * FROM search_products
    WHERE search_vector @@ plainto_tsquery('english', 'chair')
)
SELECT 'category' AS facet, category AS value, COUNT(*) FROM results GROUP BY category
UNION ALL
SELECT 'brand', brand, COUNT(*) FROM results GROUP BY brand
UNION ALL
SELECT 'price_range',
    CASE WHEN price < 50 THEN 'Under $50'
         WHEN price < 100 THEN '$50-$100'
         WHEN price < 200 THEN '$100-$200'
         ELSE 'Over $200' END,
    COUNT(*)
FROM results GROUP BY 2;

Multiple facet dimensions from one base query. One CTE, multiple aggregation passes. This is the e-commerce faceted navigation pattern — the sidebar with category counts, brand counts, and price range counts, all computed from the same search results. Many teams assume this pattern requires Elasticsearch. It requires a CTE and three GROUP BY clauses.

Elasticsearch Aggregations vs. PostgreSQL SQL

Aspect	Elasticsearch Aggregations	PostgreSQL SQL
Terms aggregation	`"aggs": {...}`	`GROUP BY category` + `COUNT(*)`
Metric aggregation	`"aggs": {...}`	`AVG(price)`
Pipeline aggregation	Nested JSON, bucket selectors	CTEs, HAVING, subqueries
Window functions	Not available	`AVG(price) OVER (PARTITION BY category)`
JOINs in aggregation	Not available (requires denormalization)	Standard SQL JOINs
Filtering aggregated results	Bucket selector pipeline	HAVING clause
Combined search + aggs	Single compound query	Two queries or CTE + UNION
Query language	Elasticsearch JSON DSL	SQL
Infrastructure	Elasticsearch cluster	Same database

I present this table without editorial comment. You may draw your own conclusions about which column you find more approachable. I will observe only that one of these query languages is specific to a single product, and the other is an ISO standard you have likely been writing for years.

Honest Boundary

Elasticsearch’s aggregations are optimized for the search-plus-aggregate pattern on inverted indexes at distributed scale. For very large-scale aggregations over billions of documents across a distributed cluster, Elasticsearch’s architecture may provide faster per-query performance. Its aggregation engine is well-engineered, and I acknowledge the work that went into it.

PostgreSQL’s advantage is flexibility and composability. Window functions, JOINs, CTEs, subqueries, HAVING — these compose in ways Elasticsearch’s DSL cannot match. For the vast majority of application workloads — product search, content search, user search — PostgreSQL’s aggregations are fast enough and meaningfully more expressive. The team that believes they need Elasticsearch for aggregations should, I would respectfully suggest, try writing the query in SQL first. They may find it was there all along.

ParadeDB’s pgfaceting extension offers an alternative approach using roaring bitmaps for very high-performance faceting. It is worth knowing about for specialized workloads. For most applications, standard SQL GROUP BY handles faceting without additional extensions — and without learning a new API.

Aggregations are the feature that keeps teams on Elasticsearch after they have considered leaving. I hope this chapter has addressed that concern — and addressed it, in the case of window functions and JOINs, with capabilities that Elasticsearch does not provide. SQL has been doing aggregations quietly and well for a very long time. It did not need a separate service to learn how.

But there is another feature that keeps teams on Elasticsearch, and it is less well-known than aggregations. It is also, if I may say, more surprising.

Reverse search. Instead of “find documents matching this query,” it asks: “find stored queries matching this document.” You store the queries. You feed it documents. It tells you which queries match. Alerts, content classification, email routing, saved search notifications — all from one pattern.

Elasticsearch calls it the percolator. Most developers assume it requires a dedicated search engine. It does not.

PostgreSQL handles it with stored tsquery values and a GIN index. Chapter 11 will show you how — and I suspect it may be the most unexpected chapter in this book. If you will follow me, I should very much like to make the introduction.