Goroutine-Based Query Batching for PostgreSQL: The 10x Throughput Pattern

Q: What is query coalescing in Go, and how does it differ from pgx SendBatch?

Query coalescing collects concurrent point queries from multiple goroutines and rewrites them into a single SELECT ... WHERE id = ANY($1), so PostgreSQL parses, plans, and executes one query instead of many. pgx SendBatch sends multiple individual queries in a single network round trip, saving network time, but PostgreSQL still parses and executes each query separately. The coalescer saves both network time and server-side CPU — particularly planning overhead, which often exceeds execution time for simple primary key lookups.

Q: How do I choose the right batch window duration for a query coalescer?

Start with 500 microseconds for a typical REST API, or 1 millisecond for high-concurrency services like metadata layers. Monitor the average batch size: if it is consistently below 2, the window is adding latency without meaningful batching and you should shorten it or bypass the coalescer. If batches regularly hit the maximum size before the timer fires, the window is longer than necessary. For services with strong diurnal traffic patterns, an adaptive window that shortens during low traffic and lengthens during peaks avoids compromising on either extreme.

Q: Why should I use ANY($1) instead of IN (...) for batched queries in PostgreSQL?

ANY($1) accepts a single array parameter, so the SQL text is identical regardless of batch size. PostgreSQL can prepare the statement once and reuse the cached plan for batches of 50, 200, or 500 IDs. IN ($1, $2, ..., $N) generates a different SQL string for each batch size, creating separate prepared statements, separate plan cache entries, and separate rows in pg_stat_statements. Over time, this quietly degrades performance as the prepared statement cache grows with hundreds of nearly identical entries.

Q: Can a database proxy perform query coalescing automatically, without application code?

Yes. A proxy that sits between your application and PostgreSQL can observe incoming queries, identify repeated point-query patterns across connections, and coalesce them using the same WHERE id = ANY($1) rewrite — transparently, with no query changes. Gold Lapel does this at the proxy layer, which means every service behind it benefits without maintaining per-service coalescer code, per-endpoint tuning, or additional monitoring infrastructure. Add goldlapel-go to your module and the wrapper handles the rest — replacing what would otherwise be hundreds of lines of concurrent Go in each service.

200 goroutines, 200 identical queries, one database. There is a better way, and it involves a channel and some patience.

The Waiter of Gold Lapel · Updated Mar 20, 2026 Published Mar 5, 2026 · 38 min read

Two hundred goroutines arrived simultaneously. We have opened one door. Patience, please.

Good evening. You appear to be doing the same thing 200 times simultaneously.

Go makes concurrency effortless. go func(), a goroutine spins up, the query fires, the result comes back. Multiply by 200 and you have 200 queries executing in parallel. Fast. Clean. Idiomatic.

Also: wasteful. Each of those 200 goroutines sends the exact same SELECT ... WHERE id = $1 query with a different ID. Each one pays the network round-trip cost. Each one asks PostgreSQL to parse the SQL, plan the query, perform an index lookup, and return one row. Two hundred times. For what could have been a single WHERE id = ANY($1) returning all 200 rows in one pass.

This is the peculiar tragedy of Go's concurrency model applied to database access. The language makes it trivially easy to fan out work across goroutines, and the goroutine scheduler makes that fan-out look efficient by parallelizing the wait times. But the database on the receiving end does not experience fan-out as efficiency. It experiences fan-out as 200 separate conversations, 200 separate handshakes, 200 separate trips to the filing cabinet when someone could have asked for all 200 files at once.

I find this behaviour — spawning 200 goroutines to perform 200 identical operations that differ only in their parameter — to be the infrastructural equivalent of sending 200 individual letters to the same address when a single parcel would have sufficed. The postman is polite about it, but he does notice.

The team at lakeFS discovered this pattern while building their open-source data versioning system. Their metadata layer was hammering PostgreSQL with thousands of concurrent point queries per second. Each query was fast individually. Collectively, they saturated their connection pool and drove PostgreSQL CPU to the ceiling. The fix was a query coalescer: a goroutine that collects incoming point queries over a short time window and fuses them into a single multi-row SELECT. The result was 10x throughput improvement with lower database CPU.

This article builds that coalescer from scratch. You will have a working, production-ready implementation by the end. You will understand the SQL mechanics that make ANY($1) faster than repeated point queries. You will know how to test it, tune it, and recognise when it is not the right tool. You will also understand why this is exactly the sort of thing that should not require application-level code to accomplish.

The problem: 200 goroutines, 200 round trips

Consider a team dashboard endpoint. It loads 50-200 users visible to the current team admin. The naive approach, which is also the approach Go's concurrency primitives actively encourage, spawns a goroutine per user.

The naive concurrent approach

// The starting point: 200 goroutines, each fetching one row.
// This is how most Go services hit PostgreSQL.

func GetUser(ctx context.Context, pool *pgxpool.Pool, id int64) (*User, error) {
    var u User
    err := pool.QueryRow(ctx,
        "SELECT id, name, email, plan FROM users WHERE id = $1", id,
    ).Scan(&u.ID, &u.Name, &u.Email, &u.Plan)
    return &u, err
}

// In the HTTP handler:
func handleDashboard(w http.ResponseWriter, r *http.Request) {
    userIDs := extractVisibleUserIDs(r) // 50-200 IDs from a team dashboard

    var wg sync.WaitGroup
    users := make([]*User, len(userIDs))
    errs := make([]error, len(userIDs))

    for i, id := range userIDs {
        wg.Add(1)
        go func(i int, id int64) {
            defer wg.Done()
            users[i], errs[i] = GetUser(r.Context(), db, id)
        }(i, id)
    }
    wg.Wait()

    // 200 goroutines. 200 queries. 200 round trips to PostgreSQL.
    // Each query: ~0.3ms network + ~0.1ms parse + ~0.05ms execute.
    // Total wall time: ~2ms (parallel), but total database time: ~90ms.
    // PostgreSQL processed 200 WHERE id = $1 queries that could have
    // been a single WHERE id = ANY($1) returning all 200 rows at once.
}

Wall-clock time is excellent. Go's goroutine scheduler distributes the queries across the connection pool, pgx handles the multiplexing, and the dashboard loads in 2-3ms. By any latency metric, this looks fine.

By every other metric, it is a problem.

Connection pool pressure. 200 concurrent queries need 200 connections from pgx's pool. The default pool_max_conns is 4 (number of CPUs). Most teams bump it to 20-50. At 200 concurrent goroutines, the pool queues requests, adding tail latency. Under sustained load, the queue grows unbounded.

PostgreSQL backend load. Each query requires a separate parse, plan, and execute cycle. With prepared statements, you save the parse, but the plan and execute remain. PostgreSQL's per-query overhead is small but not zero: catalog cache lookup, buffer pool access, lock acquisition on the index page. Multiply by 200 and it adds up.

Network amplification. Each query generates a separate protocol exchange: Parse, Bind, Describe, Execute, Sync on the way out; ParseComplete, BindComplete, RowDescription, DataRow, CommandComplete, ReadyForQuery on the way back. Twelve protocol messages per query. 2,400 messages for 200 lookups.

Snapshot overhead. Each query, even on the same connection, acquires a transaction snapshot. For a primary key lookup on a small row, the snapshot acquisition can cost more than the actual index scan. PostgreSQL must check the proc array, evaluate visibility, and determine the current transaction horizon. This is fast per-query. It is not fast 200 times per request.

The fundamental issue: the application knows these are all "get user by ID" queries. PostgreSQL does not. It sees 200 independent requests from 200 independent connections. The information that these queries are related exists only in the application's call stack, and it is discarded the moment the goroutines scatter.

What PostgreSQL actually does with those 200 queries

I should be specific about the server-side cost, because the numbers make the case more persuasively than adjectives.

EXPLAIN ANALYZE: single vs coalesced query

-- What PostgreSQL actually does for each of those 200 queries.
-- Run EXPLAIN (ANALYZE, BUFFERS, TIMING) to see it:

EXPLAIN (ANALYZE, BUFFERS, TIMING)
SELECT id, name, email, plan FROM users WHERE id = 42;

--                                  QUERY PLAN
-- Index Scan using users_pkey on users  (cost=0.42..8.44 rows=1 width=72)
--   Index Cond: (id = 42)
--   Buffers: shared hit=3
--   Planning Time: 0.068 ms
--   Execution Time: 0.029 ms

-- 0.068ms planning. 0.029ms executing. Multiply by 200:
-- Planning alone: 13.6ms of CPU time.
-- Execution: 5.8ms.
-- Total server CPU: ~19.4ms for what is logically one operation.

-- Now compare with the coalesced version:
EXPLAIN (ANALYZE, BUFFERS, TIMING)
SELECT id, name, email, plan FROM users WHERE id = ANY(ARRAY[1,2,3,...,200]);

--                                  QUERY PLAN
-- Index Scan using users_pkey on users  (cost=0.42..408.44 rows=200 width=72)
--   Index Cond: (id = ANY ('{1,2,3,...,200}'::bigint[]))
--   Buffers: shared hit=203
--   Planning Time: 0.094 ms
--   Execution Time: 0.312 ms

-- 0.094ms planning (once). 0.312ms executing (once).
-- Total server CPU: ~0.4ms. That is a 48x reduction.

The planning time is the detail that surprises most Go engineers. A primary key lookup takes 0.029ms to execute but 0.068ms to plan. Planning is more expensive than execution. PostgreSQL must parse the SQL text, resolve table and column names against the catalog, check permissions, determine the optimal access path, and emit an execution tree. For a simple WHERE id = $1 with a primary key index, there is only one sensible plan, but PostgreSQL still runs the full planner pipeline to confirm it.

With prepared statements (which pgx uses by default), the parse step is cached. But the plan step is still performed per execution in many cases. PostgreSQL creates a generic plan after five executions of a prepared statement, but it compares the generic plan's estimated cost against a custom plan and may still choose to replan. This is the plan_cache_mode setting introduced in PostgreSQL 12, and its default is auto.

The coalesced query pays the planning cost once. 0.094ms instead of 13.6ms. The execution produces one index scan that walks the B-tree once, following the primary key index pages for all 200 values. PostgreSQL's executor handles ANY(array) by sorting the array values and performing what amounts to a merge between the sorted array and the index leaf pages. This is dramatically more efficient than 200 separate index descents.

I should note: these numbers are from a warm buffer cache on a local instance. Over a network with cold caches, the per-query overhead grows because each index scan may require disk I/O. The coalesced query is even more advantageous in that scenario because it benefits from spatial locality — the 200 index pages it needs are likely adjacent, and prefetching works in its favour.

A brief digression on ANY($1) versus IN (...)

Before we build the coalescer, a SQL detail that matters for production use. The coalescer rewrites 200 point queries into a single WHERE id = ANY($1). You may wonder why not WHERE id IN ($1, $2, ..., $200). The distinction is not academic.

ANY($1) vs IN (...) — parameterization matters

-- WHERE id = ANY($1) vs WHERE id IN (...)
-- Both produce the same query plan. The difference is parameterization.

-- ANY($1) accepts a PostgreSQL array parameter:
SELECT * FROM users WHERE id = ANY($1);
-- $1 is a []int64 in Go, which pgx sends as a PostgreSQL bigint[].
-- One parameter, regardless of how many IDs.

-- IN (...) requires listing values:
SELECT * FROM users WHERE id IN ($1, $2, $3, ..., $200);
-- 200 parameters. 200 placeholders in the SQL string.
-- Each distinct count generates a different prepared statement.

-- With 200 IDs, the IN version generates:
-- "SELECT * FROM users WHERE id IN ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,
--   $11,$12,$13,$14,...,$200)"
-- That is a unique SQL string. PostgreSQL parses it fresh.
-- Tomorrow's request with 187 IDs generates a different SQL string.
-- Another fresh parse.

-- The ANY version is always the same SQL string regardless of ID count:
-- "SELECT * FROM users WHERE id = ANY($1)"
-- One prepared statement. Used for 50 IDs or 500 IDs.
-- PostgreSQL caches the plan and reuses it.

ANY($1) accepts a single array parameter. The SQL text is identical regardless of how many IDs you pass. This means PostgreSQL can prepare the statement once and reuse it for batches of 50, 200, or 500 IDs. One cached plan. One entry in pg_stat_statements. Clean monitoring.

IN ($1, $2, ..., $N) embeds N parameter placeholders in the SQL text. A batch of 200 generates a different SQL string than a batch of 187. Each unique SQL string gets its own prepared statement, its own plan cache entry, and its own row in pg_stat_statements. If your batch sizes vary — and they will — you accumulate hundreds of nearly identical prepared statements that differ only in placeholder count.

Some ORMs and query builders generate the IN variant by default. This is one of those quiet inefficiencies that does not cause an incident but steadily degrades performance as the prepared statement cache grows. If you are building a coalescer by hand, use ANY($1). If your ORM does not support it, this is a reasonable occasion to write raw SQL.

There is one caveat: ANY($1) with very large arrays (10,000+ elements) can cause the planner to produce a suboptimal plan. PostgreSQL may choose a sequential scan if it estimates the array covers a significant fraction of the table. For arrays under 1,000 elements — which is well above any reasonable batch size — this is not a concern. The planner consistently chooses the index scan.

The coalescer pattern: channels as a batching layer

The insight behind the coalescer is that Go channels are a natural batching primitive. A channel is a queue. Goroutines writing to a channel are producers. A single goroutine reading from the channel is the consumer. The consumer can accumulate items over a time window and process them as a batch.

This is a pattern that Go is uniquely well-suited for. In Java, you would need a thread pool and a concurrent queue and a scheduled executor. In Python, you would need asyncio and a complicated dance of futures. In Go, you need a channel and a goroutine. The language primitives are exactly the building blocks the pattern requires.

The architecture is three components.

1. The request channel. Each goroutine that needs a user sends a request containing the user ID and a response channel. The request channel is buffered to avoid blocking producers.

2. The collector loop. A single goroutine reads from the request channel. It waits for the first request, then collects additional requests for a short window (500 microseconds to 2 milliseconds). When the window closes or the batch hits a size limit, it flushes.

3. The batch executor. Takes the collected IDs, deduplicates them, sends one SELECT ... WHERE id = ANY($1), and distributes results back through each request's response channel.

Coalescer types and constructor

// QueryCoalescer collects individual point queries arriving from
// concurrent goroutines and batches them into a single multi-row SELECT.
//
// Instead of 200 x "SELECT ... WHERE id = $1"
// it sends 1 x "SELECT ... WHERE id = ANY($1)" with all 200 IDs.

type QueryCoalescer struct {
    pool     *pgxpool.Pool
    incoming chan *queryRequest
    window   time.Duration // how long to collect before flushing
    maxBatch int           // flush early if we hit this count
}

type queryRequest struct {
    id     int64
    result chan *queryResult
}

type queryResult struct {
    user *User
    err  error
}

func NewQueryCoalescer(pool *pgxpool.Pool, window time.Duration, maxBatch int) *QueryCoalescer {
    qc := &QueryCoalescer{
        pool:     pool,
        incoming: make(chan *queryRequest, maxBatch*2),
        window:   window,
        maxBatch: maxBatch,
    }
    go qc.loop()
    return qc
}

The queryRequest struct is the message that flows through the channel. It carries the input (the ID) and a return address (the result channel). This is a standard Go concurrency pattern: when a goroutine needs to receive a response, it includes a channel in its request. The response channel is buffered with capacity 1, which ensures the sender (the batch executor) never blocks even if the receiver (the calling goroutine) has already returned due to context cancellation.

The incoming channel is buffered at maxBatch*2. This is not arbitrary. During a burst, the collector loop may be executing a batch while new requests arrive. The buffer absorbs one full batch worth of incoming requests during execution. If the buffer fills, producers block on the channel send — which is the correct backpressure behaviour. The alternative is dropping requests, which is never acceptable.

The collector loop: timing the batch window

The collector loop is where the batching happens. It blocks until the first request arrives, then races between collecting more requests and a timer. Whichever fires first — the timer expiring or the batch reaching maximum size — triggers a flush.

The collector loop

// The core loop: collect requests, flush as a batch.
func (qc *QueryCoalescer) loop() {
    for {
        // Block until the first request arrives.
        first, ok := <-qc.incoming
        if !ok {
            return // channel closed, shut down
        }

        // Start collecting. We have our first request.
        batch := []*queryRequest{first}
        timer := time.NewTimer(qc.window)

    collect:
        for {
            select {
            case req, ok := <-qc.incoming:
                if !ok {
                    timer.Stop()
                    break collect
                }
                batch = append(batch, req)
                if len(batch) >= qc.maxBatch {
                    timer.Stop()
                    break collect
                }
            case <-timer.C:
                break collect
            }
        }

        // Flush the batch.
        qc.executeBatch(batch)
    }
}

Several details matter here, and each deserves attention.

First, the loop blocks on the channel read for the initial request. There is no busy-waiting, no polling, no CPU consumed while idle. The goroutine parks until work arrives. This is Go's channel semantics doing exactly what they were designed for. A coalescer serving zero traffic consumes zero CPU. It is the model of a well-behaved servant: invisible when not needed, instantly available when called upon.

Second, the timer starts after the first request, not before. This means a single isolated request waits at most window duration before being executed. In the common case where many goroutines are active simultaneously, the batch fills quickly and the timer never fires. The batch flushes at maxBatch in microseconds.

Third, timer.Stop() is called in both early-exit paths: channel close and max batch size. This prevents a timer leak. Go's time.NewTimer allocates a runtime timer that must be stopped if it has not fired, or it will linger until it fires and then send to a channel nobody is reading. In a long-running service, leaking timers is a slow-motion resource exhaustion.

Fourth, the labelled break collect is idiomatic Go for breaking out of a select inside a for loop. Without the label, break only exits the select. With the label, it exits the for loop. This is one of those Go idioms that reads oddly the first time but becomes natural quickly.

The worst case is a single request arriving during a quiet period. It waits the full window duration (say, 500 microseconds) before executing. For a web service with sub-100ms SLOs, half a millisecond of added latency is invisible. For a latency-critical path, reduce the window or bypass the coalescer for low-concurrency scenarios.

Batch execution and result distribution

The executor takes the accumulated batch, deduplicates IDs (because two goroutines may request the same user within the same window), runs one query, and fans results back out.

Executing the batch and distributing results

// executeBatch sends one query for all collected IDs,
// then distributes results back to waiting goroutines.
func (qc *QueryCoalescer) executeBatch(batch []*queryRequest) {
    // Collect unique IDs. Deduplicate — multiple goroutines may
    // request the same user within the same batch window.
    idSet := make(map[int64][]*queryRequest, len(batch))
    ids := make([]int64, 0, len(batch))
    for _, req := range batch {
        if _, exists := idSet[req.id]; !exists {
            ids = append(ids, req.id)
        }
        idSet[req.id] = append(idSet[req.id], req)
    }

    // One query. One round trip.
    rows, err := qc.pool.Query(context.Background(),
        "SELECT id, name, email, plan FROM users WHERE id = ANY($1)",
        ids,
    )
    if err != nil {
        // Distribute the error to all waiting goroutines.
        for _, reqs := range idSet {
            for _, req := range reqs {
                req.result <- &queryResult{err: err}
            }
        }
        return
    }
    defer rows.Close()

    // Build result map.
    found := make(map[int64]*User, len(ids))
    for rows.Next() {
        var u User
        if err := rows.Scan(&u.ID, &u.Name, &u.Email, &u.Plan); err != nil {
            // Scan error — distribute to all waiters.
            for _, reqs := range idSet {
                for _, req := range reqs {
                    req.result <- &queryResult{err: err}
                }
            }
            return
        }
        found[u.ID] = &u
    }

    // Distribute results. Each goroutine gets its answer.
    for id, reqs := range idSet {
        user, ok := found[id]
        for _, req := range reqs {
            if ok {
                req.result <- &queryResult{user: user}
            } else {
                req.result <- &queryResult{err: pgx.ErrNoRows}
            }
        }
    }
}

The deduplication step is worth highlighting. If 30 out of 200 goroutines request the same user (common for the team admin or a shared reference), the query sends 170 unique IDs instead of 200. The 30 duplicate requests all receive the same result. The database does less work, and the protocol carries fewer bytes.

I should note: the deduplication shares the same *User pointer across all goroutines that requested the same ID. This is safe because the result is read-only — goroutines receive it and use it for rendering or serialization, never mutation. If your use case requires mutating the returned struct, you will need to copy it per-requester. A minor detail, but the sort of thing that surfaces at three in the morning if left unattended.

Error handling follows the fan-out pattern: if the query fails, every waiting goroutine receives the error. If a specific ID is not found, that goroutine receives pgx.ErrNoRows while others receive their results normally. The error boundary is per-request, not per-batch.

There is a subtlety in the scan error path. If a scan fails partway through the result set, goroutines whose rows have already been scanned successfully do not receive their results — they receive the scan error instead. This is a deliberate choice. A partial scan failure indicates something unexpected about the data (a type mismatch, a null in a non-nullable field), and distributing partial results while hiding the error from some goroutines would create an extremely difficult debugging situation. When something is wrong, everyone hears about it. That is the responsible approach.

The caller API: transparent to the consumer

The coalescer exposes the same function signature as a direct query. The caller sends a request to the channel and blocks on its result channel. From the caller's perspective, it called a function and got a result. The batching is invisible.

Caller-facing API

// The caller API: identical signature to a direct query.
// The goroutine blocks on its result channel until the batch executes.
func (qc *QueryCoalescer) GetUser(ctx context.Context, id int64) (*User, error) {
    req := &queryRequest{
        id:     id,
        result: make(chan *queryResult, 1),
    }

    select {
    case qc.incoming <- req:
    case <-ctx.Done():
        return nil, ctx.Err()
    }

    select {
    case res := <-req.result:
        return res.user, res.err
    case <-ctx.Done():
        return nil, ctx.Err()
    }
}

// Usage is identical to the naive version, except the
// coalescer handles batching transparently:
func handleDashboard(w http.ResponseWriter, r *http.Request) {
    userIDs := extractVisibleUserIDs(r)

    var wg sync.WaitGroup
    users := make([]*User, len(userIDs))
    errs := make([]error, len(userIDs))

    for i, id := range userIDs {
        wg.Add(1)
        go func(i int, id int64) {
            defer wg.Done()
            users[i], errs[i] = coalescer.GetUser(r.Context(), id)
        }(i, id)
    }
    wg.Wait()

    // Same interface. Same concurrency pattern.
    // But now: 1 query instead of 200.
}

Context cancellation is handled at both stages: sending the request and waiting for the result. If the caller's context is cancelled (HTTP request timed out, client disconnected), the goroutine unblocks immediately. The request may still be in the batch, but the caller is no longer waiting for it. The result is sent to a buffered channel and garbage-collected.

This transparency is what makes the pattern practical. You do not need to restructure your handlers, change your middleware, or think about batching at the call site. You replace GetUser(ctx, pool, id) with coalescer.GetUser(ctx, id) and the rest of the code is unchanged.

A practical note on wiring the coalescer into your application: create it once, at startup, and inject it into handlers via dependency injection or a shared package. The coalescer is goroutine-safe — that is the entire point. A common pattern is to embed it in your application struct alongside the database pool and config.

Context propagation: the subtle detail

The implementation above uses context.Background() for the batch query. This is the simplest approach, but it has a trade-off worth understanding.

Context propagation strategies

// Context propagation: the subtle detail that matters in production.
//
// The naive version naturally propagates the HTTP request context:
//   GetUser(r.Context(), pool, id)
// If the client disconnects, the context cancels, the query cancels.
//
// The coalescer introduces a wrinkle: the batch query serves
// multiple callers with potentially different contexts.

func (qc *QueryCoalescer) executeBatchWithContext(batch []*queryRequest) {
    // Option 1: Use context.Background() for the batch query.
    // The batch always completes, even if some callers have cancelled.
    // Simple, but the query runs even when nobody is waiting.
    ctx := context.Background()

    // Option 2: Derive a context that cancels when ALL callers cancel.
    // More complex, but avoids wasted work.
    ctx, cancel := mergedContext(batch)
    defer cancel()

    // Option 3 (recommended): Use a timeout derived from your SLO.
    // If the batch query takes longer than your SLO, something is wrong.
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
    defer cancel()

    rows, err := qc.pool.Query(ctx,
        "SELECT id, name, email, plan FROM users WHERE id = ANY($1)",
        collectIDs(batch),
    )
    // ... distribute results as before
}

// mergedContext returns a context that cancels only when every
// input context has cancelled.
func mergedContext(batch []*queryRequest) (context.Context, context.CancelFunc) {
    ctx, cancel := context.WithCancel(context.Background())
    var remaining int32 = int32(len(batch))

    for _, req := range batch {
        go func(reqCtx context.Context) {
            <-reqCtx.Done()
            if atomic.AddInt32(&remaining, -1) == 0 {
                cancel()
            }
        }(req.ctx) // assumes queryRequest carries a ctx field
    }

    return ctx, cancel
}

When 200 goroutines submit requests and 3 of them cancel their contexts (client disconnected, timeout exceeded), the batch query still executes for all 200 IDs. The 3 cancelled goroutines have already returned errors to their callers, so the results are computed and discarded. This is usually fine — the cost of fetching 3 extra rows from an already-running query is negligible.

But if all callers cancel — perhaps a load balancer health check timed out and aborted every in-flight request — the batch query runs to completion against a database that is potentially under stress, returning results that nobody will read. For the common case, this is harmless. For the adversarial case (mass cancellation during a database incident), it wastes a connection.

The recommended approach for most services is Option 3: a static timeout derived from your SLO. If your service's P99 target is 500ms, a 5-second timeout on the batch query catches runaway queries without adding complexity. The merged-context approach (Option 2) is elegant but adds a goroutine per request in the batch, which partially defeats the purpose of reducing goroutine overhead.

I would counsel against over-engineering this. The coalescer's primary value is reducing 200 queries to 1. Whether that 1 query runs with context.Background() or a carefully merged context is, for most services, the difference between a 10x improvement and a 10.01x improvement. Start with the simple version. Optimise the context propagation only if monitoring reveals wasted work during failure scenarios.

How this compares to pgx SendBatch

pgx has a built-in SendBatch mechanism that packs multiple queries into a single protocol exchange. It reduces round trips but does not coalesce queries.

pgx SendBatch comparison

// Alternative: pgx's built-in SendBatch for explicit batching.
// This is useful when you know the full set of IDs upfront.

func GetUsersBatch(ctx context.Context, pool *pgxpool.Pool, ids []int64) ([]*User, error) {
    batch := &pgx.Batch{}

    for _, id := range ids {
        batch.Queue(
            "SELECT id, name, email, plan FROM users WHERE id = $1",
            id,
        )
    }

    br := pool.SendBatch(ctx, batch)
    defer br.Close()

    users := make([]*User, 0, len(ids))
    for range ids {
        var u User
        err := br.QueryRow().Scan(&u.ID, &u.Name, &u.Email, &u.Plan)
        if err != nil {
            return nil, err
        }
        users = append(users, &u)
    }
    return users, nil
}

// SendBatch packs multiple queries into a single protocol exchange.
// PostgreSQL receives them all at once and returns results in order.
//
// Pros: built into pgx, no extra infrastructure.
// Cons: requires knowing all IDs upfront. Not transparent.
//       Sends N separate queries in one round trip —
//       still N parse/plan/execute cycles on the server.
//
// The coalescer approach sends ONE query with ALL IDs.
// One parse. One plan. One index scan returning all rows.
// That is the difference between 10x and 2x.

The distinction is fundamental. SendBatch sends N separate queries in one network round trip. PostgreSQL still parses, plans, and executes each one individually. The coalescer sends one query that returns N rows. PostgreSQL parses it once, plans it once, and performs one index scan that returns all matching rows.

SendBatch saves network time. The coalescer saves network time and server time. On a database under CPU pressure from query planning overhead, the difference is substantial.

SendBatch also requires knowing all the IDs upfront. It is a synchronous API: collect your IDs, build the batch, send it. The coalescer works with goroutines that arrive independently over time. It collects them transparently. The two approaches solve different problems, and the coalescer is the right tool when queries originate from concurrent goroutines that do not know about each other.

There is a third distinction that is often overlooked: prepared statement behaviour. With SendBatch, each queued query uses the same prepared statement (if the SQL text is identical), so the parse cost is paid once. But the bind-and-execute step runs N times. With the coalescer's ANY($1) approach, there is one bind and one execute. On a database with hundreds of concurrent SendBatch operations, the bind/execute overhead becomes measurable — each one acquires lightweight locks and traverses the index independently.

I should be fair to SendBatch: it has genuine advantages. It works with heterogeneous queries — you can batch a SELECT, an UPDATE, and a DELETE in a single round trip. The coalescer is limited to homogeneous point queries on a single table. SendBatch also requires no background goroutine, no channel, no tuning parameters. It is simpler infrastructure. If you know your IDs upfront and your concurrency is moderate, SendBatch is a perfectly reasonable choice. The coalescer earns its complexity when queries arrive from concurrent goroutines that cannot coordinate.

The connection pool effect

The throughput improvement is the headline, but the connection pool impact may be more important for the long-term health of your service. The numbers tell the story.

Approach	Connections needed	Pool queuing	Tail latency
Naive (200 goroutines)	200	Severe	~15ms P99
Naive (pool_max=50)	50 (150 queued)	Heavy	~8ms P99
pgx SendBatch	1	None	~2.5ms P99
Query coalescer	1	None	~1.2ms P99

The naive approach with 200 goroutines needs 200 connections to avoid queuing. No production PostgreSQL instance should have 200 connections per application server. The canonical recommendation is connections = (2 * CPU cores) + effective_spindle_count, which yields 10-20 for most servers. A single dashboard load from a single user exceeds the entire connection budget.

With a pool cap of 50, 150 goroutines queue for a connection. The first 50 execute in ~0.45ms each. The queued goroutines wait for a connection to be released, execute, and release. The P99 latency balloons because the 200th goroutine waits for 3 full cycles of connection reuse.

The coalescer needs one connection. One. The batch query occupies a single connection for ~0.6ms, and the other 19 connections in your pool are available for other work. Under sustained load, when 10 dashboard requests arrive simultaneously, the naive approach needs 2,000 connections. The coalescer needs 10.

This is where the pattern transforms from "nice optimisation" to "architectural necessity." Go services are particularly vulnerable to connection exhaustion because goroutines make it so easy to spawn concurrent database operations. A service with 20 endpoints, each spawning 10-50 concurrent queries, can demand hundreds of connections during a traffic spike. The coalescer collapses that demand by an order of magnitude.

"Go's concurrency model is a gift and a liability in equal measure. Goroutines make parallelism trivial, which means database connection exhaustion is also trivial. The discipline is ensuring that easy concurrency does not become careless concurrency."
— from You Don't Need Redis, Chapter 12: Go: No Fuss, No Nonsense

Benchmarks: where the 10x comes from

Measured on a local PostgreSQL 16 instance, 200 user lookups by primary key, users table with 100,000 rows, pgx v5, Go 1.22.

Approach	Queries sent	Round trips	Server parses	Index scans	Wall time	Throughput
Sequential queries	200	200	200	200	~60ms	1x
Concurrent goroutines (naive)	200	200	200	200	~3ms	~3x
pgx SendBatch	200	1	200	200	~2ms	~4x
Query coalescer (ANY)	1	1	1	1	~0.6ms	~10x
Gold Lapel (automatic)	1	1	1	1	~0.7ms	~9x

The sequential baseline is 60ms. Concurrent goroutines bring it to 3ms by parallelizing the round trips, but the total database work is unchanged. pgx SendBatch eliminates the round trips, bringing it to 2ms, but PostgreSQL still processes 200 individual queries. The coalescer collapses everything into a single query, a single round trip, and a single index scan that returns 200 rows in 0.6ms.

The 10x improvement over naive concurrent goroutines becomes more dramatic at scale. At 1,000 concurrent lookups, the naive approach starts hitting connection pool limits and tail latency spikes. The coalescer stays flat: one query, one round trip, regardless of concurrency.

I should provide context for these numbers. The 10x figure is measured on a local PostgreSQL instance with zero network latency. Over a real network — a separate database server, or worse, a managed database in a different availability zone — the per-query overhead grows because each round trip now includes 0.5-2ms of network time. In that environment, the coalescer's advantage is even larger, because it eliminates 199 round trips that each carry real network cost. I have observed 20-30x improvements when the database is across a network boundary. The local measurement is the conservative case.

Gold Lapel achieves the same result automatically, at the proxy layer. The extra 0.1ms is protocol parsing overhead. The application sends 200 individual queries and Gold Lapel coalesces them before they reach PostgreSQL. No query changes — just install, import, and connect. No coalescer to build or maintain.

Tuning the batch window

The batch window duration is the single most important tuning parameter. It controls the trade-off between batching effectiveness (longer windows collect more queries) and latency (longer windows delay individual requests).

Tuning parameters and metrics

// Tuning the batch window: the central trade-off.
//
// Too short (< 100us): batches are small, limited coalescing benefit.
// Too long (> 5ms):    adds visible latency to individual requests.
// Sweet spot: 500us - 2ms for web services.

// For lakeFS's workload (metadata-heavy Git operations):
//   window: 1ms, maxBatch: 500
//
// For a typical API (dashboard loads, list endpoints):
//   window: 500us, maxBatch: 200
//
// For background jobs (batch processing, ETL):
//   window: 5ms, maxBatch: 1000

coalescer := NewQueryCoalescer(pool, 500*time.Microsecond, 200)

// Monitor batch effectiveness with metrics:
type CoalescerMetrics struct {
    BatchesExecuted  uint64  // total batches sent
    QueriesCoalesced uint64  // total individual requests served
    AvgBatchSize     float64 // QueriesCoalesced / BatchesExecuted
    P99Latency       time.Duration
}

// If AvgBatchSize < 2: your window is too short or concurrency is too low.
//     The coalescer adds overhead without benefit. Use direct queries.
//
// If AvgBatchSize > 100: you might benefit from a longer window.
//     But check P99 — if it exceeds your SLO, the window is too long.
//
// If AvgBatchSize is 5-50: the sweet spot for most web services.

The average batch size is your primary metric. If it is consistently below 2, the coalescer is adding latency without meaningful batching. This happens when your concurrency is low or requests arrive spaced out. Below a batch size of 2, you are better off with direct queries.

If the average batch size exceeds your maxBatch frequently, the window is too long. Batches are filling to capacity before the timer fires. Reduce the window until the timer becomes the more common flush trigger.

For the lakeFS workload — high-concurrency metadata operations within Git-like transactions — a 1ms window with a 500-item max batch was optimal. For a typical REST API serving dashboard pages, 500 microseconds with a 200-item max is a reasonable starting point.

There is a third scenario that catches teams off guard: batch size oscillation. If your traffic is bursty — quiet for 50ms, then 200 requests in 1ms, then quiet again — the average batch size looks healthy but the distribution is bimodal. Most batches are either 1 (during quiet periods, adding unnecessary latency) or 200 (during bursts, where the window is irrelevant because the batch fills instantly). For bursty traffic, a shorter window minimises the quiet-period penalty while the burst-period batching happens naturally through channel buffering.

Adaptive windows: when static tuning is not enough

A static batch window is a compromise. The optimal window depends on traffic volume, which changes throughout the day. During peak hours, a 500-microsecond window collects 50-100 queries per batch. At 3 AM, the same window collects 1-2 queries and adds 500 microseconds of latency to every request for negligible benefit.

Adaptive batch window based on observed traffic

// Adaptive window: adjust the batch window based on observed traffic.
// This avoids the worst case of a static window during low-traffic periods.

type AdaptiveCoalescer struct {
    pool       *pgxpool.Pool
    incoming   chan *queryRequest
    minWindow  time.Duration
    maxWindow  time.Duration
    maxBatch   int

    // Rolling average of batch sizes over the last 100 batches.
    recentSizes [100]int
    sizeIdx     int
    mu          sync.Mutex
}

func (ac *AdaptiveCoalescer) currentWindow() time.Duration {
    ac.mu.Lock()
    defer ac.mu.Unlock()

    var sum int
    var count int
    for _, s := range ac.recentSizes {
        if s > 0 {
            sum += s
            count++
        }
    }
    if count == 0 {
        return ac.minWindow
    }

    avg := float64(sum) / float64(count)

    // Low batch sizes → shorter window (less waiting, less overhead).
    // High batch sizes → longer window (more coalescing benefit).
    if avg < 3 {
        return ac.minWindow
    }
    if avg > 50 {
        return ac.maxWindow
    }

    // Linear interpolation between min and max.
    ratio := (avg - 3) / (50 - 3)
    window := ac.minWindow + time.Duration(
        float64(ac.maxWindow-ac.minWindow)*ratio,
    )
    return window
}

func (ac *AdaptiveCoalescer) recordBatch(size int) {
    ac.mu.Lock()
    ac.recentSizes[ac.sizeIdx%100] = size
    ac.sizeIdx++
    ac.mu.Unlock()
}

The adaptive coalescer tracks recent batch sizes and adjusts the window accordingly. When batches are small (low traffic), it shortens the window to minimise latency overhead. When batches are large (high traffic), it lengthens the window to capture more queries per batch.

This is not strictly necessary — a well-tuned static window works for most services. But if your traffic patterns have a strong diurnal cycle (peak during business hours, quiet overnight) and you have tight latency SLOs, the adaptive approach avoids the compromise of tuning for either extreme.

I should note that this adds meaningful complexity. The rolling average introduces state, the lock introduces contention (minor, but nonzero), and the linear interpolation is a heuristic that may not suit all traffic shapes. If you find yourself tuning the tuner, you have likely overshot. The static window is the right starting point. Graduate to adaptive only if monitoring reveals the static window's limitations.

An honest assessment: when the coalescer is the wrong tool

I should be forthcoming about the pattern's boundaries, because pretending they do not exist would be a disservice to you and an embarrassment to me.

Low-concurrency services. If your service handles 10 requests per second and each request performs 1-3 database queries, the coalescer adds latency and complexity for no benefit. Batching requires concurrent queries to batch. A service that processes requests one at a time has nothing to coalesce. The sweet spot begins around 50 concurrent queries per batch window, which typically means 500+ requests per second with fan-out endpoints.

Non-point queries. The coalescer works for WHERE column = $1 patterns. It does not generalise to range queries (WHERE created_at > $1), multi-column filters (WHERE org_id = $1 AND role = $2), joins, or aggregations. Those are different problems that require different solutions. Attempting to force them into a coalescer creates a maintenance burden disproportionate to the benefit.

Write-heavy workloads. The pattern is designed for reads. You could, in theory, coalesce INSERTs into a single multi-row INSERT, but the error semantics become treacherous. If one row in a batch INSERT violates a unique constraint, the entire batch fails. Distributing that failure correctly to the goroutine that submitted the conflicting row — while succeeding for all others — requires per-row error handling that PostgreSQL's multi-row INSERT does not provide. Use individual INSERTs or explicit transactions for writes.

Latency-sensitive single lookups. If your critical path is a single user lookup that must complete in under 1ms, the coalescer's batch window adds 0.5-2ms of mandatory delay. You can tune the window down to 100 microseconds, but at that point the coalescer rarely collects more than one query and you are paying overhead for no batching. For latency-critical single lookups, a direct query is faster.

Services behind connection poolers. If your Go service connects through PgBouncer in transaction mode, the coalescer's ANY($1) query works fine — it is a single statement, no transaction required. But if PgBouncer is already providing connection multiplexing that mitigates the pool exhaustion problem, the coalescer's value is reduced to server-side CPU savings. Still meaningful, but less dramatic.

The pattern is powerful precisely because it targets a specific, common scenario: many goroutines concurrently fetching individual rows by a single key from the same table. If that describes your workload, the coalescer is transformative. If it does not, the coalescer is overhead.

Testing the coalescer

Testing concurrent batching logic is notoriously tricky. A unit test that calls GetUser once in a single goroutine proves nothing — it exercises the degenerate case where the batch contains one item. The coalescer's value is emergent, and your tests must create the conditions for it to emerge.

Testing batching behaviour, cancellation, and deduplication

// Testing a coalescer is tricky because the value is emergent —
// it only helps when multiple goroutines are active simultaneously.
// Unit tests with a single goroutine prove nothing.

func TestCoalescerBatching(t *testing.T) {
    pool := setupTestDB(t)
    seedUsers(t, pool, 1000) // seed 1000 test users

    qc := NewQueryCoalescer(pool, 1*time.Millisecond, 100)
    defer qc.Close()

    // Track how many actual queries hit the database.
    var queryCount atomic.Int64
    // (In production, instrument the pool or use pg_stat_statements.)

    const concurrency = 100
    var wg sync.WaitGroup
    results := make([]*User, concurrency)
    errs := make([]error, concurrency)

    // Launch all goroutines simultaneously using a barrier.
    barrier := make(chan struct{})
    for i := 0; i < concurrency; i++ {
        wg.Add(1)
        go func(idx int) {
            defer wg.Done()
            <-barrier // wait for the starting gun
            results[idx], errs[idx] = qc.GetUser(
                context.Background(), int64(idx+1),
            )
        }(i)
    }

    // Fire the starting gun.
    close(barrier)
    wg.Wait()

    // Verify correctness: every goroutine got its result.
    for i := 0; i < concurrency; i++ {
        if errs[i] != nil {
            t.Errorf("goroutine %d: unexpected error: %v", i, errs[i])
            continue
        }
        if results[i].ID != int64(i+1) {
            t.Errorf("goroutine %d: expected ID %d, got %d",
                i, i+1, results[i].ID)
        }
    }

    // Verify batching: should be far fewer batches than goroutines.
    batches := qc.batchCount.Load()
    requests := qc.requestCount.Load()
    avgBatchSize := float64(requests) / float64(batches)

    t.Logf("batches=%d requests=%d avg=%.1f", batches, requests, avgBatchSize)

    if batches >= uint64(concurrency)/2 {
        t.Errorf("expected significant batching, got %d batches for %d requests",
            batches, concurrency)
    }
    if avgBatchSize < 5.0 {
        t.Errorf("average batch size %.1f is too low — coalescing not working",
            avgBatchSize)
    }
}

func TestCoalescerContextCancellation(t *testing.T) {
    pool := setupTestDB(t)
    qc := NewQueryCoalescer(pool, 10*time.Millisecond, 100) // long window
    defer qc.Close()

    ctx, cancel := context.WithTimeout(context.Background(), 1*time.Millisecond)
    defer cancel()

    // The context expires before the batch window closes.
    _, err := qc.GetUser(ctx, 1)
    if err != context.DeadlineExceeded {
        t.Errorf("expected DeadlineExceeded, got %v", err)
    }
}

func TestCoalescerDeduplication(t *testing.T) {
    pool := setupTestDB(t)
    seedUsers(t, pool, 10)

    qc := NewQueryCoalescer(pool, 1*time.Millisecond, 100)
    defer qc.Close()

    // 50 goroutines all requesting the same user.
    const concurrency = 50
    var wg sync.WaitGroup
    results := make([]*User, concurrency)

    barrier := make(chan struct{})
    for i := 0; i < concurrency; i++ {
        wg.Add(1)
        go func(idx int) {
            defer wg.Done()
            <-barrier
            results[idx], _ = qc.GetUser(context.Background(), 42)
        }(i)
    }
    close(barrier)
    wg.Wait()

    // All 50 goroutines should get the same user.
    for i, u := range results {
        if u == nil || u.ID != 42 {
            t.Errorf("goroutine %d: expected user 42, got %v", i, u)
        }
    }

    // The batch should have sent only 1 unique ID to PostgreSQL.
    // (Verify via pg_stat_statements or instrumented pool.)
}

Three test categories matter.

Correctness under concurrency. Launch N goroutines simultaneously (using a barrier channel), each requesting a different ID. Verify that every goroutine receives the correct result. This catches off-by-one errors in the ID-to-result mapping, race conditions in the deduplication map, and channel misroutings.

Batching effectiveness. With 100 concurrent goroutines and a 1ms window, the coalescer should produce far fewer than 100 batches. If the batch count equals the goroutine count, the collector loop is not waiting long enough, the channel is unbuffered, or goroutines are not arriving concurrently. The barrier pattern — all goroutines block on a closed channel, then race to submit — creates the simultaneous arrival that the coalescer is designed for.

Context cancellation. A context that expires before the batch window closes should return immediately with context.DeadlineExceeded, not block until the batch fires. This test catches missing context checks in the caller API.

I should note that these tests are inherently non-deterministic. Goroutine scheduling, timer resolution, and channel contention all vary between runs. Do not assert exact batch counts — assert ranges. "Fewer than N/2 batches for N goroutines" is a stable assertion. "Exactly 3 batches" is a flaky test waiting to fail on a busy CI server.

The complete implementation

Here is the full, production-ready coalescer with graceful shutdown, context handling, and basic metrics. Copy it. Adapt it to your schema. Measure the batch sizes.

Complete production-ready coalescer

// Production-ready coalescer with context, metrics, and graceful shutdown.

package main

import (
    "context"
    "fmt"
    "log"
    "sync"
    "sync/atomic"
    "time"

    "github.com/jackc/pgx/v5"
    "github.com/jackc/pgx/v5/pgxpool"
)

type User struct {
    ID    int64
    Name  string
    Email string
    Plan  string
}

type QueryCoalescer struct {
    pool     *pgxpool.Pool
    incoming chan *queryRequest
    window   time.Duration
    maxBatch int
    done     chan struct{}

    // Metrics
    batchCount   atomic.Uint64
    requestCount atomic.Uint64
}

type queryRequest struct {
    id     int64
    result chan *queryResult
}

type queryResult struct {
    user *User
    err  error
}

func NewQueryCoalescer(pool *pgxpool.Pool, window time.Duration, maxBatch int) *QueryCoalescer {
    qc := &QueryCoalescer{
        pool:     pool,
        incoming: make(chan *queryRequest, maxBatch*2),
        window:   window,
        maxBatch: maxBatch,
        done:     make(chan struct{}),
    }
    go qc.loop()
    return qc
}

func (qc *QueryCoalescer) Close() {
    close(qc.incoming)
    <-qc.done
}

func (qc *QueryCoalescer) GetUser(ctx context.Context, id int64) (*User, error) {
    req := &queryRequest{
        id:     id,
        result: make(chan *queryResult, 1),
    }

    select {
    case qc.incoming <- req:
    case <-ctx.Done():
        return nil, ctx.Err()
    }

    select {
    case res := <-req.result:
        return res.user, res.err
    case <-ctx.Done():
        return nil, ctx.Err()
    }
}

func (qc *QueryCoalescer) loop() {
    defer close(qc.done)
    for {
        first, ok := <-qc.incoming
        if !ok {
            return
        }
        batch := []*queryRequest{first}
        timer := time.NewTimer(qc.window)
    collect:
        for {
            select {
            case req, ok := <-qc.incoming:
                if !ok {
                    timer.Stop()
                    break collect
                }
                batch = append(batch, req)
                if len(batch) >= qc.maxBatch {
                    timer.Stop()
                    break collect
                }
            case <-timer.C:
                break collect
            }
        }
        qc.executeBatch(batch)
    }
}

func (qc *QueryCoalescer) executeBatch(batch []*queryRequest) {
    qc.batchCount.Add(1)
    qc.requestCount.Add(uint64(len(batch)))

    idSet := make(map[int64][]*queryRequest, len(batch))
    ids := make([]int64, 0, len(batch))
    for _, req := range batch {
        if _, exists := idSet[req.id]; !exists {
            ids = append(ids, req.id)
        }
        idSet[req.id] = append(idSet[req.id], req)
    }

    rows, err := qc.pool.Query(context.Background(),
        "SELECT id, name, email, plan FROM users WHERE id = ANY($1)", ids,
    )
    if err != nil {
        for _, reqs := range idSet {
            for _, req := range reqs {
                req.result <- &queryResult{err: err}
            }
        }
        return
    }
    defer rows.Close()

    found := make(map[int64]*User, len(ids))
    for rows.Next() {
        var u User
        if scanErr := rows.Scan(&u.ID, &u.Name, &u.Email, &u.Plan); scanErr != nil {
            for _, reqs := range idSet {
                for _, req := range reqs {
                    req.result <- &queryResult{err: scanErr}
                }
            }
            return
        }
        found[u.ID] = &u
    }

    for id, reqs := range idSet {
        user, ok := found[id]
        for _, req := range reqs {
            if ok {
                req.result <- &queryResult{user: user}
            } else {
                req.result <- &queryResult{err: pgx.ErrNoRows}
            }
        }
    }
}

func main() {
    pool, err := pgxpool.New(context.Background(),
        "postgres://user:pass@localhost:5432/mydb?pool_max_conns=20",
    )
    if err != nil {
        log.Fatal(err)
    }
    defer pool.Close()

    qc := NewQueryCoalescer(pool, 500*time.Microsecond, 200)
    defer qc.Close()

    // Simulate 200 concurrent lookups.
    var wg sync.WaitGroup
    start := time.Now()
    for i := int64(1); i <= 200; i++ {
        wg.Add(1)
        go func(id int64) {
            defer wg.Done()
            user, err := qc.GetUser(context.Background(), id)
            if err != nil {
                log.Printf("id=%d err=%v", id, err)
                return
            }
            _ = user
        }(i)
    }
    wg.Wait()

    elapsed := time.Since(start)
    batches := qc.batchCount.Load()
    requests := qc.requestCount.Load()
    fmt.Printf("200 lookups in %v (%d batches, avg size %.1f)\n",
        elapsed, batches, float64(requests)/float64(batches))
}

Extending this to multiple query types is straightforward: create one coalescer per query pattern. A UserCoalescer, an OrderCoalescer, a ProductCoalescer. Each one batches its respective point queries independently. The pattern generalizes to any SELECT ... WHERE column = $1 that runs concurrently.

It does not generalize easily to range queries, joins, or queries with multiple filter conditions. Those are different problems. The coalescer pattern is specifically for the case where many goroutines are fetching individual rows by a single key — which happens to be one of the most common query patterns in web applications.

A note on the main() function: the simulation launches 200 goroutines simultaneously, which is the ideal case for the coalescer. In production, goroutines arrive over time rather than all at once. The batch sizes you observe in the simulation will be larger than production batch sizes. Use the simulation to verify correctness and measure ceiling performance. Use production metrics to evaluate actual effectiveness.

Operational considerations for production

The coalescer is a piece of concurrent infrastructure running in your application process. It deserves the same operational attention as your HTTP server or database pool.

Graceful shutdown. The Close() method closes the incoming channel and waits for the loop goroutine to drain remaining batches. Call it during your shutdown sequence, after your HTTP server stops accepting new requests but before closing the database pool. The order matters: if you close the pool before draining the coalescer, in-flight batches fail.

Health checks. A coalescer with a growing channel backlog is a symptom of slow batch execution — typically caused by database latency or connection pool exhaustion. Expose the channel length (len(qc.incoming)) as a metric. If it consistently exceeds maxBatch, batches are not draining fast enough. Either the database is slow, the pool is too small, or the batch window is too long.

Observability. The metrics in the implementation (batch count and request count) give you the average batch size. In production, you will also want a latency histogram: how long does a caller wait between submitting a request and receiving a result? This captures both the batch window delay and the query execution time. If P99 exceeds your SLO, the batch window is the first suspect.

Multiple coalescers per service. Each query pattern gets its own coalescer. This is fine for 3-5 patterns. At 20+ patterns, the management overhead (initialisation, shutdown, monitoring) becomes burdensome. If you find yourself building a registry of coalescers with dynamic creation, you have outgrown the application-level pattern and should consider infrastructure-level coalescing.

Memory pressure during bursts. Each in-flight request allocates a queryRequest struct and a buffered channel. At 10,000 concurrent requests, that is 10,000 allocations. Go's garbage collector handles this efficiently, but the allocations are visible in heap profiles. If you observe GC pressure during traffic spikes, a sync.Pool for queryRequest structs can reduce allocation rate. In practice, this optimisation is rarely necessary — the GC cost is dwarfed by the database savings.

The pattern in the wider Go ecosystem

The coalescer pattern is not unique to lakeFS. It appears, in various forms, across the Go ecosystem wherever concurrent database access creates a fan-out problem.

graph-gophers/dataloader implements the same concept for GraphQL resolvers, where field-level resolution naturally creates N+1 query patterns across goroutines. Facebook's original DataLoader (JavaScript) popularised the approach; the Go port adapts it to goroutines and channels.

vikstrous/dataloadgen takes a code-generation approach: you define your loader function and batch function, and it generates the channel-based coalescer with type safety. This eliminates the boilerplate of writing a coalescer per query type, at the cost of a code generation step in your build.

The Sourcegraph codebase uses batched loading extensively in their code intelligence layer, where resolving symbols across thousands of files generates thousands of concurrent database lookups. Their approach is similar to the coalescer presented here, with additional complexity for cross-repository lookups.

The underlying idea — collect concurrent requests and execute them as a batch — is a general concurrency pattern. In the database world, it is sometimes called "request coalescing" or "query batching." In networking, the same idea appears as Nagle's algorithm (buffer small writes, send them together). In operating systems, it appears as I/O merging in the block layer. The principle is universal: when many small operations can be combined into one large operation with lower per-item overhead, the combination wins.

Why this should not be application code

The coalescer works. It delivers 10x throughput. It is a satisfying piece of engineering. And it is approximately 200 lines of Go that you now need to maintain, test, monitor, and debug in every service that could benefit from it.

Consider what the pattern requires at the application level.

One coalescer instance per query pattern per service. A service with 8 point-query endpoints needs 8 coalescers, each with its own channel, goroutine, and tuning parameters.

Monitoring for each coalescer: batch sizes, flush frequencies, queue depths, latency histograms. Without monitoring, you cannot tune the window, and a poorly tuned window either wastes the optimization or adds latency.

Integration tests that verify batching behavior under concurrent load. Unit tests are insufficient because the coalescer's value is emergent — it only helps when multiple goroutines are active simultaneously.

Graceful shutdown handling. The coalescer's goroutine must drain its channel before the process exits, or in-flight requests are dropped.

Team knowledge. Every engineer who touches a database query must know the coalescer exists and choose to use it. A new endpoint that performs point queries without routing through the coalescer silently forgoes the optimisation. There is no lint rule, no compiler warning, no safety net. The pattern is opt-in per call site, and opt-in patterns have a half-life measured in team rotations.

Cross-service coordination. If you have 5 Go services that hit the same PostgreSQL instance, each needs its own coalescer infrastructure. The tuning parameters differ per service because traffic patterns differ. A batch window that works for the API gateway is wrong for the background worker. You are now maintaining 5 independent instances of the same pattern with 5 independent tuning configurations.

And the fundamental limitation: the coalescer can only batch queries it knows about. If a new endpoint performs point queries without using the coalescer, the optimization silently does not apply. There is no safety net.

Gold Lapel performs this coalescing at the proxy layer, where it sees every query from every service. It identifies repeated point-query patterns automatically, coalesces them using the same WHERE id = ANY($1) rewrite, and tunes the batch window based on observed traffic. No per-service code. No per-endpoint configuration. No monitoring infrastructure to maintain. One connection string change, and every point query across every service gets the benefit.

The coalescer pattern in this article is real, and it works. The lakeFS team shipped it and measured the 10x gain. But the question it leaves you with is not "can I build this?" — you now have the code. The question is "should every Go service that talks to PostgreSQL carry its own batching infrastructure?" The answer, respectfully, is no. That is infrastructure's job. And a well-run household does not ask every guest to manage the plumbing.

Frequently asked questions

What is query coalescing in Go, and how does it differ from pgx SendBatch?

How do I choose the right batch window duration for a query coalescer?

Why should I use ANY($1) instead of IN (...) for batched queries in PostgreSQL?

Can a database proxy perform query coalescing automatically, without application code?

Terms referenced in this article

The batching pattern you have just learned is, at its heart, an elegant cure for the N+1 problem. I have written a companion piece on N+1 queries that examines the affliction from every angle — detection, ORM-level fixes, and the proxy-based approaches that catch what code review misses.