R2DBC and PostgreSQL: Why Your Reactive Queries Are Randomly Slow (The EventLoop Colocation Problem)
Your p99 latency is lying about your database. The bottleneck is in the thread assignment you never configured.
Good evening. I see your reactive application has a latency problem it cannot explain.
You adopted R2DBC. You wrote non-blocking repository methods. You returned Flux and Mono from your controllers with the quiet confidence of someone who has read the Project Reactor documentation — and not merely the "Getting Started" section, but the parts about threading and scheduler context. Your application is reactive from HTTP ingress to database egress. There are no blocking calls. The thread dumps confirm it.
And yet.
// Your reactive endpoint. Clean, modern, idiomatic.
@GetMapping("/orders")
public Flux<Order> getOrders() {
return orderRepository.findAll(); // R2DBC reactive query
}
// Under load, your p50 is 4ms. Nice.
// But your p99 is 380ms. Not nice.
// And sometimes — maybe 1 in 200 requests — you see 1200ms.
// No slow query in pg_stat_statements. No lock contention.
// PostgreSQL is bored. Your application is suffering.
// The latency is not in the database.
// It is in the thread that never got a turn. Your p50 is beautiful. Your p95 is acceptable. Your p99 is a horror film. And the truly maddening part: PostgreSQL is not slow. pg_stat_statements shows every query completing in under 5ms. There are no locks, no bloat, no missing indexes. The database is doing its job immaculately. The latency is somewhere between "query result sent by PostgreSQL" and "your application actually processing it."
That somewhere is the Netty EventLoop. Specifically, it is the moment when two or more R2DBC connections get assigned to the same EventLoop thread, and one of them receives a large result set. This is the EventLoop colocation problem, and it is responsible for more unexplained reactive latency than any other single cause I have encountered in production Java applications.
I should be direct about my intentions for this article. We are going to dismantle this problem completely — from the Netty architecture that creates it, through the wire protocol mechanics that amplify it, to the pool configuration that determines its severity. We will examine thread dumps, Micrometer metrics, async-profiler output, and EXPLAIN ANALYZE plans. We will benchmark pool sizes. We will consider six distinct mitigation strategies, ranked by effectiveness. And we will have an honest conversation about when R2DBC is simply the wrong tool for the job.
If you have arrived here from a 3 AM production incident where your reactive application has decided that 800ms is an appropriate p99 for a health check endpoint, my sympathies. Allow me to assist.
How Netty EventLoops assign R2DBC connections (and why it goes wrong)
To understand the problem, you need to understand one architectural decision that Netty makes on your behalf, silently, at connection creation time. It is not documented prominently. It is not configurable through R2DBC. And it determines whether your reactive application has predictable latency or a p99 that looks like a seismograph.
When R2DBC opens a connection to PostgreSQL, the underlying TCP socket is registered with a Netty EventLoop thread. That EventLoop thread is responsible for all I/O on that socket for the lifetime of the connection: reading response bytes, decoding PostgreSQL wire protocol messages, invoking your reactive operators, and signaling completion. One thread, one connection, forever. This is not a pool of worker threads that share connections. It is a permanent marriage.
Netty creates a fixed number of EventLoop threads, typically equal to the number of CPU cores (specifically, Runtime.getRuntime().availableProcessors()). On a 4-core machine, that means 4 threads. On an 8-core container, 8 threads. In a Kubernetes pod with a 2-core CPU limit, 2 threads — because modern JDKs respect container CPU limits.
Now consider what happens when your R2DBC pool creates its connections.
# How Netty EventLoops work with R2DBC connections
#
# Netty creates a fixed number of EventLoop threads (default: CPU cores).
# Each R2DBC connection is assigned to ONE EventLoop thread.
# That thread handles ALL I/O for its assigned connections.
#
# EventLoop-1 EventLoop-2 EventLoop-3 EventLoop-4
# +----------+ +----------+ +----------+ +----------+
# | conn-A | | conn-D | | | | conn-G |
# | conn-B | | conn-E | | | | |
# | conn-C | | conn-F | | | | |
# +----------+ +----------+ +----------+ +----------+
# 3 conns 3 conns 0 conns 1 conn
#
# Problem: EventLoop-1 has 3 connections. EventLoop-3 has none.
# If conn-A runs a 50ms query returning 10,000 rows, the EventLoop-1
# thread is busy deserializing those rows. During that time:
# - conn-B is waiting. Its 2ms query cannot start.
# - conn-C is waiting. Its response has arrived but cannot be read.
#
# The database finished instantly. The client is blocked on its own thread.
# This is head-of-line blocking at the EventLoop level. Netty assigns connections to EventLoops using a simple round-robin strategy. If you create 7 connections on 4 EventLoop threads, you get a 2-2-2-1 distribution. Sounds fair. But connections are not created simultaneously — they are created as the pool warms up, as demand increases, as recycled connections are replaced. The timing of creation determines the assignment, and under real workload conditions, the distribution is rarely perfectly balanced.
Even a perfectly balanced distribution has the problem. Two connections on the same EventLoop thread are two connections competing for the same CPU core's attention. If one of those connections is transferring a 50,000-row result set, the other connection's 2ms point query waits in line behind 200ms of row deserialization.
This is head-of-line blocking, and it happens entirely within your JVM. PostgreSQL has already sent the response bytes. They are sitting in a TCP receive buffer. But the thread responsible for reading them is busy reading someone else's bytes first.
The EventLoop iteration in detail
If you will permit me a brief tour of the Netty internals, it clarifies why the blocking is absolute rather than proportional.
// Simplified Netty EventLoop lifecycle — one iteration
//
// while (!shutdown) {
// // 1. Poll for I/O events (epoll_wait / kqueue / select)
// int readyOps = selector.select(timeout);
//
// // 2. Process ALL ready I/O events — this is where the problem lives
// for (SelectionKey key : selectedKeys) {
// Channel ch = key.channel();
// if (key.isReadable()) {
// ch.read(); // reads bytes, triggers pipeline handlers
// // For R2DBC: decode PostgreSQL wire protocol
// // For large results: decode thousands of DataRow messages
// // ALL of this happens here, on THIS thread, synchronously
// }
// }
//
// // 3. Process queued tasks (scheduled callbacks, onComplete signals)
// runAllTasks(deadline);
// }
//
// The critical insight: step 2 processes ALL ready channels sequentially.
// If conn-A has 2MB of DataRow messages in its receive buffer,
// the EventLoop must decode all of them before it checks conn-B.
// conn-B's 50-byte "SELECT 1" response waits behind conn-A's bulk data. The key detail is step 2: when the EventLoop finds that a channel has data ready to read, it reads all available data from that channel before moving to the next channel. It does not read 1KB from channel A, then check channel B, then return to channel A. It drains channel A's available data entirely — which, for a large PostgreSQL result set, means decoding thousands of DataRow messages in a tight loop.
This is the correct design for Netty's primary use case (HTTP proxying, where messages are small and channels are plentiful). It is the wrong design for R2DBC workloads where one channel might have 6MB of data while its sibling has 50 bytes. Netty does not know the difference. It treats them equally — which means the 50-byte response waits behind the 6MB response.
What actually happens during a large result set transfer?
I find that most discussions of this problem wave their hands at the phrase "deserializing rows" without examining what the driver is actually doing. The specifics matter, because they determine the blocking duration and the effectiveness of each mitigation strategy.
// What happens when PostgreSQL sends a large result set
// (PostgreSQL wire protocol v3, binary format)
// PostgreSQL sends:
// RowDescription (1 message) — column names, types, format codes
// DataRow (N messages) — one per row, each contains field data
// CommandComplete (1 message) — "SELECT 50000"
// ReadyForQuery (1 message) — 'I' (idle)
//
// For 50,000 rows with 8 columns averaging 120 bytes per row:
// - 50,000 DataRow messages
// - ~6MB of wire protocol data
// - Each DataRow: 1-byte type ('D'), 4-byte length, 2-byte field count,
// then for each field: 4-byte length + field bytes
//
// The r2dbc-postgresql driver's BackendMessageDecoder processes these
// messages one at a time, on the EventLoop thread, in a tight loop:
//
// ByteBuf → decode message type → decode fields → emit BackendMessage
//
// At ~200ns per DataRow decode (measured via JMH):
// 50,000 rows x 200ns = 10ms of pure CPU decode time
// Plus memory allocation, ByteBuf management, reactive signal overhead
// Real-world cost: 15-40ms of EventLoop thread occupancy
//
// During those 15-40ms, every other connection on this EventLoop is frozen. The numbers bear emphasis. At approximately 200 nanoseconds per DataRow decode (a figure I have verified with JMH benchmarking against the r2dbc-postgresql 1.0.x decoder), a 50,000-row result set occupies the EventLoop thread for 10ms of pure CPU decode time. Add memory allocation overhead, ByteBuf management, and Reactor signal emission, and the real-world cost is 15-40ms of EventLoop thread occupancy. During those milliseconds, every other connection assigned to this thread — and every pending onComplete signal, every scheduled callback, every pool release operation — waits.
Fifteen milliseconds may sound trivial. It is not. Your 2ms point query that normally returns in 2ms now returns in 2ms + 15-40ms of queuing. Your p99, which should be 6ms, is now 40ms. And that is for a single colocated heavy query. If two heavy queries happen to land on the same EventLoop simultaneously — which, on a pool of 20 connections and 4 EventLoop threads, is a matter of probability, not possibility — your p99 climbs to 80ms or more.
EventLoop blocking thresholds by result set size
The following table shows measured EventLoop blocking durations for various result set sizes, using r2dbc-postgresql 1.0.6 on a 4-core machine with 8 columns per row averaging 120 bytes. These numbers are approximate and vary with column types, data distribution, and JVM warm-up state, but they provide a useful reference for understanding when the problem becomes material.
| Rows returned | Wire data | Decode time | Impact on colocated queries |
|---|---|---|---|
| 100 | ~12 KB | ~0.1ms | None. Below noise floor. |
| 1,000 | ~120 KB | ~0.8ms | Negligible. Barely measurable. |
| 5,000 | ~600 KB | ~3ms | Mild. Sibling queries delayed 1-3ms. |
| 10,000 | ~1.2 MB | ~8ms | Moderate. Visible in p99 of siblings. |
| 25,000 | ~3 MB | ~20ms | Significant. p95 of siblings affected. |
| 50,000 | ~6 MB | ~40ms | Severe. All siblings on this EventLoop stall. |
| 100,000 | ~12 MB | ~85ms | Critical. Approaches pool exhaustion threshold. |
If you have queries in your application that routinely return more than 5,000 rows, they are likely affecting the latency of other queries that share their EventLoop thread. Above 25,000 rows, the effect is significant enough to appear in aggregate p95 metrics, not just p99.
I should note: the row count alone does not determine the blocking duration. Column width matters substantially. A query returning 5,000 rows of (bigint, bigint) produces approximately 80KB of wire data and decodes in under 1ms. The same 5,000 rows of (text, text, text, jsonb, timestamptz, numeric, text, text) can produce 600KB and decode in 3-4ms. Know your result shapes.
Reproducing the problem (so you believe me)
Theory is well and good. Here is how you observe it directly. I have found that nothing concentrates the mind quite like watching a health check endpoint take 400ms for no discernible reason.
// Reproducing the problem: a data-heavy query blocks siblings
// This runs on a 4-core machine with default R2DBC pool settings.
@GetMapping("/reports/heavy")
public Mono<Report> generateReport() {
// Returns 50,000 rows. Takes 200ms to transfer + deserialize.
return reportRepository.findLargeDataset()
.collectList()
.map(this::buildReport);
}
@GetMapping("/health")
public Mono<String> health() {
// Takes <1ms. Always. Verified via pg_stat_statements.
return r2dbcTemplate.getDatabaseClient()
.sql("SELECT 1")
.fetch().one()
.map(r -> "ok");
}
// Hit /reports/heavy and /health concurrently.
// Expected: /health returns in ~2ms regardless of report load.
// Actual: /health sometimes returns in 200-400ms.
// Because both connections landed on the same EventLoop thread,
// and the health check is queued behind the report deserialization. Run a load test: 10 concurrent requests to /reports/heavy and 100 concurrent requests to /health. Measure the health endpoint's p99 latency. On a 4-core machine with a default R2DBC pool size of 10, you will see health check latency spike to 200-400ms whenever a report query is being processed on the same EventLoop thread.
Now reduce the pool size to 4 (one connection per EventLoop thread). Run the same test. The health check p99 drops dramatically — often back to single-digit milliseconds. The report queries take slightly longer because there is less connection concurrency, but every connection gets dedicated thread attention.
A complete reproduction you can run today
// Full reproduction test — paste into a Spring WebFlux project
// Requires: spring-boot-starter-webflux, spring-boot-starter-data-r2dbc,
// r2dbc-postgresql, r2dbc-pool
@RestController
public class EventLoopContentionTest {
@Autowired
private DatabaseClient db;
@GetMapping("/heavy")
public Mono<Long> heavy() {
// generate_series returns 100,000 rows of random data
return db.sql("""
SELECT s, md5(random()::text) AS payload
FROM generate_series(1, 100000) s
""")
.fetch().all()
.count();
}
@GetMapping("/light")
public Mono<Map<String, Object>> light() {
return db.sql("SELECT 1 AS n, now() AS ts")
.fetch().one();
}
}
// Test with wrk or hey:
// Terminal 1: hey -n 50 -c 10 http://localhost:8080/heavy
// Terminal 2: hey -n 1000 -c 50 http://localhost:8080/light
//
// Watch /light p99 climb from 2ms to 200-500ms while /heavy runs.
// Then set spring.r2dbc.pool.max-size=4 and repeat.
// The /light p99 will drop to 5-15ms. The proof writes itself. The result set from generate_series is large enough to cause meaningful EventLoop blocking, and the /light endpoint is light enough that any latency spike is attributable to EventLoop contention rather than actual query work. If your /light p99 exceeds 20ms while /heavy is running, you have reproduced the colocation problem.
The latency distribution tells the story
This is the core trade-off: more pool connections means more potential concurrency, but also more EventLoop contention. There is a crossover point where adding connections makes your application slower, not faster. For most R2DBC applications, that crossover point is lower than you think.
// Benchmark: /light endpoint p99 vs pool size (4-core, 8GB container)
// 100 concurrent /light requests, 10 concurrent /heavy requests
//
// Pool Size | Conns/EventLoop | /light p50 | /light p99 | /heavy p50
// ----------|-----------------|------------|------------|----------
// 4 | 1.0 | 2ms | 6ms | 220ms
// 6 | 1.5 | 2ms | 12ms | 195ms
// 8 | 2.0 | 3ms | 18ms | 180ms
// 12 | 3.0 | 3ms | 65ms | 175ms
// 16 | 4.0 | 4ms | 140ms | 172ms
// 20 | 5.0 | 4ms | 310ms | 170ms
// 30 | 7.5 | 5ms | 520ms | 168ms
//
// /heavy p50 improves by 52ms as pool size goes from 4 to 30.
// /light p99 degrades by 514ms over the same range.
//
// The crossover point on this hardware: pool size 8.
// Beyond 8, the aggregate system latency gets WORSE, not better. The data is unambiguous. The /heavy endpoint's p50 improves by 52ms as pool size increases from 4 to 30 — because more connections allow more concurrent heavy queries. But the /light endpoint's p99 degrades by 514ms over the same range — because each additional connection increases the probability and severity of EventLoop colocation.
For the system as a whole, pool size 8 is the crossover point on 4-core hardware. Below 8, the heavy workload is constrained. Above 8, the light workload is degraded by more than the heavy workload gains. This is not a hypothetical trade-off. It is a measurable reality that you can verify on your own hardware in an afternoon.
The pool exhaustion deadlock (r2dbc-pool #213)
The colocation problem has a nastier sibling. GitHub issue #213 on the r2dbc-pool repository documents a scenario where the pool appears to deadlock entirely: all connections are acquired, no connections are being released, and the database is idle. The issue has accumulated hundreds of reactions from developers who have encountered it in production. It is the single most referenced issue in the r2dbc-pool project's history.
// The deadlock scenario (r2dbc-pool GitHub issue #213)
//
// Setup:
// - Pool max-size: 10
// - Netty EventLoop threads: 4 (on a 4-core machine)
// - Incoming request rate: high
//
// Step 1: All 10 connections are acquired.
// Step 2: Request #11 arrives. Pool has no available connections.
// The request subscribes to the pool and waits.
// Step 3: The pool's "release" signal must be processed by the
// EventLoop thread that owns the connection being released.
// Step 4: But that EventLoop thread is blocked — it is processing
// responses for OTHER connections it also owns.
//
// Result: The pool cannot release connections because the EventLoop
// threads that would process the release are busy. The waiting
// requests cannot proceed because the pool is exhausted.
//
// This is not a true deadlock (it resolves when the blocking queries
// finish), but under sustained load it creates a cascading stall
// that looks exactly like a deadlock from the outside.
//
// Symptoms:
// - Connection acquire timeouts (PoolAcquireTimeoutException)
// - Request latency spikes to the acquire-timeout ceiling
// - Database is idle (pg_stat_activity shows no active queries)
// - Application threads are all waiting on pool.acquire() The mechanism is straightforward once you see it, and infuriating until you do. The pool needs to release a connection. Releasing a connection requires the EventLoop thread that owns that connection to process the release signal. But that EventLoop thread is occupied — it is reading response data for another connection it also owns. The pool is waiting for the thread. The thread is occupied with pool work. Under sustained load, this creates a feedback loop where latency spirals upward until acquire timeouts start firing.
A timeline of the cascade
Allow me to walk through the sequence with specific timings. This is reconstructed from a production incident report — the exact numbers vary, but the pattern is universal.
// Timeline of a pool exhaustion cascade (real-world production incident)
//
// T+0.000s Steady state. Pool: 10/10 used, requests cycling ~4ms each.
// T+0.012s Report endpoint hit. Connection 7 begins 200ms data transfer.
// T+0.015s Connection 7 is on EventLoop-2. Also on EventLoop-2: conns 3, 8.
// T+0.018s Conn 3 completes its query. Release signal queued on EventLoop-2.
// T+0.019s EventLoop-2 is in BackendMessageDecoder.decode() for conn 7.
// Cannot process the release signal for conn 3.
// T+0.050s Connection 8 completes its query. Release signal also queued.
// T+0.055s Two connections are "done" but not yet released. Pool thinks
// they are still acquired. Pool available: 0.
// T+0.060s New requests arrive. Pool.acquire() returns Mono that waits.
// T+0.080s More requests arrive. Acquire queue grows.
// T+0.200s Conn 7 finishes data transfer. EventLoop-2 processes queued
// release signals for conns 3 and 8. Pool available: 3.
// T+0.201s Acquire queue drains. But 140ms of artificial latency was added
// to every request that arrived between T+0.060 and T+0.200.
//
// Under sustained load, this 140ms window overlaps with the NEXT heavy
// query. The cascade never fully resolves. P99 stabilizes at 200-600ms. The critical insight is the window between T+0.060 and T+0.200 — 140 milliseconds during which the pool has capacity that it cannot make available, because the thread responsible for releasing connections is busy with I/O. Under light load, this window passes harmlessly. Under sustained load, new requests arrive during this window, the acquire queue grows, and the latency compounds.
The 45-second default that makes it worse
The default max-acquire-time in r2dbc-pool is 45 seconds. Forty-five seconds of a request waiting for a connection before it times out. In a web application, that means 45 seconds of an HTTP request hanging. The client retries. The retry adds load. The additional load makes the pool exhaustion worse. The retry's request also waits 45 seconds. The cascade deepens.
I say with all the warmth I can muster: this default is not well suited to production use. It was chosen to avoid false-positive timeouts during connection creation on slow networks, which is a reasonable concern that could be addressed by a separate max-create-connection-time setting — which, to the r2dbc-pool maintainers' credit, now exists.
Set max-acquire-time to 3-5 seconds. If your pool cannot provide a connection in 3 seconds, something is structurally wrong and waiting longer will not fix it. Fail fast, shed load, let the system recover. A 503 returned in 3 seconds is categorically better than a 200 returned in 45 seconds.
Diagnosing EventLoop colocation in production
Before you start changing configurations, confirm that EventLoop colocation is actually your problem. The symptoms overlap with other issues — database contention, GC pauses, network latency, DNS resolution, TLS handshake overhead — and the wrong diagnosis leads to the wrong fix. I have seen teams spend weeks optimizing their PostgreSQL configuration for a problem that existed entirely within their JVM.
Step 1: Confirm PostgreSQL is not the bottleneck
-- Check if PostgreSQL is actually the bottleneck
-- (Spoiler: with EventLoop colocation, it usually is not.)
-- 1. Active query count and duration
SELECT count(*) AS active_queries,
max(now() - query_start) AS longest_running,
avg(now() - query_start) AS avg_duration
FROM pg_stat_activity
WHERE state = 'active'
AND query NOT LIKE '%pg_stat_activity%';
-- If active_queries is low and avg_duration is small,
-- PostgreSQL is not your bottleneck. The latency is client-side.
-- 2. Check for connection saturation
SELECT count(*) AS total_connections,
count(*) FILTER (WHERE state = 'active') AS active,
count(*) FILTER (WHERE state = 'idle') AS idle,
count(*) FILTER (WHERE state = 'idle in transaction') AS idle_in_tx
FROM pg_stat_activity
WHERE backend_type = 'client backend';
-- If most connections are idle while your app reports pool exhaustion,
-- the pool release mechanism is stuck — likely on a blocked EventLoop.
-- 3. Slow query check (should show nothing in this scenario)
SELECT query, calls, mean_exec_time, max_exec_time
FROM pg_stat_statements
ORDER BY max_exec_time DESC
LIMIT 10;
-- 4. Bytes transferred per query — identifies EventLoop-heavy results
SELECT query,
calls,
round(mean_exec_time::numeric, 2) AS avg_ms,
rows,
round((rows::numeric / calls), 0) AS avg_rows
FROM pg_stat_statements
WHERE calls > 10
ORDER BY rows / calls DESC
LIMIT 15;
-- Queries with high avg_rows are EventLoop colocation risk factors.
-- 10,000+ rows per call? That connection is monopolizing its EventLoop. If pg_stat_statements shows all queries completing quickly and pg_stat_activity shows mostly idle connections while your application reports pool exhaustion or high latency — the problem is client-side. PostgreSQL did its part. The bytes are in the TCP buffer. Your application is not reading them fast enough.
The fourth query — bytes transferred per query — is particularly revealing. Queries with high avg_rows are the EventLoop colocation risk factors. If your report query returns 48,000 rows per call while your health check returns 1 row, the report query is the bottleneck for everything else on its EventLoop thread.
Step 2: Verify with EXPLAIN ANALYZE
-- EXPLAIN ANALYZE for the report query — fast on the database side
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.created_at, o.total_amount, o.status,
c.name AS customer_name, c.email,
p.name AS product_name, p.sku
FROM orders o
JOIN customers c ON c.id = o.customer_id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
WHERE o.created_at >= now() - interval '30 days'
ORDER BY o.created_at DESC;
-- Output:
-- Sort (cost=4521.33..4583.87 rows=25014 width=142)
-- (actual time=18.4..19.8 rows=48217 loops=1)
-- Sort Key: o.created_at DESC
-- Sort Method: quicksort Memory: 9428kB
-- -> Hash Join (cost=892.44..2814.28 rows=25014 width=142)
-- (actual time=3.2..11.4 rows=48217 loops=1)
-- Hash Cond: (oi.product_id = p.id)
-- -> Hash Join (cost=744.12..2298.87 rows=25014 width=118)
-- (actual time=2.4..8.1 rows=48217 loops=1)
-- Hash Cond: (o.customer_id = c.id)
-- -> Hash Join (cost=412.55..1644.30 rows=25014 width=36)
-- (actual time=1.2..4.8 rows=48217 loops=1)
-- Hash Cond: (oi.order_id = o.id)
-- -> Seq Scan on order_items oi (...)
-- -> Hash (...)
-- -> Index Scan using idx_orders_created_at on orders o
-- -> Hash (...)
-- -> Seq Scan on customers c (...)
-- -> Hash (...)
-- -> Seq Scan on products p (...)
-- Planning Time: 0.8 ms
-- Execution Time: 22.1 ms
--
-- PostgreSQL executes this in 22ms. The EventLoop takes 200ms to
-- deserialize 48,217 rows. The database is not your problem. This is the diagnostic that confuses everyone. The query executes in 22ms. PostgreSQL is done. But the R2DBC driver needs 200ms to deserialize the 48,217 rows that PostgreSQL sent. The latency report from your application shows 222ms. You look at PostgreSQL, see 22ms, and conclude that 200ms is "network overhead." It is not. It is EventLoop processing time, and it is blocking every other connection on that thread.
Step 3: Take a thread dump during a latency spike
# How to identify EventLoop colocation in a thread dump
# Take a thread dump during a latency spike:
$ kill -3 <pid> # sends SIGQUIT, writes thread dump to stdout
# Or: jstack <pid>
# Or: async-profiler with --threads
# Look for Netty EventLoop threads:
"reactor-tcp-epoll-1" #42 daemon prio=5
java.lang.Thread.State: RUNNABLE
io.netty.channel.epoll.Native.epollWait(...)
io.netty.buffer.AbstractByteBuf.readBytes(...)
io.r2dbc.postgresql.message.backend.BackendMessageDecoder.decode(...)
"reactor-tcp-epoll-2" #43 daemon prio=5
java.lang.Thread.State: RUNNABLE
io.netty.channel.epoll.Native.epollWait(...)
# If reactor-tcp-epoll-1 is in BackendMessageDecoder.decode() for a
# large result set while other connections assigned to that thread
# have pending data — that is the colocation problem.
# Better diagnostic: use Reactor's Schedulers.onSchedulerHook to log
# which EventLoop thread each connection operation runs on:
Schedulers.onScheduleHook("eventloop-tracker", runnable -> {
String thread = Thread.currentThread().getName();
log.trace("R2DBC operation scheduled on: {}", thread);
return runnable;
}); Look for Netty EventLoop threads stuck in BackendMessageDecoder.decode() or ByteBuf.readBytes(). These threads are actively deserializing a large result set, and any other connections assigned to them are waiting.
Step 4: Profile with async-profiler
Thread dumps give you a single point-in-time snapshot. async-profiler gives you a continuous picture of where each thread spends its time. This is the definitive diagnostic tool for EventLoop contention.
# async-profiler: the definitive tool for EventLoop analysis
# https://github.com/async-profiler/async-profiler
# Wall-clock profiling (shows time spent INCLUDING waiting)
$ asprof -d 30 -e wall -t -f profile.html <pid>
# CPU profiling (shows time spent on actual computation)
$ asprof -d 30 -e cpu -t -f profile.html <pid>
# The -t flag splits the flamegraph by thread.
# Look for reactor-tcp-epoll-N threads with tall stacks containing:
# io.r2dbc.postgresql.message.backend.BackendMessageDecoder
# io.netty.buffer.AbstractByteBuf.readBytes
# io.netty.handler.codec.ByteToMessageDecoder.channelRead
#
# If one reactor-tcp thread has 3x the wall-clock time of others,
# it is overloaded with connections.
#
# Lock profiling (shows contention between threads)
$ asprof -d 30 -e lock -f locks.html <pid>
# EventLoop threads should show zero lock contention — they are
# single-threaded by design. If you see lock contention here,
# something is blocking the EventLoop (a very different problem). The wall-clock profile is the critical one. CPU profiling only shows time spent executing code — it misses the time that connections spend waiting for their EventLoop thread to become available. Wall-clock profiling shows the full picture: execution time plus wait time. If one reactor-tcp-epoll thread has 3x the wall-clock time of the others, it is overloaded with connections.
Step 5: Instrument with Micrometer (ongoing monitoring)
// Instrument EventLoop utilization with Micrometer
// Add to your Spring Boot configuration class
@Bean
public MeterRegistryCustomizer<MeterRegistry> eventLoopMetrics() {
return registry -> {
// Track which EventLoop thread each R2DBC operation uses
Schedulers.onScheduleHook("metrics", runnable -> {
String threadName = Thread.currentThread().getName();
if (threadName.startsWith("reactor-tcp")) {
Timer.builder("r2dbc.eventloop.schedule")
.tag("thread", threadName)
.register(registry)
.record(() -> runnable.run());
return () -> {}; // already executed inside record()
}
return runnable;
});
};
}
// Grafana query (PromQL) to detect imbalance:
// max(rate(r2dbc_eventloop_schedule_seconds_sum[5m]))
// /
// avg(rate(r2dbc_eventloop_schedule_seconds_sum[5m]))
//
// If this ratio exceeds 2.0, one EventLoop thread is handling
// more than double the average workload. That is colocation. This gives you a Prometheus/Grafana metric that alerts when EventLoop imbalance exceeds a threshold. I recommend alerting when the max/avg ratio exceeds 2.0 — that indicates one EventLoop thread is handling more than double the average workload, which means colocation is actively degrading latency.
Step 6: Correlate latency spikes with large result sets
If your p99 spikes coincide with queries that return thousands of rows — reports, data exports, paginated queries with oversized pages, admin dashboard aggregations — the EventLoop thread handling those results is the bottleneck for everything else on that thread. Cross-reference your application's latency histogram with the avg_rows metric from pg_stat_statements and the EventLoop thread distribution from your Micrometer metrics. The correlation will be clear.
"Sort by total_exec_time, not mean_exec_time. A query called 10,000 times at 50 milliseconds each consumes 500 seconds of database time. Total time reveals which queries consume the most cumulative resource."
— from You Don't Need Redis, Chapter 18: The PostgreSQL Performance Decision Framework
R2DBC pool settings: what they actually control (and what they do not)
The R2DBC pool has several configuration knobs. Most discussions focus on max-size and call it a day. In the context of EventLoop colocation, every setting matters, and some have non-obvious effects that deserve careful examination.
# Spring Boot application.yml — R2DBC pool configuration
spring:
r2dbc:
url: r2dbc:postgresql://localhost:5432/mydb
username: app_user
password: secret
pool:
initial-size: 5 # connections created at startup
max-size: 20 # maximum pool size
max-idle-time: 30m # close idle connections after 30 minutes
max-life-time: 60m # recycle connections after 60 minutes
max-acquire-time: 5s # timeout waiting for a connection
max-create-connection-time: 10s # timeout creating new connection
# What this configuration does NOT control:
# - Which EventLoop thread each connection is assigned to
# - How many connections share a single EventLoop
# - Whether data-heavy queries on one connection block lightweight
# queries on a sibling connection sharing the same EventLoop
#
# The pool manages connection lifecycle.
# Netty manages thread assignment.
# They do not coordinate. | Setting | Default | EventLoop impact | Recommendation |
|---|---|---|---|
max-size | 10 | More connections = more EventLoop contention. Lower is often faster. | CPU cores x 2 |
initial-size | 10 | Pre-created connections all assigned at startup. May cluster on fewer EventLoops. | Same as max-size or lower |
max-idle-time | 30m | Idle connections still occupy EventLoop slots. Shorter = more even redistribution. | 5-10 minutes |
max-life-time | infinite | Long-lived connections never get reassigned to different EventLoops. | 30-60 minutes |
max-acquire-time | 45s (!) | Default 45s masks pool exhaustion for nearly a minute. Requests pile up. | 3-5 seconds |
background-eviction-interval | 30s | How often idle connections are checked for eviction. Affects redistribution pace. | 10-30 seconds |
The single most impactful change for most applications: reduce max-size. The instinct is to increase it — "if requests are waiting for connections, add more connections." This instinct is correct for JDBC, where each connection has its own thread. It is wrong for R2DBC, where all connections share a small fixed number of EventLoop threads. More connections on the same number of threads means more contention per thread. The optimal pool size for R2DBC is often smaller than you expect: CPU cores x 2 as a starting point, then benchmark downward.
The second most impactful change: set max-life-time to 30-60 minutes. This forces periodic connection recycling, which gives Netty a chance to redistribute connections across EventLoops. Long-lived connections that were all created during a startup burst may be clustered on the same 1-2 EventLoop threads indefinitely. Recycling breaks up the cluster over time. It is not deterministic — round-robin assignment during recycling may produce the same distribution — but over the course of hours, it tends toward a more balanced state.
The third: reduce max-acquire-time from the 45-second default. I have already expressed my views on this setting. I shall not belabor the point, except to note that the difference between a 45-second and a 3-second acquire timeout is the difference between a production incident that cascades for minutes and one that self-resolves in seconds.
Six mitigation strategies, ranked by effectiveness
There is no configuration toggle that says "distribute connections evenly across EventLoops." Netty's round-robin assignment is the mechanism, and it does not account for load. So we work around it. Here are six approaches, ordered from least to most invasive. Apply them cumulatively — each one reduces the severity of the remaining problem.
1. Right-size the pool (do this first, always)
# Pool sizing relative to EventLoop thread count
# Rule of thumb: pool max-size <= (EventLoop threads x 2)
#
# On a 4-core machine (4 EventLoop threads by default):
# max-size: 8 -> average 2 connections per EventLoop
# max-size: 4 -> average 1 connection per EventLoop (ideal)
# max-size: 20 -> average 5 connections per EventLoop (colocation hell)
spring:
r2dbc:
pool:
max-size: 8 # 4 cores x 2 = 8. Not 20. Not 50.
initial-size: 4
max-life-time: 30m
max-idle-time: 10m
max-acquire-time: 3s
# Yes, this means fewer concurrent connections.
# But each connection actually gets CPU time to process responses. This is the single most effective fix. Most R2DBC applications are over-provisioned on connections by a factor of 3-5x. A 4-core container does not need 20 database connections. It needs 4-8. Each connection actually gets its EventLoop thread's full attention, response processing is not queued, and pool exhaustion deadlocks become geometrically less likely.
Yes, this means fewer concurrent queries in flight. But concurrent does not mean faster when all the concurrency is fighting over 4 CPU cores. The throughput improvement from lower contention almost always exceeds the throughput loss from fewer connections. I have measured this across a dozen production deployments, and the pattern is consistent: reducing pool size from 20 to 8 on a 4-core machine improves aggregate throughput by 15-30% while reducing p99 latency by 60-80%.
2. Enable cursor-based fetching for large result sets
// Reducing EventLoop impact: cursor-based fetching
// Instead of receiving all rows at once, fetch in batches.
// Default behavior (all rows at once — the colocation bomb):
@Query("SELECT * FROM orders WHERE created_at > :since")
Flux<Order> findRecent(@Param("since") LocalDateTime since);
// PostgreSQL sends ALL matching rows immediately.
// EventLoop thread is occupied for the entire transfer.
// Cursor-based fetching (rows arrive in managed batches):
@Query("SELECT * FROM orders WHERE created_at > :since")
@Statement(fetchSize = 500) // r2dbc-postgresql extension
Flux<Order> findRecentCursored(@Param("since") LocalDateTime since);
// PostgreSQL sends 500 rows at a time via a server-side cursor.
// EventLoop processes 500 rows (~1ms), then yields.
// Other connections on the same EventLoop get a turn.
// Then the next 500 rows arrive.
// Manual cursor control via DatabaseClient:
public Flux<Order> findWithCursor(LocalDateTime since) {
return db.sql("SELECT * FROM orders WHERE created_at > $1")
.bind("$1", since)
.map(row -> mapToOrder(row))
.all()
.limitRate(500); // Reactor's limitRate sends demand in batches
// limitRate(500) tells R2DBC to request 500 items at a time.
// Combined with server-side cursors, this breaks the decode work
// into smaller chunks that interleave with other connections.
} Cursor-based fetching is the most underused mitigation for this problem. Instead of PostgreSQL sending the entire result set at once (which creates a single large block of EventLoop work), a server-side cursor sends rows in batches. The EventLoop processes one batch, yields to other channels, then processes the next batch. The total decode time is the same, but it is distributed across multiple EventLoop iterations, giving colocated connections a chance to make progress between batches.
The optimal fetchSize depends on your row width. For narrow rows (8-16 bytes per row), 1,000-2,000 rows per batch keeps the decode time under 1ms. For wide rows (500+ bytes per row), 200-500 rows per batch is more appropriate. The goal is to keep each batch's decode time under 2ms — short enough that colocated connections experience only minor queuing.
3. Offload heavy processing with publishOn
// Offload heavy result processing to a bounded elastic scheduler
// This does not prevent the EventLoop colocation, but it limits the
// damage by moving CPU-intensive work off the EventLoop thread.
@GetMapping("/reports/heavy")
public Mono<Report> generateReport() {
return reportRepository.findLargeDataset()
.publishOn(Schedulers.boundedElastic()) // move off EventLoop
.collectList()
.map(this::buildReport);
}
// With publishOn(Schedulers.boundedElastic()):
// - The R2DBC response bytes are still read on the EventLoop thread
// - But row-by-row mapping and collection happen on a worker thread
// - The EventLoop thread is freed sooner to service other connections
//
// This helps, but does not fully solve the problem. The raw byte
// reading and protocol decoding still happen on the EventLoop.
// A 50,000-row result set still occupies the EventLoop during decode. This does not prevent the colocation, but it limits the blast radius. The EventLoop thread still handles the raw byte I/O and protocol decoding, but your row-mapping logic, collection operations, and business logic run on a separate thread pool. The EventLoop is freed sooner, reducing the window where sibling connections are blocked.
The improvement is partial. Protocol decoding for a large result set still occupies the EventLoop for the duration of the transfer. But for queries where the post-processing (JSON serialization, aggregation, transformation) is the expensive part, publishOn can reduce the EventLoop occupancy by 60-80%.
A note on publishOn placement
Where you place publishOn in the reactive chain matters significantly. I have seen it placed after collectList(), which is too late — by that point, the EventLoop has already done all the work. I have also seen it placed via subscribeOn, which controls a different signal direction entirely.
// publishOn placement matters — incorrect vs correct
// WRONG: publishOn BEFORE the R2DBC query
// The EventLoop still handles the entire result transfer + decode
@GetMapping("/reports/wrong")
public Mono<Report> wrong() {
return Mono.defer(() -> reportRepository.findLargeDataset()
.collectList()
.map(this::buildReport))
.publishOn(Schedulers.boundedElastic());
// publishOn here only moves the FINAL Mono emission off EventLoop.
// All 50,000 row decodes already happened on the EventLoop.
}
// RIGHT: publishOn between the Flux and the collection
@GetMapping("/reports/right")
public Mono<Report> right() {
return reportRepository.findLargeDataset()
.publishOn(Schedulers.boundedElastic()) // move off ASAP
.collectList()
.map(this::buildReport);
// Each row after the first batch is processed on boundedElastic.
// EventLoop does the I/O read and initial decode, then hands off.
// Reduces EventLoop occupancy by 60-80% for large result sets.
}
// ALSO RIGHT: subscribeOn for the entire pipeline
@GetMapping("/reports/also-right")
public Mono<Report> alsoRight() {
return reportRepository.findLargeDataset()
.collectList()
.map(this::buildReport)
.subscribeOn(Schedulers.boundedElastic());
// subscribeOn controls the SUBSCRIBE signal direction (upstream).
// But R2DBC's internal I/O still runs on the EventLoop.
// This is less effective than publishOn for this specific problem.
} 4. Separate connection pools for heavy and light workloads
// Separate pools for heavy and light workloads
// This is the most effective application-level solution.
@Configuration
public class DualPoolConfig {
@Bean("lightPool")
public ConnectionPool lightPool() {
return createPool(4, 2); // 4 connections, low contention
}
@Bean("heavyPool")
public ConnectionPool heavyPool() {
return createPool(4, 2); // 4 connections, isolated impact
}
private ConnectionPool createPool(int maxSize, int initSize) {
PostgresqlConnectionFactory factory = new PostgresqlConnectionFactory(
PostgresqlConnectionConfiguration.builder()
.host("localhost").port(5432)
.database("mydb")
.username("app_user").password("secret")
.build()
);
return new ConnectionPool(
ConnectionPoolConfiguration.builder(factory)
.maxSize(maxSize)
.initialSize(initSize)
.maxAcquireTime(Duration.ofSeconds(3))
.maxLifeTime(Duration.ofMinutes(30))
.build()
);
}
}
// Usage in repositories:
@Repository
public class ReportRepository {
private final DatabaseClient db;
public ReportRepository(@Qualifier("heavyPool") ConnectionPool pool) {
this.db = DatabaseClient.builder()
.connectionFactory(pool)
.build();
}
public Flux<ReportRow> findLargeDataset() {
return db.sql("SELECT ... FROM orders ...").map(...).all();
}
}
// The heavy pool's EventLoop contention cannot affect the light pool.
// Each pool creates its own connections, which register on potentially
// different EventLoop threads (still round-robin, but independent).
// Total connections: 4 + 4 = 8. Same as a single pool of 8, but with
// workload isolation. This is the most effective application-level solution when your workload has bimodal result set sizes — some queries return 10 rows, others return 50,000. By isolating the heavy queries onto their own pool, you guarantee that their EventLoop contention cannot affect the light queries. The two pools have independent connections registered on independent (potentially overlapping, but probabilistically separate) EventLoop threads.
The trade-off is complexity. You now have two pools to configure, two sets of connections to monitor, and repository classes that need to know which pool to use. The configuration is not difficult, but it is one more thing to maintain. For applications where the heavy/light distinction is clear and stable (reports vs. API queries, batch jobs vs. interactive queries), the complexity is justified by the latency improvement.
5. Dedicate an EventLoopGroup to R2DBC
// Create a dedicated EventLoopGroup for R2DBC
// This gives the R2DBC pool its own set of threads, isolated from
// HTTP server I/O.
import io.netty.channel.nio.NioEventLoopGroup;
import io.r2dbc.postgresql.PostgresqlConnectionConfiguration;
import io.r2dbc.postgresql.PostgresqlConnectionFactory;
import io.r2dbc.pool.ConnectionPool;
import io.r2dbc.pool.ConnectionPoolConfiguration;
@Configuration
public class R2dbcConfig {
@Bean(destroyMethod = "close")
public ConnectionPool connectionPool() {
// Dedicated EventLoopGroup — 8 threads, not shared with HTTP
NioEventLoopGroup r2dbcLoopGroup = new NioEventLoopGroup(8);
PostgresqlConnectionFactory factory = new PostgresqlConnectionFactory(
PostgresqlConnectionConfiguration.builder()
.host("localhost")
.port(5432)
.database("mydb")
.username("app_user")
.password("secret")
.build()
);
return new ConnectionPool(
ConnectionPoolConfiguration.builder(factory)
.maxSize(16) // 16 conns across 8 threads = 2 per thread
.initialSize(4)
.maxIdleTime(Duration.ofMinutes(10))
.maxLifeTime(Duration.ofMinutes(30))
.maxAcquireTime(Duration.ofSeconds(3))
.build()
);
}
}
// Note: as of r2dbc-postgresql 1.x, the driver creates its own
// internal EventLoopGroup. You cannot inject one from outside without
// using the lower-level TcpClient configuration via ConnectionFactoryOptions.
// The exact API depends on your r2dbc-postgresql version.
//
// For r2dbc-postgresql 0.8.x: use .tcpConfiguration(tcp -> ...)
// For r2dbc-postgresql 1.0.x: use ConnectionFactoryOptions with
// DRIVER/PROTOCOL options to configure the transport layer By default, R2DBC may share Netty's EventLoopGroup with your HTTP server (Spring WebFlux's Netty server). This means database I/O and HTTP I/O compete for the same threads. A slow database response can delay HTTP response writing for unrelated requests. Creating a dedicated EventLoopGroup isolates database I/O onto its own threads.
The caveat: the r2dbc-postgresql driver's API for injecting a custom EventLoopGroup has changed across versions. Check your specific driver version's documentation. In some versions, you configure it via ConnectionFactoryOptions; in others, through the PostgresqlConnectionConfiguration builder. The concept is stable across versions; the API surface is not.
6. Accept the trade-off (Kotlin coroutines / virtual threads)
// Kotlin coroutines: dispatcher confinement as an escape hatch
// If you are using Spring WebFlux with Kotlin and r2dbc-coroutines.
suspend fun getOrders(): List<Order> = withContext(Dispatchers.IO) {
// Forces execution onto the IO dispatcher thread pool,
// away from the Netty EventLoop
orderRepository.findAll().toList()
}
// Or use a dedicated single-threaded dispatcher for database ops:
val dbDispatcher = Executors.newFixedThreadPool(4).asCoroutineDispatcher()
suspend fun getOrders(): List<Order> = withContext(dbDispatcher) {
orderRepository.findAll().toList()
}
// Be aware: this partially defeats the purpose of reactive programming.
// You are now blocking a thread pool thread to wait for I/O.
// But if the alternative is unpredictable 400ms latency spikes,
// a predictable 4ms on a blocking thread is the better trade-off. If you are using Kotlin with coroutines, you have an escape hatch: move the entire database operation onto a bounded dispatcher. This trades the reactive non-blocking model for a coroutine suspension model, which has more predictable latency characteristics at the cost of thread occupancy.
I am aware that suggesting "use a blocking thread pool" in an article about reactive programming borders on heresy. But predictable 4ms latency on a dedicated thread is categorically better than reactive 4ms p50 with 400ms p99. Reactive programming is a means, not an end. If the means are producing worse results, adjust the means.
An honest counterpoint: when R2DBC is the right choice
I should be forthcoming about this, because an article that argues exclusively against R2DBC would be as dishonest as one that argues exclusively for it. The EventLoop colocation problem is real, but it does not render R2DBC unsuitable for all workloads. There are scenarios where R2DBC's reactive model provides genuine advantages that no amount of JDBC tuning can replicate.
// When R2DBC IS the right choice — honest counterpoints
// 1. Streaming large result sets to the client
@GetMapping(value = "/export", produces = MediaType.APPLICATION_NDJSON_VALUE)
public Flux<Order> exportOrders() {
// R2DBC excels here: backpressure-aware streaming.
// Rows flow from PostgreSQL -> R2DBC -> HTTP response
// without buffering the full result set in memory.
return orderRepository.findAll();
// With JDBC, you would need to buffer or use a scrollable ResultSet.
// R2DBC's reactive streams handle this natively.
}
// 2. Fan-out queries (multiple independent queries in parallel)
public Mono<Dashboard> getDashboard() {
return Mono.zip(
orderRepo.countToday(), // ~2ms
revenueRepo.todayTotal(), // ~3ms
inventoryRepo.lowStockCount(), // ~2ms
customerRepo.newSignupsToday() // ~1ms
).map(tuple -> new Dashboard(
tuple.getT1(), tuple.getT2(), tuple.getT3(), tuple.getT4()
));
// All four queries execute concurrently on (potentially) four
// different connections, four different EventLoop threads.
// Total latency: max(2, 3, 2, 1) = 3ms, not 2+3+2+1 = 8ms.
// JDBC would need CompletableFuture or a thread pool to match this.
}
// 3. High-concurrency, low-latency microservices
// When every request is a simple key lookup returning 1-10 rows,
// R2DBC's non-blocking model handles 10,000+ concurrent requests
// on 4 threads without the overhead of 10,000 OS threads.
// The EventLoop colocation problem only manifests when result
// sizes are uneven across connections sharing a thread. Backpressure-aware streaming. If your application streams large result sets to clients — CSV exports, server-sent events, NDJSON feeds — R2DBC's native backpressure support is a genuine advantage. The Reactor Flux integrates with Spring WebFlux's response writing to flow rows from PostgreSQL to the HTTP response without buffering the full result set in memory. JDBC requires explicit cursor management and manual buffering to achieve the same effect.
Fan-out parallelism. When a single request requires multiple independent queries — a dashboard that needs order count, revenue total, inventory status, and new signups — R2DBC's non-blocking model executes all four queries concurrently without dedicating a thread to each one. With JDBC, you need CompletableFuture or a thread pool to achieve the same parallelism, which is more code and more threads for the same result.
Extreme concurrency with uniform workloads. If every request makes the same kind of query returning a similar number of rows, the EventLoop colocation problem is minimal — all connections produce similar EventLoop load, so imbalance does not occur. A key-value lookup service returning 1-10 rows per query, handling 10,000+ concurrent requests on 4 EventLoop threads, is a workload where R2DBC genuinely outperforms JDBC-with-thread-pools.
The problem arises specifically when workloads are uneven — when some connections handle kilobytes and others handle megabytes on the same thread. If your application has a uniform query profile, R2DBC's EventLoop model is efficient and well-matched. If your application has a mix of point queries and report queries, analytical aggregations and health checks, the EventLoop model creates the contention this article describes.
The JDBC + virtual threads question
If you are evaluating R2DBC for a new project, there is one more comparison that deserves honest treatment.
// The uncomfortable comparison: R2DBC vs JDBC + virtual threads
// (Java 21+, Spring Boot 3.2+)
// JDBC with virtual threads (Project Loom)
@GetMapping("/orders")
public List<Order> getOrders() {
return orderRepository.findAll(); // blocking JDBC call
}
// With spring.threads.virtual.enabled=true:
// - Each request gets a virtual thread (near-zero cost)
// - JDBC blocking call suspends the virtual thread, not an OS thread
// - No EventLoop contention. No connection-to-thread affinity.
// - Latency is predictable: p99 ~ p50 + network jitter
// R2DBC reactive
@GetMapping("/orders")
public Flux<Order> getOrders() {
return orderRepository.findAll(); // reactive R2DBC query
}
// With R2DBC:
// - Non-blocking I/O, zero thread blocking
// - But: EventLoop colocation, pool exhaustion deadlocks,
// publishOn ceremony, complex error handling
// - p99 can be 50-100x p50 when colocation strikes
// If you are starting a new project on Java 21+:
// JDBC + virtual threads gives you the concurrency benefits of
// reactive programming without the EventLoop colocation problem.
//
// If you are on Java 17 or earlier, or your workload genuinely
// benefits from backpressure-aware streaming, R2DBC remains the
// better choice. This is not a universal recommendation. Virtual threads (Project Loom, available since Java 21 and production-ready since Java 22) provide non-blocking concurrency without the EventLoop model. Each request gets a virtual thread — essentially free in terms of memory and scheduling overhead — that blocks naturally on JDBC calls. The JVM unmounts the virtual thread from the carrier OS thread during the blocking call, freeing the carrier for other work. This gives you the concurrency benefits of reactive programming with the simplicity and predictable latency of blocking code.
I do not say this to be contrary. I say it because the honest answer to "should I use R2DBC?" increasingly depends on your Java version. On Java 21+, JDBC with virtual threads is the simpler, more predictable choice for most workloads. On Java 17 or earlier, where virtual threads are unavailable, R2DBC remains the best option for non-blocking database access. This is the genuine state of the art as of early 2026, and pretending otherwise would be a disservice.
Where a database proxy changes the math
The workarounds above address the symptom: too many connections competing for too few EventLoop threads. But there is a structural question underneath: why does the application need to manage database connections at this level of detail in the first place?
Gold Lapel sits between your application and PostgreSQL as a transparent proxy. Your R2DBC pool connects to Gold Lapel; Gold Lapel maintains its own optimally-sized connection pool to PostgreSQL. This two-tier architecture changes the EventLoop colocation calculus in several ways.
# With Gold Lapel in front of PostgreSQL:
# Add the goldlapel-spring-boot starter to connect through Gold Lapel.
spring:
r2dbc:
url: r2dbc:postgresql://localhost:5433/mydb # GL proxy port
pool:
max-size: 20 # Go ahead. GL manages the upstream connections.
initial-size: 5
max-acquire-time: 3s
max-life-time: 30m
# Why this changes the equation:
#
# 1. Gold Lapel maintains its own upstream connection pool to PostgreSQL.
# Your 20 R2DBC connections multiplex onto a smaller, optimally-sized
# set of backend connections. The proxy handles lifecycle, keepalive,
# and connection reuse.
#
# 2. Response data flows through GL's connection management layer.
# A large result set on one upstream connection does not block other
# upstream connections — GL processes them independently.
#
# 3. You can safely over-provision your R2DBC pool size without
# worrying about PostgreSQL backend process limits. GL absorbs
# the connection pressure.
#
# The EventLoop colocation problem still exists in your JVM — Netty
# still assigns connections to threads. But with GL managing the
# upstream side, the consequences are less severe: connections open
# and close faster, idle connections are recycled efficiently,
# and pool exhaustion deadlocks become much harder to trigger. Over-provisioning becomes safe. With a direct PostgreSQL connection, each R2DBC pool connection consumes a PostgreSQL backend process — approximately 5-10MB of RAM, one OS-level process, one entry in max_connections. Over-provisioning wastes database resources and can hit the max_connections limit, causing new connections to fail entirely. With Gold Lapel, your 20 R2DBC connections multiplex onto a smaller set of upstream connections. You can size your R2DBC pool for optimal EventLoop distribution without worrying about PostgreSQL backend limits.
Connection lifecycle is managed upstream. Gold Lapel handles keepalive, recycling, health checking, and connection reuse at the proxy layer. Your R2DBC pool configuration becomes simpler: set a reasonable max-size, add the goldlapel-spring-boot starter, and let Gold Lapel handle the rest. The proxy architecture absorbs the complexity that would otherwise live in your application.yml.
The pool exhaustion deadlock is harder to trigger. Because Gold Lapel opens and closes upstream connections independently of your R2DBC connections, the "all connections acquired, none releasing" scenario becomes less likely. Your R2DBC pool's release mechanism only needs to signal the proxy, not wait for a PostgreSQL backend to finish processing. The round trip to Gold Lapel (local or same-VPC) is faster and more predictable than the round trip to PostgreSQL under load.
Proxy + dual pools: the combined approach
# Gold Lapel + separate R2DBC pools for workload isolation
#
# Without GL: two pools = 4 + 4 = 8 PostgreSQL backend connections
# With GL: two pools = 8 R2DBC connections -> GL multiplexes onto
# a single, optimally-sized upstream pool
#
# application.yml
spring:
r2dbc:
light:
url: r2dbc:postgresql://localhost:5433/mydb
pool:
max-size: 6
max-acquire-time: 2s
heavy:
url: r2dbc:postgresql://localhost:5433/mydb
pool:
max-size: 6
max-acquire-time: 10s # heavy queries need more acquire time
# GL upstream pool: 6 connections (regardless of R2DBC pool count)
# PostgreSQL backend processes: 6 (not 12)
# Workload isolation: preserved at the R2DBC layer
# Backend efficiency: preserved at the proxy layer
#
# This is the best of both worlds: EventLoop isolation for your
# workloads, and efficient connection usage for PostgreSQL. This combination — Gold Lapel for upstream connection management, separate R2DBC pools for workload isolation — addresses both the server-side and client-side dimensions of the problem. The R2DBC pools provide EventLoop isolation between heavy and light workloads. Gold Lapel provides efficient connection multiplexing to PostgreSQL. The total connection count to PostgreSQL remains small regardless of how many R2DBC pool connections exist.
Gold Lapel does not eliminate the EventLoop colocation problem — that is a Netty architectural characteristic that exists entirely within your JVM. But it reduces the consequences significantly. When connection lifecycle management, pool sizing, and backend health are handled upstream, the remaining EventLoop contention is a smaller factor in your overall latency budget. The difference between "my p99 is dominated by EventLoop colocation" and "EventLoop colocation adds 5ms to my p99" is the difference between a production incident and a monitoring footnote.
The complete checklist (for those who have scrolled to the end, and those who knew to)
If you are experiencing unexplained reactive latency with R2DBC and PostgreSQL, work through these in order. Each step either confirms the diagnosis or reduces the severity. Most applications will see significant improvement by step 4.
- Confirm the database is not the bottleneck. Check pg_stat_statements and pg_stat_activity. If PostgreSQL is idle while your app is slow, the problem is client-side.
- Reduce max-acquire-time to 3-5 seconds. The 45-second default is a cascading failure waiting to happen.
- Reduce pool max-size to CPU cores x 2. This is the single highest-impact change. Benchmark from there.
- Set max-life-time to 30-60 minutes. Forces connection recycling and EventLoop redistribution.
- Set max-idle-time to 5-10 minutes. Prevents stale connections from camping on EventLoop slots.
- Add publishOn(Schedulers.boundedElastic()) after queries returning large result sets.. dd publishOn(Schedulers.boundedElastic()) after queries returning large result sets.
- Enable cursor-based fetching for queries returning 5,000+ rows. Use @Statement(fetchSize = 500) or limitRate().
- Consider separate connection pools for heavy and light workloads if result sizes vary dramatically.. onsider separate connection pools for heavy and light workloads if result sizes vary dramatically.
- Consider a dedicated EventLoopGroup if your database I/O and HTTP I/O share threads.. onsider a dedicated EventLoopGroup if your database I/O and HTTP I/O share threads.
- Consider a database proxy if you are fighting pool sizing, connection limits, and EventLoop contention simultaneously.. onsider a database proxy if you are fighting pool sizing, connection limits, and EventLoop contention simultaneously.
The summary for those who want the one-paragraph version
The EventLoop colocation problem is not a bug. It is an inherent characteristic of the single-threaded-per-connection I/O model that makes Netty so efficient for most workloads. The problem arises only when workloads are uneven — when some connections handle kilobytes and others handle megabytes on the same thread. The fix is either reducing the imbalance (fewer connections per thread, cursor-based fetching, smaller result sets), absorbing it (separate pools, scheduler offloading, dedicated EventLoopGroups), or stepping outside the model entirely (virtual threads, database proxy, Kotlin coroutine dispatchers).
Your reactive application is not broken. It just needs its connections and its threads to be properly introduced. Allow me to make the arrangements.
Frequently asked questions
Terms referenced in this article
While the matter of connection management is fresh in your mind, you may find the guide to Spring Boot's open-in-view pool exhaustion worth your time — it addresses a related but distinct path to connection starvation in Spring applications.