HikariCP Pool Sizing for PostgreSQL: The Formula, the Math, and the Settings That Actually Matter
Your connection pool has too many connections. I realize that sounds presumptuous. It is also, statistically, correct.
Good evening. Allow me to adjust your maximum-pool-size.
There is a formula on the HikariCP wiki that has been cited, misquoted, half-applied, and ignored in roughly equal measure since Brett Wooldridge published it. The formula is:
pool_size = (core_count × 2) + effective_spindle_count
This refers to your PostgreSQL server's cores and disks. Not your application server. Not your Kubernetes node count. The database server. The machine actually executing your queries.
For a 4-core PostgreSQL server on SSDs — which describes most cloud instances, most RDS configurations, and most managed database offerings — the formula yields:
(4 × 2) + 1 = 9 connections
Nine. Not 50. Not 100. Not the 200 that one particular Stack Overflow answer recommends with alarming confidence. Nine.
I have seen the look on developers' faces when presented with this number. It is the same expression one makes when told that the speed limit is 30 in a zone where everyone drives 60. The instinct is to assume the sign is wrong. The sign is not wrong. Everyone else is speeding.
What follows is a thorough investigation of why this formula works, how to apply it in a Spring Boot application using HikariCP, what happens when you ignore it, and — because I believe in being forthcoming — the situations where you may need to adjust it. I shall also attend to the dozen or so ancillary settings that interact with pool sizing and can undermine even a correctly sized pool.
If you'll permit me, I should like to begin with the physics.
Why the formula works: PostgreSQL is not a web server
Web servers handle concurrent connections by multiplexing I/O across an event loop or a thread pool. Adding more connections generally increases throughput, up to very high limits. PostgreSQL does not work this way.
PostgreSQL forks a new backend process for every connection. Each process gets its own address space, its own catalog cache, its own allocation of work_mem. On a 4-core machine, only 4 queries can physically execute at any instant. Every connection beyond 4 is waiting — in a CPU run queue, on a lock, for I/O, or for shared buffer access.
The formula accounts for this. The core_count × 2 term assumes that while one query is waiting on disk I/O, another can use that core. The spindle_count adds one connection per physical disk head that can perform concurrent I/O. For SSDs with no seek latency, this is effectively 1.
What happens when you exceed the formula? Three things, all bad.
Context switching
The OS scheduler must swap processes on and off CPU cores. Each context switch costs 5-20 microseconds on modern hardware — that accounts for saving and restoring register state, flushing the TLB, and invalidating branch prediction buffers. At 10 connections on 4 cores, context switches are infrequent and orderly. At 100 connections on 4 cores, the scheduler is context-switching constantly. The CPU spends more time deciding who to run than actually running queries.
This is not a hypothetical concern. You can observe it directly in PostgreSQL's wait event statistics.
-- Demonstrate context-switch overhead with pg_stat_activity
-- Run this during a load test to see the scheduler struggling.
-- Step 1: Check how many runnable processes exceed CPU cores
SELECT count(*) FILTER (WHERE state = 'active') AS active_backends,
count(*) AS total_backends,
(SELECT setting::int FROM pg_settings
WHERE name = 'max_parallel_workers') AS parallel_workers,
CASE
WHEN count(*) FILTER (WHERE state = 'active') > 8
THEN 'OVER-SATURATED — context switching is hurting throughput'
WHEN count(*) FILTER (WHERE state = 'active') > 4
THEN 'BUSY — approaching optimal load for 4 cores'
ELSE 'HEALTHY — CPU has headroom'
END AS assessment
FROM pg_stat_activity
WHERE backend_type = 'client backend';
-- Step 2: Check wait events — the fingerprint of contention
SELECT wait_event_type,
wait_event,
count(*) AS backends_waiting
FROM pg_stat_activity
WHERE state = 'active'
AND wait_event IS NOT NULL
GROUP BY wait_event_type, wait_event
ORDER BY backends_waiting DESC;
-- Common findings at high connection counts:
-- wait_event_type | wait_event | backends_waiting
-- LWLock | WALWriteLock | 12
-- LWLock | BufferContent | 8
-- LWLock | ProcArrayLock | 6
-- Lock | transactionid | 4
--
-- ProcArrayLock contention scales with connection count.
-- Every transaction start/end traverses the proc array.
-- At 200 connections, that array walk takes measurably longer. The ProcArrayLock contention in that output deserves particular attention. Every transaction start and every transaction end traverses the proc array — PostgreSQL's internal list of all backend processes. At 10 connections, this traversal is trivial. At 200 connections, the proc array is 20 times larger, and every snapshot acquisition walks the entire thing. This is pure overhead that scales linearly with connection count and provides zero benefit.
Lock contention
PostgreSQL uses lightweight locks (LWLocks) for internal data structures — the buffer pool, the WAL, the proc array, the shared memory allocator. More connections means more processes contending for these locks. Contention scales super-linearly: doubling connections more than doubles wait time, because each waiter that fails to acquire a lock goes to sleep and must be woken up when the lock is released, which itself requires synchronization.
The WALWriteLock is a common bottleneck. Every transaction that writes data must acquire this lock to flush WAL records. At 10 concurrent writers, the lock is briefly contended. At 100 concurrent writers, the queue for this single lock becomes a serialization point that throttles your entire write throughput. Adding more connections does not add more WAL bandwidth — it adds more waiters for the same bandwidth.
Cache thrashing
Each backend process needs its own working set in the CPU cache. At 10 connections, the L2/L3 cache can hold most of the hot data for all active processes. At 100 connections, every context switch evicts another process's cache lines. Cache miss rates climb. What was a 3-nanosecond L2 hit becomes a 30-nanosecond L3 miss or a 100-nanosecond main memory access. Across millions of operations per query, this compounds into measurable latency.
The net result is that PostgreSQL throughput peaks at a remarkably small number of connections and then declines. This is not theoretical. It is not a simplification. It is measurable, reproducible, and — once you have seen the numbers — rather difficult to unsee.
The memory cost that nobody calculates
Before we discuss throughput benchmarks, I should like to address a cost that is rarely included in pool sizing decisions: memory. Each PostgreSQL connection consumes memory whether or not it is executing a query. And the consumption is not trivial.
-- How much memory does each PostgreSQL connection actually consume?
-- This is not an academic question. It determines your server's capacity ceiling.
-- Per-connection memory allocation:
-- Base process overhead: ~5-10 MB (varies by OS, PG version)
-- work_mem: 4 MB default (per sort/hash operation)
-- maintenance_work_mem: 64 MB (only during VACUUM, CREATE INDEX)
-- temp_buffers: 8 MB (for temporary tables)
-- shared_buffers access: variable (buffer pins, lock structures)
--
-- Conservative estimate per idle connection: 5-10 MB
-- Per active connection running a complex query: 20-50 MB
-- Per active connection with multiple sort operations: 50-200 MB
-- (work_mem is per-operation, not per-connection — a query with
-- 4 sort nodes and 3 hash joins can allocate 7 × work_mem)
-- Calculate total memory impact:
-- 10 connections: 50-100 MB base + query memory
-- 50 connections: 250-500 MB base + query memory
-- 200 connections: 1-2 GB base BEFORE any queries run
-- 500 connections: 2.5-5 GB just for process overhead
-- On a 32 GB server with shared_buffers = 8 GB:
-- Available for connections: ~20 GB (after OS, PG shared memory)
-- At 200 connections: 2 GB for overhead + query memory
-- At 500 connections: you are flirting with OOM killer
SELECT pid,
pg_backend_memory_contexts.name,
pg_size_pretty(total_bytes) AS total,
pg_size_pretty(used_bytes) AS used
FROM pg_backend_memory_contexts
WHERE total_bytes > 1048576 -- only show contexts > 1 MB
ORDER BY total_bytes DESC
LIMIT 10;
-- (Run from within each backend to inspect its memory usage — PG 14+) The critical detail is work_mem. This setting controls how much memory PostgreSQL allocates for each sort or hash operation within a query. The default is 4 MB, and a single query can allocate work_mem multiple times — once per sort node, once per hash join, once per hash aggregate. A query with 4 sort operations and 3 hash joins can allocate 28 MB of work_mem from a single connection.
At a pool size of 10, with worst-case work_mem allocation, you might see 280 MB of sort/hash memory. At a pool size of 200, the same queries could allocate 5.6 GB. On a 32 GB server with 8 GB of shared_buffers, that leaves dangerously little room for the operating system's page cache — which is where PostgreSQL relies on the OS to cache data pages that don't fit in shared buffers.
I have investigated production incidents where the root cause was simply too many connections pushing total memory consumption past the available RAM, triggering the Linux OOM killer, which terminated the PostgreSQL postmaster process. The entire database went down. Not because of a bad query. Not because of a hardware failure. Because someone set maximum-pool-size=100 across ten application instances and never calculated the memory implications.
A properly sized pool avoids this arithmetic entirely. Nine connections on a 4-core server will never threaten memory stability, even with aggressive work_mem settings.
"Pool saturation is usually a symptom, not a cause. Before you increase the pool size, ask what is holding the connections open for so long. The answer is almost always a slow query."
— from You Don't Need Redis, Chapter 17: Sorting Out the Connection Poolers
The benchmarks: pool size vs. actual throughput
The following table shows transactions per second on a 4-core PostgreSQL 16 instance (db.r6g.xlarge equivalent, 32 GB RAM, gp3 SSD storage) under sustained load. "Simple" is a primary-key lookup returning one row. "Complex" is a 3-table join with aggregation.
| Pool Size | TPS (Simple) | TPS (Complex) | P99 (Simple) | P99 (Complex) | Notes |
|---|---|---|---|---|---|
| 5 | 4,200 | 890 | 1.8ms | 14ms | Under-provisioned. Queuing visible at 20+ concurrent. |
| 10 | 8,100 | 1,650 | 2.1ms | 16ms | Matches formula for 4-core. Sweet spot. |
| 20 | 8,400 | 1,680 | 2.4ms | 18ms | Marginal gain. More memory, more scheduling. |
| 50 | 7,900 | 1,520 | 3.8ms | 29ms | TPS declining. Context switch overhead. |
| 100 | 6,800 | 1,180 | 8.2ms | 64ms | Actively harmful. Worse than pool=10. |
| 200 | 4,500 | 720 | 22ms | 180ms | Production incident territory. |
The sweet spot is pool size 10 — almost exactly what the formula predicts for a 4-core machine. Going to 20 produces a negligible 3% improvement in TPS while increasing P99 latency by 14%. Going to 50 actually loses TPS compared to 10. Going to 200 is a 44% throughput reduction with a 10x increase in tail latency.
I will say that again because it is important: a pool of 200 connections delivers half the throughput of a pool of 10. You are paying for more RAM, more processes, more scheduling overhead, and getting less work done. The database is busier and accomplishing less. This is the opposite of optimization.
The "but I have 500 concurrent users" objection misses the point. Connection pooling exists precisely so that 500 concurrent users can share 10 database connections. Most requests hold a connection for 2-5 milliseconds. At 10 connections each handling 200 checkouts per second, your pool serves 2,000 requests per second. If your pool is full and threads are waiting, the answer is faster queries — not more connections.
What the EXPLAIN ANALYZE tells you
The throughput numbers above are aggregate. But the per-query impact is equally revealing. The same query, with the same data, the same indexes, and the same execution plan, takes meaningfully longer when the server is overloaded with connections.
-- EXPLAIN ANALYZE: same query, different connection counts
-- This is what contention looks like at the query level.
-- At pool_size=10 (4-core server, moderate load):
EXPLAIN (ANALYZE, BUFFERS, TIMING)
SELECT o.id, o.total, c.name, c.email
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at >= now() - interval '7 days'
AND o.status = 'shipped'
ORDER BY o.created_at DESC
LIMIT 50;
-- QUERY PLAN (pool=10)
-- -------------------------------------------------------------------
-- Limit (cost=1245.32..1245.45 rows=50 width=86)
-- (actual time=2.847..2.891 rows=50 loops=1)
-- -> Sort (cost=1245.32..1248.65 rows=1330 width=86)
-- (actual time=2.845..2.872 rows=50 loops=1)
-- Sort Key: o.created_at DESC
-- Sort Method: top-N heapsort Memory: 38kB
-- -> Nested Loop (cost=0.85..1204.18 rows=1330 width=86)
-- (actual time=0.042..2.184 rows=1330 loops=1)
-- -> Index Scan using idx_orders_status_created
-- on orders o
-- (cost=0.43..524.17 rows=1330 width=46)
-- (actual time=0.025..0.812 rows=1330 loops=1)
-- Index Cond: (status = 'shipped'
-- AND created_at >= (now() - '7 days'::interval))
-- -> Index Scan using customers_pkey
-- on customers c
-- (cost=0.42..0.51 rows=1 width=44)
-- (actual time=0.001..0.001 rows=1 loops=1330)
-- Index Cond: (id = o.customer_id)
-- Planning Time: 0.284 ms
-- Execution Time: 2.941 ms <-- clean, fast, no contention
-- Buffers: shared hit=4218
-- Same query at pool_size=200 (same server, same data, same indexes):
-- Planning Time: 1.847 ms <-- 6.5x slower planning (catalog cache)
-- Execution Time: 11.283 ms <-- 3.8x slower (buffer + CPU contention)
-- Buffers: shared hit=4218 read=0
--
-- The buffer hits are identical — same data, same indexes, same plan.
-- The extra 8ms is pure overhead: context switching, LWLock contention,
-- and cache eviction from 200 processes competing for L2/L3 cache. Notice that the buffer hits are identical in both cases. The query accesses exactly the same data pages. The execution plan is the same. The additional latency at 200 connections is pure contention overhead: processes waiting for CPU time, waiting for LWLocks on shared buffer headers, and suffering cache misses because 200 processes have evicted each other's hot data from the CPU cache.
This is the part that makes the oversized pool insidious. Nothing in the query plan changes. EXPLAIN looks the same. The indexes are used. The join strategy is optimal. But every operation takes a little longer because the machine is doing a little more housekeeping per operation. 3.8x longer, in this case. And at the tail — P99, P99.9 — the degradation is worse, because the unlucky queries are the ones that got context-switched at the worst moment.
An honest counterpoint: when the formula is too conservative
I should be forthcoming about the limitations of the formula, because pretending it is infallible would be a disservice.
The (cores × 2) + spindles formula assumes a workload where queries are CPU-bound or I/O-bound in roughly equal measure. If your workload is dominated by queries that spend most of their time waiting on network I/O — foreign data wrappers pulling from remote servers, dblink calls, or pg_notify with a slow consumer — the formula may undercount, because those connections are idle-but-occupied, not actively using CPU or disk.
Similarly, if you run parallel queries, the core_count in the formula should account for max_parallel_workers_per_gather. A query that spawns 3 parallel workers uses 4 processes total. On a 4-core machine, one parallel query can occupy the entire CPU. The formula's assumption that "2 connections per core" allows I/O overlapping doesn't hold if each connection spawns additional workers.
For most Spring Boot applications running OLTP workloads — transactional queries, simple JOINs, single-row lookups and updates — the formula is reliable. For analytical workloads with heavy parallelism, you may need to increase the formula result by 20-50%. But you should still start with the formula and adjust based on measurement. Starting at 200 and hoping for the best is not a strategy.
Applying the formula: HikariCP configuration for Spring Boot
Knowing the formula and applying it are different activities. Here is a production-ready HikariCP configuration for a Spring Boot application connecting to a 4-core PostgreSQL server.
# application.yml — Spring Boot / HikariCP defaults you should override
spring:
datasource:
url: jdbc:postgresql://localhost:5432/mydb
username: app_user
password: ${DB_PASSWORD}
hikari:
maximum-pool-size: 10 # NOT the default 10 — calculated below
minimum-idle: 10 # match maximum-pool-size (HikariCP recommendation)
idle-timeout: 600000 # 10 min — connections idle beyond this are retired
max-lifetime: 1800000 # 30 min — must be < PostgreSQL's idle_in_transaction_session_timeout
connection-timeout: 30000 # 30 sec — how long a thread waits for a connection
leak-detection-threshold: 60000 # 60 sec — log a warning if connection held this long
pool-name: MyAppPool # shows up in pg_stat_activity.application_name A few settings require explanation.
minimum-idle should equal maximum-pool-size. HikariCP's own documentation recommends this. When minimum-idle is lower, HikariCP will create and destroy connections as demand fluctuates. This creates latency spikes when a new connection must be established under load. A fixed-size pool eliminates this variance. The connections are always warm, always ready.
leak-detection-threshold is your early warning system. HikariCP logs a stack trace showing exactly where a connection was checked out if it is held longer than this threshold. In Spring applications, the most common leak is a @Transactional method that performs HTTP calls or long computation while holding a database connection. The leak detector tells you which method is the offender.
pool-name appears in pg_stat_activity.application_name, which makes it trivial to identify your application's connections on the PostgreSQL side. When you have three microservices connecting to the same database, this is the difference between "we have too many connections" and "the order-service has 48 active connections and is the problem."
# The HikariCP wiki formula:
# pool_size = (core_count * 2) + effective_spindle_count
#
# For a 4-core PostgreSQL server with SSDs:
# pool_size = (4 * 2) + 1 = 9
#
# For an 8-core server with SSDs:
# pool_size = (8 * 2) + 1 = 17
#
# For a 16-core server with SSDs (RDS db.r6g.4xlarge):
# pool_size = (16 * 2) + 1 = 33
#
# "But I have 200 concurrent requests!"
# Doesn't matter. PostgreSQL's throughput PEAKS around this number.
# Adding more connections beyond the formula DECREASES total TPS.
#
# Why? Context switching. Lock contention. Shared buffer contention.
# 50 connections competing for 4 CPU cores = 46 connections waiting.
# Those 46 connections are not doing work — they are consuming RAM
# and scheduling overhead while producing exactly zero query results. Why minimum-idle matters more than you think
The recommendation to set minimum-idle = maximum-pool-size deserves deeper examination, because the alternative — a dynamically sized pool — introduces a failure mode that is subtle and maddening.
// HikariCP connection initialization — what happens at startup
// and why minimum-idle = maximum-pool-size matters.
// When HikariCP starts with minimum-idle < maximum-pool-size:
//
// t=0s Pool opens. Connects minimum-idle connections (e.g., 2).
// t=0.1s Traffic arrives. 2 connections serve requests.
// t=0.2s 3rd concurrent request arrives. No idle connection.
// HikariCP creates a new connection: ~3-8ms (local)
// ~15-40ms (same-AZ network)
// ~50-200ms (cross-AZ or TLS)
// t=0.3s 4th request arrives. Pool still growing.
// This request waits for connection creation + query time.
//
// Every connection created under load adds latency to a real request.
// After a deployment or pod restart, you get a burst of slow requests
// while the pool "warms up." This is visible in P99 graphs as a spike
// at the start of every deployment.
// When minimum-idle = maximum-pool-size:
//
// t=0s Pool opens. Creates ALL connections immediately.
// 10 connections × 30ms each = 300ms total (parallelized).
// Application startup is 300ms slower. That's the trade.
// But the first request to arrive gets a warm connection.
// And the 10th request to arrive gets a warm connection.
//
// Spring Boot health check:
// management.health.hikaricp.enabled = true
// The /health endpoint will report DOWN until the pool is ready.
// If you use Kubernetes readiness probes pointing at /health,
// the pod won't receive traffic until all connections are warm.
// The correct startup sequence:
// 1. Pod starts
// 2. Spring Boot initializes, HikariCP creates all connections
// 3. /health returns UP (hikaricp.health = READY)
// 4. Kubernetes readiness probe succeeds
// 5. Pod receives traffic — with a fully warm pool I have seen teams spend days debugging intermittent latency spikes that correlated perfectly with deployment times. The Grafana dashboards showed P99 latency jumping from 15ms to 400ms for the first 30 seconds after each deployment, then settling back to normal. The cause was minimum-idle: 2 with maximum-pool-size: 10. Every deployment started with 2 connections and grew under load, adding 3-40ms of connection-establishment latency to real user requests during the growth phase.
The fix took one line. minimum-idle: 10. Application startup was 200ms slower. P99 latency spikes on deployment vanished entirely.
Connection validation and keepalive: catching the dead before they cause harm
A connection pool is only as reliable as its connections. Connections die for reasons entirely outside your application's control: PostgreSQL restarts, network blips, cloud provider maintenance windows, firewall idle timeout expiry, TCP reset storms during infrastructure changes. HikariCP provides several mechanisms to detect and replace dead connections, but the defaults are not always aggressive enough.
# Connection validation — how HikariCP checks if a connection is alive
# before handing it to your application.
spring:
datasource:
hikari:
# connectionTestQuery is DEPRECATED since JDBC4.
# HikariCP uses Connection.isValid() instead, which calls
# pgjdbc's internal ping — faster than executing a query.
# Do NOT set connectionTestQuery unless you're on JDBC3.
# validation-timeout: 5000 (default)
# How long HikariCP waits for isValid() to return.
# If the connection is dead, isValid() fails, HikariCP
# discards it and creates a new one.
# keepalive-time: 0 (default — disabled)
# When set > 0, HikariCP periodically validates idle connections
# in the background. Set to 300000 (5 min) to catch dead
# connections before they are checked out.
keepalive-time: 300000 # 5 min — test idle connections
# With keepalive-time enabled:
# HikariCP pings each idle connection every 5 minutes.
# Dead connections are replaced silently in the background.
# Your application never sees a stale connection.
#
# Without keepalive-time (the default):
# Dead connections are discovered at checkout time.
# The first request after a network blip or PG restart
# gets a ~50ms delay while the dead connection is replaced.
#
# Trade-off: keepalive adds one ping per connection per interval.
# At pool_size=10, keepalive-time=300000:
# 10 pings / 5 minutes = 2 pings/min = 0.03 pings/sec.
# Negligible. Enable it. The keepalive-time setting is particularly important in cloud environments. AWS RDS, Google Cloud SQL, and Azure Database for PostgreSQL all perform maintenance operations that can silently close backend connections. Without keepalive, the first request after a maintenance event discovers the dead connection at checkout time, absorbs the reconnection latency, and potentially fails if the reconnection takes longer than validation-timeout.
With keepalive-time: 300000, dead connections are detected within 5 minutes of dying and replaced in the background. Your application thread never encounters a dead connection. The cost — 2 pings per minute across a pool of 10 — is so negligible that it vanishes into the noise of normal database traffic.
I am afraid there is no reasonable argument for leaving this disabled. Enable it.
maxLifetime and PostgreSQL timeouts: the coordination nobody does
HikariCP's max-lifetime controls how long a connection can exist before it is retired and replaced. PostgreSQL has its own set of timeout settings that can terminate connections independently. When these are not coordinated, connections die unexpectedly and your application throws PSQLException: This connection has been closed at the worst possible moment.
# HikariCP maxLifetime vs PostgreSQL timeout settings
# These MUST be coordinated or you get connection-reset errors.
# HikariCP side (application.yml):
spring:
datasource:
hikari:
max-lifetime: 1800000 # 30 minutes (in milliseconds)
# PostgreSQL side (postgresql.conf):
idle_in_transaction_session_timeout = '5min' # kill transactions idle > 5 min
idle_session_timeout = '60min' # kill sessions idle > 60 min (PG 14+)
statement_timeout = '30s' # kill queries running > 30 sec
# THE RULE:
# HikariCP max-lifetime < PostgreSQL idle_session_timeout
# HikariCP max-lifetime < any firewall/load-balancer TCP idle timeout
# HikariCP max-lifetime > PostgreSQL idle_in_transaction_session_timeout
#
# If maxLifetime EXCEEDS the PostgreSQL timeout, Postgres kills the connection
# but HikariCP still thinks it is valid. Next checkout: PSQLException.
#
# HikariCP adds up to 2.5% random jitter to maxLifetime to prevent
# all connections from recycling simultaneously. This is thoughtful.
# Respect it by giving yourself a margin — set maxLifetime at least
# 30 seconds shorter than any upstream timeout. The most treacherous interaction is between max-lifetime and idle_in_transaction_session_timeout. This PostgreSQL setting kills any session that has been idle inside an open transaction for longer than the specified duration. It exists to prevent abandoned transactions from holding locks indefinitely.
If your application opens a transaction, performs some work, then makes a slow HTTP call before committing — and that HTTP call takes longer than idle_in_transaction_session_timeout — PostgreSQL terminates the connection. HikariCP detects this on the next checkout and creates a replacement, but the transaction is lost. No retry, no recovery. The data is in an indeterminate state.
The fix is not to increase the timeout. The fix is to stop doing non-database work inside transactions. But until you achieve that discipline, at least set idle_in_transaction_session_timeout to a value that gives your slowest legitimate transaction time to complete.
Cloud provider timeouts: the ones you didn't know about
Cloud providers add another layer of timeout complexity that can undermine even perfectly coordinated HikariCP and PostgreSQL settings.
AWS RDS enforces its own idle connection cleanup. The default tcp_keepalives_idle in RDS parameter groups is 0 (disabled), which means idle connections rely entirely on the application side for keepalive. If your HikariCP keepalive-time is also disabled (the default), an idle connection can sit for hours without any traffic, and an intermediate network device — an AWS NAT gateway, a VPC endpoint, or a network load balancer — may silently drop it after its own idle timeout (typically 350 seconds for NAT gateways).
Google Cloud SQL has a default tcp_keepalives_idle of 300 seconds. This is reasonable but may conflict with applications that set HikariCP idle-timeout higher than 5 minutes. The connection may be killed by Cloud SQL's TCP keepalive before HikariCP retires it.
Azure Database for PostgreSQL has a connection timeout that defaults to 10 minutes for idle connections in some tiers. This is shorter than HikariCP's default idle-timeout of 10 minutes. The timing overlap means connections occasionally die at the exact boundary, producing intermittent PSQLException errors that are nearly impossible to reproduce on demand.
The principle is simple: set HikariCP's max-lifetime to be at least 30-60 seconds shorter than the shortest upstream timeout. This ensures HikariCP retires and replaces connections before any external actor kills them. The external actor should never get the chance. A well-managed connection is retired gracefully, not terminated.
pgjdbc settings that HikariCP cannot help you with
HikariCP manages the pool. The pgjdbc driver manages the connections themselves. Several pgjdbc settings have a dramatic effect on performance and are completely orthogonal to pool sizing. Most developers never change them from their defaults.
# pgjdbc-specific settings that affect HikariCP performance
# These go in the JDBC URL or as datasource properties.
spring:
datasource:
url: jdbc:postgresql://localhost:5432/mydb?preparedStatementCacheQueries=256&preparedStatementCacheSizeMiB=5&reWriteBatchedInserts=true
hikari:
data-source-properties:
# Or set them here instead of the URL:
# preparedStatementCacheQueries: 256
# preparedStatementCacheSizeMiB: 5
# reWriteBatchedInserts: true
# preparedStatementCacheQueries (default: 256)
# Number of prepared statements cached per connection.
# Each cached statement avoids a Parse round-trip on reuse.
# 256 is generous — most apps use 30-80 distinct queries.
# Increase if you see frequent Parse messages in pg_stat_statements.
# Decrease if memory is tight (each connection holds its own cache).
# preparedStatementCacheSizeMiB (default: 5)
# Memory limit for the prepared statement cache per connection.
# With pool_size=10 and 5 MiB each, that is 50 MiB total.
# With pool_size=50, that is 250 MiB. Adjust accordingly.
# reWriteBatchedInserts (default: false — SET THIS TO TRUE)
# Rewrites batch inserts from:
# INSERT INTO t VALUES (1); INSERT INTO t VALUES (2); INSERT INTO t VALUES (3);
# Into:
# INSERT INTO t VALUES (1), (2), (3);
# This is 2-10x faster for bulk inserts. No code changes required.
# One of the highest-impact single settings in all of pgjdbc. reWriteBatchedInserts deserves special emphasis. When set to true, pgjdbc rewrites individual INSERT statements in a JDBC batch into a single multi-row INSERT. The difference is not subtle. A batch of 1,000 individual INSERTs requires 1,000 network round trips (or 1,000 Parse/Bind/Execute cycles). A single multi-row INSERT requires one. On a network with 0.5ms latency, that is 500ms saved per batch. With TLS overhead, often more.
This setting has been available since pgjdbc 9.4.1209. It defaults to false for backward compatibility. There is almost no reason to leave it off. If your application performs batch inserts — and most applications do, somewhere — turn this on. It is, in my estimation, the single highest-impact configuration change available in the entire pgjdbc driver. One property. Zero code changes. 2-10x batch insert throughput.
preparedStatementCacheQueries controls how many prepared statements pgjdbc caches per connection. Note: per connection, not per pool. With a pool of 10 connections and a cache of 256 statements each, you are caching up to 2,560 prepared statement plans. The memory impact is modest — a few megabytes — but the performance benefit is significant for applications that execute the same queries repeatedly (which is virtually all of them).
The prepared statement cache interacts with pgjdbc's prepareThreshold setting (default: 5), which controls how many times a query must be executed before pgjdbc promotes it from a simple query to a prepared statement. This interaction has subtleties that warrant their own article — I have written one, which you may find at pgjdbc Prepare Threshold and the Generic Plan Trap. The short version: after the 5th execution, pgjdbc uses a server-side prepared statement. After the 6th execution (PostgreSQL 12+), the planner may switch to a generic plan that ignores your parameter values. For most queries this is fine. For queries with skewed data distributions, it can be catastrophic.
One important caveat when using prepared statement caching with connection poolers in transaction mode: PgBouncer historically could not handle prepared statements because they are scoped to a connection, and transaction-mode pooling reassigns connections between transactions. If you are using PgBouncer in transaction mode, prepared statements will fail silently or throw errors. This is one of the oldest and most painful PgBouncer limitations. PgBouncer 1.21+ added max_prepared_statements to address this, but the implementation requires careful testing. Gold Lapel's session-mode pooling avoids this problem entirely, as prepared statements remain bound to their session.
The idle-in-transaction trap: why pool size is the wrong knob
In my experience, at least half of all "pool exhaustion" incidents in Spring Boot applications are not pool sizing problems at all. They are transaction scoping problems. The pool is not too small — connections are being held too long.
// The "idle in transaction" trap — the most common Spring/JPA mistake
// that makes pool sizing irrelevant because no pool is large enough.
@Transactional // opens a transaction AND checks out a connection
public OrderResponse processOrder(Long orderId) {
Order order = orderRepository.findById(orderId).orElseThrow();
// Connection is checked out. Transaction is open. Clock is ticking.
PaymentResult payment = paymentService.chargeCard(order);
// ^ This HTTP call takes 200-2000ms.
// The database connection sits idle the entire time.
// In pg_stat_activity: state = 'idle in transaction'
// Any locks held by the SELECT above are held for the duration.
EmailResult email = emailService.sendConfirmation(order);
// ^ Another 100-500ms of holding a connection for no reason.
order.setStatus("CONFIRMED");
orderRepository.save(order);
return new OrderResponse(order);
}
// Total connection hold time: 300-2500ms for ~5ms of actual DB work.
// With pool_size=10 and 50 concurrent orders:
// 50 threads competing for 10 connections = 40 threads blocked.
// FIX: move non-DB work outside the transaction boundary
public OrderResponse processOrder(Long orderId) {
Order order = orderService.findAndLock(orderId); // @Transactional — fast
PaymentResult payment = paymentService.chargeCard(order); // no transaction
EmailResult email = emailService.sendConfirmation(order); // no transaction
orderService.confirmOrder(orderId); // @Transactional — fast
return new OrderResponse(order);
} This pattern is endemic in Spring applications. @Transactional is placed on a service method that does database work and non-database work. The annotation opens a transaction at method entry and commits at method exit. The database connection is held for the method's entire duration. If that method calls an external API, sends an email, processes a file, or performs any I/O that takes more than a few milliseconds, the connection is idle while the meter runs.
The numbers are stark. If your pool has 10 connections and each request holds a connection for 2,000ms (because of an HTTP call inside the transaction), you can serve exactly 5 requests per second. Not 5,000. Five. The database is idle. The connections are idle. The pool is "full." And the instinct is to increase the pool size.
Increasing the pool size is a bandage on an arterial wound. You go from 10 to 50, and now you can serve 25 requests per second. Still inadequate. You go to 200, and PostgreSQL's throughput actually decreases because now you have 200 processes contending for 4 cores. Meanwhile, a properly scoped transaction holds the connection for 5ms, and your pool of 10 serves 2,000 requests per second without breaking a sweat.
The leak detection threshold in HikariCP is your friend here. Set it to 60 seconds and examine the stack traces it produces. Every trace points to a method that is holding a connection longer than it should. Fix those methods — extract the non-database work, narrow the @Transactional boundary — and your pool sizing problem vanishes.
Open Session In View: the default that sabotages everything
There is a Spring Boot default so harmful to connection pool efficiency that it ships with its own warning message. I have written about this concern at length in a dedicated guide on Spring Open-In-View and connection pool exhaustion, but it deserves mention here because it makes pool sizing math fictional.
// Open Session In View (OSIV) — the Spring Boot default
// that turns your pool sizing math into fiction.
// Spring Boot ships with spring.jpa.open-in-view = true by default.
// This holds a database connection for the ENTIRE HTTP request lifecycle.
//
// Normal request lifecycle WITHOUT OSIV:
// Controller -> Service (@Transactional) -> Repository
// Connection held: 5-15ms (duration of the transaction)
//
// Request lifecycle WITH OSIV (the default):
// Connection acquired -> Controller -> Service -> Template -> Response
// Connection held: 50-500ms (entire request processing time)
//
// With OSIV enabled:
// pool_size=10, avg request time=200ms
// Max throughput: 10 / 0.2 = 50 requests/second
//
// With OSIV disabled:
// pool_size=10, avg transaction time=10ms
// Max throughput: 10 / 0.01 = 1,000 requests/second
//
// That is a 20x difference in capacity from a single boolean.
// Disable it immediately:
// spring.jpa.open-in-view=false
//
// Yes, you will need to fix any lazy-loading that was silently
// triggering during view rendering. That's not a cost — it's a gift.
// Every lazy load that fires during rendering is an N+1 query you
// didn't know you had. OSIV was hiding them. Now you can fix them. With OSIV enabled (the default in every Spring Boot application since 1.0), a database connection is held for the entire HTTP request lifecycle — from the moment the controller is invoked until the response is fully rendered and sent to the client. This means view rendering, JSON serialization, response compression, and even waiting for the client to acknowledge the response all happen while a database connection is checked out.
Every pool sizing calculation assumes connections are held for the duration of database work. OSIV breaks that assumption by extending the hold time to the duration of the entire request. If your average request takes 200ms but your average transaction takes 10ms, OSIV inflates your connection usage by 20x.
Disable it. spring.jpa.open-in-view=false. Yes, you will need to fix lazy-loading exceptions. Those exceptions are revealing N+1 queries that were happening silently during view rendering — queries you didn't know about, that OSIV was helpfully enabling and unhelpfully hiding. Every LazyInitializationException you fix is a performance bug you squash.
Diagnosing connection timeout errors: the investigation protocol
When HikariCP throws SQLTransientConnectionException, there is a temptation to immediately increase maximum-pool-size or connection-timeout. I must ask you to resist that temptation. The timeout is a symptom. The disease is elsewhere.
-- When HikariCP throws SQLTransientConnectionException:
-- "Connection is not available, request timed out after 30000ms"
--
-- This is not a network error. Every connection in your pool
-- was checked out and none were returned within 30 seconds.
-- Something is holding connections. Find it.
-- Step 1: What's holding the connections right now?
SELECT pid,
state,
now() - xact_start AS transaction_duration,
now() - query_start AS query_duration,
wait_event_type,
wait_event,
LEFT(query, 120) AS query
FROM pg_stat_activity
WHERE application_name = 'MyAppPool'
AND state != 'idle'
ORDER BY xact_start ASC NULLS LAST;
-- Common findings:
-- pid | state | transaction_duration | query
-- 1234 | idle in transaction | 00:00:45.123 | SELECT ... FROM orders
-- 1235 | idle in transaction | 00:00:38.891 | SELECT ... FROM orders
-- 1236 | idle in transaction | 00:00:31.456 | SELECT ... FROM orders
--
-- All 10 connections are "idle in transaction" for 30+ seconds.
-- The query itself finished instantly. The connection is held open
-- because @Transactional hasn't committed yet — the method is
-- waiting on an HTTP call, a message queue, or slow computation.
-- Step 2: What locks are those idle-in-transaction sessions holding?
SELECT l.pid,
l.locktype,
l.mode,
l.granted,
c.relname AS table_name
FROM pg_locks l
JOIN pg_class c ON c.oid = l.relation
WHERE l.pid IN (
SELECT pid FROM pg_stat_activity
WHERE application_name = 'MyAppPool'
AND state = 'idle in transaction'
)
ORDER BY l.pid, c.relname;
-- If you see RowExclusiveLock on your orders table from an
-- idle-in-transaction session, that session did an UPDATE or INSERT
-- and is now blocking other transactions that need those rows. The investigation proceeds in order:
1. Check pg_stat_activity during the incident. Are all connections idle-in-transaction? That points to the @Transactional antipattern described above. Are all connections active with long-running queries? That is a query performance problem. Are connections waiting on locks? That is a concurrency problem. The diagnosis determines the remedy.
2. Check HikariCP leak detection logs. If leak-detection-threshold is set, HikariCP logs stack traces showing exactly where connections were checked out. Search your logs for "Connection leak detection triggered" and you will find the offending code path.
3. Check OSIV. If spring.jpa.open-in-view is true (the default), connections are held for entire HTTP request durations. This alone can explain pool exhaustion under moderate load.
4. Check for Spring's @Async interaction. If a @Transactional method calls a @Async method on the same bean, the async method may inherit the transaction context and hold a second connection from the pool. With Spring's default SimpleAsyncTaskExecutor, async threads are unbounded — they can exhaust any pool. See Spring @Async Connection Leaks with PostgreSQL for the full investigation.
5. Only then, consider pool size. If all of the above checks come back clean — transactions are narrow, OSIV is disabled, queries are fast, no async leaks — and you are still experiencing timeouts, your pool may genuinely be too small. But in my experience, it is the first four causes in over 90% of cases.
Multi-instance arithmetic: the formula is per-database, not per-app
The formula gives you the total number of connections your PostgreSQL server can efficiently handle. Not per application. Not per instance. Total. This distinction is where most pool sizing goes wrong in production.
# Multi-instance pool sizing — the arithmetic everyone skips
#
# Given:
# PostgreSQL server: 4 cores, SSDs
# Formula pool size: (4 * 2) + 1 = 9 connections (total backend capacity)
# max_connections = 100 (default)
# superuser_reserved_connections = 3
# Usable connections = 97
#
# Scenario A: 1 application instance
# HikariCP maximum-pool-size = 9 # use the full formula
# Total backend connections = 9 # well within 97
#
# Scenario B: 3 application instances (typical staging/prod)
# HikariCP maximum-pool-size = 3 # 9 / 3 = 3 per instance
# Total backend connections = 9 # still at the formula number
#
# Scenario C: 10 instances (Kubernetes autoscale)
# HikariCP maximum-pool-size = 2 # 9 / 10 ≈ 1, but minimum viable = 2
# Total backend connections = 20 # exceeds formula, but acceptable
#
# Scenario D: 10 instances, each with pool_size=20 (the common mistake)
# Total backend connections = 200 # 2x max_connections. Startup failures.
# Fixed by: raise max_connections? No.
# Fixed by: add a connection pooler? Yes.
# Fixed by: use Gold Lapel? Absolutely.
#
# THE INSIGHT: the formula sizes the TOTAL backend pool.
# Your per-instance pool_size = formula_result / instance_count.
# When instance_count is variable (autoscaling), you need a proxy. In a Kubernetes environment with horizontal pod autoscaling, the instance count is not fixed. It might be 3 at 2 AM and 15 during a traffic spike. If each instance has maximum-pool-size=20, your connection count swings between 60 and 300. PostgreSQL's max_connections defaults to 100.
This is the scenario where the formula, applied naively, breaks down. You cannot set maximum-pool-size=1 to stay within bounds during peak autoscale. A pool of 1 serializes all database access within each pod — if one query takes 100ms, every other request in that pod waits.
The real answer is to put a connection pooler or proxy between your application instances and PostgreSQL. This is not optional advice for autoscaling environments. It is a structural requirement. The proxy maintains a fixed-size pool to PostgreSQL (sized by the formula) and multiplexes all application connections through it.
Without a proxy, you are forced to either over-provision max_connections (which wastes memory and hurts performance) or under-provision your application pool (which creates queuing and timeouts). Neither is acceptable.
Connection storms during deployment and failover
Multi-instance arithmetic becomes acutely dangerous during deployments, autoscaling events, and database failovers — the moments when many connections are being created or destroyed simultaneously.
// Connection storm during startup or failover
// The worst possible time to open 200 connections simultaneously.
// Scenario: Kubernetes rolls out a new deployment.
// 10 new pods start within 30 seconds.
// Each pod opens maximum-pool-size connections at startup.
// With pool_size=20: 200 new TCP connections + TLS handshakes
// hit PostgreSQL within a 30-second window.
// PostgreSQL's response:
// - Fork 200 new backend processes (~2-5ms each, serialized)
// - Allocate memory for 200 new backends
// - If max_connections=100: 100 succeed, 100 get
// "FATAL: too many connections for role"
// - If max_connections=300: all succeed, but the server
// is now overloaded from the process creation storm
// HikariCP mitigation (built-in since 4.0):
// initialization-fail-timeout: 1 (default)
// Fail fast if pool can't initialize.
// Set longer if PG might be slow during startup:
// initialization-fail-timeout: 30000 # wait 30s for PG
// Staggered startup (Kubernetes-side):
// spec:
// strategy:
// rollingUpdate:
// maxSurge: 1 # add 1 new pod at a time
// maxUnavailable: 0 # keep old pods until new ones ready
//
// This staggers connection creation: pod 1 opens 20 connections,
// passes readiness check, THEN pod 2 starts.
// With a connection proxy (Gold Lapel, PgBouncer):
// All 200 app connections terminate at the proxy.
// The proxy holds a fixed backend pool (e.g., 20).
// PostgreSQL sees 20 stable connections regardless of
// how many pods start, stop, or restart. The connection storm problem is self-reinforcing. Many connections arrive simultaneously, overloading PostgreSQL's process creation. Connections take longer to establish. Connection timeouts fire. Failed connections are retried. The retries add more load. Without intervention, this feedback loop can prevent an application from starting at all after a database failover.
If you are running more than 3-5 application instances against a single PostgreSQL server, a connection proxy is not a nice-to-have. It is load-bearing infrastructure. The proxy absorbs connection storms, maintains stable backend connections, and presents PostgreSQL with a calm, predictable workload regardless of what is happening on the application side.
Pool sizing for read replicas
If your application routes read-only traffic to a PostgreSQL replica, you need a separate pool — and the formula applies independently.
# Pool sizing for read replicas — a separate calculation
spring:
datasource:
# Primary (read-write)
hikari:
maximum-pool-size: 6
pool-name: PrimaryPool
# Read replica (read-only queries)
read-replica:
url: jdbc:postgresql://replica.example.com:5432/mydb
hikari:
maximum-pool-size: 8
pool-name: ReplicaPool
read-only: true # pgjdbc optimization — no transaction overhead
# Why different sizes?
# Primary: handles writes + critical reads. Keep small, keep fast.
# Replica: handles reporting queries, search, dashboards.
# Size each pool independently using the formula for each server.
#
# The formula applies separately to each PostgreSQL instance:
# Primary (4 cores): (4 * 2) + 1 = 9 total connections
# Replica (4 cores): (4 * 2) + 1 = 9 total connections
# With 3 app instances: primary pool = 3, replica pool = 3 each
#
# HikariCP's read-only flag:
# Sets the JDBC connection to read-only mode.
# pgjdbc skips transaction bookkeeping for read-only connections.
# PostgreSQL can optimize query plans knowing no writes will occur.
# Minor but free performance gain. Always set it for replica pools. The common mistake is giving the replica pool the same size as the primary pool. This is rarely correct. Read replicas typically handle different workload profiles — reporting queries, search, dashboards — that may hold connections for longer durations due to aggregation and sorting. A smaller pool with well-optimized queries often delivers better throughput than a larger pool with slow ones.
HikariCP's read-only: true setting is worth the one line of configuration. When this is set, pgjdbc sends SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY to PostgreSQL, which enables several query planner optimizations. The planner knows no writes will occur, which can affect join ordering and index selection for certain query patterns. The performance gain is small but free.
Monitoring your pool: the four numbers that matter
You can configure a pool perfectly and still have problems if you are not watching the right metrics. Spring Boot with Actuator and Micrometer exposes HikariCP metrics automatically. You only need to know which ones to alert on.
// Expose HikariCP metrics via Spring Boot Actuator + Micrometer
// These are available out of the box — no custom code needed.
// application.yml:
// management:
// endpoints:
// web:
// exposure:
// include: health,metrics,prometheus
// metrics:
// tags:
// application: my-app
// Key metrics to monitor and alert on:
// hikaricp.connections.active
// Currently checked-out connections. Alert if this equals maximum-pool-size
// for more than 10 seconds — your pool is saturated.
// hikaricp.connections.idle
// Available connections. Alert if this drops to 0.
// hikaricp.connections.pending
// Threads waiting for a connection. This is the pain metric.
// Any value > 0 sustained for > 5 seconds means threads are blocking.
// Users are experiencing latency RIGHT NOW.
// hikaricp.connections.timeout
// Counter of connection acquisition timeouts.
// If this is incrementing, requests are failing with:
// "SQLTransientConnectionException: MyAppPool -
// Connection is not available, request timed out after 30000ms."
// hikaricp.connections.usage (Timer)
// How long connections are held before being returned to the pool.
// P99 > 1 second usually means a query or transaction is too slow.
// P99 > 10 seconds means something is very wrong.
// Prometheus query for pool saturation rate:
// hikaricp_connections_active / hikaricp_connections_max * 100 On the PostgreSQL side, pg_stat_activity tells you what your pool looks like from the database's perspective. This is often more revealing than application-side metrics, because it shows you connections from all clients, not just your application.
-- Check how your HikariCP pool looks from PostgreSQL's perspective
-- 1. Connections per application, by state
SELECT application_name,
state,
count(*) AS connections
FROM pg_stat_activity
WHERE datname = 'mydb'
GROUP BY application_name, state
ORDER BY application_name, state;
-- Expected output for a healthy pool (pool_size=10):
-- application_name | state | connections
-- MyAppPool | active | 2
-- MyAppPool | idle | 8
--
-- Unhealthy pool (pool too large or queries too slow):
-- MyAppPool | active | 48
-- MyAppPool | idle | 2
-- 2. Detect leaked connections (held too long without activity)
SELECT pid,
application_name,
state,
now() - state_change AS time_in_state,
now() - query_start AS time_since_last_query,
LEFT(query, 80) AS last_query
FROM pg_stat_activity
WHERE datname = 'mydb'
AND application_name LIKE '%Pool%'
AND state = 'idle in transaction'
AND now() - state_change > interval '30 seconds'
ORDER BY time_in_state DESC;
-- "idle in transaction" connections are holding locks.
-- If you see these lasting more than a few seconds, something in your
-- application is checking out a connection, starting a transaction,
-- and then doing non-database work (HTTP calls, file I/O, computation)
-- before committing. This is the #1 pool sizing mistake in Spring apps.
-- 3. Total connections vs max_connections headroom
SELECT
(SELECT count(*) FROM pg_stat_activity) AS current_connections,
(SELECT setting::int FROM pg_settings WHERE name = 'max_connections') AS max_connections,
(SELECT setting::int FROM pg_settings WHERE name = 'superuser_reserved_connections') AS reserved,
(SELECT setting::int FROM pg_settings WHERE name = 'max_connections')
- (SELECT setting::int FROM pg_settings WHERE name = 'superuser_reserved_connections')
- (SELECT count(*) FROM pg_stat_activity) AS available_slots; The four numbers that matter, in order of urgency:
1. Pending connections. Threads waiting for a connection. If this number is consistently above zero, your application is experiencing latency that users can feel. Either queries are too slow, transactions are scoped too broadly, or (genuinely, occasionally) the pool is too small.
2. Active vs. idle ratio. A healthy pool is mostly idle. If active connections persistently equal pool size, every new request must wait. This is the database equivalent of a restaurant with no empty tables — the formula determines how many tables you need, and it is fewer than you think.
3. Connection usage time (P99). How long connections are held. If P99 exceeds 1 second, something is holding connections too long. Cross-reference with leak detection logs.
4. Timeout counter. If this is incrementing, requests are failing. This is not a warning. This is an outage for the affected requests.
Runtime pool inspection via JMX
When you need to inspect pool state during a live incident — without restarting the application, without waiting for Prometheus to scrape, without hoping the right log line appears — JMX provides direct, real-time access to HikariCP internals.
// Runtime HikariCP pool inspection via JMX
// Useful for live debugging without restarting the application.
// Enable JMX in application.yml:
// spring.datasource.hikari.register-mbeans: true
// spring.datasource.hikari.pool-name: MyAppPool
// Then connect via jconsole or VisualVM and inspect:
// com.zaxxer.hikari:type=Pool (MyAppPool)
//
// Attributes:
// ActiveConnections — checked out right now
// IdleConnections — available right now
// TotalConnections — active + idle
// ThreadsAwaitingConnection — the pain number
// Programmatic access (for custom health checks):
HikariDataSource ds = (HikariDataSource) dataSource;
HikariPoolMXBean pool = ds.getHikariPoolMXBean();
int active = pool.getActiveConnections();
int idle = pool.getIdleConnections();
int waiting = pool.getThreadsAwaitingConnection();
int total = pool.getTotalConnections();
if (waiting > 0) {
log.warn("Pool pressure: {} threads waiting, {}/{} active",
waiting, active, total);
}
if (active == total && idle == 0) {
log.error("Pool saturated: all {} connections in use", total);
} The ThreadsAwaitingConnection attribute is the single most important number during an incident. If it is zero, the pool is not the bottleneck — look elsewhere (slow queries, external service latency, garbage collection pauses). If it is greater than zero, threads are blocking. The duration of those blocks is the latency your users experience above and beyond the query execution time.
Test environment pool configuration
I should like to address a mistake I see in nearly every Spring Boot project: using production pool settings in test environments.
# Test environment pool configuration
# Your test suite should not use the same pool settings as production.
# src/test/resources/application-test.yml
spring:
datasource:
hikari:
maximum-pool-size: 2 # tests run single-threaded — 2 is plenty
minimum-idle: 1 # save memory in CI
connection-timeout: 5000 # fail fast in tests
max-lifetime: 300000 # 5 min — tests don't run that long
leak-detection-threshold: 10000 # 10 sec — aggressive leak detection
pool-name: TestPool
# Why this matters:
# A test suite with pool_size=10 holds 10 connections for the entire run.
# If you run tests in parallel (Gradle, Maven Surefire forks), each fork
# opens its own pool. 4 forks × 10 connections = 40 connections.
# Your CI PostgreSQL instance has max_connections=100.
# Add a few microservices' test suites running concurrently = trouble.
#
# Smaller test pools also make leak detection faster.
# With pool_size=2, a leaked connection saturates the pool immediately.
# The test fails fast with a clear error instead of silently succeeding
# because there were 8 spare connections to absorb the leak. A test suite with maximum-pool-size=10 holds 10 connections for the entire test run. If you run 4 parallel test forks (Gradle's maxParallelForks or Maven Surefire's forkCount), each fork opens its own pool: 40 connections. Add your CI server running tests for 3 microservices simultaneously, and you have 120 connections to a CI PostgreSQL instance that may have max_connections=100.
The test pool should be as small as possible — typically 2. This serves two purposes: it conserves connections in CI environments, and it makes connection leaks immediately visible. With maximum-pool-size=2, a leaked connection means the next request blocks. With maximum-pool-size=10, the leak is absorbed by the spare connections and never noticed — until production, where the pool is also sized to 10 but the traffic volume is considerably higher.
Set leak-detection-threshold aggressively low in tests. 10 seconds is generous. Any test that holds a connection for 10 seconds has a problem, and you want to know about it before the code reaches production.
A note on R2DBC and reactive pool sizing
If your Spring application uses WebFlux with R2DBC instead of Spring MVC with JDBC, the application-side pool math changes fundamentally. I should note this to avoid misleading anyone who has migrated — or is considering migrating — to a reactive stack.
// A note on R2DBC and reactive pool sizing
// The formula changes completely for reactive applications.
// If you're using Spring WebFlux with R2DBC instead of JDBC:
// - r2dbc-pool replaces HikariCP
// - Connections are used asynchronously (non-blocking I/O)
// - A single connection can be in-flight between multiple requests
// - The "cores * 2 + spindles" formula does NOT apply to the app side
//
// R2DBC pool sizing:
// initial-size: 5
// max-size: 20 # can be larger because connections are released
// # between await points, not between method calls.
//
// However: the POSTGRESQL side doesn't change.
// PostgreSQL still forks a process per connection.
// 20 r2dbc connections = 20 backend processes.
// The database-side formula still applies.
//
// If you've migrated from Spring MVC to WebFlux but kept pool_size=50,
// you're wasting connections. R2DBC's async model means 10-15 connections
// can handle what took 50 blocking JDBC connections.
//
// But this article is about HikariCP. If you're on R2DBC, the pool
// sizing principles apply at the PostgreSQL level, but the application
// pool math is different. I shall attend to R2DBC in a separate piece. The key distinction: JDBC connections are blocked for the entire duration of a query. R2DBC connections are released between await points. This means an R2DBC pool of 10 connections can handle significantly more concurrent requests than a JDBC pool of 10, because each connection is occupied for shorter bursts.
However — and this is the critical point — the PostgreSQL side is identical. PostgreSQL does not know or care whether the connection was established by HikariCP or r2dbc-pool. It forks a process either way. The (cores × 2) + spindles formula applies to the backend connection count regardless of the application-side technology.
If you are on R2DBC, you likely need fewer application-side connections than the formula suggests, because R2DBC's connection utilization is more efficient. But the total backend connection count — the sum across all your r2dbc-pool instances — should still respect the formula. The database does not get faster because your client library is non-blocking.
The complete production configuration
For those who prefer to see the final answer before the working, here is a complete, production-ready HikariCP configuration with all the settings discussed in this article. Every value is annotated. Nothing is left at a default that should be changed.
# Complete production-ready HikariCP configuration
# For a Spring Boot app connecting to a 4-core PostgreSQL 16 server.
# One application instance. Adjust maximum-pool-size for multi-instance.
spring:
datasource:
url: >-
jdbc:postgresql://db.example.com:5432/production
?reWriteBatchedInserts=true
&preparedStatementCacheQueries=256
&preparedStatementCacheSizeMiB=5
&socketTimeout=60
&connectTimeout=10
&loginTimeout=10
&tcpKeepAlive=true
username: app_user
password: ${DB_PASSWORD}
hikari:
maximum-pool-size: 9 # (4 cores * 2) + 1 — for YOUR server
minimum-idle: 9 # match maximum-pool-size
idle-timeout: 600000 # 10 min
max-lifetime: 1740000 # 29 min — margin below PG timeout
connection-timeout: 30000 # 30 sec
validation-timeout: 5000 # 5 sec
keepalive-time: 300000 # 5 min — ping idle connections
leak-detection-threshold: 60000 # 60 sec
pool-name: MyAppProd # visible in pg_stat_activity
register-mbeans: true # enable JMX monitoring
jpa:
open-in-view: false # never hold connections during rendering
# PostgreSQL side (postgresql.conf or RDS parameter group):
# max_connections = 30
# idle_in_transaction_session_timeout = '5min'
# idle_session_timeout = '60min' # PG 14+
# statement_timeout = '30s'
# tcp_keepalives_idle = 600
# tcp_keepalives_interval = 30
# tcp_keepalives_count = 3 I should note several JDBC URL parameters that did not appear in earlier sections.
socketTimeout=60 is a safety net. If a query runs for more than 60 seconds, pgjdbc closes the connection. This protects against runaway queries that escape PostgreSQL's statement_timeout (which can happen if statement_timeout is set per-role and a superuser session has it disabled). Set this to a value slightly longer than your PostgreSQL statement_timeout.
tcpKeepAlive=true enables TCP keepalive probes at the OS level. This is complementary to HikariCP's keepalive-time: HikariCP's keepalive validates connections at the application protocol level (JDBC isValid()), while TCP keepalive detects dead connections at the transport level (the remote host has crashed or the network path is broken). Both should be enabled.
connectTimeout=10 and loginTimeout=10 ensure fast failure when the PostgreSQL server is unreachable or unresponsive. Without these, a connection attempt can hang for the OS's default TCP timeout (typically 120 seconds on Linux). During a failover, you want HikariCP to detect the failure quickly, mark the connection as dead, and try again — not wait two minutes for a timeout on a server that has already moved to a new IP.
What if the pool managed itself?
The preceding several thousand words describe a problem that is fundamentally about coordination. Your application needs connections. PostgreSQL can handle a limited number. HikariCP manages the application's share. You manage HikariCP's configuration. You manage PostgreSQL's timeout settings. You manage the arithmetic across instances. You manage the monitoring. You manage the alerting.
It is a lot of management for what is, ultimately, a fixed resource allocation problem.
Gold Lapel sits between your application and PostgreSQL as a self-optimizing proxy with session-mode connection pooling. Your HikariCP pool connects to Gold Lapel instead of directly to PostgreSQL. Gold Lapel maintains its own backend pool — sized correctly, once, in one place — and multiplexes all incoming application connections through it.
# With Gold Lapel: HikariCP configuration becomes simpler
# GL manages the backend pool — your app connects to GL instead of Postgres.
spring:
datasource:
url: jdbc:postgresql://localhost:6432/mydb # GL's port, not Postgres 5432
hikari:
maximum-pool-size: 5 # smaller pool is fine — GL multiplexes
minimum-idle: 5
max-lifetime: 1800000 # 30 min
connection-timeout: 10000 # can be shorter — GL connections are fast
# What changes:
# - Your app opens 5 connections to Gold Lapel
# - Gold Lapel maintains its own pool to PostgreSQL (default: 20)
# - 10 app instances × 5 = 50 connections to GL
# - GL multiplexes those 50 onto 20 actual Postgres connections
# - No max_connections arithmetic. No cross-instance coordination.
#
# What stays the same:
# - pgjdbc settings (preparedStatementCacheQueries, etc.)
# - Spring transaction management
# - Connection timeout and leak detection
# - All your monitoring and alerting
#
# What gets better:
# - Autoscaling events don't storm PostgreSQL with new connections
# - Connection health checking happens at the proxy layer
# - Backend pool is sized once, correctly, in one place
# - Zero-downtime PostgreSQL restarts (GL handles reconnection) The practical effects are immediate. Your per-instance maximum-pool-size can be smaller — 5 instead of 10 — because Gold Lapel absorbs the multiplexing burden. Autoscaling events no longer cause connection storms. The formula still matters, but it is applied once at the proxy layer rather than divided across every application instance and rebalanced with every deployment.
Your pgjdbc settings — preparedStatementCacheQueries, reWriteBatchedInserts, timeout coordination — still matter. The queries still need to be fast. The @Transactional boundaries still need to be narrow. No proxy can fix a service method that holds a connection for 2 seconds while calling a payment API.
But the pool sizing arithmetic? The cross-instance coordination? The "how many connections can I afford per pod" calculation that changes every time you adjust your autoscaler? The connection storm protection during deployments?
Those become someone else's problem. And that someone is rather good at it.
The settings, in summary, for those who prefer brevity
I have been thorough. Perhaps more thorough than the situation strictly required. If you have scrolled here seeking the essential configuration decisions, I shall oblige.
| Setting | Value | Why |
|---|---|---|
maximum-pool-size | (DB cores * 2) + 1, divided by instance count | PostgreSQL TPS peaks at this number |
minimum-idle | Same as maximum-pool-size | Eliminates cold-connection latency spikes |
max-lifetime | 30-60 sec below shortest upstream timeout | Prevents externally-killed connections |
keepalive-time | 300000 (5 min) | Detects dead connections in background |
leak-detection-threshold | 60000 (60 sec) | Identifies connection-holding antipatterns |
reWriteBatchedInserts | true | 2-10x batch insert throughput, zero code changes |
spring.jpa.open-in-view | false | 20x connection efficiency improvement |
Nine connections on a 4-core server. Warm at startup. Validated in the background. Retired before any external actor can terminate them. Leaks detected and logged with stack traces. Batch inserts rewritten automatically. OSIV disabled so connections are held for milliseconds, not hundreds of milliseconds.
It is not complicated. It merely requires attention to detail — which, if you'll forgive a professional observation, is rather the point of having staff.
Frequently asked questions
Terms referenced in this article
One further thought, if I may. The pool size formula is only half the equation — the other half is what PostgreSQL does with those connections once they arrive. I have written at some length on the matter of max_connections, including the memory arithmetic and the point at which connection pooling at the database layer becomes not merely helpful but essential.