← Connection Pooling & Resource Management

FastAPI + PostgreSQL Connection Pool Lifecycle: Permit Me to Manage the Introductions

Q: How many connections should my FastAPI pool have to PostgreSQL?

Start with the formula (CPU cores * 2) + effective_spindles (SSD counts as 1) for optimal concurrent active queries on your PostgreSQL server. A 4-core server with SSD gives about 9 optimal active connections. Divide the available connection slots (max_connections minus a few reserved for superuser and monitoring) across your replicas — for example, 97 available slots across 4 replicas gives 24 max per replica. In practice, a max_size of 15 per replica is a reasonable middle ground: 4 * 15 = 60, well within the 97 available, while not all replicas peak simultaneously. Set min_size to 2-3 to keep warm connections ready.

Q: Should I use asyncpg, SQLAlchemy async, or psycopg3 with FastAPI?

asyncpg offers the highest raw throughput and lowest latency if you are writing SQL directly. SQLAlchemy async is the right choice if you want ORM models, migrations via Alembic, or a query builder. psycopg3 provides a modern Pythonic API with pipeline mode support and the closest parity with the libpq wire protocol. All three work well with FastAPI lifespans and dependency injection.

Q: Why am I getting "too many clients already" errors in FastAPI?

Your application replicas are collectively opening more connections than PostgreSQL allows. Multiply your pool max_size by the number of running replicas — if that exceeds max_connections (default 100), new connections fail. This commonly surfaces during autoscaling events or rolling deployments when old and new instances overlap. The fix is coordinating pool sizes across replicas, or placing a connection proxy like PgBouncer or Gold Lapel between your application and PostgreSQL.

Q: Do I need to close the connection pool when FastAPI shuts down?

Yes. Without explicit pool closure in the lifespan teardown, connections are abandoned rather than closed cleanly. PostgreSQL will eventually detect the dead connections via TCP keepalives, but until it does, those slots count against max_connections — which can cause connection failures for other clients or new deployments. Use the lifespan pattern to call pool.close() (asyncpg), engine.dispose() (SQLAlchemy), or pool.close() (psycopg3) on shutdown.

Your application and your database need a proper introduction. A good pool manager ensures they meet under the right circumstances and part on civil terms.

The Waiter of Gold Lapel · Updated Mar 20, 2026 Published Mar 5, 2026 · 28 min read

The illustration has, regrettably, timed out waiting for a connection to the artist.

Good evening. We need to discuss how your connections are being managed.

FastAPI is splendid. PostgreSQL is exemplary. But the manner in which they are introduced to each other in most tutorials is, if I may be direct, a disgrace.

The official FastAPI documentation demonstrates database access using SQLModel with an SQLite file. A perfectly reasonable teaching example. An entirely unreasonable production architecture. When you swap in PostgreSQL and async drivers, the SQLite example provides approximately zero guidance on connection pool lifecycle, lifespan event wiring, or dependency injection patterns that do not leak connections under load.

I have, in recent months, reviewed a considerable number of FastAPI applications in production. The pattern is remarkably consistent: the application works beautifully during development, handles the first few hundred users without complaint, and then — at some point between midnight and 3 AM, because these things have a sense of occasion — begins returning 500 errors. The PostgreSQL logs show FATAL: too many clients already. The application logs show ConnectionRefusedError. And the on-call engineer, bleary-eyed and resentful, begins searching for answers that the tutorial did not provide.

There are three mainstream approaches to managing a PostgreSQL connection pool in FastAPI: raw asyncpg, SQLAlchemy's async engine, and psycopg3's AsyncConnectionPool. Each has different creation semantics, different checkout mechanisms, different timeout behaviors, and different failure modes. No single resource covers all three with proper configuration, monitoring, and the honest discussion of trade-offs that the topic deserves.

That is what this guide is for.

What a connection pool actually does — and what it costs

Before we examine the three drivers, a brief word on what we are managing and why it requires such care.

A PostgreSQL connection is not cheap. Each connection spawns a dedicated backend process on the server — a full operating system process with its own memory allocation (typically 5-10 MB), its own file descriptors, and its own entry in the pg_stat_activity catalog. Creating one requires a TCP handshake (1-3ms on localhost, 10-50ms across a network), TLS negotiation if encrypted (another 5-15ms), PostgreSQL authentication (password hashing, SCRAM exchange), and backend process initialization.

The total cost of creating a fresh connection ranges from 3ms on localhost to 50ms or more across a network with TLS. That may sound trivial. It is not. At 1,000 requests per second, if each request creates and destroys its own connection, you are spending 3-50 seconds of cumulative CPU time per second just on connection overhead. You have hired a footman whose sole occupation is opening and closing the front door.

A connection pool eliminates this overhead by maintaining a set of pre-established connections that are borrowed, used, and returned. The cost of "creating a connection" drops from 3-50ms to less than 0.1ms — the time to check out an already-open connection from the pool. The savings are not incremental. They are transformational.

But a pool introduces its own complexities: connections go stale, the pool can be exhausted, the sizing must be coordinated across replicas, and the lifecycle must be managed with precision. A poorly configured pool is often worse than no pool at all — it creates a false sense of security while silently leaking connections or holding them long past their useful life.

The full lifecycle of a pooled connection

# The full lifecycle of a connection through the pool
# Understanding this helps you debug every pool-related issue

import asyncpg
import logging

logger = logging.getLogger("pool")

async def trace_connection_lifecycle(pool):
    """
    1. ACQUIRE: connection checked out from pool
       - If idle connection available: immediate (< 0.1ms)
       - If pool at max_size: blocks until one is returned
       - If pool below max_size: creates new connection (3-15ms)
    """
    async with pool.acquire() as conn:
        logger.info(f"Pool state: {pool.get_size()} total, "
                     f"{pool.get_idle_size()} idle")

        # 2. USE: connection is exclusively yours
        #    No other coroutine can use it until you release it
        await conn.execute("SELECT 1")

        # 3. RETURN: connection goes back to idle pool
        #    Happens automatically when 'async with' block exits

    # 4. IDLE: connection sits in pool waiting for next acquire()
    #    After max_inactive_connection_lifetime: closed and removed

    # 5. RECYCLE: if connection exceeds max age, it is closed
    #    and a fresh one is created on next acquire()

# The critical insight: connection TIME is what you pay for.
# A connection held for 500ms during a slow query is 500ms
# that no other request can use that slot.
# 20 connections * 500ms average hold time = 40 requests/second max.
# 20 connections * 5ms average hold time = 4,000 requests/second max.
# The pool size matters far less than the hold duration.

The critical insight in that example deserves emphasis: the pool's throughput is determined not by its size but by how quickly connections are returned. Twenty connections with a 5ms average hold time can serve 4,000 requests per second. Those same twenty connections with a 500ms hold time — perhaps due to an unoptimized query or a forgotten SELECT * — serve only 40. The pool has not changed. The queries have.

I shall return to this point when we discuss pool sizing. For now, hold it in mind: the pool is a multiplier of your query performance, not a substitute for it.

The lifespan event: where pools are born and where they must die

FastAPI's lifespan context manager replaced the older on_startup/on_shutdown events in version 0.93. It is the correct place to create and destroy your connection pool. The pattern is identical across all three drivers: create the pool before yield, tear it down after.

The principle is straightforward: one pool, created once, shared by all requests, destroyed on shutdown. Anything else — pools created per-request, pools created at module scope with no teardown, engines floating in global variables with no lifecycle management — leads to connection leaks, resource exhaustion, or both.

I have arranged all three approaches below, complete and production-ready. Each one has been tested under load. Each one handles teardown correctly. I should note, however, that "production-ready" is a statement about the pool wiring — your actual production readiness depends on the timeout configuration, monitoring, and graceful shutdown patterns that we shall address in subsequent sections.

Approach 1: Raw asyncpg

asyncpg gives you the thinnest abstraction over PostgreSQL's binary protocol. No ORM, no session management, no query building. You write SQL, you get rows. The pool is managed by asyncpg itself, and it is fast — the fastest pure-Python PostgreSQL driver by a meaningful margin, owing to its use of the binary protocol and its Cython-accelerated parser.

asyncpg — lifespan + dependency injection

from contextlib import asynccontextmanager
import asyncpg
from fastapi import FastAPI, Request

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup: create the pool once
    app.state.pool = await asyncpg.create_pool(
        "postgresql://user:pass@localhost:5432/mydb",
        min_size=5,
        max_size=20,
        max_inactive_connection_lifetime=300,  # close idle connections after 5 min
        command_timeout=30,
    )
    yield
    # Shutdown: close every connection cleanly
    await app.state.pool.close()

app = FastAPI(lifespan=lifespan)

async def get_pool(request: Request):
    return request.app.state.pool

@app.get("/users/{user_id}")
async def get_user(request: Request, user_id: int):
    pool = await get_pool(request)
    async with pool.acquire() as conn:
        return await conn.fetchrow(
            "SELECT id, name, email FROM users WHERE id = $1", user_id
        )

Note the max_inactive_connection_lifetime parameter. Without it, idle connections persist indefinitely — fine for a single-instance application, problematic when autoscaling creates and destroys replicas. Stale connections accumulate on the PostgreSQL side, consuming slots and memory. I have observed production systems where terminated application instances left behind dozens of zombie connections that persisted until PostgreSQL's own tcp_keepalives_idle timeout expired — which defaults to two hours.

Also note what is absent: asyncpg has no built-in pool timeout. If all connections are busy and you call pool.acquire(), it blocks indefinitely. Indefinitely. In the sense of "until the heat death of the universe or your Kubernetes liveness probe kills the pod, whichever comes first." We shall address this in the timeout configuration section.

Approach 2: SQLAlchemy async engine

SQLAlchemy 2.0's async engine wraps asyncpg (or psycopg3) with its own connection pool, session management, and ORM layer. The pool configuration lives on the engine, not on the underlying driver. This means you have two pools in series — SQLAlchemy's pool on top of asyncpg's connection management — though in practice SQLAlchemy disables asyncpg's pool and manages connections directly.

SQLAlchemy async — lifespan + dependency injection

from contextlib import asynccontextmanager
from fastapi import FastAPI, Depends, Request
from sqlalchemy.ext.asyncio import create_async_engine, async_sessionmaker, AsyncSession
from sqlalchemy import text

@asynccontextmanager
async def lifespan(app: FastAPI):
    app.state.engine = create_async_engine(
        "postgresql+asyncpg://user:pass@localhost:5432/mydb",
        pool_size=20,
        max_overflow=5,         # 5 extra connections under burst
        pool_timeout=30,        # seconds to wait for a free connection
        pool_recycle=1800,      # recycle connections every 30 min
        pool_pre_ping=True,     # verify connection health before checkout
    )
    app.state.session_factory = async_sessionmaker(
        app.state.engine, expire_on_commit=False
    )
    yield
    await app.state.engine.dispose()

app = FastAPI(lifespan=lifespan)

async def get_session(request: Request) -> AsyncSession:
    async with request.app.state.session_factory() as session:
        yield session

@app.get("/orders")
async def list_orders(session: AsyncSession = Depends(get_session)):
    result = await session.execute(
        text("SELECT id, total, status FROM orders WHERE status = :s"),
        {"s": "pending"}
    )
    return result.mappings().all()

Several details matter here. First, pool_pre_ping=True sends a lightweight check before handing out a connection. This catches connections that PostgreSQL terminated due to idle timeout, network hiccups, or server restarts — the dreaded "connection reset by peer" error that appears twenty minutes after a deployment. The cost is approximately 1ms of overhead per connection checkout. Some engineers disable it for performance. I would advise against this economy. The 1ms you save is dwarfed by the 30 seconds of 500 errors you will experience when a stale connection is handed to a request.

Second, expire_on_commit=False prevents SQLAlchemy from lazily reloading all object attributes after a commit, which triggers unexpected queries and confuses people who expected their objects to remain populated. This is one of those defaults that is technically correct (the session's view of the database is stale after a commit) but practically infuriating (the object was populated 50 milliseconds ago, and the data has not changed).

Third, max_overflow=5 allows the pool to temporarily exceed its pool_size by 5 connections during traffic bursts. This is a feature unique to SQLAlchemy — neither asyncpg nor psycopg3 offers it. The overflow connections are closed as soon as they are returned to the pool, so they do not accumulate. It is a useful safety valve, but it is not a substitute for proper pool sizing. If your pool is routinely in overflow, it is too small.

Approach 3: psycopg3 AsyncConnectionPool

psycopg3's pool is built into the psycopg_pool package (installed separately from psycopg itself — a detail that catches people). It offers the most granular lifecycle control of the three — separate open() and wait() steps, distinct idle and lifetime timeouts, and a get_stats() method that returns pool health metrics as a dictionary. If asyncpg is the sports car and SQLAlchemy is the luxury sedan, psycopg3 is the vehicle where every dial and gauge is exposed and labeled.

psycopg3 — lifespan + dependency injection

from contextlib import asynccontextmanager
from fastapi import FastAPI, Depends, Request
from psycopg_pool import AsyncConnectionPool

@asynccontextmanager
async def lifespan(app: FastAPI):
    app.state.pool = AsyncConnectionPool(
        "postgresql://user:pass@localhost:5432/mydb",
        min_size=5,
        max_size=20,
        max_idle=300,           # close idle connections after 5 min
        max_lifetime=1800,      # recycle connections every 30 min
        open=False,             # don't connect in constructor
    )
    await app.state.pool.open()     # connect explicitly in lifespan
    await app.state.pool.wait()     # block until min_size connections are ready
    yield
    await app.state.pool.close()

app = FastAPI(lifespan=lifespan)

async def get_conn(request: Request):
    async with request.app.state.pool.connection() as conn:
        yield conn

@app.get("/users/{user_id}")
async def get_user(user_id: int, conn = Depends(get_conn)):
    row = await conn.execute(
        "SELECT id, name, email FROM users WHERE id = %s", (user_id,)
    )
    return await row.fetchone()

The open=False / await pool.open() / await pool.wait() sequence deserves attention. By default, the psycopg3 pool constructor attempts to open connections immediately — which fails if called outside an async context (and the constructor, being a regular __init__, is not async). Setting open=False defers connection creation to the explicit open() call inside the lifespan. The subsequent wait() blocks until min_size connections are established, ensuring your application does not begin accepting requests before the database is reachable.

This is a detail that matters more than it appears. Without wait(), your application can start accepting requests while the pool is still establishing its initial connections. The first few requests arrive, find an empty pool, and either block or fail depending on your timeout configuration. Your health check returns 200, your load balancer sends traffic, and your users see errors. The wait() call prevents this by making startup explicitly synchronous: the pool is ready, or we do not proceed.

How do the three approaches compare?

A side-by-side reference for the decisions you will actually face during implementation.

Feature	asyncpg	SQLAlchemy async	psycopg3
Pool creation	`asyncpg.create_pool()`	`create_async_engine()`	`AsyncConnectionPool()`
Connection checkout	`pool.acquire()`	`session_factory()`	`pool.connection()`
Parametrized queries	`$1, $2 (positional)`	`:name (named)`	`%s or %(name)s`
Health checking	`Manual (no built-in ping)`	`pool_pre_ping=True`	`check=AsyncConnectionPool.check_connection`
Max overflow	`No (hard max_size limit)`	`max_overflow=N`	`No (hard max_size limit)`
Pool timeout	`None (blocks forever)`	`pool_timeout=N (seconds)`	`timeout=N on connection()`
Pool stats	`get_size(), get_idle_size()`	`pool.checkedout(), pool.overflow()`	`get_stats() (dict)`
Connection recycling	`max_inactive_connection_lifetime`	`pool_recycle (seconds)`	`max_lifetime, max_idle`
Prepared statement support	`Built-in (statement cache)`	`Via driver (asyncpg/psycopg3)`	`Built-in (prepare())`
Best suited for	`Raw speed, minimal abstraction`	`ORM models, complex queries, migrations`	`Modern Pythonic API, pipeline mode`

The choice depends on what else you need. If you want an ORM, migrations, and a mature query builder, SQLAlchemy is the only realistic option — and it uses asyncpg or psycopg3 underneath, so you get the driver's performance regardless. If you want maximum control with minimum abstraction, raw asyncpg or psycopg3 are both excellent. A separate comparison of asyncpg and psycopg3 for FastAPI examines the driver-level trade-offs in detail, should the distinction matter to your architecture.

I should offer an honest observation here: the "which driver" decision matters less than the "how you configure it" decision. I have seen asyncpg applications that leaked connections and psycopg3 applications that ran flawlessly for years. I have seen SQLAlchemy setups that were elegant and SQLAlchemy setups that were architectural fever dreams. The driver is the instrument. The configuration is the music.

The three timeouts you must understand

Every pool-related outage I have investigated traces back to one of three timeouts being misconfigured or — more commonly — not configured at all. The defaults across all three drivers are, if I may be blunt, irresponsible in their optimism.

The three critical timeout configurations

# Timeout configuration — the three timeouts you must understand

# 1. POOL TIMEOUT: how long to wait for a connection from the pool
#    When all connections are busy, new requests queue here.
#    Too short: requests fail unnecessarily during brief spikes
#    Too long: requests hang, users stare at spinners

# asyncpg: no built-in pool timeout — acquire() blocks indefinitely!
# You must wrap it yourself:
import asyncio

async def get_conn_with_timeout(pool, timeout=5.0):
    try:
        return await asyncio.wait_for(pool.acquire(), timeout=timeout)
    except asyncio.TimeoutError:
        raise HTTPException(503, "Database pool exhausted")

# SQLAlchemy: pool_timeout parameter (default: 30 seconds)
engine = create_async_engine(
    "postgresql+asyncpg://...",
    pool_timeout=10,  # 10 seconds, then TimeoutError
)

# psycopg3: timeout parameter on connection()
async with pool.connection(timeout=10.0) as conn:
    ...  # raises PoolTimeout after 10 seconds


# 2. QUERY TIMEOUT: how long a single query can run
#    Prevents runaway queries from holding connections forever

# asyncpg: command_timeout on pool creation
pool = await asyncpg.create_pool(..., command_timeout=30)

# SQLAlchemy: set at the connection level
async with engine.connect() as conn:
    await conn.execute(text("SET statement_timeout = '30s'"))

# psycopg3: set via connection options
pool = AsyncConnectionPool(
    "postgresql://...?options=-c statement_timeout=30000",
    ...
)

# 3. CONNECTION TIMEOUT: how long to wait for initial TCP connection
#    Only matters during pool creation or when creating new connections

# asyncpg: timeout parameter
pool = await asyncpg.create_pool(..., timeout=10)  # TCP connect timeout

# SQLAlchemy: connect_args
engine = create_async_engine(
    "postgresql+asyncpg://...",
    connect_args={"timeout": 10},
)

# psycopg3: connect_timeout in connection string
pool = AsyncConnectionPool(
    "postgresql://...?connect_timeout=10",
    ...
)

The consequences of getting each one wrong are worth spelling out.

Pool timeout too short: During a brief traffic spike, requests that would have been served in 2 seconds receive a timeout error instead. Your monitoring fires, your on-call engineer wakes up, and by the time they look at the dashboard, the spike has passed. The system was fine. Your timeout was impatient.

Pool timeout too long (or absent): When the pool is genuinely exhausted — perhaps because a long-running query is holding all connections — requests queue silently. Response times climb from 50ms to 5 seconds to 30 seconds. Your users see a frozen page, not an error message. By the time the timeout fires, they have already left. I find a pool timeout of 5-10 seconds to be appropriate for most web applications. Long enough to survive a brief spike, short enough to fail visibly when something is genuinely wrong.

Query timeout absent: A single unoptimized query — an analyst's ad-hoc report, a missing index on a growing table, a SELECT * on a table with 50 million rows — holds its connection for the full duration. If that duration is 60 seconds, one query reduces your effective pool by one slot for a full minute. Five such queries and a quarter of your pool is occupied by work that should have been cancelled 55 seconds ago. Set statement_timeout. Set it on the pool, the connection, or the PostgreSQL configuration. But set it.

I should note that asyncpg's command_timeout is a client-side timeout — it cancels the Python coroutine but does not necessarily cancel the query on the PostgreSQL side. For true server-side cancellation, also set statement_timeout in PostgreSQL's configuration or via a SET statement when the connection is established. The server-side timeout ensures the query stops consuming resources even if the client disconnects.

The two mistakes I see in every second codebase

I have reviewed enough FastAPI applications to develop opinions. Two patterns recur with distressing regularity.

Mistake 1: Creating the engine or pool per request

The per-request pool antipattern

# WRONG: creating an engine on every request
@app.get("/users")
async def list_users():
    engine = create_async_engine("postgresql+asyncpg://...")
    async with engine.connect() as conn:
        result = await conn.execute(text("SELECT * FROM users"))
        return result.mappings().all()
    # Engine is garbage collected. Connections leak or close abruptly.
    # Under load: ConnectionRefusedError, ResourceWarning, "too many clients"

# WRONG: creating a pool in a dependency without caching it
async def get_pool():
    pool = await asyncpg.create_pool("postgresql://...")
    yield pool
    await pool.close()
    # A new pool is created and destroyed per request.
    # You are paying 3-5ms of connection setup on every single call.

A connection pool exists to reuse connections. Creating a new pool per request is the precise opposite of that — the infrastructural equivalent of buying a new car for every trip to the shops and abandoning it in the car park afterward.

Each create_pool() or create_async_engine() call establishes fresh TCP connections (3-5ms each, 10-20ms with TLS), performs authentication, and allocates memory. Under 100 concurrent requests, you are creating and destroying 100 pools simultaneously, exhausting PostgreSQL's max_connections and generating a cascade of ConnectionRefusedError exceptions.

The insidious part: this works perfectly in development. Your local PostgreSQL has 100 available connections. Your development server handles one request at a time. The pool is created, used, destroyed, and the next request creates a new one. Latency is perhaps 10ms higher than it should be, but you do not notice. It is only under concurrent load — in staging, in production, at 2 AM — that the pattern reveals its nature.

The fix is always the same: create the pool once in the lifespan event, attach it to app.state, and retrieve it via dependency injection.

Mistake 2: Mixing sync and async database access

Sync operations blocking the async event loop

# WRONG: mixing sync ORM calls in an async FastAPI app
from sqlalchemy import create_engine  # sync engine
from sqlalchemy.orm import Session

sync_engine = create_engine("postgresql://...")  # no +asyncpg

@app.get("/reports")
async def get_reports():
    # This blocks the event loop. Under concurrency, everything stalls.
    with Session(sync_engine) as session:
        return session.execute(text("SELECT * FROM reports")).all()

# What actually happens:
# - FastAPI runs on uvicorn's asyncio event loop
# - The sync Session blocks the thread running the event loop
# - All other requests queue behind it
# - At 50 concurrent requests, response times spike from 5ms to 2,000ms
# - The thread pool (default: 40 workers) becomes the bottleneck

# FIX: use the async engine and AsyncSession everywhere
# Or, if you must use sync code, wrap it explicitly:
@app.get("/reports")
async def get_reports():
    return await asyncio.to_thread(sync_query)  # at least offload to thread pool

FastAPI runs on an asyncio event loop. When you call a synchronous database driver — psycopg2, SQLAlchemy's sync Session, or any blocking I/O — from an async def endpoint, you block the event loop thread. Every other request queues behind the blocked thread. Under concurrency, latencies compound exponentially.

FastAPI does automatically run def (non-async) endpoints in a thread pool, which is why some tutorials appear to work with sync drivers. But the thread pool defaults to 40 workers — a hard ceiling on concurrency that defeats the purpose of using an async framework. If your endpoints are async def, every database call must be non-blocking. Full stop.

The diagnostic is simple: if your FastAPI application handles one request quickly but falls apart at 50 concurrent requests, check whether any async def route is calling synchronous database code. It almost certainly is.

I should add a counterpoint here, because intellectual honesty requires it: if your application genuinely handles fewer than 50 concurrent requests and you have an existing codebase built on psycopg2 or synchronous SQLAlchemy, migrating to async may not be worth the effort. Using def endpoints with a sync driver and a thread pool of 40 workers is perfectly adequate for many applications. The async advantage manifests under high concurrency — hundreds or thousands of simultaneous connections. If that is not your world, the sync approach is simpler, better documented, and has fewer sharp edges. There is no shame in it. A household does not need a ballroom if it never hosts a ball.

Dependency injection: the right way to provide connections

FastAPI's Depends() system is the correct mechanism for providing database connections to route handlers. But there is an important distinction between injecting a pool and injecting a connection, and the choice has consequences for transaction management, error handling, and connection hold time.

Injecting the pool means the route handler calls pool.acquire() itself. The handler controls when the connection is checked out and returned. This is appropriate when the handler needs fine-grained control — acquiring the connection late (after validating input), releasing it early (before doing non-database work), or acquiring multiple connections for different operations.

Injecting a connection (via a dependency that acquires from the pool and yields) means the connection is checked out before the handler runs and returned after it completes. This is simpler, more consistent, and the right default for most endpoints. The connection lifetime matches the request lifetime exactly.

For SQLAlchemy, inject the session, not the engine. The session handles connection checkout, transaction boundaries, and cleanup. Injecting the engine and manually creating sessions in each handler is verbose and error-prone — you will eventually forget to close a session, and connections will leak.

Transaction-scoped dependencies

For endpoints that perform multiple database operations that must succeed or fail together, wrap the dependency in a transaction scope.

Transaction-scoped dependency injection

# Transaction-scoped dependency injection
# For endpoints that need atomic operations

from fastapi import Depends, Request
from sqlalchemy.ext.asyncio import AsyncSession

async def get_session_with_transaction(request: Request):
    """Yield a session inside a transaction.
    Commits on success, rolls back on exception."""
    async with request.app.state.session_factory() as session:
        async with session.begin():
            yield session
            # If no exception: commits here
        # If exception: rolls back here

@app.post("/transfer")
async def transfer_funds(
    body: TransferRequest,
    session: AsyncSession = Depends(get_session_with_transaction),
):
    # Both operations happen in one transaction
    await session.execute(
        text("UPDATE accounts SET balance = balance - :amount WHERE id = :from_id"),
        {"amount": body.amount, "from_id": body.from_account},
    )
    await session.execute(
        text("UPDATE accounts SET balance = balance + :amount WHERE id = :to_id"),
        {"amount": body.amount, "to_id": body.to_account},
    )
    # Transaction commits automatically when the dependency yields back
    return {"status": "transferred"}

# For asyncpg, the pattern is similar but explicit:
async def get_conn_with_transaction(request: Request):
    pool = request.app.state.pool
    async with pool.acquire() as conn:
        async with conn.transaction():
            yield conn

# For psycopg3:
async def get_conn_with_transaction(request: Request):
    async with request.app.state.pool.connection() as conn:
        async with conn.transaction():
            yield conn

The elegance of this pattern is that the route handler has no transaction management code at all. It receives a session, uses it, and returns. The dependency handles commit and rollback. If the handler raises an exception, the transaction rolls back automatically. If it completes normally, the transaction commits. The handler need not — and should not — know about transaction boundaries.

One rule applies universally across all three drivers: every connection acquired in a dependency must be released in that same dependency's teardown. If you yield a connection, the code after yield must ensure the connection is returned to the pool — whether the request succeeded, failed, or raised an exception. Python's async with guarantees this. Manual try/finally blocks do not, because people forget the finally.

I have seen applications where a missing finally caused a connection leak that grew by one connection every time a particular error path was triggered. The application ran fine for weeks, leaking one or two connections per day, until the pool was exhausted and everything failed simultaneously. The fix was four characters: finally:. The investigation took six hours. Use async with.

Graceful shutdown: the part nobody gets right

When your application shuts down — whether for a deployment, a scaling event, or because Kubernetes decided your pod was no longer welcome — the pool must close cleanly. Connections must be returned. In-flight queries must complete or be cancelled. New requests must be rejected gracefully rather than accepted and then abandoned mid-execution.

The lifespan teardown (the code after yield) handles pool closure. But it does not handle the transition period between "stop accepting new requests" and "all in-flight requests have completed." That gap is where data corruption, partial writes, and user-facing errors live.

Graceful shutdown with connection draining

# Graceful shutdown: the part nobody gets right
# When uvicorn receives SIGTERM, the lifespan teardown runs.
# But what about in-flight requests?

from contextlib import asynccontextmanager
import asyncio
import signal

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Create pool
    app.state.pool = await asyncpg.create_pool(
        "postgresql://user:pass@localhost:5432/mydb",
        min_size=5,
        max_size=20,
    )
    app.state.shutting_down = False

    yield

    # Signal that we're shutting down
    app.state.shutting_down = True

    # Give in-flight requests a grace period to finish
    # uvicorn's --timeout-graceful-shutdown controls the outer deadline
    # Default: None (waits forever) — set to 30 in production:
    #   uvicorn app:app --timeout-graceful-shutdown 30

    # Close the pool — this waits for all acquired connections to be returned
    await app.state.pool.close()
    # If connections are not returned within the grace period,
    # they are terminated. This can interrupt in-flight queries.

# Middleware to reject new requests during shutdown
@app.middleware("http")
async def reject_during_shutdown(request, call_next):
    if getattr(request.app.state, "shutting_down", False):
        return JSONResponse(
            status_code=503,
            content={"detail": "Server shutting down"},
            headers={"Retry-After": "5"},
        )
    return await call_next(request)

# The Kubernetes/Docker pattern:
# 1. SIGTERM received → lifespan teardown begins
# 2. Middleware rejects new requests with 503
# 3. In-flight requests complete (or hit the grace period)
# 4. Pool closes (waits for connections to return)
# 5. Process exits
#
# Without this: during rolling deployments, the old pod's pool.close()
# terminates connections mid-query. Users see 500 errors.
# With this: clean handoff, zero errors during deploys.

The critical detail is the --timeout-graceful-shutdown flag on uvicorn. Without it, uvicorn waits indefinitely for in-flight requests to complete — which sounds safe until a hung request prevents your deployment from proceeding. With it set to, say, 30 seconds, uvicorn allows in-flight requests 30 seconds to finish before terminating them. The pool's close() method then cleans up any remaining connections.

In a Kubernetes environment, this interacts with the pod's terminationGracePeriodSeconds (default: 30 seconds). The chain is: SIGTERM received, uvicorn stops accepting connections, in-flight requests have timeout-graceful-shutdown seconds to complete, lifespan teardown runs (pool closes), process exits. If the entire sequence exceeds terminationGracePeriodSeconds, Kubernetes sends SIGKILL and the process dies immediately — no teardown, no pool cleanup, no graceful anything.

Set timeout-graceful-shutdown to a value comfortably less than your terminationGracePeriodSeconds. If Kubernetes gives you 30 seconds, set the uvicorn timeout to 20. That leaves 10 seconds for pool teardown and process cleanup.

Health checks that actually check the database

Your load balancer needs to know whether this application instance can serve requests. A health check that returns {"status": "ok"} without verifying database connectivity is not a health check — it is a statement of optimism.

Health check endpoints with pool awareness

# Health check endpoints that actually check the database
# Your load balancer needs to know if this instance can serve requests

@app.get("/health")
async def health_check(request: Request):
    """Shallow health check — is the process alive?"""
    return {"status": "ok"}

@app.get("/health/ready")
async def readiness_check(request: Request):
    """Deep health check — can we actually query the database?
    Use this for Kubernetes readiness probes."""
    pool = request.app.state.pool

    # Check 1: pool has available connections
    idle = pool.get_idle_size()
    total = pool.get_size()
    if idle == 0 and total >= 20:  # pool exhausted
        return JSONResponse(
            status_code=503,
            content={"status": "pool_exhausted", "total": total, "idle": idle},
        )

    # Check 2: can we actually execute a query?
    try:
        async with pool.acquire() as conn:
            await conn.fetchval("SELECT 1")
    except Exception as e:
        return JSONResponse(
            status_code=503,
            content={"status": "db_unreachable", "error": str(e)},
        )

    return {
        "status": "ok",
        "pool_total": total,
        "pool_idle": idle,
        "pool_used": total - idle,
    }

# IMPORTANT: the readiness probe acquires a connection.
# If your probe interval is 5 seconds and the probe holds the
# connection for 50ms, that's 1% of one connection slot — negligible.
# But if the probe query is slow (> 1s), you're wasting a pool slot
# on health checks instead of real traffic. Keep it to SELECT 1.

The separation between a liveness probe (/health) and a readiness probe (/health/ready) is intentional and important. The liveness probe answers "is the process running?" — it should always return 200 unless the process has crashed. If a liveness probe fails, Kubernetes restarts the pod, which is aggressive medicine. The readiness probe answers "can this instance serve traffic?" — it can fail during startup (pool still initializing), during database outages, or during pool exhaustion without triggering a restart.

A word of caution: the readiness probe acquires a connection from the pool. If your probe interval is 5 seconds and the probe holds a connection for 50ms, that is negligible — 1% of one connection slot. But I have seen health checks that execute multi-table joins or run SELECT count(*) FROM large_table to "verify the database is responsive." This is akin to testing whether the house is structurally sound by removing a load-bearing wall. Use SELECT 1. It confirms connectivity. It does not contribute to the problem it is meant to detect.

How do you know when your pool is exhausted?

Pool exhaustion is one of those failures that does not announce itself clearly. Response times increase gradually, then suddenly. Error messages vary by driver — "timeout waiting for a connection," "too many clients already," "QueuePool limit of size 20 overflow 5 reached" — and by the time you see them, your users have already noticed.

Monitor from both sides: the database and the application. Each provides information the other cannot.

Database-side monitoring

PostgreSQL pool health queries

-- Check pool exhaustion symptoms on the PostgreSQL side
-- Run this when response times spike or you see "connection pool exhausted" errors

-- 1. Current connection count by state
SELECT state, count(*)
FROM pg_stat_activity
WHERE datname = 'mydb'
GROUP BY state;

-- Healthy output:           Unhealthy output:
-- active   | 4              active   | 20   (all connections busy)
-- idle     | 16             idle     | 0    (no connections available)
-- total    | 20             total    | 20

-- 2. Connections waiting on locks (sign of contention, not pool size)
SELECT count(*) AS waiting,
       avg(EXTRACT(EPOCH FROM now() - query_start))::numeric(10,2) AS avg_wait_sec
FROM pg_stat_activity
WHERE wait_event_type = 'Lock';

-- 3. Longest-running active queries (pool hogs)
SELECT pid, now() - query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY duration DESC
LIMIT 5;

-- 4. Connection churn — are connections being created and destroyed rapidly?
-- High values here mean your pool is misconfigured or not being reused
SELECT datname,
       numbackends AS current_connections,
       xact_commit + xact_rollback AS total_transactions
FROM pg_stat_database
WHERE datname = 'mydb';

-- 5. Connection age distribution — are connections being recycled?
SELECT pid,
       now() - backend_start AS connection_age,
       state,
       now() - state_change AS time_in_state
FROM pg_stat_activity
WHERE datname = 'mydb'
ORDER BY connection_age DESC;

The key signal is the ratio of active to idle connections. A healthy pool has most connections idle — they are available, waiting to serve requests. When every connection is active and none are idle, the next request to arrive must wait. When idle = 0 persists for more than a few seconds, you are either under-pooled or your queries are too slow.

The "pool hogs" query (query 3) is equally important. A single long-running query — an unindexed report, a forgotten SELECT * on a million-row table, a transaction left open by a debugger — holds its connection for the entire duration. One 30-second query in a pool of 20 reduces your effective pool size to 19. Five of them and you are operating at 75% capacity.

Query 4 — connection churn — reveals whether your pool is actually reusing connections. If the connection count fluctuates rapidly (connections being created and destroyed every few seconds), your pool is misconfigured. Either min_size is too low and connections are being destroyed during idle periods only to be recreated moments later, or something is closing connections prematurely.

Application-side monitoring

Pool metrics by driver

# Application-side pool monitoring for each driver

# asyncpg: check pool stats directly
pool = app.state.pool
print(f"Size: {pool.get_size()}")
print(f"Free: {pool.get_idle_size()}")
print(f"Used: {pool.get_size() - pool.get_idle_size()}")
# Alert when free == 0 for more than 5 seconds

# SQLAlchemy: use pool events
from sqlalchemy import event

@event.listens_for(engine.sync_engine, "checkout")
def on_checkout(dbapi_conn, connection_record, connection_proxy):
    pool = engine.pool
    print(f"Checked out: {pool.checkedout()}, Overflow: {pool.overflow()}")
    if pool.checkedout() >= pool.size():
        logger.warning("Pool at capacity — requests will queue")

# psycopg3: pool stats via get_stats()
stats = app.state.pool.get_stats()
print(f"Pool size: {stats['pool_size']}")
print(f"Requests waiting: {stats['requests_waiting']}")
print(f"Connections used: {stats['pool_size'] - stats['pool_available']}")
# Alert when requests_waiting > 0 persists for more than a few seconds

Each driver exposes pool metrics differently, but the critical numbers are the same: how many connections are in use, how many are available, and whether anything is waiting. Set up alerts on available == 0 sustained for more than 5 seconds, and on waiting > 0 sustained for more than 10 seconds. These are early warnings that precede the timeout errors your users will see.

Structured monitoring with Prometheus

For production systems that need dashboards and alerting — which is to say, all production systems — export pool metrics as Prometheus gauges.

Prometheus metrics for pool monitoring

# Structured pool monitoring with Prometheus metrics
# For production systems that need dashboards and alerting

from prometheus_client import Gauge, Histogram, Counter

# Define metrics once at module scope
pool_total = Gauge("db_pool_connections_total", "Total connections in pool")
pool_idle = Gauge("db_pool_connections_idle", "Idle connections in pool")
pool_used = Gauge("db_pool_connections_used", "Active connections in pool")
pool_waiting = Gauge("db_pool_requests_waiting", "Requests waiting for connection")
checkout_duration = Histogram(
    "db_pool_checkout_seconds",
    "Time to acquire a connection from pool",
    buckets=[0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0],
)
pool_exhausted = Counter("db_pool_exhausted_total", "Times pool was fully exhausted")

# Background task to collect metrics every 5 seconds
import asyncio

async def collect_pool_metrics(app):
    while True:
        pool = app.state.pool
        total = pool.get_size()
        idle = pool.get_idle_size()
        used = total - idle

        pool_total.set(total)
        pool_idle.set(idle)
        pool_used.set(used)

        if idle == 0 and total > 0:
            pool_exhausted.inc()

        await asyncio.sleep(5)

# Start the collector in your lifespan
@asynccontextmanager
async def lifespan(app: FastAPI):
    app.state.pool = await asyncpg.create_pool(...)
    metrics_task = asyncio.create_task(collect_pool_metrics(app))
    yield
    metrics_task.cancel()
    await app.state.pool.close()

# Middleware to time connection checkout
import time

@app.middleware("http")
async def track_checkout_time(request, call_next):
    start = time.monotonic()
    response = await call_next(request)
    # Note: this measures total request time, not just checkout.
    # For precise checkout timing, instrument the dependency.
    return response

The histogram on checkout duration is particularly revealing. In a healthy system, the p99 checkout time should be under 1ms — the connection is already in the pool, and checkout is essentially a list pop. When the p99 climbs to 100ms or more, connections are being created on demand because the pool was empty. When it climbs to seconds, requests are queuing behind an exhausted pool. The histogram tells you which regime you are in without requiring you to reproduce the problem.

Pool sizing: the number everyone gets wrong

Application-side pool sizing and PostgreSQL-side max_connections are related but distinct settings, and getting either one wrong produces different flavors of trouble.

The application pool max_size controls how many connections this application instance can hold simultaneously. PostgreSQL's max_connections controls the total across all clients. If you have 4 application replicas each with max_size=20, that is 80 potential connections — already most of PostgreSQL's default max_connections=100.

Pool sizing calculator for FastAPI deployments

# Pool sizing calculator for FastAPI deployments
# Based on: (CPU cores * 2) + effective_spindles

# Your PostgreSQL server:
pg_cpu_cores = 4
effective_spindles = 1  # SSD counts as 1
pg_optimal_connections = (pg_cpu_cores * 2) + effective_spindles  # = 9

# Reserve some for superuser, replication, monitoring:
pg_reserved = 3
pg_max_connections = 100  # PostgreSQL default (can be changed, but see below)
pg_available = pg_max_connections - pg_reserved  # = 97

# But optimal active queries is still only 9!
# The other 88 slots exist for connections that are mostly idle.

# Your deployment:
app_replicas = 4
# Each replica gets an equal share:
per_replica_max = pg_available // app_replicas  # = 24

# But remember: optimal active is 9, not 97.
# If all 4 replicas are doing queries simultaneously:
per_replica_target = pg_optimal_connections  # = 9 per replica... ideally
# In practice, not all replicas peak simultaneously, so:
per_replica_pool_size = 15  # reasonable middle ground
per_replica_min_size = 3    # keep a few warm connections ready

# The math that matters:
# 4 replicas * 15 max = 60 potential connections (within pg_available of 97)
# 4 replicas * 3 min = 12 persistent connections (warm pool, minimal overhead)
# If a 5th replica autoscales: 5 * 15 = 75 (still within limits)
# If a 6th: 6 * 15 = 90 (getting close — time to revisit)

# WARNING: if per_replica * replicas > pg_available, new connections WILL fail.
# This is the #1 cause of "too many clients already" in autoscaling deployments.

The formula from the connection pooling guide applies: (CPU cores * 2) + effective_spindles for the optimal number of actively executing queries. For a 4-core PostgreSQL server, that is roughly 9 connections of actual concurrent query execution capacity. More active connections than that will contend on CPU scheduling, shared buffers, and lock management. They will not run faster. They will run slower.

This means your application pool should be sized based on your share of PostgreSQL's capacity, not based on your request concurrency. If you have 500 concurrent requests but only 20 effective database connections, the pool will queue requests — which is exactly correct behavior. The pool's job is to serialize access to a scarce resource, not to pretend the resource is unlimited.

A pool that queues briefly is healthy. A pool that queues for seconds is telling you that queries are too slow, not that the pool is too small. I cannot emphasize this enough, because the instinct — especially at 3 AM — is to increase max_size. Resist this instinct. A larger pool with slow queries does not solve the problem. It hides it for slightly longer while consuming more PostgreSQL resources and making the eventual failure more dramatic.

"Every connection pooler makes trade-offs. The question is not which pooler is best in the abstract, but which trade-offs your application can tolerate — and which ones it cannot."
— from You Don't Need Redis, Chapter 17: Sorting Out the Connection Poolers

The autoscaling trap

Autoscaling introduces a particularly dangerous arithmetic. If your Kubernetes cluster scales from 4 replicas to 8 during a traffic spike, and each replica has max_size=20, your potential connection count doubles from 80 to 160 — exceeding PostgreSQL's default max_connections=100. The new replicas attempt to connect, receive FATAL: too many clients already, fail their health checks, and Kubernetes restarts them. The restarted pods try again, fail again, and enter a crash loop. The autoscaler, observing that the new pods are unhealthy, may scale up further, making the problem worse.

The solution is to coordinate your pool sizing with your maximum expected replica count. If you might scale to 8 replicas, each replica's max_size must not exceed (max_connections - reserved) / 8. Or — and this is the approach I recommend — place a connection pooler between your application and PostgreSQL, which we shall discuss presently.

When to add PgBouncer (and what changes when you do)

If you have more than two or three application replicas, or if you autoscale, a connection pooler between your application and PostgreSQL is not optional — it is infrastructure hygiene.

PgBouncer integration with FastAPI

# When to add PgBouncer between FastAPI and PostgreSQL
# And how to configure your app-side pool when you do

# WITHOUT PgBouncer:
# App (pool: 20) → PostgreSQL (max_connections: 100)
# Direct connection. App pool IS your only pool.

# WITH PgBouncer:
# App (pool: 10) → PgBouncer (default_pool_size: 20) → PostgreSQL
# Two pools in series. Each has its own configuration.

# The key change: your app-side pool can be SMALLER
# because PgBouncer multiplexes across all clients.

# App-side configuration when using PgBouncer:
pool = await asyncpg.create_pool(
    "postgresql://user:pass@pgbouncer-host:6432/mydb",  # PgBouncer port
    min_size=2,     # fewer warm connections needed
    max_size=10,    # PgBouncer handles the upstream limit
    # IMPORTANT: disable app-side prepared statements in transaction mode
    # PgBouncer's transaction pooling cannot track prepared statements
    # across different backend connections
    statement_cache_size=0,  # asyncpg-specific: disable statement cache
)

# For SQLAlchemy with PgBouncer:
engine = create_async_engine(
    "postgresql+asyncpg://user:pass@pgbouncer-host:6432/mydb",
    pool_size=10,
    max_overflow=0,     # let PgBouncer handle overflow
    pool_pre_ping=True, # still useful — PgBouncer can evict connections too
    connect_args={
        "statement_cache_size": 0,  # disable asyncpg statement cache
    },
)

# IMPORTANT CAVEAT: PgBouncer in transaction mode breaks:
# - LISTEN/NOTIFY (connection is reassigned between transactions)
# - Prepared statements (unless you disable them, as above)
# - SET statements that should persist across queries
# - Advisory locks
# - Temporary tables
# If you need any of these, use session mode — but session mode
# provides almost no multiplexing benefit.

PgBouncer (or its newer alternative, pgcat) multiplexes many client connections onto fewer PostgreSQL connections. Your 8 application replicas, each with 10 connections to PgBouncer, share a single pool of 20 connections to PostgreSQL. The arithmetic is no longer per-replica. It is centralized.

However — and this is where I must be forthcoming about the trade-offs — PgBouncer in transaction pooling mode breaks several PostgreSQL features that your application may depend on. Prepared statements are the most common casualty. asyncpg, in particular, uses prepared statements aggressively via its statement cache, and those prepared statements are associated with a specific PostgreSQL backend connection. When PgBouncer reassigns you to a different backend connection on the next transaction, your prepared statement does not exist there, and the query fails.

The fix — statement_cache_size=0 for asyncpg — disables the statement cache entirely. This means every query is parsed and planned from scratch on every execution. For simple queries, the overhead is negligible. For complex queries that benefit from a cached plan, you are trading connection multiplexing for query planning overhead. It is not a free lunch. It is a trade-off.

Session pooling mode avoids these issues by assigning each client a dedicated backend connection for the duration of the session. But session mode provides almost no multiplexing benefit — each client still holds a backend connection — so the reason for deploying PgBouncer evaporates.

I should also note that LISTEN/NOTIFY, advisory locks, temporary tables, and any session-level state (SET statements, search_path changes) are all broken by transaction-mode pooling. If your application uses any of these features, you must either switch to session mode, route those specific connections around PgBouncer, or restructure your application to avoid session-level state.

Testing your pool configuration

Pool configuration is notoriously difficult to test because the failure modes only manifest under concurrent load. Your unit tests pass. Your integration tests pass. Your staging environment handles light traffic without complaint. And then production, with its hundreds of concurrent users, reveals that your pool timeout is set to 30 seconds and your users have the patience of approximately 3.

Testing patterns for pool lifecycle

# Testing your pool configuration without a real database
# Use this in CI to verify your lifespan and dependencies work correctly

import pytest
from httpx import AsyncClient, ASGITransport
from unittest.mock import AsyncMock, MagicMock, patch

@pytest.fixture
async def mock_pool():
    """Create a mock pool that behaves like asyncpg's pool."""
    pool = AsyncMock()
    pool.get_size.return_value = 5
    pool.get_idle_size.return_value = 3

    # Mock the acquire context manager
    conn = AsyncMock()
    conn.fetchrow.return_value = {"id": 1, "name": "Test", "email": "test@test.com"}

    pool.acquire.return_value.__aenter__ = AsyncMock(return_value=conn)
    pool.acquire.return_value.__aexit__ = AsyncMock(return_value=False)

    return pool

@pytest.fixture
async def client(mock_pool):
    """Create a test client with the mock pool injected."""
    with patch("app.main.asyncpg") as mock_asyncpg:
        mock_asyncpg.create_pool = AsyncMock(return_value=mock_pool)
        async with AsyncClient(
            transport=ASGITransport(app=app),
            base_url="http://test",
        ) as client:
            yield client

@pytest.mark.asyncio
async def test_get_user(client):
    response = await client.get("/users/1")
    assert response.status_code == 200
    assert response.json()["name"] == "Test"

# Integration test with a real database (use testcontainers or a test DB)
import asyncpg

@pytest.fixture(scope="session")
async def real_pool():
    pool = await asyncpg.create_pool(
        "postgresql://test:test@localhost:5432/test_db",
        min_size=1,
        max_size=5,
    )
    yield pool
    await pool.close()

@pytest.mark.asyncio
async def test_pool_lifecycle(real_pool):
    """Verify pool creates connections and returns them correctly."""
    assert real_pool.get_size() >= 1
    assert real_pool.get_idle_size() >= 1

    async with real_pool.acquire() as conn:
        result = await conn.fetchval("SELECT 1")
        assert result == 1

    # Connection should be back in idle pool
    assert real_pool.get_idle_size() >= 1

The unit tests with mocked pools verify that your lifespan wiring and dependency injection work correctly — that the pool is created on startup, injected into routes, and closed on shutdown. These tests are fast and reliable.

But they do not test pool behavior under load. For that, you need load tests against a real database. Tools like Locust or k6 can simulate hundreds of concurrent users, and you can observe pool metrics during the test to verify that your sizing, timeouts, and monitoring all behave as expected.

The load test scenario I recommend: gradually ramp from 1 to 200 concurrent users over 5 minutes, hold at 200 for 5 minutes, then ramp down. During the test, record pool checkout duration (p50, p95, p99), pool exhaustion events, error rates, and PostgreSQL connection count. If the p99 checkout duration exceeds 100ms during the ramp, your pool is too small or your queries are too slow. If you see any pool exhaustion events at 200 concurrent users, investigate which queries are holding connections the longest.

An honest accounting of what can still go wrong

I have now presented a comprehensive guide to pool configuration in FastAPI. If I left the matter here, you might reasonably conclude that proper pool configuration solves all connection-related problems. It does not. And I would be a poor waiter indeed if I sent you away with that impression.

DNS resolution failures. If your PostgreSQL connection string uses a hostname, DNS resolution happens during pool creation and during connection recreation. A transient DNS failure during pool creation causes the entire application startup to fail. A DNS failure during connection recreation causes that slot to remain empty until the pool attempts again. Consider using IP addresses for connection strings in environments where DNS is unreliable, and set appropriate DNS TTLs where you must use hostnames.

Connection storms after network partitions. If the network between your application and PostgreSQL drops for 30 seconds, all pool connections die simultaneously. When the network recovers, all connections are recreated simultaneously — a "thundering herd" of connection attempts that can overwhelm PostgreSQL. asyncpg handles this relatively gracefully (it recreates connections on demand). SQLAlchemy's pool_pre_ping detects dead connections before checkout. psycopg3 with wait() blocks until connections are reestablished. But all three will attempt to recreate their full min_size simultaneously, and if multiple replicas experience the same partition, the combined reconnection attempt may exceed max_connections.

Memory leaks in long-lived connections. Connections that live for hours or days can accumulate memory — in the PostgreSQL backend process, in the driver's internal caches, or in the application's reference to the connection object. This is why pool_recycle (SQLAlchemy) and max_lifetime (psycopg3) exist: they close and recreate connections periodically, preventing memory from growing without bound. asyncpg's max_inactive_connection_lifetime only closes idle connections — a busy connection that has been alive for a week will not be recycled. If you are using asyncpg and see gradual memory growth in your PostgreSQL backend, consider implementing manual connection recycling based on age.

The ORM tax. If you are using SQLAlchemy's ORM (not just the Core expression language), be aware that the session object accumulates an identity map of all objects loaded during its lifetime. A long-running request that loads thousands of objects will hold those objects in memory for the duration of the session — which, if you are injecting sessions per-request, means the duration of the request. This is not a pool issue per se, but it manifests as one: the connection is held while the session processes a large result set, and the memory consumption compounds the problem. For endpoints that load large datasets, consider using yield_per() or stream_results() to process rows in batches rather than loading everything into memory at once.

What if the pool managed itself?

The configuration burden above — pool sizes per replica, max_connections arithmetic, idle timeouts, health checks, monitoring alerts, PgBouncer integration, prepared statement cache coordination, graceful shutdown sequencing — exists because your application connects directly to PostgreSQL and must negotiate its own share of a finite resource.

Gold Lapel changes the equation. It sits between your application and PostgreSQL as a proxy with built-in session-mode connection pooling. Your application connects to Gold Lapel. Gold Lapel manages the upstream connection pool to PostgreSQL — a single, properly-sized pool shared across all your application instances.

Application-side pool with Gold Lapel

# With Gold Lapel: your pool configuration gets simpler
# GL manages upstream connections — your app just connects to GL

pool = await asyncpg.create_pool(
    "postgresql://user:pass@localhost:6432/mydb",  # GL's port
    min_size=2,
    max_size=10,    # smaller app-side pool is fine
    # GL maintains its own pool to PostgreSQL (default: 20)
    # Your 10 connections fan out to GL's 20 backend connections
    # Multiple app instances share the same backend pool
)

# Benefits:
# - App-side pool can be smaller (less memory per process)
# - Multiple app replicas share backend connections efficiently
# - GL handles connection health checking and recycling
# - No more "too many clients" errors during deploys or autoscaling
# - No need to disable prepared statements (GL handles them natively)
# - Pool sizing arithmetic becomes: "connect to GL, GL handles the rest"

The practical effect: your application-side pool becomes smaller and simpler. You no longer need to coordinate max_size across replicas to stay under max_connections. Health checking and connection recycling happen at the proxy layer. Autoscaling events — spinning up new replicas, tearing down old ones — no longer produce connection storms on PostgreSQL because Gold Lapel absorbs the churn. And because Gold Lapel is not limited by PgBouncer's transaction-mode constraints, your prepared statements continue to work without disabling the cache.

The pool configuration still matters. The timeout configuration still matters. The monitoring still matters. But the arithmetic becomes simpler, the failure modes become fewer, and the 3 AM pages become — in my experience — rather less frequent.

The pool configuration still matters. But it matters rather less when there is a competent intermediary handling the introductions.

Frequently asked questions

How many connections should my FastAPI pool have to PostgreSQL?

Should I use asyncpg, SQLAlchemy async, or psycopg3 with FastAPI?

Why am I getting "too many clients already" errors in FastAPI?

Do I need to close the connection pool when FastAPI shuts down?

Terms referenced in this article

The pool sizing formula above assumes your queries run efficiently. If they do not, more connections will not save you — they will merely queue more slowly. I have written at some length on the hidden cost of slow queries, where connection overhead, query latency, and infrastructure cost compound in ways that pool tuning alone cannot address.