← Django & Python Frameworks

Django Multi-Database Routing with PostgreSQL Read Replicas: Solving the Replication Lag Consistency Bug

Your user updates their profile, gets redirected, and sees the old data. The write went to the primary. The read came from the replica. The replica hadn't caught up yet. This is a solved problem, but not a simple one.

The Waiter of Gold Lapel · Updated Mar 20, 2026 Published Mar 5, 2026 · 38 min read
The illustration is replicating from the primary canvas. Current lag: 340 milliseconds.

Good evening. I understand you have deployed a read replica.

A sensible decision. Your PostgreSQL primary was handling reads and writes, the read load was growing, and adding a streaming replica to absorb SELECT traffic is one of the most well-established scaling patterns in the PostgreSQL ecosystem. You pointed Django's multi-database routing at both servers, wrote a 12-line router class, and watched your primary's CPU drop by 40%.

Then the bug reports started.

"I updated my email but it still shows the old one." "I added an item to my cart but the cart page is empty." "I submitted a comment and it disappeared." Each report follows the same pattern: a user performs a write, gets redirected, and the subsequent page load shows stale data. The write is not lost — refresh the page two seconds later and it appears. But those two seconds are enough to erode trust, generate support tickets, and make your application feel broken.

The underlying issue is PostgreSQL streaming replication lag. The primary applies a transaction, sends the WAL (write-ahead log) to the replica, and the replica applies it. This process is fast — typically under 100 milliseconds — but it is not instantaneous. And Django's database router has no concept of "this user just wrote something, so their next read should come from the primary." The router sees a read, routes it to the replica, and the replica serves data from 100 milliseconds ago.

100 milliseconds. That is all it takes.

I should be direct about the scope of what follows. This is not a conceptual overview. We will examine every practical approach to solving this problem in Django — from the 12-line router that breaks, through middleware-based pinning, library solutions, lag-aware routing, ASGI considerations, multi-replica topologies, and monitoring. By the end, you will understand not only how to fix the read-after-write bug, but why each fix works, where it falls short, and which trade-offs are appropriate for your particular situation.

How PostgreSQL streaming replication actually works

Before we fix the problem, allow me a brief tour of the mechanism that creates it. Understanding the replication pipeline will make every subsequent fix more intuitive — and, more importantly, will help you diagnose when something is wrong at 2 AM.

When a transaction commits on the primary, PostgreSQL does not send the data changes directly to the replica. It sends the WAL — a sequential record of every change made to the database's data files. The replica receives these WAL records and replays them against its own copy of the data. This is the same mechanism PostgreSQL uses for crash recovery: replay the WAL from the last checkpoint, and the database is restored to its committed state.

The five phases of WAL replication
-- What happens inside PostgreSQL when you commit a write:

-- 1. Transaction commits on the primary
--    The change is written to the WAL (write-ahead log) on disk.

-- 2. WAL sender streams the record
--    The primary's WAL sender process sends the new WAL bytes
--    to each connected replica over a persistent TCP connection.

-- 3. Replica WAL receiver accepts the bytes
--    The replica's WAL receiver process writes the WAL to its
--    own WAL file on disk. This is the "write" phase.

-- 4. Replica WAL receiver flushes to disk
--    The WAL bytes are fsynced. This is the "flush" phase.

-- 5. Replica startup process replays the WAL
--    The actual data pages are updated. This is the "replay" phase.
--    Only AFTER replay is the data visible to queries on the replica.

-- The lag you care about is the time between step 1 and step 5.
-- Each phase has its own lag metric in pg_stat_replication:
--   write_lag  → step 2 to step 3
--   flush_lag  → step 2 to step 4
--   replay_lag → step 2 to step 5

The critical insight is that there are five distinct phases, and each introduces potential delay. The WAL must be written on the primary, sent over the network, received by the replica, flushed to the replica's disk, and then replayed into the replica's data pages. Only after replay is the data visible to queries running on the replica.

Under normal conditions, this entire pipeline completes in under 100 milliseconds. Often under 10 milliseconds. But "normal conditions" is doing a great deal of heavy lifting in that sentence. Several things can increase lag dramatically:

  • Large transactions: A bulk UPDATE that modifies 500,000 rows generates a substantial volume of WAL. The replica must receive and replay all of it, and during replay, queries see the pre-update state. A migration that adds a column with a default value to a 10-million-row table can cause seconds of lag.
  • Replica under load: If the replica is running expensive analytical queries, the replay process competes for I/O and CPU. PostgreSQL's recovery process runs at a fixed priority — it does not preempt your queries. A 30-second report query can hold back replay.
  • Network congestion: The WAL stream flows over a TCP connection between primary and replica. If that connection traverses congested network segments, or if the replica is in a different availability zone or region, network latency adds directly to replication lag.
  • Replica VACUUM: This is one of the most common and least understood sources of lag spikes. When the autovacuum daemon runs on the replica, it can generate I/O contention that slows WAL replay. The irony is unmistakable: a process designed to keep the database healthy is making the replica temporarily less useful.
  • WAL archiving contention: If the replica is also serving as a WAL archive target (common in backup topologies), the archival process competes with the replay process for disk I/O.

The important thing to understand is that replication lag is not a fixed number. It fluctuates. Under idle conditions it may be 1 millisecond. During a migration it may be 10 seconds. During a network partition it may be minutes. Any solution that assumes a fixed lag duration is fragile by design.

The naive router and why it breaks

Let us start with what most Django tutorials recommend. A database router that sends all reads to the replica and all writes to the primary.

settings.py — dual database configuration
# settings.py

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'myapp',
        'HOST': 'primary.db.internal',
        'PORT': '5432',
        'USER': 'app',
        'PASSWORD': 'secret',
    },
    'replica': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'myapp',
        'HOST': 'replica.db.internal',
        'PORT': '5432',
        'USER': 'app_readonly',
        'PASSWORD': 'secret',
    },
}

DATABASE_ROUTERS = ['myapp.db_router.PrimaryReplicaRouter']
The 12-line router everyone starts with
# myapp/db_router.py

class PrimaryReplicaRouter:
    """Route reads to the replica, writes to the primary."""

    def db_for_read(self, model, **hints):
        return 'replica'

    def db_for_write(self, model, **hints):
        return 'default'

    def allow_relation(self, obj1, obj2, **hints):
        return True

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        return db == 'default'

This router is correct in the narrow sense: it routes reads and writes to the appropriate servers. It is also the source of every read-after-write consistency bug you are about to encounter.

The problem is visible in a simple view pair:

The read-your-writes bug in action
# views.py

from django.shortcuts import redirect, get_object_or_404
from django.views.decorators.http import require_POST

@require_POST
def update_profile(request):
    user = request.user
    user.display_name = request.POST['display_name']
    user.save()  # writes to 'default' (primary)
    return redirect('profile_detail', user_id=user.id)


def profile_detail(request, user_id):
    user = get_object_or_404(User, id=user_id)  # reads from 'replica'
    # The replica hasn't received the WAL yet.
    # User sees their OLD display name.
    # They click "Edit" again, confused. Save again.
    # Support ticket incoming.
    return render(request, 'profile.html', {'user': user})

The timeline:

  1. T+0ms: user.save() writes to the primary. The transaction commits. The primary begins streaming the WAL record to the replica.
  2. T+5ms: Django returns a 302 redirect to /profile/42/.
  3. T+15ms: The browser follows the redirect. Django receives the GET request.
  4. T+16ms: The router sends the read to the replica. The replica has not yet applied the WAL record — it is still in the replica's WAL receiver buffer.
  5. T+16ms: The user sees their old display name.
  6. T+120ms: The replica finishes applying the WAL. The data is now consistent. Nobody is looking.

The window is narrow. But the POST-redirect-GET pattern is so common in Django — form submissions, AJAX updates, API calls followed by page loads — that even a 50-millisecond lag window guarantees this bug will hit real users in production.

I should note that this is not a Django bug, nor a PostgreSQL bug. It is an architectural consequence of eventual consistency. Every system that splits reads and writes across different servers faces this problem — Rails with its multiple database configuration, Spring with read/write DataSource routing, any application with a load balancer distributing queries. Django's contribution is making the naive version extremely easy to deploy and providing no built-in mechanism to handle the consistency gap.

The Django documentation, to its credit, does mention this issue. In the multi-database topic guide, under a section titled "Database routing," there is a brief note that automatic routing "doesn't handle the issue of replication lag." It does not elaborate on what to do about it. That is, if you'll permit me, what we are here to address.

Fix 1: POST-then-redirect primary pinning

The most widely used solution is primary pinning: after any write request, pin the user's subsequent reads to the primary for a short window. The implementation uses a cookie to track the pin state and middleware to enforce it.

Middleware that pins reads to primary after writes
# myapp/middleware.py

import time

class PrimaryPinMiddleware:
    """After any POST/PUT/PATCH/DELETE, pin subsequent reads
    to the primary for a configurable window."""

    STICKY_COOKIE = 'db_pin_primary'
    PIN_SECONDS = 5  # how long to pin after a write

    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        # Check if this request should be pinned to primary
        if request.COOKIES.get(self.STICKY_COOKIE):
            pin_until = float(request.COOKIES[self.STICKY_COOKIE])
            if time.time() < pin_until:
                request._db_pin_primary = True
            else:
                request._db_pin_primary = False
        else:
            request._db_pin_primary = False

        response = self.get_response(request)

        # If this was a write request, set the pin cookie
        if request.method in ('POST', 'PUT', 'PATCH', 'DELETE'):
            pin_until = time.time() + self.PIN_SECONDS
            response.set_cookie(
                self.STICKY_COOKIE,
                str(pin_until),
                max_age=self.PIN_SECONDS,
                httponly=True,
                samesite='Lax',
            )

        return response

The middleware sets a cookie with a timestamp after any POST, PUT, PATCH, or DELETE. On subsequent requests, if the cookie exists and the timestamp hasn't expired, the request is marked for primary reads.

The router needs to check for this flag, which requires getting the request object into the router. Django's database router has no access to the request — it receives only the model and hints. The standard solution is thread-local storage:

Thread-local middleware to make the request available to the router
# myapp/middleware.py (addition)

import threading

_local = threading.local()


class ThreadLocalRequestMiddleware:
    """Stash the current request in thread-local storage
    so the database router can access it."""

    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        _local.request = request
        try:
            response = self.get_response(request)
        finally:
            _local.request = None
        return response


# settings.py — order matters
MIDDLEWARE = [
    # ...
    'myapp.middleware.ThreadLocalRequestMiddleware',
    'myapp.middleware.PrimaryPinMiddleware',
    # ...
]
Router that respects the primary pin
# myapp/db_router.py

import threading

_local = threading.local()


def get_current_request():
    """Retrieve the request stashed by PrimaryPinMiddleware."""
    return getattr(_local, 'request', None)


class PrimaryReplicaRouter:
    """Route reads to replica unless the user just wrote something."""

    def db_for_read(self, model, **hints):
        request = get_current_request()
        if request and getattr(request, '_db_pin_primary', False):
            return 'default'  # read from primary
        return 'replica'

    def db_for_write(self, model, **hints):
        return 'default'

    def allow_relation(self, obj1, obj2, **hints):
        return True

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        return db == 'default'

Choosing the pin duration

The pin duration — 5 seconds in this example — is a trade-off. Too short, and users with slow connections or high-lag replicas still see stale data. Too long, and you are routing reads back to the primary unnecessarily, negating the benefit of having a replica.

In practice, 5 seconds is generous. PostgreSQL streaming replication lag under normal conditions is 1-100 milliseconds. If your lag regularly exceeds 1 second, the replica is either under-provisioned or processing a heavy batch operation, and you have a larger problem than routing.

The 5-second window exists to cover edge cases: network blips, replica vacuum operations, large transaction replays. It is insurance, not a performance target.

Consider the arithmetic. If 20% of your traffic is write-initiated (POST/PUT/PATCH/DELETE), and each write pins that user for 5 seconds, and your average user generates a request every 3 seconds, then roughly 2 additional requests per write will hit the primary instead of the replica. For a site with 1,000 requests/second where 200 are writes, you are redirecting approximately 400 reads/second back to the primary — a 50% reduction in replica traffic during that window. Whether this matters depends on your headroom. For most applications, the primary can easily absorb this. For applications already at primary capacity, a 2-second pin window may be more appropriate.

Cookie pinning vs. session pinning

The cookie approach has a subtle limitation: the pin state is visible to the client. While the cookie is httponly, a sophisticated user can clear it, and its value (a Unix timestamp) reveals information about your infrastructure. An alternative is to store the pin state in Django's session framework:

Session-based pinning — pin state stays server-side
# myapp/middleware.py — session-based pinning alternative

import time

class SessionPrimaryPinMiddleware:
    """Pin reads to primary using Django's session framework
    instead of a raw cookie. Works across subdomains and
    is not exposed to client-side JavaScript."""

    PIN_SECONDS = 5

    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        pin_until = request.session.get('_db_pin_until', 0)
        request._db_pin_primary = time.time() < pin_until

        response = self.get_response(request)

        if request.method in ('POST', 'PUT', 'PATCH', 'DELETE'):
            request.session['_db_pin_until'] = (
                time.time() + self.PIN_SECONDS
            )

        return response

Session-based pinning keeps the state server-side, works across subdomains (since the session ID is the only cookie involved), and does not leak infrastructure details. The trade-off is a session store read on every request, which you likely already have if you are using Django's authentication system. If your session backend is cached (Redis or Memcached), the overhead is negligible.

One additional consideration: if you are using session pinning and your session backend is itself backed by the replica (a configuration I have encountered more often than I would care to admit), you have created a circular dependency. The session read goes to the replica. The session data says "read from primary." But the session data was written to the primary and might not have replicated yet. The pin state itself can be stale. Use a cache backend for sessions, or ensure your session storage reads from the primary. This is the kind of edge case that makes replication routing interesting.

Fix 2: per-view @use_primary decorators

Some views should always read from the primary, regardless of whether the user recently wrote anything. Financial dashboards. Admin panels. Order confirmation pages. Anything where even momentary staleness is unacceptable.

Decorator for views that demand fresh data
# myapp/decorators.py

from functools import wraps


def use_primary(view_func):
    """Force a specific view to read from the primary database,
    regardless of cookie state."""
    @wraps(view_func)
    def wrapper(request, *args, **kwargs):
        request._db_pin_primary = True
        return view_func(request, *args, **kwargs)
    return wrapper


# Usage:
from myapp.decorators import use_primary

@use_primary
def dashboard(request):
    """This view always reads from the primary.
    Financial data, admin panels, anything that
    cannot tolerate even 100ms of staleness."""
    recent_orders = Order.objects.filter(
        created_at__gte=timezone.now() - timedelta(hours=1)
    ).select_related('customer')
    return render(request, 'dashboard.html', {'orders': recent_orders})

This approach is surgical. You know which views are consistency-sensitive, and you mark them explicitly. The router respects the flag. No cookies, no time windows, no middleware dependency.

The risk is forgetting to decorate a view that needs it. This is a "silent correctness bug" category — the page works, the data loads, it is just occasionally stale. These bugs survive code review because the reviewer has no way to know which views are consistency-sensitive without understanding the business logic.

A context manager for partial pinning

The decorator pins an entire view to the primary. Sometimes that is more than necessary. A view might render historical data (replica-safe) alongside freshly created data (primary-required). For this, a context manager provides finer control:

Context manager for pinning within a view
# myapp/routing.py — context manager for primary reads

from contextlib import contextmanager


@contextmanager
def read_from_primary(request):
    """Temporarily pin reads to primary for a block of code.
    Useful when only part of a view needs fresh data."""
    original = getattr(request, '_db_pin_primary', False)
    request._db_pin_primary = True
    try:
        yield
    finally:
        request._db_pin_primary = original


# Usage in a view:
def order_confirmation(request, order_id):
    # This read can come from the replica — it's historical data
    customer = Customer.objects.get(id=request.user.customer_id)

    # But the order was just created — must read from primary
    with read_from_primary(request):
        order = Order.objects.select_related('items').get(id=order_id)
        payment = Payment.objects.get(order=order)

    return render(request, 'confirmation.html', {
        'customer': customer,
        'order': order,
        'payment': payment,
    })

The context manager lets you pin reads to the primary for exactly the block of code that needs it, then release back to replica routing for the rest of the view. This is the most precise tool in the kit, but also the most demanding — you must understand, at the query level, which reads need fresh data and which tolerate staleness.

Explicit routing with .using()

You can also force routing at the query level with Django's .using() method:

Explicit per-query routing with .using()
# Explicit routing with Django's .using()
# When you need fine-grained control beyond the router

# Force a specific query to the primary
user = User.objects.using('default').get(id=42)

# Force a specific query to the replica
products = Product.objects.using('replica').filter(active=True)

# Save to a specific database
user.save(using='default')

# This is useful for one-off overrides, but does not scale.
# You'll end up scattering .using() calls across your codebase,
# and forgetting one means a stale read bug.

This is useful for one-off overrides. It does not scale as a primary routing strategy — you will end up scattering .using() calls across your codebase, and every forgotten call is a potential stale read. But it has its place: a one-off admin script that needs to read from the primary, a management command that performs a read-modify-write cycle, a debugging session where you want to confirm the primary and replica are returning different results.

DRF considerations

If you are using Django REST Framework, the write-then-read pattern appears in a slightly different form. A viewset's create() action writes the object, then serializes it for the response — and that serialization reads from the database. If the serializer includes related objects or computed fields, those reads go to the replica, which may not have the newly created object yet.

DRF mixin that pins primary after writes
# myapp/mixins.py — DRF mixin for write-then-read APIs

class PrimaryPinMixin:
    """For DRF viewsets that perform writes and return the
    updated object. Ensures the response reads from primary."""

    def perform_create(self, serializer):
        self.request._db_pin_primary = True
        serializer.save()

    def perform_update(self, serializer):
        self.request._db_pin_primary = True
        serializer.save()

    def perform_destroy(self, instance):
        self.request._db_pin_primary = True
        instance.delete()


# Usage:
from rest_framework.viewsets import ModelViewSet

class OrderViewSet(PrimaryPinMixin, ModelViewSet):
    queryset = Order.objects.all()
    serializer_class = OrderSerializer
    # After create/update/destroy, the response serializer
    # reads from primary — no stale data in the API response.

The mixin ensures that any read during the response serialization phase goes to the primary. Without it, you get the particularly confusing scenario where a POST /api/orders/ returns a 201 Created with the new order's ID, but a subsequent GET /api/orders/{id}/ returns 404 because the replica does not have it yet.

Fix 3: django-replicated for production

If you would rather not maintain your own middleware and router, django-replicated packages the POST-then-redirect pinning pattern as a drop-in library. It has been used in production at Yandex, handles the thread-local plumbing, and provides both @use_master and @use_slave decorators for per-view overrides.

django-replicated configuration and usage
# Using django-replicated for production-grade routing
# pip install django-replicated

# settings.py
INSTALLED_APPS = [
    # ...
    'django_replicated',
]

MIDDLEWARE = [
    # ...
    'django_replicated.middleware.ReplicationMiddleware',
]

DATABASE_ROUTERS = ['django_replicated.router.ReplicationRouter']

# After any POST, reads are pinned to primary for this many seconds
REPLICATED_CACHE_BACKEND = 'default'  # uses Django cache for pin state
REPLICATED_READ_ONLY_DOWNGRADE = True  # if primary is down, fall back to replica
REPLICATED_FORCE_MASTER_COOKIE_MAX_AGE = 5  # seconds to pin

# In views, you can force primary reads:
from django_replicated.decorators import use_master, use_slave

@use_master
def admin_dashboard(request):
    # All reads go to primary
    return render(request, 'admin/dashboard.html')

@use_slave
def public_catalog(request):
    # All reads go to replica, even after POST
    return render(request, 'catalog.html')

The library stores the pin state in Django's cache backend rather than a cookie, which means it survives across subdomains and is not subject to cookie size limits. It also supports a REPLICATED_READ_ONLY_DOWNGRADE setting that falls back to the replica if the primary is unreachable — a useful pattern for read-only degraded operation during primary failover.

For most Django applications with a single primary and one or two replicas, django-replicated is the right answer. It handles the common cases correctly, requires minimal configuration, and has been battle-tested at scale.

Where it falls short: it has no awareness of actual replication lag. The pin duration is a fixed time window, not a measurement. It does not know whether the replica is 10 milliseconds behind or 10 minutes behind. The pin expires after N seconds regardless of whether the replica has caught up.

An honest assessment of the library ecosystem

I should be forthcoming about the state of Django's replication routing ecosystem, because pretending it is more mature than it is would be a disservice to you.

django-replicated works. It has also not seen frequent updates. The Django ecosystem has several similar packages — django-multidb-router, django-balancer, django-db-multitenant — none of which have emerged as a clear standard. This is in contrast to Rails, where the multiple database framework is built directly into Active Record with automatic role switching and request-level pinning out of the box since Rails 6.1.

Django's position has historically been that database routing is an application concern, and the framework provides the hooks (DATABASE_ROUTERS, .using()) for you to implement your own strategy. This is philosophically defensible and practically frustrating. It means every Django team with a read replica ends up writing the same middleware, the same thread-local hack, the same cookie-based pin — or reaching for a third-party package that may or may not be actively maintained.

If you are choosing between writing your own middleware and using django-replicated, use django-replicated. If you are choosing between django-replicated and a proxy-level solution, the proxy will serve you better long-term. But if your budget or infrastructure constraints mean the application layer is where this must be solved, django-replicated is a reasonable choice.

The ASGI complication: when thread-local storage breaks

Everything described so far assumes Django running under WSGI — Gunicorn, uWSGI, mod_wsgi — where each request executes in its own thread with its own thread-local storage. If you are running Django under ASGI (Daphne, Uvicorn, Hypercorn) or using Django Channels, the thread-local approach to passing the request to the router will break.

The problem is straightforward. In ASGI, multiple requests can execute concurrently within the same thread. Thread-local storage is shared between them. If Request A stores itself in thread-local, and Request B overwrites it before Request A's database query executes, Request A's routing decision is based on Request B's pin state. The result is unpredictable — reads may be pinned to primary when they should not be, or not pinned when they should be.

The fix is contextvars, introduced in Python 3.7 and designed specifically for this pattern:

ASGI-safe request passing with contextvars
# myapp/middleware.py — ASGI-safe version using contextvars

import contextvars
import time

# contextvars work correctly in both WSGI and ASGI
_current_request = contextvars.ContextVar('current_request', default=None)


class ContextVarRequestMiddleware:
    """Stash the request in a context variable instead of
    thread-local storage. Works with Daphne, Uvicorn, and
    any ASGI server."""

    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        token = _current_request.set(request)
        try:
            response = self.get_response(request)
        finally:
            _current_request.reset(token)
        return response


# Update the router to use the context variable
def get_current_request():
    return _current_request.get(None)

contextvars.ContextVar is scoped to the execution context (task or coroutine), not the thread. In ASGI, each request gets its own context, so the request object is isolated even when multiple requests share a thread.

If you are deploying on WSGI today but may migrate to ASGI in the future, I would recommend using contextvars from the start. It works correctly under both WSGI and ASGI, and replacing thread-local storage after the fact means auditing every piece of code that touches _local. An ounce of foresight, in this case, prevents a pound of debugging.

One subtlety worth noting: Django's own connections object (the database connection pool) uses thread-local storage internally. Even with contextvars for your routing logic, Django's connection management may still behave differently under ASGI than you expect. Django 4.1 added connection pool support for ASGI, but the interaction between ASGI connection pools, database routers, and replica routing is an area where careful testing is essential. Do not assume that passing the WSGI test suite means the ASGI deployment is correct.

Measuring real replication lag with pg_stat_replication

All of the solutions above use time-based heuristics: pin to primary for N seconds after a write, then hope the replica has caught up. This works most of the time. But "most of the time" is not the same as "all of the time," and the gap between those two is where your support tickets live.

PostgreSQL provides precise replication lag data through two system views. On the primary server:

pg_stat_replication — view from the primary
-- Check replication lag on the PRIMARY server
-- This tells you how far behind each replica is

SELECT
    client_addr,
    application_name,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    -- Bytes of WAL not yet applied on the replica
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes,
    -- Time-based lag (PostgreSQL 10+)
    write_lag,
    flush_lag,
    replay_lag
FROM pg_stat_replication
ORDER BY replay_lag DESC;

-- Example output:
--  client_addr   | application_name | state     | replay_lag_bytes | replay_lag
-- ---------------+------------------+-----------+------------------+------------
--  10.0.1.42     | replica1         | streaming |          1048576 | 00:00:00.34
--  10.0.1.43     | replica2         | streaming |             8192 | 00:00:00.01

And on the replica itself:

pg_stat_wal_receiver — view from the replica
-- Check replication lag on the REPLICA server itself
-- Useful when you can't access the primary

SELECT
    status,
    received_lsn,
    latest_end_lsn,
    -- How far behind the replica's WAL receiver is
    pg_wal_lsn_diff(
        pg_last_wal_receive_lsn(),
        pg_last_wal_replay_lsn()
    ) AS local_replay_lag_bytes,
    -- When was the last WAL record received?
    last_msg_receipt_time,
    -- When was the last WAL record applied?
    (EXTRACT(EPOCH FROM now()) -
     EXTRACT(EPOCH FROM pg_last_xact_replay_timestamp()))
    AS seconds_since_last_replay
FROM pg_stat_wal_receiver;

-- If seconds_since_last_replay is > 1, your replica is meaningfully behind.
-- If it's > 5, users WILL see stale data on read-after-write flows.

The key fields:

  • replay_lag_bytes: The number of WAL bytes the replica has received but not yet applied. This is the most reliable lag indicator. Under 8,192 bytes (one WAL page), the replica is essentially caught up.
  • replay_lag: A time-based interval (PostgreSQL 10+). This is derived from WAL timestamps, not wall-clock time, so it is accurate even if the server clocks drift. Values under 100ms are normal for a healthy streaming replica.
  • seconds_since_last_replay: How long since the replica applied its last transaction. If this exceeds 1 second during normal operation, something is wrong — the replica may be overwhelmed, running a long VACUUM, or experiencing network issues to the primary.

For a deeper understanding of what these numbers mean and how PostgreSQL handles WAL shipping, PostgreSQL's own monitoring documentation covers the replication statistics views in detail.

Understanding the three types of lag

I find that most teams treat "replication lag" as a single number, when in fact it decomposes into three distinct phases, each with different causes and different remedies:

  • Write lag (write_lag): Time for WAL to travel from primary to replica memory. This is essentially network latency. If write lag is high but flush and replay lag are normal, you have a network problem, not a replica problem. Same-AZ replicas should see sub-millisecond write lag. Cross-region replicas will see 10-100ms depending on distance.
  • Flush lag (flush_lag): Time for the replica to fsync the received WAL to disk. If flush lag is high but write lag is normal, the replica's disk I/O is the bottleneck. This often happens on replicas with undersized EBS volumes, shared disk, or aggressive VACUUM activity.
  • Replay lag (replay_lag): Time for the replica to apply the WAL to data pages. This is the one that matters for query freshness. If replay lag is high but flush lag is normal, the replica's startup process is struggling — often because a long-running query on the replica is holding a conflicting lock, or because the WAL contains a large transaction that takes time to replay.

Knowing which phase is lagging tells you where to look. I have seen teams spend weeks optimizing their network because "replication lag was high," when the actual problem was a nightly analytics query on the replica that held back WAL replay for 30 seconds.

Monitoring replication lag in production

Measuring lag once is useful. Measuring it continuously and alerting on anomalies is essential. The following query provides a structured view of replication health that maps well to monitoring dashboards and alerting rules:

Structured replication health monitoring query
-- A practical monitoring query for replication health
-- Run this on the PRIMARY on an interval (e.g., every 10 seconds)

SELECT
    application_name,
    client_addr,
    state,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes,
    replay_lag,
    CASE
        WHEN replay_lag IS NULL THEN 'UNKNOWN'
        WHEN replay_lag < interval '100 milliseconds' THEN 'HEALTHY'
        WHEN replay_lag < interval '1 second' THEN 'WARNING'
        WHEN replay_lag < interval '5 seconds' THEN 'CRITICAL'
        ELSE 'EMERGENCY'
    END AS lag_status,
    CASE
        WHEN state != 'streaming' THEN 'REPLICA NOT STREAMING'
        WHEN pg_wal_lsn_diff(sent_lsn, write_lsn) > 0 THEN 'NETWORK DELAY'
        WHEN pg_wal_lsn_diff(write_lsn, flush_lsn) > 0 THEN 'DISK FLUSH DELAY'
        WHEN pg_wal_lsn_diff(flush_lsn, replay_lsn) > 0 THEN 'REPLAY DELAY'
        ELSE 'CAUGHT UP'
    END AS lag_phase
FROM pg_stat_replication
ORDER BY replay_lag DESC NULLS FIRST;

-- lag_phase tells you WHERE the bottleneck is:
--   NETWORK DELAY → WAL not arriving at replica (network issue)
--   DISK FLUSH DELAY → WAL arrived but not fsynced (slow replica disk)
--   REPLAY DELAY → WAL flushed but not replayed (heavy replay workload)

The lag_phase column tells you where in the pipeline the delay is occurring, which is far more actionable than a single lag number. "Replication lag is 2 seconds" is a symptom. "Replay delay on replica2" is a diagnosis.

Alert thresholds worth considering

For a Django application using the primary pinning approach with a 5-second window:

  • Warning at 500ms: Lag is elevated but within the pin window. No user impact, but investigate the cause. This often indicates a batch operation or heavy VACUUM.
  • Critical at 2 seconds: Lag is approaching the pin window. Users who did not perform a recent write may see stale data for reads that happen to land on this replica.
  • Emergency at 5 seconds: Lag exceeds the pin window. Even pinned users will see stale data once the cookie expires. This requires immediate intervention — identify the cause and decide whether to stop sending traffic to this replica.

If you are using a lag-aware router instead of time-based pinning, your alert thresholds align with the router's MAX_LAG_BYTES threshold. When the router starts sending all reads to the primary (because all replicas exceed the threshold), you want to know about it — your primary is now handling 100% of read traffic, and that may push it beyond its capacity.

The replica conflict problem

There is a particularly insidious source of replay lag that deserves dedicated attention: replica conflicts. When a query on the replica holds a lock that conflicts with a WAL record that needs to be replayed, PostgreSQL has two choices: cancel the query or delay the replay. The max_standby_streaming_delay setting controls this trade-off.

The default is 30 seconds. That means a long-running analytical query on the replica can hold back WAL replay for up to 30 seconds before PostgreSQL cancels it. During those 30 seconds, the replica serves increasingly stale data. If you have set hot_standby_feedback = on (which prevents the primary from vacuuming rows the replica still needs), the delay can be even longer.

The correct setting depends on how you use the replica. If it serves application queries (where freshness matters), lower max_standby_streaming_delay to 1-5 seconds and accept that long analytical queries may be cancelled. If it serves analytical queries (where completion matters more than freshness), leave it at 30 seconds and do not rely on it for application reads. If you need both, you need two replicas — one for each workload. There is no setting that satisfies both requirements simultaneously.

Building a lag-aware database router

With access to real lag data, you can build a router that makes informed decisions rather than hopeful ones. Instead of pinning to primary for a fixed 5 seconds, you route to the replica only when you know it is sufficiently caught up.

Router that checks actual replication lag
# myapp/db_router.py — lag-aware version

from django.db import connections
from django.conf import settings

# Maximum acceptable lag in bytes before we stop sending reads to replica
MAX_LAG_BYTES = 1024 * 1024  # 1 MB of WAL behind = too stale


class LagAwareRouter:
    """Route reads to replica only when replication lag is acceptable."""

    def _replica_lag_bytes(self):
        """Query the replica to check how far behind it is."""
        try:
            with connections['replica'].cursor() as cursor:
                cursor.execute("""
                    SELECT pg_wal_lsn_diff(
                        pg_last_wal_receive_lsn(),
                        pg_last_wal_replay_lsn()
                    )
                """)
                lag = cursor.fetchone()[0]
                return lag or 0
        except Exception:
            # If we can't check lag, assume it's bad
            return float('inf')

    def db_for_read(self, model, **hints):
        # Check for per-request pinning first
        request = get_current_request()
        if request and getattr(request, '_db_pin_primary', False):
            return 'default'

        # Check replica lag
        lag = self._replica_lag_bytes()
        if lag > MAX_LAG_BYTES:
            return 'default'  # replica too far behind, read from primary

        return 'replica'

    def db_for_write(self, model, **hints):
        return 'default'

    def allow_relation(self, obj1, obj2, **hints):
        return True

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        return db == 'default'

This router queries the replica on every read to check its lag. If the lag exceeds 1 MB of unapplied WAL, reads are routed to the primary instead. The 1 MB threshold is conservative — at typical write rates, 1 MB of WAL represents a few seconds of transactions.

The obvious problem: querying pg_wal_lsn_diff on every read adds a round trip to every request. For a page that executes 10 queries, you have just added 10 lag checks. This is worse than the disease.

The solution is caching the lag measurement with a short TTL:

Cached lag checks — one query every 2 seconds
# myapp/db_router.py — cached lag checks

import time
import threading
from django.db import connections

_lag_cache = threading.local()
LAG_CHECK_INTERVAL = 2  # seconds between lag checks
MAX_LAG_BYTES = 1024 * 1024


class CachedLagRouter:
    """Check replica lag, but cache the result to avoid
    querying pg_stat_wal_receiver on every single read."""

    def _get_replica_lag(self):
        now = time.time()
        last_check = getattr(_lag_cache, 'last_check', 0)
        cached_lag = getattr(_lag_cache, 'lag_bytes', float('inf'))

        if now - last_check < LAG_CHECK_INTERVAL:
            return cached_lag

        try:
            with connections['replica'].cursor() as cursor:
                cursor.execute("""
                    SELECT pg_wal_lsn_diff(
                        pg_last_wal_receive_lsn(),
                        pg_last_wal_replay_lsn()
                    )
                """)
                lag = cursor.fetchone()[0] or 0
        except Exception:
            lag = float('inf')

        _lag_cache.lag_bytes = lag
        _lag_cache.last_check = now
        return lag

    def db_for_read(self, model, **hints):
        request = get_current_request()
        if request and getattr(request, '_db_pin_primary', False):
            return 'default'

        if self._get_replica_lag() > MAX_LAG_BYTES:
            return 'default'

        return 'replica'

    def db_for_write(self, model, **hints):
        return 'default'

    def allow_relation(self, obj1, obj2, **hints):
        return True

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        return db == 'default'

The cached version checks lag once every 2 seconds per thread. Between checks, it uses the cached value. This means your routing decisions are based on lag data that is at most 2 seconds old — which, given that you are trying to prevent users from seeing data that is milliseconds stale, is more than precise enough.

The lag-aware approach has a meaningful advantage over time-based pinning: it adapts to actual conditions. If the replica is running smoothly with sub-millisecond lag, reads flow to it immediately after a write — no wasted 5-second pin window. If the replica falls behind during a large migration or batch job, reads automatically shift to the primary until it catches up. No manual intervention, no stale data.

Choosing the right lag threshold

The MAX_LAG_BYTES threshold deserves careful consideration. The router above uses 1 MB, but the right value depends on your write volume and consistency requirements:

  • 8,192 bytes (1 WAL page): Extremely strict. The replica must be essentially caught up. This will route to primary frequently during normal write activity, especially if you have a high-throughput workload. Suitable for financial applications where even milliseconds of staleness are unacceptable.
  • 1 MB: Conservative. Tolerates brief lag spikes but catches significant delays. A good default for most web applications.
  • 10 MB: Permissive. The replica can be several seconds behind before reads shift to primary. Appropriate for content sites where slight staleness is acceptable — a blog comment appearing 3 seconds late is less damaging than a financial balance being wrong.

You can also vary the threshold by model. Reads to your User model (where read-after-write is common) might use a strict 8 KB threshold, while reads to your Product model (which changes infrequently) might use 10 MB. This adds complexity to the router but maximizes replica utilization.

Multiple replicas: routing to the least-lagged

If you have multiple replicas — and at scale, you likely do — the routing decision becomes more interesting. Rather than treating all replicas as equivalent, you can route each read to the replica with the lowest lag:

Multi-replica router with lag-based selection
# myapp/db_router.py — multiple replicas with lag-based selection

import time
import threading
import random
from django.db import connections
from django.conf import settings

_lag_cache = threading.local()
LAG_CHECK_INTERVAL = 2
MAX_LAG_BYTES = 1024 * 1024

# List your replica aliases from settings.DATABASES
REPLICAS = ['replica1', 'replica2', 'replica3']


class MultiReplicaRouter:
    """Route reads to the least-lagged replica, or primary if
    all replicas exceed the lag threshold."""

    def _get_replica_lags(self):
        now = time.time()
        last_check = getattr(_lag_cache, 'last_check', 0)
        cached = getattr(_lag_cache, 'lags', {})

        if now - last_check < LAG_CHECK_INTERVAL:
            return cached

        lags = {}
        for alias in REPLICAS:
            try:
                with connections[alias].cursor() as cursor:
                    cursor.execute("""
                        SELECT pg_wal_lsn_diff(
                            pg_last_wal_receive_lsn(),
                            pg_last_wal_replay_lsn()
                        )
                    """)
                    lags[alias] = cursor.fetchone()[0] or 0
            except Exception:
                lags[alias] = float('inf')

        _lag_cache.lags = lags
        _lag_cache.last_check = now
        return lags

    def db_for_read(self, model, **hints):
        request = get_current_request()
        if request and getattr(request, '_db_pin_primary', False):
            return 'default'

        lags = self._get_replica_lags()

        # Filter to replicas under the lag threshold
        healthy = [
            (alias, lag) for alias, lag in lags.items()
            if lag <= MAX_LAG_BYTES
        ]

        if not healthy:
            return 'default'  # all replicas too far behind

        # Pick the least-lagged replica
        # (random tiebreak if multiple are equally caught up)
        healthy.sort(key=lambda x: x[1])
        min_lag = healthy[0][1]
        best = [alias for alias, lag in healthy if lag == min_lag]
        return random.choice(best)

    def db_for_write(self, model, **hints):
        return 'default'

    def allow_relation(self, obj1, obj2, **hints):
        return True

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        return db == 'default'

This router checks the lag on all replicas (cached, of course), filters out any that exceed the threshold, and routes to the least-lagged one among the rest. If all replicas are over the threshold, reads fall back to the primary.

The multi-replica approach is particularly valuable during deployments and migrations. If a schema change causes one replica to fall behind (because it is replaying a heavy ALTER TABLE), the router automatically shifts traffic to the other replicas. When the lagging replica catches up, traffic redistributes. The application code is oblivious to all of this.

Synchronous replication: the nuclear option

I should mention synchronous replication, because you will encounter advice to use it, and I want to be honest about the trade-offs.

PostgreSQL supports synchronous replication at various levels of strictness. At the strictest level (synchronous_commit = 'remote_apply'), every COMMIT waits until at least one replica has replayed the transaction. Replication lag is zero by definition — the primary does not acknowledge the commit until the replica is caught up.

Synchronous replication configuration
-- Synchronous replication configuration on the PRIMARY
-- This eliminates lag but adds latency to every write.

-- postgresql.conf on primary:
-- synchronous_standby_names = 'replica1'
-- synchronous_commit = 'remote_apply'

-- The settings in order of strictness:
--   'on'           → WAL flushed locally (default, no replica wait)
--   'remote_write' → WAL written to replica memory
--   'remote_flush' → WAL flushed to replica disk
--   'remote_apply' → WAL replayed on replica (data visible to queries)

-- With remote_apply, every COMMIT waits until the replica
-- has replayed the transaction. Lag is zero by definition.
-- But every write now includes a network round trip to the replica.

-- You can set this per-transaction for critical writes:
SET LOCAL synchronous_commit = 'remote_apply';
UPDATE users SET email = 'new@example.com' WHERE id = 42;
COMMIT;
-- This single write waits for replica replay.
-- All other writes use the server default.

This eliminates the read-after-write problem entirely. It also adds the replica's replay time to every write's latency. If your write latency budget is 5ms and the replica adds 2ms of network + replay time, you have consumed 40% of your budget on replication. If the replica is under load and replay takes 50ms, every write in your application takes 50ms longer.

Worse: if the replica goes down, synchronous_commit = 'remote_apply' will block all writes on the primary until the replica comes back or you reconfigure. Your read replica — which you added for availability — becomes a single point of failure for writes. The irony is, I'm afraid, quite complete.

Synchronous replication has legitimate uses. Financial systems where regulatory requirements demand zero data loss. Multi-region setups where you need guaranteed consistency across sites. But for solving the Django read-after-write bug? It is the equivalent of hiring a full security detail because you misplaced your house key. The problem is real, but the solution is disproportionate.

The per-transaction SET LOCAL synchronous_commit = 'remote_apply' approach shown above is more surgical — you pay the synchronous penalty only for specific critical writes. But at that point, you have moved consistency logic into your transaction management, which is yet another layer of application code that must be maintained and tested. For most applications, time-based pinning or lag-aware routing is the more practical choice.

"Read replicas, connection pooling, materialized views, proper indexing — there is a long and rewarding list of techniques to exhaust before the conversation turns to sharding or additional services."

— from You Don't Need Redis, Chapter 16: Everything to Try Before You Shard

Routing strategies compared

Here is a summary of every approach covered above, with the trade-offs that matter in production:

StrategyMechanismRead-after-writeComplexityLag-aware
Naive primary/replica splitRouter sends all reads to replicaBrokenLowNo
POST-then-redirect pinCookie pins reads to primary for N seconds after writeFixed (within window)MediumNo
Per-view @use_primaryDecorator forces specific views to primaryFixed (for decorated views)LowNo
django_replicatedProduction middleware + cache-backed pinningFixedLow (library)No
Lag-aware routerQuery pg_stat_wal_receiver, route based on lag bytesPartial (depends on threshold)HighYes
Proxy-level routing (Gold Lapel)Intercept queries at wire protocol, route by real-time lag + query typeFixed (automatic)None (transparent)Yes

The progression is clear: each strategy adds sophistication, but also adds code you need to maintain, test, and debug. Time-based pinning is the sweet spot for most applications. Lag-aware routing is valuable when your replication lag is unpredictable or when you need to maximize replica utilization. Proxy-level routing eliminates the problem from your application code entirely.

I should be direct about one thing: the difference between "Medium" and "High" complexity in this table is not about initial implementation. It is about ongoing maintenance. The pin-based middleware is 40 lines of Python that, once working, rarely needs attention. The lag-aware router is 60 lines of Python that interacts with live database state, caches that state with TTL logic, and must be tested against actual replication topologies. When it breaks — and it will eventually break, because it depends on a running replica — the debugging surface area is larger.

When replication lag is not your actual problem

I would be a poor waiter indeed if I did not mention the scenarios where the techniques in this article are solving the wrong problem. Not every stale-data bug is a replication lag issue, and applying these fixes to the wrong diagnosis wastes time and masks the real cause.

ORM caching. Django's ORM caches querysets at the Python level. If you fetch a queryset, mutate an object in the database through a different code path, and then iterate the original queryset, you see stale data — but the staleness has nothing to do with replication. The queryset was evaluated once and cached in memory. Refreshing with .all() or refresh_from_db() solves this regardless of which database you are reading from.

Application-level caching. If you are using Django's cache framework or an external cache like Redis to cache rendered fragments or serialized objects, the cache may serve stale data long after the replica has caught up. A user updates their profile, the write goes to the primary, the replica catches up in 50ms, but the cached profile page lives in Redis for another 300 seconds. No amount of database routing fixes this — you need cache invalidation, which is its own well-documented ordeal.

CDN and browser caching. The same pattern at a different layer. If your CDN caches a page for 60 seconds, or the browser caches an API response because you set Cache-Control: max-age=60, the user sees stale data regardless of database freshness. Check your HTTP headers before reaching for replication fixes.

Transaction isolation. If a read occurs within a transaction that started before the write committed, the read sees the pre-write snapshot — even on the primary. This is MVCC working correctly, not replication lag. It typically manifests in long-running transactions or when using REPEATABLE READ isolation level.

Before implementing primary pinning, confirm that your stale-data bug actually correlates with write-then-read patterns across the primary/replica split. The simplest diagnostic: temporarily set your router to send all reads to the primary. If the bug disappears, it is replication lag. If it persists, the problem is elsewhere.

Testing your replication routing

Replication bugs are notoriously difficult to test because they depend on timing. In your test environment, the "replica" is often the same database as the primary (Django's test runner uses a single database by default), so lag is zero and read-after-write always works.

Unit tests for router logic

Unit tests for router routing decisions
# tests/test_replication.py

from django.test import TestCase, override_settings
from django.contrib.auth import get_user_model
from myapp.db_router import PrimaryReplicaRouter

User = get_user_model()


class ReplicationRoutingTest(TestCase):
    databases = {'default', 'replica'}

    def test_reads_go_to_replica(self):
        router = PrimaryReplicaRouter()
        self.assertEqual(router.db_for_read(User), 'replica')

    def test_writes_go_to_primary(self):
        router = PrimaryReplicaRouter()
        self.assertEqual(router.db_for_write(User), 'default')

    def test_pinned_request_reads_from_primary(self):
        """After a write, reads should go to primary."""
        from django.test import RequestFactory
        factory = RequestFactory()
        request = factory.get('/profile/')
        request._db_pin_primary = True

        # Stash request in thread-local
        from myapp.db_router import _local
        _local.request = request

        router = PrimaryReplicaRouter()
        self.assertEqual(router.db_for_read(User), 'default')

        _local.request = None  # cleanup

These unit tests verify that the router makes correct routing decisions based on request state. They do not test actual replication lag — for that, you need an integration test environment with a real primary and replica.

Unit tests for middleware

The middleware is where the pin logic lives, and it deserves its own test suite:

Middleware tests — verifying pin behavior
# tests/test_middleware.py

from django.test import TestCase, RequestFactory
from myapp.middleware import PrimaryPinMiddleware

class PrimaryPinMiddlewareTest(TestCase):

    def _make_middleware(self, response=None):
        if response is None:
            from django.http import HttpResponse
            response = HttpResponse('ok')
        return PrimaryPinMiddleware(lambda r: response)

    def test_post_sets_pin_cookie(self):
        factory = RequestFactory()
        request = factory.post('/update/')
        request.COOKIES = {}
        middleware = self._make_middleware()
        response = middleware(request)
        self.assertIn('db_pin_primary', response.cookies)

    def test_get_without_cookie_is_not_pinned(self):
        factory = RequestFactory()
        request = factory.get('/profile/')
        request.COOKIES = {}
        middleware = self._make_middleware()
        middleware(request)
        self.assertFalse(getattr(request, '_db_pin_primary', False))

    def test_get_with_valid_cookie_is_pinned(self):
        import time
        factory = RequestFactory()
        request = factory.get('/profile/')
        request.COOKIES = {
            'db_pin_primary': str(time.time() + 10)
        }
        middleware = self._make_middleware()
        middleware(request)
        self.assertTrue(request._db_pin_primary)

    def test_expired_cookie_is_not_pinned(self):
        import time
        factory = RequestFactory()
        request = factory.get('/profile/')
        request.COOKIES = {
            'db_pin_primary': str(time.time() - 10)
        }
        middleware = self._make_middleware()
        middleware(request)
        self.assertFalse(request._db_pin_primary)

These tests verify the four critical behaviors: POST sets the cookie, GET without cookie is not pinned, GET with a valid cookie is pinned, and GET with an expired cookie is not pinned. If any of these fail, your read-after-write fix is broken.

Integration tests with real replication

Unit tests verify logic. Integration tests verify behavior under real conditions. For replication routing, this means testing against an actual primary and replica with genuine network lag.

Integration test with Playwright — testing the full POST-redirect-GET cycle
# tests/test_read_after_write.py
# Integration test with Playwright — requires a real primary+replica

import re
from playwright.sync_api import sync_playwright

BASE_URL = 'http://localhost:8000'

def test_profile_update_shows_new_value():
    """POST-redirect-GET must show the updated value,
    not the stale replica value."""
    with sync_playwright() as p:
        browser = p.chromium.launch()
        page = browser.new_page()

        # Log in
        page.goto(f'{BASE_URL}/login/')
        page.fill('#id_username', 'testuser')
        page.fill('#id_password', 'testpass')
        page.click('button[type="submit"]')

        # Update display name to a unique value
        import uuid
        new_name = f'Test-{uuid.uuid4().hex[:8]}'

        page.goto(f'{BASE_URL}/profile/edit/')
        page.fill('#id_display_name', new_name)
        page.click('button[type="submit"]')

        # The redirect lands on the profile detail page.
        # The updated name MUST appear immediately.
        page.wait_for_url(re.compile(r'/profile/\d+/'))
        assert new_name in page.content(), (
            f'Expected "{new_name}" on page after POST-redirect-GET. '
            f'Got stale data from replica.'
        )

        browser.close()

# Run this 100 times in CI. If it fails once, your pinning is broken.

The critical detail in this test is running it many times. A read-after-write bug that manifests 5% of the time is still a bug — it just takes 20+ runs to catch it. In CI, run this test 100 times. If it fails once, your pin window is too short, your middleware is not loading in the correct order, or your replica is consistently slower than you assumed.

Three additional integration test scenarios worth covering in a staging environment with real replication:

  1. POST-redirect-GET flows: Submit a form, verify the redirect shows the updated data. The Playwright test above covers this. Run it against every critical form in your application.
  2. Pin expiration: Submit a form, wait longer than your pin window, then load the page. The data should still be correct — it just comes from the replica now that the lag has resolved. If it is stale, your replica is chronically behind.
  3. Replica failure: Take the replica offline and verify your application degrades gracefully. With REPLICATED_READ_ONLY_DOWNGRADE or equivalent, reads should fall back to the primary. Without it, you get a DatabaseError. Know which behavior you have before your users discover it for you.

Where a query-aware proxy handles this automatically

Every strategy above requires application-level code: middleware, decorators, router classes, cached lag queries, thread-local storage, context variables. This code works. It also means every Django application in your organization needs to implement the same patterns, and every new developer needs to understand the replication topology to avoid introducing stale-read bugs.

Gold Lapel operates at the PostgreSQL wire protocol level, between your application and the database. It sees every query, knows which are reads and which are writes, and can query pg_stat_replication for real-time lag data on every routing decision. When a connection sends a write followed by a read, Gold Lapel automatically routes the read to the primary — no middleware, no cookies, no pin windows. When lag drops below threshold, reads flow back to the replica automatically.

This is not a replacement for understanding replication. You should still know what pg_stat_replication tells you, what a healthy replay_lag looks like, and how to diagnose a replica that falls behind. But the per-query routing decisions — the thing that generates bugs at 2 AM when your replica hits a slow VACUUM — that can be handled at the infrastructure layer instead of the application layer.

Your database router goes from 50 lines of lag-aware, cache-backed, thread-local-dependent Python to zero lines. Django's DATABASE_ROUTERS setting points at Gold Lapel. The proxy handles the rest.

If you are running a connection pooler in front of your PostgreSQL instances — and you should be — Gold Lapel integrates with that topology as well. Pool connections on one side, primary and replica routing on the other, lag-aware decisions in between. The architecture is covered in depth in the query proxy overview.

A recommended path through the options

If you have read this far, you have encountered a great many options, and a great many trade-offs. Allow me to offer a recommendation, which you are of course free to disregard.

Start with primary pinning. Implement the cookie-based or session-based PrimaryPinMiddleware with a 5-second window. Use contextvars instead of thread-local for the request passing. Add the @use_primary decorator to any views that demand absolute freshness. This handles 95% of read-after-write bugs with minimal code and minimal performance impact.

Add monitoring immediately. Even before you fix the routing, start monitoring pg_stat_replication. You need to understand your baseline lag before you can set intelligent pin windows or lag thresholds. If your lag is consistently under 10ms, a 2-second pin window is more than sufficient. If your lag spikes to 5 seconds during nightly batch jobs, a 5-second window might not be enough — and you should also investigate why the batch jobs are causing that much lag.

Graduate to lag-aware routing when you need it. If your write traffic is high enough that the pin window wastes significant replica capacity, or if your lag is unpredictable enough that fixed windows are unreliable, implement the cached lag router. This typically becomes relevant at the point where you have multiple replicas and the cost of under-utilizing them justifies the additional routing complexity.

Consider a proxy when the application layer becomes unwieldy. When you have multiple Django services, multiple replicas, and the middleware/router code is duplicated across projects, the operational argument for moving routing to the infrastructure layer becomes compelling. This is not an argument against understanding the problem — quite the opposite. The teams that benefit most from proxy-level routing are the ones that already understand replication lag deeply enough to know where the application-level approach falls short.

Fix the routing. Measure the lag. Then let the proxy make the per-query decisions so your application code can focus on what it is actually good at: your product.

Frequently asked questions

Terms referenced in this article

If you'll permit me a related observation — the replication lag you are measuring may matter less if the read-heavy queries hitting your replicas were served from materialized views instead. I have written a chapter on materialized views in Django that addresses exactly this pattern: offloading expensive aggregations so replicas handle simpler, faster reads.