← How-To

Neon PostgreSQL Performance: Optimizing for Serverless

Neon re-architects PostgreSQL's storage layer. That changes the optimization playbook entirely.

The Butler of Gold Lapel · March 29, 2026 · 22 min read
The illustrator is conducting a dress rehearsal of their own. We await the final performance.

Neon's architecture — what makes it different

Neon is not a hosted PostgreSQL instance. It is a genuinely innovative re-architecture of PostgreSQL's storage layer that separates compute and storage into independent services, and this distinction drives every performance characteristic discussed in this guide.

Storage: the Pageserver. Traditional PostgreSQL reads and writes data to local disk. Neon replaces this with a custom storage engine called the Pageserver. Data lives in a distributed, shared storage layer — not on the compute node's disk. When PostgreSQL needs a page, it fetches it from the Pageserver over the network.

Compute: standard PostgreSQL. The compute layer runs a standard PostgreSQL process. It executes queries, manages connections, and handles transactions exactly as PostgreSQL does anywhere else. The difference is that every page read is a network call to the Pageserver rather than a local disk read.

This architecture enables capabilities that traditional PostgreSQL deployments cannot offer:

  • Auto-suspend: when no queries arrive for a configurable period, Neon shuts down the compute entirely. You pay nothing for idle time.
  • Auto-scaling: compute resources scale up and down based on query load.
  • Instant branching: creating a full copy of your database takes seconds regardless of size — it is a metadata operation on the shared storage layer.
  • Point-in-time restore: the Pageserver maintains a history of page versions.

But the architecture introduces trade-offs that I should be direct about:

  • Every page read has network latency. Local NVMe reads take microseconds; Pageserver fetches take single-digit milliseconds.
  • Cold starts exist. When auto-suspend shuts down the compute, the next query must wait for a new compute to provision.
  • Some PostgreSQL features behave differently when the storage layer is remote and the compute is ephemeral.

Understanding these trade-offs is prerequisite to optimizing for them. Generic PostgreSQL tuning advice — written for instances with local storage — often does not apply to Neon.

Cold start optimization

What causes cold starts

When auto-suspend is enabled and no queries arrive within the configured timeout, Neon shuts down the compute instance. The next query triggers a cold start: Neon provisions a new compute, starts PostgreSQL, connects it to the Pageserver, and loads shared state. Typical cold start times range from 500ms to 2 seconds.

Reducing cold start impact

Increase the auto-suspend timeout. The suspend_timeout_seconds setting controls how long Neon waits after the last query before suspending. For production workloads with regular traffic, 300-600 seconds covers most gaps between request bursts. For latency-sensitive applications, disable auto-suspend entirely.

Use Neon's serverless driver for edge and serverless functions. The @neondatabase/serverless driver communicates over HTTP or WebSocket rather than traditional TCP, eliminating TCP connection overhead. See the Neon serverless HTTP vs WebSocket comparison.

Minimize preloaded extensions. Every extension in shared_preload_libraries adds startup time.

Use Neon's built-in connection pooler. The PgBouncer-based pooler keeps connections warm and ready.

When cold starts do not matter

Not every workload needs cold start optimization:

  • Batch jobs and cron tasks. A 1-second cold start is irrelevant for a 5-minute job.
  • Development and staging environments. Auto-suspend saves significant cost for databases idle for hours between sessions.
  • Low-traffic applications. An internal tool with a few requests per hour benefits more from cost savings than it suffers from occasional cold starts.

Connection pooling on Neon

Neon's built-in pooler

Neon provides a PgBouncer-based connection pooler in transaction pooling mode: a backend connection is assigned to a client only for the duration of a transaction. This has a critical implication: prepared statements do not persist across transactions. ORMs need configuration adjustments.

For a comprehensive treatment of connection pooling modes, see the PostgreSQL connection pooling guide.

Configuring your application for the pooled endpoint

Prisma:

Prisma connection string
# In your connection string
DATABASE_URL="postgresql://user:pass@ep-cool-name-pooler.region.aws.neon.tech/dbname?pgbouncer=true&sslmode=require"

Rails / ActiveRecord:

Rails database.yml
# config/database.yml
production:
  adapter: postgresql
  url: <%= ENV['DATABASE_URL'] %>
  prepared_statements: false

Django:

Django settings.py
# settings.py
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'OPTIONS': {
            'options': '-c statement_timeout=30000',
        },
        'DISABLE_SERVER_SIDE_CURSORS': True,
    }
}

Node.js (node-postgres / pg):

Node.js pg pool
const { Pool } = require('pg');

const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  // Avoid named prepared statements with the pooled endpoint
  // Use the query method with unnamed statements
});

// This works with transaction pooling (unnamed prepared statement):
await pool.query('SELECT * FROM users WHERE id = $1', [userId]);

// This may fail (named prepared statement persists in session):
// const prepared = { name: 'get-user', text: 'SELECT ...', values: [...] };

For serverless deployments on Vercel, Cloudflare Workers, or AWS Lambda, the pooled endpoint is essential.

When to use direct (non-pooled) connections

The pooled endpoint is not appropriate for: migrations that use advisory locks or temporary tables, administrative sessions, long-running operations (VACUUM, bulk loads), and interactive psql sessions. Use the direct connection string for these.

Auto-suspend and auto-scaling

Tuning auto-suspend

Lower timeout equals lower cost but more cold starts. Higher timeout equals fewer cold starts but higher cost.

WorkloadTimeoutRationale
Production (consistent traffic)DisabledCannot tolerate cold starts; traffic rarely gaps long enough to benefit
Production (bursty traffic)300-600 secondsCovers gaps between request bursts
Staging300 seconds (default)Idle most of the time; cold starts are a non-issue
Development60-300 secondsIdle for hours between sessions; maximize cost savings
CI/CD branches60 secondsUsed briefly during test runs; should suspend immediately after

Auto-scaling compute

Neon dynamically scales compute size based on query load. Configuration guidance:

  • Minimum CU: set to match your baseline workload. Setting it too low means every traffic spike requires a scale-up, adding latency.
  • Maximum CU: set based on peak expected load and budget. If traffic spikes can be 4x normal, set the max to 4 CU.
  • Monitor actual usage: the Neon dashboard shows CU consumption over time.

Branching — Neon's differentiator for development

What branching is

Neon can create an instant, full copy of your database — all tables, all data, all indexes — in seconds, regardless of database size. The mechanism is copy-on-write at the storage layer. No data is physically copied at creation time; only changed pages are stored independently as the branch diverges.

A branch of a 500 GB production database is created in the same time as a branch of a 5 MB development database: seconds.

Performance considerations for branches

  • Reads from unchanged pages are served from the parent's data — identical performance.
  • Modified pages are stored independently. Storage footprint grows proportionally to changes.
  • Branch computes are independent with their own cold start behavior and scaling settings.
  • Orphaned branches accumulate cost. Delete branches for merged PRs and completed tests.

Branch-per-PR workflow

The most impactful use of Neon branching:

  1. Create a branch from production when a PR is opened. Realistic test conditions.
  2. Run migrations against the branch. Discover timing issues before production.
  3. Connect to a preview deployment. Vercel, Netlify, and similar platforms support per-PR preview deployments.
  4. Delete the branch when the PR is merged.

This workflow is, if you'll permit the conviction, one of Neon's strongest practical advantages over traditional PostgreSQL hosting. Automate it with Neon's API and GitHub integration.

Extensions and PostgreSQL feature compatibility

Neon supports a broad subset of PostgreSQL extensions:

ExtensionPurposeNotes
pg_stat_statementsQuery performance statisticsStats reset on compute restart (auto-suspend cycle)
pgvectorVector similarity searchFull support including HNSW and IVFFlat indexes
PostGISGeospatial queriesAvailable on all plans
pg_trgmTrigram text similarityUseful for fuzzy search
uuid-osspUUID generationAlso consider gen_random_uuid() (built-in since PG 13)

Key limitations:

  • pg_stat_statements resets on compute restart. Snapshot to a table on a schedule. See the pg_stat_statements guide for snapshot strategies.
  • Custom background workers are not supported.
  • pg_cron availability varies by plan. Use external schedulers where unavailable.

Always check Neon's current extension compatibility list for your PostgreSQL version and plan tier.

Cost optimization

Understanding Neon's billing model

Neon bills on three axes: compute time (CU-hours), storage (GB-month), and data transfer (egress). The billing model rewards both reducing active compute time and reducing per-query resource consumption.

Practical cost reduction

  • Right-size the auto-suspend timeout. This is the single largest cost lever.
  • Right-size compute. Start with the minimum CU that handles baseline load. Use auto-scaling for spikes.
  • Use connection pooling. Fewer backend connections means lower memory pressure, which means a smaller compute size.
  • Delete inactive branches. Automate cleanup in your CI/CD pipeline.
  • Optimize queries. On Neon, query optimization has a direct billing impact — faster queries reduce both latency and cost simultaneously. Use pg_stat_statements (snapshotted before suspension) and EXPLAIN ANALYZE to identify and fix expensive queries.

Honest counterpoint — when Neon is not the right choice

I should be forthcoming. Neon's compute-storage separation is its greatest strength and its most significant trade-off. Every page read crosses the network to the Pageserver.

  • Heavy sequential scans on large tables. A full table scan is faster on local NVMe.
  • Large sort operations that spill to disk.
  • Bulk data processing — ETL, data warehouse queries, large aggregations.
  • Extension restrictions. If your workload requires custom background workers or unsupported extensions.
  • Consistent high-throughput workloads. If your database runs at high CPU 24/7, reserved instances on RDS may offer better price-performance.
  • Deep operational control. Kernel parameters, WAL configuration, storage layout.

Neon excels at variable workloads with idle periods, development and preview workflows using branching, serverless application backends, and teams that value operational simplicity. For a broader framework on choosing the right scaling strategy, see the scaling decision framework.

Frequently asked questions