← PostgreSQL Internals & Maintenance

The State of PostgreSQL Performance, 2026: A Thorough Assessment

PostgreSQL has never been more popular, more capable, or more in need of the tooling it lacks.

The Waiter of Gold Lapel · Updated Mar 20, 2026 Published Mar 5, 2026 · 22 min read
A comprehensive infographic was planned. It is undergoing performance optimization.

Good evening. Shall we take stock of where things stand?

PostgreSQL enters 2026 as the dominant open-source database. This is no longer a trend to monitor — it is an established fact, confirmed by every credible survey in the field. The question that matters now is not whether organizations will use PostgreSQL. They will. The question is whether they will use it well.

On that question, the data is less encouraging.

I have spent considerable time examining query traffic in production PostgreSQL deployments — hundreds of them, across industries, team sizes, and cloud providers. The patterns are remarkably consistent. The same performance problems recur with a regularity that borders on the predictable. Missing indexes. N+1 query patterns generated by ORMs that should know better. Autovacuum configurations untouched since initial provisioning. Connection pools sized for worst-case imagination rather than measured reality.

These are not exotic failure modes. They are the default state of a PostgreSQL database that has been deployed competently but not optimized deliberately. And they represent, in aggregate, billions of dollars in unnecessary infrastructure spending across the industry.

This assessment examines four dimensions: where PostgreSQL stands in adoption, what the most common performance problems are and how to identify them, what the tooling ecosystem provides, and what it does not. I have aimed to be thorough rather than brief. The subject warrants it.

Where does PostgreSQL stand in 2026?

SourceMetricValueYear
Stack Overflow Developer SurveyMost-used database1st place (3rd consecutive year)2024
DB-Engines RankingDBMS of the Year4th time winning2023
Timescale State of PostgreSQLPrimary database72% of respondents2024
Percona Open Source SurveyDatabase adoption+8% YoY growth2024

The numbers tell a story that no longer requires interpretation. PostgreSQL is not merely popular — it has become the default. When a team begins a new project in 2026, the question is not "should we use PostgreSQL?" but "is there a specific reason not to?" And that question, for most applications, has no compelling answer.

The PostgreSQL-compatible ecosystem has expanded in ways that reinforce this dominance. Neon offers serverless Postgres with branch-based development workflows. AlloyDB, Google's Postgres-compatible offering, promises up to 100x faster analytical queries through a disaggregated storage architecture. Aurora PostgreSQL handles replication and storage scaling transparently. Crunchy Data and Tembo are building managed platforms that treat extensions as first-class features rather than afterthoughts.

Choosing PostgreSQL is no longer choosing a database. It is choosing an ecosystem — one where the wire protocol has become a de facto standard, where your investment in learning the query planner and extension system transfers across providers, and where the community's collective knowledge deepens with each passing year.

PostgreSQL 17, released in September 2024, continued the steady drumbeat of performance improvements that have characterized recent releases:

-- PostgreSQL 17 performance highlights (released Sept 2024):

-- 1. Incremental sort improvements
-- Better cost estimation for incremental sorts,
-- making composite index ordering more effective.

-- 2. Parallel VACUUM of indexes
-- VACUUM can now process multiple indexes in parallel,
-- significantly reducing vacuum time on wide tables.

-- 3. Improved JSON_TABLE support
-- SQL/JSON functions per SQL:2023 standard.
-- Better integration with planner for JSONB queries.

-- 4. Bulk loading with COPY
-- 2-3x faster COPY performance for large inserts.
-- Reduced WAL generation during bulk operations.

Each of these improvements matters. Parallel VACUUM alone can reduce maintenance windows on wide tables from hours to minutes. The COPY improvements make bulk data loading — ETL pipelines, data warehouse refreshes, batch imports — meaningfully faster. The JSON_TABLE support brings PostgreSQL further in line with SQL:2023, making it a genuine contender for workloads that previously required dedicated document stores.

And yet. Each release makes the engine faster. But the engine's speed is only one factor in real-world performance. The other factor — the quality of the queries sent to it — remains largely unchanged by database version upgrades. PostgreSQL 17 will execute a poorly designed query with admirable speed. It will not redesign the query.

I'm afraid this is what I keep finding

After examining query traffic across hundreds of PostgreSQL deployments, the same problems recur with remarkable consistency. I want to be specific about these, because specificity is what separates useful observation from hand-waving. These are not hypotheticals. They are what I encounter, repeatedly, in production systems operated by competent teams.

ProblemFrequencyImpactFix difficultyDetection
Missing indexesVery common10-1000x slower queriesEasy to fixEXPLAIN ANALYZE, pg_stat_user_tables
N+1 query patternsVery common50-500x more queriesMedium (code changes)Query log analysis, Bullet gem
Autovacuum falling behindCommon2-10x slower scansEasy (config change)n_dead_tup monitoring
Unoptimized aggregationsCommon100-400x latencyMedium (matview design)pg_stat_statements top queries
Connection exhaustionCommonApplication errorsEasy (add pooler)pg_stat_activity count
Table bloatModerate2-5x slower, disk pressureMedium (pg_repack)pgstattuple extension
Lock contentionModerateQuery pileups, outagesHard (architecture)pg_locks + pg_stat_activity

Allow me to attend to each of these in turn, because the devil — as is so often the case in database work — is in the specifics.

Missing indexes: the most consequential oversight in production

If I could fix only one problem across every PostgreSQL deployment I encounter, it would be this one. Missing indexes account for more wasted compute, more unnecessary infrastructure spending, and more degraded user experience than every other performance problem combined.

The mechanics are straightforward. Without an index on the filtered columns, PostgreSQL must read every row in the table to find the ones you want. This is a sequential scan — the database equivalent of reading every page in a book to find the chapter you need, rather than consulting the table of contents.

Missing index — before and after
-- The query:
SELECT * FROM orders
WHERE customer_id = 4217
  AND status = 'pending'
ORDER BY created_at DESC
LIMIT 20;

-- Without an index, PostgreSQL reads every row in the table.
-- EXPLAIN ANALYZE reveals:
--   Seq Scan on orders  (rows=12,400,000)
--   Filter: (customer_id = 4217 AND status = 'pending')
--   Rows Removed by Filter: 12,399,814
--   Execution Time: 847ms

-- The fix:
CREATE INDEX idx_orders_customer_status_created
ON orders (customer_id, status, created_at DESC);

-- After:
--   Index Scan using idx_orders_customer_status_created
--   Execution Time: 0.4ms

-- Same query. Same data. 2,117x faster.

The numbers are not exaggerated for effect. A table with 12.4 million rows, queried by a column that is not indexed, requires PostgreSQL to examine all 12.4 million rows and discard the ones that do not match. The database is doing precisely what it was asked to do. It was simply asked to do it in the most expensive way possible.

What makes this problem insidious is that it develops gradually. When the orders table had 10,000 rows, the sequential scan completed in 2ms. Nobody noticed. Nobody investigated. The table grew to 100,000 rows — 20ms, still fast enough. At 1 million rows, 80ms — starting to feel sluggish, but within acceptable bounds. At 12 million rows, 847ms — and now the on-call engineer's phone is ringing at 2 AM.

The fix, as you can see, is a single SQL statement. One line. No application code changes, no deployment, no downtime if you use CREATE INDEX CONCURRENTLY. The execution time drops from 847ms to 0.4ms. The same query, the same data, the same hardware — 2,117 times faster.

I should note the counterpoint: indexes are not free. Each index on a table adds write overhead, because every INSERT, UPDATE, and DELETE must maintain every index. A table with 15 indexes will have noticeably slower writes than one with 3. The discipline is adding the indexes that matter — the ones on columns that appear in WHERE clauses, JOIN conditions, and ORDER BY clauses of your most expensive queries — and not reflexively indexing every column. pg_stat_statements tells you which queries are expensive. EXPLAIN tells you whether they need indexes. The evidence should drive the decision.

N+1 queries: a matter requiring frank discussion with your ORM

The N+1 pattern is the second most common performance problem I encounter, and it is the one most likely to be generated by your ORM without your knowledge or consent.

The N+1 pattern
-- The N+1 pattern in action.
-- Your ORM runs this for a page listing 50 orders:

-- Query 1: fetch the orders
SELECT * FROM orders WHERE status = 'recent' LIMIT 50;

-- Then, for EACH of the 50 orders:
SELECT * FROM customers WHERE id = 42;
SELECT * FROM customers WHERE id = 17;
SELECT * FROM customers WHERE id = 42;  -- yes, again
SELECT * FROM customers WHERE id = 91;
-- ... 46 more individual queries

-- 51 queries to do the work of 1:
SELECT o.*, c.*
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'recent'
LIMIT 50;

-- The ORM generated correct SQL. It did not generate efficient SQL.
-- There is a difference.

I regret to inform you that this is not a failure of the ORM. It is the ORM doing exactly what it was designed to do — lazily loading associated records on access. The design decision is defensible in principle: load data only when it is needed, avoid fetching records that will never be used. In practice, it means your page that displays 50 orders with customer names generates 51 database round trips instead of 1.

The impact compounds in ways that are not immediately obvious. Each round trip has latency — on a cloud database, typically 1-3ms of network overhead alone, before PostgreSQL even begins processing. Fifty-one queries at 2ms of network latency each is 102ms of pure waiting. The database might execute each individual query in 0.1ms, but the application experiences 102ms of latency because it is waiting for 51 sequential network round trips.

Every major ORM provides a mechanism to solve this. Rails has includes and eager_load. Django has select_related and prefetch_related. SQLAlchemy has joinedload and selectinload. Prisma has include. The fix is usually a single line of code in the query that loads the parent records. The difficulty is knowing which queries exhibit the pattern — and that requires either instrumentation (Bullet for Rails, nplusone for Django) or analysis of query logs for repeated patterns.

I should be honest: some N+1 patterns are intentional and correct. If you load a list of 50 items and only 3 of them will ever have their association accessed (based on a conditional in the template), eager loading all 50 associations wastes work. The lazy approach loads only the 3 you need. The judgment call is whether the access pattern is sparse (lazy wins) or dense (eager wins). In my experience, the vast majority of N+1 patterns in production are the dense variety — the page displays every association, every time. For those, eager loading is strictly better.

Autovacuum: the household maintenance you have been neglecting

PostgreSQL uses a multiversion concurrency control (MVCC) model. When a row is updated or deleted, the old version is not removed immediately — it is marked as dead and retained until a VACUUM process reclaims the space. This is by design: other transactions may still need the old version for snapshot isolation.

Autovacuum is the background process responsible for cleaning up these dead tuples. When it falls behind — and it does, more often than most teams realize — the consequences accumulate silently.

Autovacuum monitoring and tuning
-- Check which tables have fallen behind on vacuuming:
SELECT
  schemaname || '.' || relname AS table,
  n_live_tup,
  n_dead_tup,
  round(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1)
    AS dead_pct,
  last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT 10;

-- If dead_pct exceeds 10-20%, autovacuum is falling behind.
-- The default threshold triggers at 20% dead tuples + 50 rows.
-- For large tables, lower the per-table threshold:

ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.01,    -- 1% instead of 20%
  autovacuum_analyze_scale_factor = 0.005   -- 0.5% instead of 10%
);

-- On a 10M-row table, this changes the trigger from
-- 2,000,050 dead tuples to 100,050. A considerable difference.

A table with 20% dead tuples is physically 20% larger than it needs to be. Sequential scans must read through dead rows and skip them. Index scans must consult the visibility map to determine which tuples are live. The overhead is not catastrophic — it is gradual. A query that ran in 10ms now runs in 14ms. Nobody notices. Six months later, the table has 35% dead tuples and the same query takes 22ms. The monitoring dashboard shows a slow upward trend. Someone suggests upgrading the instance.

The default autovacuum settings are conservative by design — they are calibrated for a general-purpose workload on modest hardware. For high-write tables in production, they are almost always insufficient. The default autovacuum_vacuum_scale_factor of 0.2 means autovacuum triggers when 20% of the table consists of dead tuples. On a 10-million-row table, that is 2 million dead rows before cleanup begins. By the time autovacuum runs, the bloat has already accumulated.

Lowering the scale factor to 0.01-0.05 for your highest-write tables is one of the highest-impact configuration changes available. It costs nothing. It runs more frequently but for shorter durations. And it prevents the gradual performance degradation that leads teams to believe their database is slowing down when, in fact, it is merely cluttered.

Table bloat: when the clutter becomes structural

If autovacuum has been neglected for long enough, or if the table experiences heavy UPDATE patterns that autovacuum cannot keep pace with, the result is table bloat — a condition where a significant portion of the table's physical storage is occupied by dead tuples that have never been reclaimed.

Measuring table bloat
-- Check table bloat (requires pgstattuple extension):
CREATE EXTENSION IF NOT EXISTS pgstattuple;

SELECT
  table_len,
  tuple_count,
  dead_tuple_count,
  dead_tuple_len,
  round(100.0 * dead_tuple_len / NULLIF(table_len, 0), 1)
    AS bloat_pct,
  round(dead_tuple_len / 1024.0 / 1024.0, 1)
    AS wasted_mb
FROM pgstattuple('orders');

-- bloat_pct of 30% means nearly a third of the table
-- is occupied by dead rows. PostgreSQL must skip over
-- them during sequential scans. The table is physically
-- larger than it needs to be, consuming disk and memory.

-- For severe bloat, pg_repack reclaims space online:
-- pg_repack --table orders --no-superuser-check -d mydb

Bloat at 30% is not uncommon in production tables with heavy write activity. I have encountered tables with 60% bloat — more than half the physical storage consumed by dead rows. The table was 40GB on disk. The live data occupied 16GB. Every sequential scan read 24GB of data it did not need.

Regular VACUUM can prevent bloat from accumulating, but it cannot reclaim space that has already been allocated to a table — it merely marks it as reusable. To actually shrink the physical file, you need either VACUUM FULL (which locks the table exclusively and rewrites it, making it unsuitable for production use) or pg_repack, which rebuilds the table online without exclusive locks.

The honest assessment: table bloat is a maintenance problem, not a design flaw. PostgreSQL's MVCC model is the correct engineering trade-off — it provides excellent read concurrency at the cost of requiring periodic cleanup. The issue is that "periodic cleanup" is not something most teams think about at provisioning time, and by the time they notice, the bloat has been accumulating for months.

Connection exhaustion: a pooling problem in capacity clothing

I encounter this conversation regularly. "We need a larger instance — we are running out of connections." Allow me to offer an alternative interpretation.

Connection state analysis
-- How many connections are actually doing work?
SELECT state, count(*)
FROM pg_stat_activity
WHERE backend_type = 'client backend'
GROUP BY state
ORDER BY count DESC;

--  state                | count
-- ----------------------+------
--  idle                 |   87
--  idle in transaction  |   14
--  active               |    3
--  idle in transaction (aborted) | 1

-- 105 connections. 3 doing work.
-- 14 holding transactions open while doing nothing —
-- blocking autovacuum and consuming memory.
-- This is not a capacity problem.

One hundred and five connections. Three actively executing queries. Eighty-seven sitting idle, holding memory and file descriptors while doing nothing. Fourteen holding transactions open — idle in transaction — which is worse than idle, because those connections prevent autovacuum from cleaning up the rows they can see.

Each PostgreSQL connection consumes approximately 5-10MB of RAM for the backend process, plus shared memory for session state. One hundred idle connections represent 500MB-1GB of RAM doing no useful work. On a db.r5.large with 16GB of RAM, that is 3-6% of total memory consumed by idle connections that could instead be allocated to shared_buffers, where it would actually accelerate queries.

The fix is a connection pooler — PgBouncer, pgcat, or Supavisor. A pooler allows hundreds of application connections to share a small number of database connections. In transaction mode, a database connection is assigned to an application connection only for the duration of a transaction, then returned to the pool. Twenty database connections can serve 500 application connections with headroom to spare, because at any given moment, the vast majority of those application connections are between transactions, not in them.

I should note the trade-off: transaction-mode pooling is incompatible with certain PostgreSQL features that depend on session state — prepared statements (unless the pooler handles them), advisory locks, SET commands, and temporary tables. PgBouncer in particular has historically not supported prepared statements in transaction mode, though pgcat and Supavisor handle them by rewriting named statements into protocol-level unnamed statements. Know what your application depends on before choosing a pooler and mode.

Unoptimized aggregations: computing the same answer hundreds of times

Dashboard queries are a particular source of unnecessary database work. A reporting query that joins three tables, aggregates over a date range, and groups by category might take 340ms to execute. If the dashboard refreshes every 15 seconds for 20 users, that query runs 4,800 times per hour. The database spends 27 minutes of every hour computing an answer that changes, at most, once per minute.

Materialized view for repeated aggregations
-- A dashboard runs this aggregation 400 times per hour:
SELECT
  p.category,
  COUNT(*) AS total_orders,
  SUM(o.amount) AS revenue,
  AVG(o.amount) AS avg_order
FROM orders o
JOIN products p ON p.id = o.product_id
WHERE o.created_at > NOW() - INTERVAL '30 days'
GROUP BY p.category;

-- Execution time: 340ms (sequential scan, hash join, sort)
-- Daily CPU cost: 340ms × 9,600 calls = 54 minutes of computation
-- For a result that changes meaningfully once per hour.

-- The materialized view:
CREATE MATERIALIZED VIEW mv_category_revenue_30d AS
SELECT
  p.category,
  COUNT(*) AS total_orders,
  SUM(o.amount) AS revenue,
  AVG(o.amount) AS avg_order
FROM orders o
JOIN products p ON p.id = o.product_id
WHERE o.created_at > NOW() - INTERVAL '30 days'
GROUP BY p.category;

CREATE UNIQUE INDEX ON mv_category_revenue_30d (category);

-- Now the dashboard reads pre-computed rows:
-- SELECT * FROM mv_category_revenue_30d;
-- Execution time: 0.2ms. The join never runs.

A materialized view computes the result once and stores it as a table. Subsequent reads hit the pre-computed result directly — no joins, no aggregation, no computation. The query drops from 340ms to 0.2ms. The database's hourly CPU consumption for this query drops from 27 minutes to a fraction of a second.

The trade-off is staleness. A materialized view is a snapshot — it reflects the state of the underlying data at the time it was last refreshed. If you refresh every 5 minutes, your dashboard data can be up to 5 minutes stale. For a monthly revenue report, this is irrelevant. For a real-time trading dashboard, it is unacceptable. The question to ask is: how fresh does this data genuinely need to be? Not "how fresh would we like it to be" — how fresh does it actually need to be for the business decision it supports?

In my experience, the answer is "much less fresh than the team assumes." A dashboard that refreshes every 15 seconds, displaying data that is meaningful at the hourly or daily level, is performing theatrical freshness — the appearance of real-time without the substance. Refreshing the materialized view every 5 minutes, or even every hour, would serve the actual business need while reducing the database work by three orders of magnitude.

REFRESH MATERIALIZED VIEW CONCURRENTLY allows the refresh to happen without locking out readers, provided the view has a unique index. During the refresh, existing queries continue to see the previous version. When the refresh completes, new queries see the updated version. There is no downtime, no inconsistency, no moment where the dashboard shows "loading."

"Materialized views that transform seven-second queries into seven-millisecond reads. Built-in pub/sub that replaces Redis for event-driven architectures. Connection pooling that makes serverless PostgreSQL viable at scale."

— from You Don't Need Redis, Chapter 1: Good Evening. We Have a Problem.

Lock contention: the quiet crisis

Lock contention is less common than the problems above, but its impact is disproportionately severe. When it occurs, it does not degrade gracefully — queries pile up behind the lock, each one waiting for the previous one to finish, and the pile grows faster than it drains. The result is a sudden, sharp performance collapse rather than a gradual slowdown.

Identifying lock contention
-- Identify queries waiting on locks:
SELECT
  blocked.pid AS blocked_pid,
  blocked_activity.query AS blocked_query,
  blocking.pid AS blocking_pid,
  blocking_activity.query AS blocking_query,
  now() - blocked_activity.query_start AS waiting_duration
FROM pg_catalog.pg_locks blocked
JOIN pg_catalog.pg_stat_activity blocked_activity
  ON blocked_activity.pid = blocked.pid
JOIN pg_catalog.pg_locks blocking
  ON blocking.locktype = blocked.locktype
  AND blocking.database IS NOT DISTINCT FROM blocked.database
  AND blocking.relation IS NOT DISTINCT FROM blocked.relation
  AND blocking.pid != blocked.pid
JOIN pg_catalog.pg_stat_activity blocking_activity
  ON blocking_activity.pid = blocking.pid
WHERE NOT blocked.granted;

The most common sources of lock contention in PostgreSQL are long-running transactions that hold row-level locks (preventing other transactions from updating the same rows), DDL operations that acquire ACCESS EXCLUSIVE locks (blocking all access to the table), and idle in transaction sessions that hold locks indefinitely without releasing them.

The fix depends on the cause. Long-running transactions need to be shortened — either by reducing the work done within the transaction or by breaking it into smaller transactions. DDL operations on busy tables should use CONCURRENTLY variants where available (CREATE INDEX CONCURRENTLY, DROP INDEX CONCURRENTLY). Idle-in-transaction sessions need application-level fixes — connection pool timeouts, explicit transaction management, or middleware that aborts transactions that have been idle for too long.

I classify this as "hard (architecture)" in the difficulty column because the fix often requires rethinking how the application interacts with the database at a design level, not merely adding an index or adjusting a configuration value. The good news is that it is also the least common of the major performance problems. The bad news is that when it occurs, it tends to present as an outage rather than a performance complaint.

Finding what is actually expensive: the starting point for everything

Every optimization begins with the same question: what is the database spending its time on? Not what feels slow. Not what the loudest engineer complains about. What the numbers say.

-- Enable pg_stat_statements (if not already):
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find the queries consuming the most total database time:
SELECT
  calls,
  round(total_exec_time::numeric, 1) AS total_ms,
  round(mean_exec_time::numeric, 1) AS mean_ms,
  round((100 * total_exec_time /
    sum(total_exec_time) OVER ())::numeric, 1) AS pct,
  substr(query, 1, 80) AS query_preview
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

The output reveals the truth plainly:

pg_stat_statements — top queries by total time
calls  | total_ms    | mean_ms | pct  | query_preview
-------+-------------+---------+------+----------------------------------------------
48291  | 19,903,892  | 412.1   | 41.2 | SELECT u.name, COUNT(o.id), SUM(o.total) ...
112040 |  8,963,200  |  80.0   | 18.6 | SELECT * FROM products WHERE category_id ...
 9120  |  5,107,200  | 560.0   | 10.6 | SELECT d.name, SUM(s.amount) FROM departm...
31200  |  3,744,000  | 120.0   |  7.8 | SELECT id, title, body FROM posts WHERE au...
 6480  |  2,332,800  | 360.0   |  4.8 | SELECT c.name, COUNT(p.id) FROM categories...

That first row — 48,291 calls at 412ms each — has consumed 19.9 million milliseconds. That is 5.5 hours of cumulative CPU time spent answering one question. It accounts for 41.2% of all database work. If you could eliminate only that query's cost, the database would be 41% less busy. If you could optimize the top three queries, you would recover 70% of the database's capacity.

This is the number that monitoring dashboards do not show you. CloudWatch reports CPU utilization at 82%, painted in alarming orange. It does not tell you that a single unindexed query is responsible for most of that utilization. The dashboard presents the problem in hardware terms — CPU, memory, IOPS — so the solution naturally follows in hardware terms: a bigger instance, more cores, more RAM.

But the problem was never hardware. It was a missing index. A $0 fix masquerading as a $700/month problem.

What does the tooling landscape look like?

PostgreSQL's tooling ecosystem is extensive — and categorically incomplete. There is no shortage of excellent tools. There is a shortage of tools that close the loop.

CategoryToolsWhat's missing
Monitoringpganalyze, Datadog, pgwatch2, pg_stat_monitorTells you what is slow, not how to fix it
Connection poolingPgBouncer, pgcat, SupavisorManages connections, does not touch queries
Query analysispg_stat_statements, auto_explain, pgBadgerSurfaces data, requires human interpretation
Index adviceDexter, pg_qualstats, HypoPGSuggests indexes, does not create or manage them
Schema managementpgloader, Flyway, AlembicManages schema, not runtime performance
Optimization proxyGold LapelCloses the loop: detect → create → rewrite → maintain

Allow me to be specific about what "close the loop" means, because the phrase risks becoming vague.

pganalyze is excellent monitoring software. It collects pg_stat_statements data, provides historical query performance trends, identifies index recommendations, and alerts on anomalies. It tells you, with clarity and precision, that a particular query has degraded by 300% over the past week and that an index on orders.customer_id would likely resolve it. What it does not do is create that index. That step requires a human — a human who reads the alert, investigates the recommendation, writes the CREATE INDEX statement, tests it in staging, deploys it to production, and verifies the improvement. The cycle takes hours to days.

Dexter goes a step further — it analyzes query logs and suggests specific indexes, using HypoPG to simulate their impact without actually creating them. This is a genuine improvement over manual EXPLAIN analysis. But it still stops at suggestion. The index exists in hypothesis, not in the database.

PgBouncer solves connection pooling definitively — it has done so for over a decade. But connection pooling and query optimization are orthogonal concerns. A perfectly pooled connection that sends a sequential scan on a 12-million-row table is efficiently delivering a slow query.

The pattern is consistent across every category: each tool does its specific job well and stops there. The gap is not in any individual tool. It is in the space between them — the orchestration layer that takes the monitoring tool's observation, the index advisor's recommendation, the pooler's connection management, and the query planner's execution, and integrates them into a continuous, self-maintaining optimization cycle.

This creates a human bottleneck. A pganalyze alert fires. A developer investigates. They run EXPLAIN ANALYZE, identify the missing index, create it, test it, deploy it. The cycle takes hours to days. Meanwhile, the slow query continues running thousands of times, consuming CPU and degrading user experience. By the time the fix is deployed, the query has executed another 100,000 times unnecessarily.

What does under-optimization actually cost?

-- The average cost of not optimizing:

-- A typical SaaS application (50M rows, 200 qps):
-- Top 10 queries by total_exec_time account for 75% of CPU
-- 3 of those queries are missing indexes
-- 2 are N+1 patterns from ORM lazy loading
-- 1 is a repeated aggregation that could be materialized

-- Fixing these 6 query patterns typically:
-- Reduces database CPU by 60-80%
-- Drops P95 latency from 200-500ms to 10-50ms
-- Eliminates the need for read replicas
-- Allows downgrading the instance by 1-2 sizes
-- Saves $20K-50K/year in infrastructure costs

The cost is not just latency, though latency is the most visible symptom. The deeper cost is the infrastructure cascade that slow queries trigger.

A dashboard query takes 400ms. The team upgrades the database instance — $700/month more. Queries are still slow at peak traffic, because the instance upgrade improved performance by 30% but the query still performs a sequential scan. The team adds a read replica — another $730/month. Cache misses spike latency on certain endpoints. The team adds Redis — $410/month. The replica introduces replication lag, causing stale reads. The team upgrades the replica. Six months later, the infrastructure bill has grown by $2,900/month — $35,000 annually — and the original query still performs a sequential scan on a table that lacks an index.

Each compensating mechanism adds not just cost but operational complexity. The read replica requires application-level routing logic — reads go to the replica, writes go to the primary. The Redis cache requires cache invalidation logic, which is one of the two hard problems in computer science (the other being naming things and off-by-one errors). The connection pooler requires configuration tuning. Each layer is a new thing that can break at 3 AM.

The irony is that the fixes are often trivial. A CREATE INDEX statement. An eager_load call in the ORM. An autovacuum threshold adjustment. The individual fixes are minutes of work. The difficulty is knowing which fixes to apply and when — which requires either deep PostgreSQL expertise on staff or tooling that provides it automatically.

I have seen teams optimize their top 5 queries and subsequently downgrade their database instance, remove the read replica, and decommission the Redis cache. Their infrastructure bill dropped by $2,400/month. Their P95 latency improved. The system became simpler, cheaper, and faster simultaneously. This is not a common outcome in engineering. It is worth pursuing.

The honest counterpoints: where this assessment falls short

I would be a poor assessor if I did not acknowledge the boundaries of this analysis. Several counterpoints deserve honest treatment.

PostgreSQL is not the right tool for every workload. If you need sub-millisecond lookups on a hot key-value path serving 50,000 concurrent requests, Redis is the correct choice. If you need full-text search with faceted navigation, relevance scoring, and highlighting across a corpus of millions of documents, Elasticsearch or Meilisearch will serve you better than PostgreSQL's tsvector. If you need time-series data at millions of inserts per second with automatic retention policies, TimescaleDB (a PostgreSQL extension, to be fair) or ClickHouse is purpose-built for the job. PostgreSQL is an extraordinary general-purpose database, but "general-purpose" means "good at many things," not "best at everything."

Not every team has the bandwidth for optimization. A three-person startup shipping their first product should not spend a week learning the query planner. They should ship, grow, and optimize when the monitoring tells them to. The advice to "optimize before scaling" is correct for teams that have the scale to justify the investment. For a team serving 100 requests per second on a $200/month instance, the ROI of deep optimization work may not clear the threshold. Scale up, ship features, and optimize when the bill reaches a number that motivates it.

The tooling gap is narrowing. I have described a landscape where tools observe but do not act. This is changing. pganalyze's index recommendations are increasingly actionable. Cloud providers are adding intelligent query insights to their managed PostgreSQL offerings. The AI-assisted coding movement has made it easier for developers without deep database expertise to interpret EXPLAIN output and apply fixes. The gap is real, but it is not static — the ecosystem is improving, and some of the problems I have described may be substantially mitigated within a few years by better default tooling.

Some performance problems are genuinely hard. I have focused on the common, fixable problems — missing indexes, N+1 patterns, autovacuum tuning — because they represent the majority of performance issues and have clear solutions. But some production performance problems are genuinely complex: query plans that change unpredictably due to statistics drift, partition pruning that interacts poorly with parameterized queries, lock contention in high-concurrency write workloads. These problems do not have single-line fixes. They require sustained expertise, and no assessment of "the state of performance" should imply they are trivial.

What the ecosystem needs next

The missing piece is not another monitoring tool. It is not another dashboard. It is not another extension that surfaces data for a human to interpret and act upon. The PostgreSQL ecosystem has monitoring. It has data. It has, if anything, a surplus of visibility into what is happening.

What it lacks is a closed-loop system that observes, decides, and acts — the way autovacuum handles dead tuples, but for query optimization. Autovacuum is the existence proof that continuous, automatic maintenance of a PostgreSQL database is not merely possible but essential. No one manually runs VACUUM on production tables in 2026. The database handles it. Query optimization deserves the same treatment.

The requirements for this next layer are specific:

  • Continuous observation. Not periodic snapshots — real-time analysis of actual query traffic as it flows through the system. The expensive queries are not the ones you anticipated when designing the schema. They are the ones your ORM generated for a feature that was added six months ago and has grown from 1,000 daily users to 50,000.
  • Automatic action. When the evidence is clear — a query pattern that would benefit from an index, an aggregation that runs 500 times per hour with the same result — act without waiting for a human to read an alert, investigate, and deploy. The gap between detection and action is where the cost accumulates.
  • Self-maintenance. Remove optimizations that are no longer needed. An index created for a query pattern that no longer exists consumes write overhead without providing read benefit. A materialized view for a dashboard that was decommissioned three months ago is refreshing on schedule, consuming CPU for no purpose. The optimization layer must prune as well as create.
  • Transparency. Work with any application, any ORM, any framework. No query changes — install, import, connect. No application-level instrumentation, no vendor lock-in. The optimization layer must operate at the wire protocol level — between the application and the database, invisible to both.

This is the problem Gold Lapel was designed to solve. Not monitoring — the ecosystem has monitoring. Not advice — the ecosystem has advisors. The missing link: a system that takes the advice and implements it, continuously, safely, and automatically. A transparent Rust proxy that sits between your application and PostgreSQL, observes query patterns as they happen in production, creates the appropriate indexes and materialized views, and rewrites queries to use them — without code changes, without a DBA on staff, without anyone remembering to run EXPLAIN ANALYZE on a quarterly basis.

I mention this not as a sales pitch — a waiter who overstates his case is no waiter at all — but as an honest description of the problem and the architecture we believe solves it. The reader should leave this assessment with a clear picture of the landscape, the common problems, and the detection methods, regardless of whether they ever use Gold Lapel. The knowledge is the point. The product is one way to apply it.

A closing observation

PostgreSQL in 2026 is remarkable software. Version 17 is faster, more capable, and more standards-compliant than any prior release. The ecosystem of compatible services, extensions, and tooling is broader than it has ever been. The community is vibrant, the development pace is steady, and the technical foundations are sound.

And yet the most common performance problems — missing indexes, N+1 patterns, stale autovacuum settings, bloated tables, idle connections — are the same problems they were a decade ago. The database has improved enormously. The way most applications use it has not.

This is not a criticism of the teams building on PostgreSQL. It is an observation about the ecosystem's tooling. The database is excellent. The monitoring is excellent. The advisory tools are excellent. What is missing is the layer that translates observation into action — the layer between knowing what is wrong and fixing it.

The gap is not in the database. It is in the space between the database and the applications that depend on it. Closing that gap — automatically, transparently, continuously — is the most impactful thing the PostgreSQL ecosystem can do next.

In the meantime, enable pg_stat_statements. Run EXPLAIN ANALYZE on your top five queries. Check your autovacuum dead tuple counts. Look at your connection states. The problems are findable. The fixes are often simple. And the impact — on latency, on infrastructure cost, on the quality of the experience your users receive — is considerable.

The household is in good order. It could use a bit of attention to the details.

Frequently asked questions

Terms referenced in this article

Much of what this assessment describes — the monitoring gap, the optimization gap, the tooling gap — is the subject of a longer work. I have written a book chapter on PostgreSQL performance optimization that addresses the gap not as a market observation but as a practical guide to closing it yourself.