← How-To

PostgreSQL on AWS RDS: A Proper Optimization Guide

Managed does not mean optimized. Allow me to attend to the configuration.

The Butler of Gold Lapel · March 29, 2026 · 22 min read
The illustrator is configuring a custom parameter group. We await the reboot.

RDS PostgreSQL is not self-hosted PostgreSQL

Good evening. I see you have arrived with an RDS instance. Allow me to take your coat and have a look at the configuration.

AWS RDS gives you a managed PostgreSQL instance — and "managed" is doing real work in that sentence. AWS handles patching, automated backups, failover orchestration, replication provisioning, and storage management. These are genuine operational burdens that RDS removes, and they are the reason most teams choose RDS over self-hosted PostgreSQL on EC2. A sound decision.

But "managed" does not mean "optimized." RDS does not attend to query performance, parameter tuning, connection management, index strategy, or schema design. These remain your responsibility, and they are the factors that determine whether your RDS PostgreSQL instance hums along quietly or struggles under load.

I should note that RDS is not simply self-hosted PostgreSQL with a nicer dashboard. The configuration surface is different. Parameter groups replace postgresql.conf. Some settings are locked by AWS and cannot be changed. Others use formulas based on instance class instead of absolute values. Understanding what you can and cannot change is the prerequisite to optimizing anything.

This guide covers the RDS-specific settings and features that have the largest impact on PostgreSQL performance — parameter groups, connection management, storage configuration, replication, and monitoring. For platform-agnostic PostgreSQL tuning, see the PostgreSQL performance tuning guide.

Parameter groups — where the real work begins

How parameter groups work

RDS uses parameter groups instead of direct postgresql.conf access. A parameter group is a named collection of PostgreSQL configuration settings that you attach to an RDS instance.

Every RDS instance starts with the default parameter group, which is read-only. You cannot modify the default parameter group. To change any setting, you must:

  1. Create a custom parameter group (based on the appropriate PostgreSQL version family, e.g., postgres16).
  2. Modify the settings you want to change.
  3. Associate the custom parameter group with your RDS instance.
  4. Reboot the instance (for static parameters) or wait for the apply cycle (for dynamic parameters).

Parameters are classified as either static (require a reboot to take effect) or dynamic (take effect immediately or after the next transaction). The RDS console and CLI indicate which type each parameter is.

If you'll permit me a brief caution: a common early misstep is modifying what you believe is the parameter group, discovering it is the default (read-only), creating a custom one, and then forgetting to associate it with the instance. Always verify the active parameter group after making changes:

Verify parameter group settings
SHOW shared_buffers;
SHOW work_mem;
SHOW random_page_cost;

If the values do not match your custom parameter group, the association or reboot step was missed.

shared_buffers

shared_buffers is PostgreSQL's main memory cache for table and index data. Pages read from disk are cached here, and subsequent reads of the same page are served from memory.

RDS default: {DBInstanceClassMemory/32768} — approximately 25% of instance memory. For a db.r6g.large with 16 GB RAM, this is roughly 4 GB.

Recommendation: The RDS default formula is well-calibrated for most workloads — and I am pleased to report that this is one of the rare cases where the defaults deserve no criticism. Unlike self-hosted PostgreSQL, where 25% is the community-standard starting point, RDS already applies this formula. Adjusting shared_buffers on RDS is rarely necessary.

When to consider changes: If your working set (the data pages actively used by your queries) is significantly smaller or larger than 25% of instance memory. A database with a 500 MB working set on a 64 GB instance wastes memory in shared_buffers that the OS page cache could use more effectively. Conversely, a database where the working set is 60% of instance memory may benefit from a modest increase.

Verification:

Check buffer cache hit ratio
-- Check buffer cache hit ratio
SELECT
  sum(blks_hit) AS hits,
  sum(blks_read) AS reads,
  round(sum(blks_hit)::numeric / nullif(sum(blks_hit) + sum(blks_read), 0), 4) AS hit_ratio
FROM pg_stat_database
WHERE datname = current_database();

A hit ratio above 0.99 (99%) indicates shared_buffers is adequately sized. Below 0.95 (95%) suggests either shared_buffers is too small or the working set exceeds available memory regardless of configuration.

work_mem

work_mem controls the memory available for internal sort operations (ORDER BY, DISTINCT, merge joins) and hash tables (hash joins, hash aggregations) before PostgreSQL spills to disk.

RDS default: 4 MB — conservative and appropriate as a global default.

The detail that warrants your attention: work_mem is allocated per operation, not per connection. A single complex query with three sort operations and two hash joins can allocate work_mem five times. A global setting of 256 MB with 200 connections creates a theoretical maximum allocation of 256 MB x 5 operations x 200 connections = 256 GB — far exceeding any instance's memory. I have seen this arithmetic go unnoticed more often than I would like.

Recommended approach:

Conservative global default with targeted overrides
-- Keep the global default conservative
-- (set via parameter group: 4MB-8MB)

-- For specific reporting or analytical queries that need more:
SET LOCAL work_mem = '64MB';
SELECT ... ORDER BY ... GROUP BY ...;
-- SET LOCAL automatically resets at the end of the transaction

This pattern keeps the household in order — a conservative global setting that protects the infrastructure, with targeted allowances for the queries that genuinely need more room. The SET LOCAL scope ensures the elevated setting does not persist beyond the transaction.

Detecting work_mem exhaustion:

Detect disk spills
EXPLAIN (ANALYZE, BUFFERS)
SELECT ... ORDER BY large_column ...;

Look for Sort Method: external merge Disk: in the output. This means PostgreSQL exhausted work_mem and spilled the sort to disk — significantly slower than an in-memory sort. If the spill is modest (a few MB), increasing work_mem for that query will eliminate it. For guidance on reading EXPLAIN output, see the EXPLAIN ANALYZE guide.

effective_cache_size

effective_cache_size is not a memory allocation. It is an estimate that tells the query planner how much memory is available for caching data — combining both shared_buffers and the OS page cache.

RDS default: {DBInstanceClassMemory/16384} — approximately 50% of instance memory.

Recommendation: Set to approximately 75% of instance memory. PostgreSQL can cache data in both shared_buffers (managed by PostgreSQL) and the OS page cache (managed by the Linux kernel). On a 16 GB instance, a reasonable setting is 12 GB.

Why this matters: If effective_cache_size is set too low, the planner assumes most data reads will hit disk and favors sequential scans. If set appropriately, the planner recognizes that random page reads are likely cached in memory and favors index scans — which are faster when the data is in cache.

Getting effective_cache_size wrong does not waste memory or cause errors. It produces suboptimal query plans. This makes it one of those rare adjustments that is low-risk and potentially high-impact — the sort of change I am always glad to recommend.

Configure effective_cache_size
-- Check current setting
SHOW effective_cache_size;

-- Set via parameter group (value in 8KB pages)
-- For 12 GB: 12 * 1024 * 1024 / 8 = 1572864
-- Or use the formula: {DBInstanceClassMemory * 3/4 / 8192}

maintenance_work_mem

maintenance_work_mem is the memory allocated for maintenance operations: VACUUM, CREATE INDEX, ALTER TABLE ADD FOREIGN KEY, and similar DDL commands.

RDS default: Scales with instance class but is often conservative — typically 64 MB to 256 MB depending on the instance size.

Recommendation: Raise to 256 MB-1 GB on instances with sufficient memory. This directly speeds up VACUUM operations and index creation.

Impact of maintenance_work_mem on index creation
-- Check current setting
SHOW maintenance_work_mem;

-- The impact is visible during index creation
CREATE INDEX CONCURRENTLY idx_large_table_col ON large_table (column);
-- With 64MB maintenance_work_mem: 45 seconds
-- With 512MB maintenance_work_mem: 12 seconds (typical improvement for large tables)

The risk of over-allocation is lower than with work_mem because only one maintenance operation uses this allocation at a time per autovacuum worker. Even with autovacuum_max_workers = 3 (the default), the maximum concurrent allocation is 3 x maintenance_work_mem — manageable on most instance classes.

For a comprehensive guide to autovacuum tuning, including how maintenance_work_mem interacts with vacuum performance, see the autovacuum tuning guide.

max_connections

RDS default: Based on instance class memory. The formula is generous:

Instance ClassDefault max_connectionsTypical Recommended
db.t3.micro~11250-80
db.t3.medium~415100-150
db.r6g.large~1,600100-200
db.r6g.xlarge~3,300150-300
db.r6g.2xlarge~5,000200-400

I'm afraid these default limits are achievable, not advisable. Running a db.r6g.large at 1,600 connections would consume most of the instance's memory in per-connection overhead before any queries execute.

Recommendation: Keep actual backend connections to 50-200 for most workloads. Use a connection pooler — RDS Proxy, PgBouncer, or application-level pooling — to multiplex client connections across a smaller number of database connections.

Monitor current connection usage:

Monitor connection usage
-- Active connections by state
SELECT state, count(*)
FROM pg_stat_activity
GROUP BY state
ORDER BY count DESC;

-- Peak connection count (since last stats reset)
SELECT max_conn, used FROM (
  SELECT setting::int AS max_conn FROM pg_settings WHERE name = 'max_connections'
) AS mc,
(
  SELECT count(*) AS used FROM pg_stat_activity
) AS ua;

For a deep dive on connection pooling strategies, see the PostgreSQL connection pooling guide.

random_page_cost

random_page_cost tells the query planner how expensive a random page read is relative to a sequential page read. It directly influences whether the planner chooses an index scan or a sequential scan.

RDS default: 4.0 — calibrated for spinning disk drives where a random seek is roughly 4x more expensive than a sequential read.

The problem — and it is a real one: RDS has not used spinning disks in years. All RDS storage types (gp3, io1, io2) are SSD-based. On SSDs, the cost difference between random and sequential reads is minimal. The default persists from an era that has passed.

Recommendation:

Set random_page_cost for SSD storage
-- For gp3 storage (most common):
-- Set random_page_cost to 1.1 in your custom parameter group

-- For Provisioned IOPS (io1/io2):
-- Set random_page_cost to 1.0-1.1

-- Verify the change
SHOW random_page_cost;

Allow me to be direct: this is the highest-impact change in this entire guide. With random_page_cost = 4.0, the planner significantly penalizes index scans because it assumes random reads are expensive. Queries that should use an index scan instead perform sequential scans — reading entire tables instead of looking up specific rows via an index. Changing this single setting to 1.1 can transform dozens of query plans across your database. One setting. Dozens of plans. That is the kind of return I find deeply satisfying.

After changing random_page_cost, run ANALYZE on your most critical tables to ensure the planner has fresh statistics to combine with the new cost parameter:

Refresh planner statistics
ANALYZE orders;
ANALYZE users;
ANALYZE transactions;

Connection management — keeping the household in order

RDS Proxy

RDS Proxy is AWS's managed connection pooler. It sits between your application and the RDS instance, multiplexing many client connections across a smaller pool of database connections.

How it works: RDS Proxy maintains a pool of persistent connections to your RDS instance. When a client connects to the proxy, it assigns an available backend connection for the duration of a transaction (in transaction pooling mode), then returns it to the pool.

When to use RDS Proxy:

  • Lambda and serverless backends. Each Lambda invocation creates a new database connection. Without pooling, 100 concurrent Lambda invocations open 100 database connections. With RDS Proxy, those 100 invocations share a smaller pool of backend connections.
  • Applications with connection churn. Web applications that open and close connections frequently benefit from the proxy maintaining persistent backend connections.
  • Multi-tenant workloads where many application instances connect to the same database.

Limitations:

  • Latency. RDS Proxy adds 1-5ms per query. For applications where single-digit millisecond latency matters, this overhead is significant.
  • Feature restrictions. Some PostgreSQL features do not work through the proxy in transaction pooling mode: LISTEN/NOTIFY, session-level advisory locks, SET commands that must persist across transactions, and temporary tables created in one transaction and accessed in another.
  • Cost. RDS Proxy is a separate billed service. It charges per vCPU of the associated RDS instance per hour.

When to skip RDS Proxy: Low-connection-count applications, latency-sensitive workloads where 1-5ms overhead per query is unacceptable, and workloads that rely on session-level PostgreSQL features. For these cases, a self-managed PgBouncer on an EC2 instance provides more control, lower latency, and no per-vCPU proxy charges. See the PgBouncer comparison for a detailed feature comparison.

Connection limits by instance class

Each RDS instance class has a default max_connections calculated from available memory. The default values are high — far higher than most applications should use:

Monitor connection usage by state
-- Check your instance's current limit
SHOW max_connections;

-- Check current usage
SELECT count(*) FROM pg_stat_activity;

-- Check active (actually executing) vs idle connections
SELECT
  count(*) FILTER (WHERE state = 'active') AS active,
  count(*) FILTER (WHERE state = 'idle') AS idle,
  count(*) FILTER (WHERE state = 'idle in transaction') AS idle_in_transaction,
  count(*) AS total
FROM pg_stat_activity
WHERE backend_type = 'client backend';

The practical guideline: keep active database connections below 200-400, regardless of instance class. Performance degrades well before the theoretical limit due to memory pressure, lock contention, and context switching.

If idle in transaction connections are consistently high (more than 10-20% of total), I would recommend investigating application code for transactions that are opened but not promptly committed or rolled back. These connections hold resources — memory, locks — without doing useful work. They are, if you'll forgive me, guests who have been shown to their room but refuse to either unpack or leave.

Storage — IOPS, throughput, and the gp3 question

gp3 vs io1 vs io2

RDS offers three SSD storage types. The choice affects both performance and cost:

gp3 (General Purpose SSD): Baseline 3,000 IOPS and 125 MB/s throughput included at no extra cost. Provisionable up to 16,000 IOPS and 1,000 MB/s throughput for additional cost. Best for most workloads.

io1 (Provisioned IOPS SSD): Up to 64,000 IOPS. For workloads that consistently exceed gp3's 16,000 IOPS ceiling.

io2 (Provisioned IOPS SSD, higher durability): Up to 256,000 IOPS with 99.999% durability. For workloads requiring both extreme IOPS and the highest storage durability.

Recommendation: Start with gp3. Provision additional IOPS only after monitoring confirms your workload needs them. Most PostgreSQL workloads — including busy OLTP applications — run well on gp3 with 6,000-12,000 provisioned IOPS. Measure first, spend second.

Monitoring storage performance:

Key CloudWatch storage metrics
-- These metrics are available in CloudWatch, not SQL.
-- Key CloudWatch metrics for RDS storage:
-- ReadIOPS / WriteIOPS — actual IOPS consumed
-- ReadLatency / WriteLatency — per-operation latency
-- DiskQueueDepth — number of pending I/O operations

If ReadLatency consistently exceeds 1ms or DiskQueueDepth is regularly above 1, the storage is under pressure. Options: increase provisioned IOPS (on gp3), switch to io1/io2, or reduce I/O demand through query optimization and better indexing.

Storage autoscaling

RDS can automatically increase storage when free space drops below 10% of allocated storage:

  • Enable it. Running out of storage causes an outage — the database becomes read-only. Storage autoscaling prevents this with zero downtime. There is no good reason to leave it off.
  • Set a maximum storage threshold. Without a ceiling, a bug that generates unbounded data can scale storage until costs become alarming. Set the max to 2-3x your expected maximum legitimate size.
  • Understand the ratchet — this is important. RDS storage can only grow, never shrink. If autoscaling increases your storage from 100 GB to 500 GB due to a temporary spike, you pay for 500 GB permanently (or until you migrate to a new instance). Plan initial allocation thoughtfully.

Read replicas vs Multi-AZ — a distinction worth understanding

Multi-AZ — availability, not performance

Multi-AZ deployment creates a synchronous standby instance in a different availability zone. Its purpose is automatic failover: if the primary instance fails, RDS promotes the standby within 60-120 seconds.

Key characteristics:

  • The standby is not accessible for reads. Unlike Aurora, standard RDS Multi-AZ standby instances do not accept read connections. The standby exists exclusively for failover.
  • Writes are slower. Every write must be synchronously replicated to the standby before being acknowledged to the client. This adds 1-3ms of write latency.
  • Use for production. The failover protection is worth the write latency for any production workload where an outage is unacceptable.

I should be clear: do not deploy Multi-AZ expecting a performance improvement. It is an availability feature that adds write latency. Its purpose is insurance, and insurance is valuable precisely because you hope never to use it.

Read replicas — scaling reads

Read replicas use asynchronous replication to maintain copies of the primary database that can serve read queries:

  • Up to 5 read replicas per RDS PostgreSQL instance (up to 15 for Aurora).
  • Replication lag is typically under 1 second but can spike during heavy write loads or if the replica is under-provisioned.
  • Independent instance class. Replicas can be a different instance class than the primary. A common pattern: the primary handles writes on a db.r6g.xlarge, while reporting queries run on a db.r6g.2xlarge replica with more memory.

Good use cases: Reporting and analytics queries that scan large datasets, read-heavy API endpoints where the application can tolerate slight staleness, and geographic read distribution.

Poor use cases: Queries that require absolute consistency (financial balances, inventory counts) and write-heavy workloads (read replicas do not help with write scaling).

Monitor replication lag:

Monitor replication lag
-- On the primary:
SELECT
  client_addr,
  state,
  sent_lsn,
  write_lsn,
  flush_lsn,
  replay_lsn,
  sent_lsn - replay_lsn AS replication_lag_bytes
FROM pg_stat_replication;

In CloudWatch, the ReplicaLag metric shows lag in seconds. Sustained lag above 5 seconds warrants investigation. For a comprehensive guide to PostgreSQL replication strategies, see the replication guide.

Monitoring — if you'll permit me, the most important section

Performance Insights

Performance Insights is the most valuable monitoring feature on RDS. Enable it on every production instance. I cannot state this more plainly.

What it provides:

  • Database Load chart: Shows active sessions over time, grouped by wait event type (CPU, I/O, lock, client, etc.). This is the single most useful view for understanding what your database is doing.
  • Top SQL: Identifies which queries contribute most to database load — similar to pg_stat_statements but with a visual timeline.
  • Wait event analysis: Shows whether performance is bottlenecked by CPU computation, disk I/O, lock contention, or client communication delays.

Interpreting the Database Load chart: The y-axis shows "average active sessions." Each unit represents one session actively doing work. The horizontal reference line represents the instance's vCPU count.

  • Load below vCPU line: The database has CPU headroom. Performance issues, if any, are not CPU-related.
  • Load consistently at vCPU line: The database is CPU-saturated. Either optimize the top CPU-consuming queries or scale up.
  • Load above vCPU line: Sessions are queuing for CPU. This is the most common sign the instance is undersized.
  • Load dominated by I/O waits: Queries are waiting for disk reads. Increase IOPS, add more memory for shared_buffers, or optimize queries.
  • Load dominated by lock waits: Transactions are blocking each other. Investigate with pg_stat_activity and pg_locks.

Enhanced Monitoring

Enhanced Monitoring provides OS-level metrics at up to 1-second granularity — more granular than CloudWatch's default 1-minute resolution.

Key metrics to watch:

  • Free memory: If free memory approaches zero, the instance is swapping or will be soon.
  • Swap usage: Any swap on an RDS instance is a problem. If the instance is swapping, it is under-provisioned.
  • CPU steal time: On burstable instance classes (db.t3, db.t4g), CPU steal indicates exhausted CPU credits. For production workloads consistently using more than 20% CPU on burstable instances, switch to non-burstable classes (db.r6g, db.m6g).
  • I/O utilization: Correlate with CloudWatch's ReadLatency and WriteLatency.

Enhanced Monitoring and Performance Insights complement each other: Performance Insights shows database-level causes, while Enhanced Monitoring shows infrastructure-level causes. Together, they answer the question that matters: is this a knowledge problem or a hardware problem?

pg_stat_statements on RDS

pg_stat_statements is enabled by default on most RDS PostgreSQL instances and provides cumulative query performance statistics.

Enable and configure pg_stat_statements
-- Verify it is loaded
SELECT * FROM pg_available_extensions WHERE name = 'pg_stat_statements';

-- Create the extension (if not already present)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Key parameter group settings:
-- pg_stat_statements.max = 10000 (default: 5000)
-- pg_stat_statements.track = 'all' (default: 'top')

The pg_stat_statements.track = 'all' setting captures queries executed inside functions and procedures, not just top-level statements. This is essential for applications that use stored procedures heavily.

For the complete pg_stat_statements playbook, see the pg_stat_statements guide. For a broader treatment of combining pg_stat_statements with auto_explain and pg_stat_monitor, see the PostgreSQL monitoring stack guide.

Extensions available on RDS

RDS supports a curated list of PostgreSQL extensions. Not every community extension is available, but the most widely used ones are:

ExtensionPurposeRequires shared_preload_libraries
pg_stat_statementsQuery performance statisticsYes
auto_explainAutomatic EXPLAIN for slow queriesYes
pg_hint_planManual query plan hintsYes
pgAuditAudit loggingYes
pgvectorVector similarity searchNo
PostGISGeospatial queriesNo
pg_trgmTrigram text similarityNo
pgcryptoCryptographic functionsNo
pg_partmanPartition managementNo
hstoreKey-value storageNo
citextCase-insensitive textNo

Extensions requiring shared_preload_libraries: Add them via your custom parameter group and reboot the instance:

Parameter group configuration
# In the custom parameter group:
shared_preload_libraries = pg_stat_statements, auto_explain, pg_hint_plan

After rebooting, create the extensions:

Create extensions
CREATE EXTENSION pg_stat_statements;
CREATE EXTENSION auto_explain;
CREATE EXTENSION pg_hint_plan;

For a guide to configuring auto_explain for production use, see the auto_explain production setup guide. For pgAudit configuration for compliance requirements, see the pgAudit setup guide.

Notable omissions: pg_repack (table bloat removal without locking) is not directly available on RDS. For table bloat remediation, use RDS Blue/Green deployments or VACUUM FULL during maintenance windows. Always check the RDS documentation for the current extension list for your PostgreSQL version.

What RDS cannot do — and I should be honest about this

RDS makes deliberate trade-offs between operational convenience and configuration control. Understanding the boundaries prevents wasted effort trying to optimize what RDS does not expose. A butler who overstates his case is no butler at all, so allow me to be direct:

OS-level tuning is off-limits. You cannot configure huge_pages, vm.swappiness, I/O schedulers, kernel semaphores, or filesystem mount options. RDS manages the OS. For most workloads, AWS's defaults are reasonable. For workloads that need specific kernel tuning, self-hosted PostgreSQL on EC2 is the alternative.

Some PostgreSQL features are restricted. Custom background workers, certain WAL-level configurations, and direct file system access are not available.

RDS Proxy adds convenience at the cost of latency. For latency-sensitive workloads, self-hosted PgBouncer on an EC2 instance provides lower latency and more configuration control.

Burstable instances have hidden limits. The db.t3 and db.t4g instance classes offer CPU credits that allow short bursts above baseline performance. When credits are exhausted, performance drops to 20-40% of peak. Production workloads that consistently use more than 20% CPU should use non-burstable instances (db.r6g, db.m6g, db.r7g).

The right framing: RDS trades fine-grained control for operational simplicity. You give up kernel tuning, extension flexibility, and some configuration parameters. In return, you get automated backups, zero-downtime patching, push-button failover, and managed replication. For most teams, this is the right trade-off. For teams that need every configuration knob, self-hosted PostgreSQL on EC2 provides full control at the cost of managing everything yourself.

For a broader framework on when managed services, self-hosted infrastructure, or scaling strategies are the right approach, see the scaling decision framework.

Frequently asked questions