pgAudit Configuration for SOC 2 and HIPAA Compliance
Everything in order before the auditor reaches the front door.
What pgAudit captures that standard PostgreSQL logging cannot
I regret to inform you that your auditor will be arriving shortly, and they will have questions. Specifically, they will ask: can you prove who accessed what data and when? Standard PostgreSQL logging cannot answer this question with the precision that SOC 2 and HIPAA require. pgAudit closes that gap.
This guide maps the actual requirements of SOC 2 Trust Service Criteria and HIPAA Security Rule provisions to specific pgAudit configuration settings. It covers what to log, how to configure object-level auditing for sensitive tables, how to manage log retention and storage costs, what auditors actually ask for in practice, and how to mitigate the performance impact of audit logging. If you'll permit me, I'd like to have everything in order before the auditor reaches the front door.
Allow me to draw an important distinction. Standard PostgreSQL logging via log_statement records SQL text as it arrives at the server. Setting log_statement = 'all' logs every statement, but the output is a flat stream of SQL text with no semantic structure:
2026-03-28 14:30:01.234 UTC [12345] LOG: statement: SELECT * FROM patients WHERE id = 42; This tells you a query ran. It does not, however, tell you what type of operation it was, which specific tables and columns were accessed, whether the access was on a table that contains sensitive data, or the query parameters separately from the statement template.
pgAudit provides structured audit records with semantic classification:
2026-03-28 14:30:01.234 UTC [12345] LOG: AUDIT: SESSION,1,1,READ,SELECT,TABLE,public.patients,"SELECT * FROM patients WHERE id = $1",42 This entry contains the audit type (SESSION), statement class (READ), command (SELECT), object type (TABLE), object name (public.patients), statement text, and parameters. For compliance, this distinction is the one that matters. Auditors want evidence that you can answer "who accessed the patients table between March 1 and March 15" — not merely "what SQL ran on the server."
pgAudit integrates with PostgreSQL's standard logging infrastructure. Output goes to the same log files, same log destinations, and same log rotation configured in postgresql.conf. This means your existing log pipeline — CloudWatch Logs, Cloud Logging, Datadog, ELK, Splunk — works without modification.
SOC 2 audit requirements mapped to pgAudit
SOC 2 Trust Service Criteria — what applies to database audit logging
SOC 2 is organized around Trust Service Criteria (TSC) defined by the AICPA. I should like to map the specific criteria to what they mean for your database, as the standards documentation is not always forthcoming about practical implications. The criteria most relevant to database audit logging fall under the Common Criteria (CC) category:
| Criterion | Requirement | Database Audit Relevance |
|---|---|---|
| CC6.1 | Logical access controls | Log who connects, authentication failures, privilege changes |
| CC6.2 | Authentication mechanisms | Log authentication events, credential changes |
| CC6.3 | Access to data | Log who reads and modifies sensitive data |
| CC7.2 | Monitoring for anomalies | Log sufficient detail to detect unusual access patterns |
| CC8.1 | Change management | Log schema changes (DDL) as part of change tracking |
SOC 2 does not prescribe specific tools, configuration values, or log formats — which is both a freedom and a burden. It requires that you demonstrate controls exist and function. pgAudit provides the technical evidence; your documentation, monitoring, and review processes provide the operational evidence.
pgAudit configuration for SOC 2
Minimum configuration (postgresql.conf):
# Load pgAudit
shared_preload_libraries = 'pgaudit'
# Statement classes to audit
pgaudit.log = 'ddl, role, write'
# Include query parameters (not just $1, $2 placeholders)
pgaudit.log_parameter = on
# Exclude system catalog queries (pg_class, pg_stat_*, etc.)
pgaudit.log_catalog = off
# Log the statement text only once for multi-line entries
pgaudit.log_statement_once = on
# Use standard log level for pipeline compatibility
pgaudit.log_level = log What each statement class captures:
| pgaudit.log class | SQL Commands Captured | SOC 2 Criterion |
|---|---|---|
ddl | CREATE, ALTER, DROP (tables, indexes, functions, etc.) | CC8.1 — Change management |
role | GRANT, REVOKE, CREATE ROLE, ALTER ROLE, DROP ROLE | CC6.1 — Access controls |
write | INSERT, UPDATE, DELETE, TRUNCATE | CC6.3 — Data modification tracking |
read | SELECT, COPY TO | CC6.3 — Data access tracking |
function | Function calls and DO blocks | Varies by use case |
misc | DISCARD, FETCH, CHECKPOINT, VACUUM | Rarely needed for compliance |
Adding read to the audit scope:
pgaudit.log = 'ddl, role, write, read' Include read if your SOC 2 scope requires monitoring who reads sensitive data, not just who modifies it. Be aware that adding read significantly increases log volume. See the performance impact section for mitigation strategies.
Additional recommended settings:
# Include the specific table name in audit entries
pgaudit.log_relation = on
# Log statements in functions and procedures
pgaudit.log_function = on Object-level auditing for sensitive tables
Statement-level auditing (pgaudit.log) applies globally to all tables and all users. For SOC 2 scopes that include specific sensitive data (PII, financial records, access credentials), object-level auditing provides targeted logging without the volume of global read auditing.
Step 1: Create a non-login audit role.
CREATE ROLE pgaudit_role NOLOGIN; Step 2: Tell pgAudit to watch this role (postgresql.conf):
pgaudit.role = 'pgaudit_role' Step 3: Grant permissions on sensitive tables to the audit role.
-- Audit all SELECT and DML on the customers table (contains PII)
GRANT SELECT, INSERT, UPDATE, DELETE ON customers TO pgaudit_role;
-- Audit only SELECTs on the financial_reports table
GRANT SELECT ON financial_reports TO pgaudit_role;
-- Audit all operations on the user_credentials table
GRANT ALL ON user_credentials TO pgaudit_role; pgAudit now logs any operation that matches the permissions granted to pgaudit_role — regardless of which actual user executes the statement. The result: full read and write auditing on sensitive tables, write-only auditing (via pgaudit.log = 'write') on everything else. Precision rather than volume, which is how a well-run household manages its affairs.
Step 4: Document the mapping.
Maintain a table that maps sensitive data classifications to audit configurations. Auditors will ask for this:
| Table | Data Classification | Audit Scope | Rationale |
|---|---|---|---|
customers | PII (name, email, address) | SELECT, INSERT, UPDATE, DELETE | SOC 2 CC6.3 — access to personal data |
financial_reports | Financial | SELECT | SOC 2 CC6.3 — read access to financial data |
user_credentials | Authentication | ALL | SOC 2 CC6.1, CC6.2 — credential access |
orders | Business data | Write only (via pgaudit.log) | Standard write auditing |
HIPAA requirements mapped to pgAudit
HIPAA Security Rule — database-relevant provisions
HIPAA is more exacting than SOC 2 in its specificity, and rightly so — the data under its protection demands nothing less. The provisions most relevant to database audit logging:
- Audit controls (164.312(b)): Implement mechanisms to record and examine access to ePHI. Log all reads and writes to ePHI tables.
- Access controls (164.312(a)(1)): Unique user identification, emergency access, auto logoff. Trace access to individual users, not shared accounts.
- Integrity controls (164.312(c)(1)): Mechanisms to authenticate ePHI, detect unauthorized alteration. Log data modifications with before/after context.
- Activity review (164.308(a)(1)(ii)(D)): Regular review of audit logs, access reports, security incidents. Logs must be searchable and reviewed on a schedule.
- Documentation (164.530(j)): Retain policies and documentation for 6 years. Audit logs retained for 6 years.
HIPAA is more prescriptive than SOC 2 about what must be logged. Specifically: access to ePHI must be traceable to individual users, and the audit trail must include both read and write access.
pgAudit configuration for HIPAA
Minimum configuration (postgresql.conf):
# Load pgAudit
shared_preload_libraries = 'pgaudit'
# HIPAA requires logging reads AND writes to ePHI
pgaudit.log = 'read, write, ddl, role'
# Include actual parameter values — essential for HIPAA forensics
pgaudit.log_parameter = on
# Include table names in audit entries
pgaudit.log_relation = on
# Exclude system catalog queries
pgaudit.log_catalog = off
# Reduce multi-line duplication
pgaudit.log_statement_once = on
# Object-level auditing role
pgaudit.role = 'pgaudit_role'
# Standard log level
pgaudit.log_level = log Object-level auditing on ePHI tables:
-- Create the audit role
CREATE ROLE pgaudit_role NOLOGIN;
-- Grant monitoring on all ePHI tables
GRANT SELECT, INSERT, UPDATE, DELETE ON patients TO pgaudit_role;
GRANT SELECT, INSERT, UPDATE, DELETE ON medical_records TO pgaudit_role;
GRANT SELECT, INSERT, UPDATE, DELETE ON prescriptions TO pgaudit_role;
GRANT SELECT, INSERT, UPDATE, DELETE ON insurance_claims TO pgaudit_role;
GRANT SELECT, INSERT, UPDATE, DELETE ON lab_results TO pgaudit_role;
GRANT SELECT, INSERT, UPDATE, DELETE ON appointments TO pgaudit_role; If using object-level auditing for ePHI tables, you can reduce the global pgaudit.log to exclude read and rely on the object-level audit for read coverage on sensitive tables:
# Global: log writes and DDL/role changes everywhere
pgaudit.log = 'write, ddl, role'
# Object-level: log reads AND writes on ePHI tables (via pgaudit_role grants)
pgaudit.role = 'pgaudit_role' The user identification requirement
This is where many teams discover an uncomfortable gap. HIPAA requires tracing access to individual users. This creates a genuine challenge when your application connects to PostgreSQL through a single shared database account.
If every query arrives as app_user, the audit log shows:
AUDIT: SESSION,1,1,READ,SELECT,TABLE,public.patients,"SELECT * FROM patients WHERE id = $1",42
-- Role: app_user (but which human user initiated this?) This does not satisfy HIPAA's unique user identification requirement. Solutions:
Option 1: Set application_name per request.
-- At the start of each request, set the application name to identify the user
SET application_name = 'user:dr.smith:session:abc123'; PostgreSQL includes application_name in log entries, providing user-level traceability.
Option 2: Use SET ROLE with per-user database roles.
-- Application connects as app_user, then assumes the user's role
SET ROLE dr_smith;
SELECT * FROM patients WHERE id = 42;
RESET ROLE; pgAudit logs the effective role, not the login role. This provides native PostgreSQL-level user identification.
Option 3: Use connection attributes (PostgreSQL 14+). Application drivers can pass custom attributes during connection setup that appear in logs.
Option 4: Correlate via application-level audit logs. Log the user identity and database query correlation ID at the application layer, then cross-reference with pgAudit logs during investigation.
For HIPAA compliance, the audit record must establish which individual accessed ePHI. Choose one of these approaches and document it. I assure you: auditors will ask how you trace database access to specific users, and "we use a shared account" is not the answer they are looking for.
Log retention and storage
Retention requirements
| Framework | Minimum Retention | Practical Recommendation |
|---|---|---|
| SOC 2 | No explicit minimum; auditors expect 90 days–1 year | 1 year searchable in log platform |
| HIPAA | 6 years (45 CFR 164.530(j)) | 1 year searchable, 6 years in cold storage |
HIPAA's 6-year requirement applies to policies, procedures, and documentation related to the Security Rule — which includes audit logs. This is a long retention window that requires planning for storage costs.
Practical retention architecture:
- Hot tier (0–90 days): Searchable in your log platform (Datadog, ELK, CloudWatch)
- Warm tier (90–365 days): Compressed in lower-cost storage, queryable on demand
- Cold tier (1–6 years): Compressed in archive storage (S3 Glacier, GCS Archive)
Log volume estimation and cost
| Scenario | Audit Scope | Estimated QPS | Daily Log Volume |
|---|---|---|---|
| Small app, write-only | write, ddl, role | 50 writes/sec | ~2GB/day |
| Medium app, write + ePHI reads | write, ddl, role + object-level reads | 200 mixed/sec | ~8GB/day |
| Large app, global read + write | read, write, ddl, role | 1,000 reads/sec + 200 writes/sec | ~40GB/day |
At 40GB/day, log storage costs become a real infrastructure line item. Over 6 years (HIPAA), that is approximately 87TB of compressed audit logs in cold storage.
Strategies to reduce volume without losing compliance coverage:
- Object-level auditing instead of global
read: Audit only the tables that require it. pgaudit.log_catalog = off: System catalog queries are not compliance-relevant. Excluding them can reduce log volume by 10–30%.pgaudit.log_statement_once = on: For statements that touch multiple tables, log the SQL text only on the first audit entry.- Role-based exclusion: Exclude high-frequency monitoring service accounts from auditing.
- Log compression in the pipeline: Audit logs compress well (70–90% compression ratios are typical).
Log rotation and shipping
# Rotate logs daily
log_rotation_age = 1d
# Also rotate if a single file exceeds 100MB
log_rotation_size = 100MB
# Use a consistent naming pattern for log shippers
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
# Include timestamps in a parseable format
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a ' Audit logs must be shipped to a centralized, tamper-evident log store. Auditors will ask whether logs can be modified or deleted by database administrators — if the answer is "yes, the DBA has root access to the log server," that is a finding.
What auditors actually ask for
If I may speak from observation: there is a considerable gap between what the compliance standards say and what the auditor sitting across the table actually requests during the audit. This section addresses the latter.
"Show me your pgAudit configuration."
Auditors want to see the actual configuration, not a description of it. Prepare:
# Extract pgAudit settings from the running server
psql -c "SELECT name, setting FROM pg_settings WHERE name LIKE 'pgaudit%';" Expected output:
name | setting
---------------------------+---------------------
pgaudit.log | ddl, role, write
pgaudit.log_catalog | off
pgaudit.log_level | log
pgaudit.log_parameter | on
pgaudit.log_relation | on
pgaudit.log_statement_once | on
pgaudit.role | pgaudit_role "Show me a sample audit log entry for a read of sensitive data."
Have a prepared example ready:
-- Test query (run as a known user or with application_name set)
SET application_name = 'audit_demo:user:auditor';
SELECT patient_name, date_of_birth FROM patients WHERE id = 1; Corresponding pgAudit log entry:
2026-03-28 14:30:01.234 UTC [12345] LOG: AUDIT: OBJECT,1,1,READ,SELECT,TABLE,public.patients,
"SELECT patient_name, date_of_birth FROM patients WHERE id = $1",1 Walk the auditor through each field: audit type (OBJECT — object-level), statement class (READ), command (SELECT), object (public.patients), statement text, and parameters.
"How do you review these logs?"
Auditors want evidence of active monitoring, not just passive logging. Prepare dashboard screenshots, alert rules for anomalous access patterns, and evidence of regular review cadence.
Example alert query (for an ELK-based log platform):
{
"query": {
"bool": {
"must": [
{ "match": { "message": "AUDIT" } },
{ "match": { "message": "public.patients" } },
{ "match": { "message": "READ" } }
],
"filter": [
{ "range": { "@timestamp": { "gte": "now-1h" } } }
]
}
},
"aggs": {
"by_user": { "terms": { "field": "postgresql.user.keyword" } }
}
} "How long do you keep audit logs?"
Provide the documented retention policy and demonstrate that logs from the required period are accessible:
# Demonstrate that logs from 6 months ago are retrievable
aws s3 ls s3://audit-archive/postgresql/2025/09/ --human-readable
# Or query your log platform for a date range
curl -X GET "https://elasticsearch:9200/postgresql-audit-*/_count" \
-H 'Content-Type: application/json' \
-d '{"query":{"range":{"@timestamp":{"gte":"2025-09-01","lte":"2025-09-30"}}}}' "Who can access or modify the audit logs?"
Demonstrate that the log store has its own access controls separate from database administration: database administrators cannot delete or modify logs, log store access is restricted to the security/compliance team, logs are append-only or stored in tamper-evident storage, and access to the log store is itself logged.
The recurring theme across all these questions: auditors want evidence that the system works and is monitored, not just that it is installed. A pgAudit configuration that logs everything but is never reviewed is a finding, not a control. Installation without oversight is decoration, not security.
Performance impact and mitigation
I would be remiss if I did not address the cost of this diligence. pgAudit adds overhead proportional to the volume of logged statements.
Measured impact
| Audit Scope | Typical Overhead | Primary Cost |
|---|---|---|
| write, ddl, role only | 1–3% | Minimal — writes are already I/O-heavy |
| write, ddl, role + object-level reads on 5 tables | 2–5% | Moderate — depends on read frequency on audited tables |
| Global read, write, ddl, role | 5–15% | Significant — every SELECT generates an entry |
The overhead is primarily I/O: writing audit log entries to disk. CPU overhead for generating the log entry is negligible. See the pgAudit performance benchmarks for detailed measurement methodology.
Mitigation strategies
1. Use object-level auditing instead of global read.
# Instead of this (audits every SELECT on every table):
pgaudit.log = 'read, write, ddl, role'
# Use this (audits writes globally, reads only on specific tables):
pgaudit.log = 'write, ddl, role'
pgaudit.role = 'pgaudit_role'
-- Plus GRANT SELECT on sensitive tables to pgaudit_role 2. Separate log I/O from data I/O.
# postgresql.conf
log_directory = '/var/log/postgresql' # On a separate volume from PGDATA 3. Use asynchronous log shipping. Ship logs in batches rather than synchronously forwarding each entry.
4. Monitor for log I/O bottlenecks.
-- Check for log-related wait events
SELECT wait_event_type, wait_event, count(*)
FROM pg_stat_activity
WHERE wait_event_type = 'IO' AND wait_event LIKE '%WAL%'
GROUP BY wait_event_type, wait_event; 5. Exclude high-frequency non-sensitive operations.
-- Exclude a monitoring user from audit logging entirely
ALTER ROLE monitoring_agent SET pgaudit.log = 'none';
-- Exclude a reporting user from read auditing (keep write auditing)
ALTER ROLE reporting_service SET pgaudit.log = 'write, ddl, role'; Honest counterpoint — the limits of pgAudit for compliance
I should be forthcoming about the boundaries of what pgAudit can do, because pretending they do not exist would be a disservice to you and an embarrassment to me.
pgAudit logs database-level operations, not application-level context. pgAudit can tell you: "The app_user role executed SELECT * FROM patients WHERE id = 42 at 14:30:01." pgAudit cannot tell you: "Dr. Smith viewed patient record #8827 via the appointments screen to prepare for a 3:00 PM consultation." The application-level context exists only at the application layer.
pgAudit cannot detect what happens after the data leaves the database. If the application reads ePHI and then writes it to an application log file, sends it to a third-party analytics service, caches it in Redis without access controls, or includes it in an error report — pgAudit sees the SELECT. It does not see what follows.
Compliance frameworks expect defense in depth. pgAudit covers the database layer. Network logs cover connectivity. Application audit logging covers business context. Infrastructure logs cover configuration changes. An auditor who asks "how do you trace end-to-end access to patient data?" expects an answer that spans all four layers.
pgAudit alone does not make you compliant. It provides the database audit component of a broader compliance posture that includes access controls, encryption, organizational policies, training, and incident response. Present it accurately as one layer of a defense-in-depth audit strategy — your auditor will respect the honesty far more than overconfidence.