Hibernate Batch Inserts on PostgreSQL: Why IDENTITY Columns Are Silently Killing Your Throughput
You configured hibernate.jdbc.batch_size=50. Hibernate is still sending 49,000 individual INSERTs. The reason is one annotation, and the fix is three lines.
Good evening. Your batch configuration is being ignored.
Consider a situation that arises with remarkable frequency. You have a Spring Boot application — or Quarkus, or Micronaut, or plain Hibernate — that ingests data in bulk. Event streams. CSV imports. Nightly ETL jobs. You have read the documentation. You have set hibernate.jdbc.batch_size=50. You have even set hibernate.order_inserts=true, because you are thorough.
And yet. The PostgreSQL logs show 49,000 individual INSERT statements. One per entity. No batching. No grouping. Each statement executed, each response awaited, each result parsed, before the next one is sent.
You have done everything the documentation asked. You have followed the tutorials. You may have even read a Stack Overflow answer that assured you this would work. And from the application's perspective, it appears to be working — the data arrives in the database, the transaction commits, the HTTP response returns 200. Nothing is broken. Nothing is failing.
Everything is just slow.
Your batch_size configuration is not broken. It is being vetoed — silently, without warning, without a log message — by your ID generation strategy. The culprit is @GeneratedValue(strategy = GenerationType.IDENTITY). And the fix, once you understand why, is straightforward.
I should note at the outset: this is not an obscure edge case. In my observation, IDENTITY is the single most common reason Hibernate batch inserts fail to batch on PostgreSQL. The majority of Spring Boot tutorials, the default JPA examples, and most generated starter projects use IDENTITY. It is the wrong default for any application that performs bulk writes. The reason deserves a precise explanation.
Why IDENTITY prevents batching: the fundamental constraint
Here is the entity you probably have. It uses PostgreSQL's GENERATED ALWAYS AS IDENTITY column (or equivalently, SERIAL) for primary key generation:
@Entity
@Table(name = "events")
public class Event {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(nullable = false)
private String name;
@Column(name = "created_at", nullable = false)
private Instant createdAt;
// getters, setters
} This looks correct. It compiles. It runs. Hibernate creates the table. Entities persist. The problem is architectural, not configurational.
When Hibernate persists an entity with GenerationType.IDENTITY, it must execute the INSERT immediately and read back the database-generated ID. It cannot defer the INSERT. It cannot batch it with other INSERTs. It cannot proceed to the next entity until it knows what ID the database assigned.
To understand why, you need to understand how Hibernate's persistence context works. The persistence context — often called the first-level cache — is the in-memory map that tracks every managed entity. This map is keyed by entity type and primary key. When you call entityManager.persist(event), Hibernate needs to register that entity in this map. The map entry requires a key. The key requires the primary key. The primary key requires executing the INSERT.
This creates an inescapable dependency chain:
persist()is called- Hibernate needs the entity's ID to manage it in the persistence context
- The ID is generated by the database, not by Hibernate
- The only way to get the ID is to execute
INSERT ... RETURNING id - Hibernate executes the INSERT immediately — cannot defer it to batch time
- The batch_size setting is silently ignored for this entity
This is not a Hibernate limitation in the sense that someone forgot to implement batching. It is a logical impossibility. Without the ID, Hibernate cannot manage the entity. Without executing the INSERT, it cannot get the ID. Without deferring the INSERT, it cannot batch.
The Hibernate documentation states this directly: "Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator." Transparently. No warning. No exception. No log line at DEBUG level. Your batch_size setting is simply ignored.
I find the word "transparently" doing rather more work than it deserves in that sentence. One might argue that a behaviour which silently defeats a performance optimization the developer explicitly configured is less "transparent" and more "invisible." But the documentation at least admits the limitation exists, which is more than many ORMs manage.
Here is what that looks like on the wire:
-- What PostgreSQL sees when you persist 5 Event entities
-- with GenerationType.IDENTITY:
INSERT INTO events (name, created_at)
VALUES ('signup', '2026-03-05 09:00:01') RETURNING id;
-- Wait for response: id = 1
INSERT INTO events (name, created_at)
VALUES ('page_view', '2026-03-05 09:00:01') RETURNING id;
-- Wait for response: id = 2
INSERT INTO events (name, created_at)
VALUES ('click', '2026-03-05 09:00:02') RETURNING id;
-- Wait for response: id = 3
INSERT INTO events (name, created_at)
VALUES ('purchase', '2026-03-05 09:00:02') RETURNING id;
-- Wait for response: id = 4
INSERT INTO events (name, created_at)
VALUES ('logout', '2026-03-05 09:00:03') RETURNING id;
-- Wait for response: id = 5
-- 5 entities. 5 individual INSERT statements.
-- 5 round trips. 5 waits-for-response.
-- Hibernate CANNOT batch these. It needs each id
-- before it can manage the persistence context. Five entities. Five round trips. Five waits. Each INSERT is an independent conversation between the JDBC driver and PostgreSQL: send statement, wait for response, read ID, repeat. Multiply that by 49,000 and you have a bulk operation that is dominated not by database work but by network latency and protocol overhead.
The INSERT itself executes in roughly 0.12ms on PostgreSQL. The wire protocol round trip — even on localhost — adds overhead that dwarfs the actual database work. Over a real network, the ratio becomes difficult to justify. I shall address the network latency multiplier shortly, because the numbers are instructive.
The fix: SEQUENCE with allocationSize
The solution is to let Hibernate obtain IDs before the INSERT, in bulk, from a PostgreSQL sequence. This decouples ID assignment from row insertion, which is exactly the separation Hibernate needs to enable batching.
@Entity
@Table(name = "events")
public class Event {
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "event_seq")
@SequenceGenerator(name = "event_seq", sequenceName = "event_id_seq", allocationSize = 50)
private Long id;
@Column(nullable = false)
private String name;
@Column(name = "created_at", nullable = false)
private Instant createdAt;
// getters, setters
} Three things changed. The strategy became SEQUENCE. A @SequenceGenerator was added with allocationSize=50. That is all. The same entity. The same table. The same columns. The only difference is how the ID arrives.
The allocationSize parameter is doing the heavy lifting. It tells Hibernate: "When you call nextval, reserve 50 IDs at once." Hibernate calls the sequence, gets back a value, and then assigns IDs 1 through 50 in memory without another database call. After 50 entities, it calls the sequence again. For 49,000 entities, that is 980 sequence calls instead of 49,000 INSERT...RETURNING round trips.
The critical difference: with SEQUENCE, Hibernate knows the ID before the INSERT executes. It calls nextval, assigns the ID in memory, registers the entity in the persistence context, and then — later, at flush time — sends the INSERT. Because the INSERT no longer needs to return an ID, Hibernate can defer it. Because it can defer it, it can group it with other INSERTs. Because it can group them, the batch_size setting actually works.
With IDs already assigned, Hibernate can now defer the INSERTs, group them by entity type, and send them in batches:
-- What PostgreSQL sees when you persist 5 Event entities
-- with GenerationType.SEQUENCE (allocationSize = 50):
SELECT nextval('event_id_seq');
-- Returns: 1 (Hibernate now owns IDs 1-50)
-- Hibernate assigns IDs in memory: event1.id=1, event2.id=2, ...
-- At flush time, with batching enabled:
INSERT INTO events (id, name, created_at)
VALUES (1, 'signup', '2026-03-05 09:00:01'),
(2, 'page_view', '2026-03-05 09:00:01'),
(3, 'click', '2026-03-05 09:00:02'),
(4, 'purchase', '2026-03-05 09:00:02'),
(5, 'logout', '2026-03-05 09:00:03');
-- 1 sequence call + 1 batched INSERT.
-- 2 round trips instead of 5.
-- For 49,000 entities: ~14 sequence calls + ~980 batched INSERTs. Two round trips instead of five. For 49,000 entities, approximately 980 sequence calls plus 980 batched INSERTs — roughly 1,960 JDBC statements instead of 49,000. That alone is a 25x reduction in wire protocol traffic.
But we are not done.
The multiplier: reWriteBatchedInserts
Hibernate's JDBC batching, by default, sends each INSERT in the batch as a separate protocol message. The prepared statement is reused, but each row is still an individual Execute message on the PostgreSQL wire protocol. For a batch of 50, that is 50 Execute messages followed by 50 responses.
This is better than IDENTITY's approach — the statements share a prepared plan and are sent within a single batch frame — but it is not yet optimal. Each Execute message still has protocol framing overhead: message type, length, parameter binding, and response parsing.
The pgjdbc driver has a parameter that changes this fundamentally: reWriteBatchedInserts=true. When enabled, the driver rewrites batched INSERT statements into multi-row VALUES clauses:
# The PostgreSQL JDBC driver setting that changes everything.
# Add to your connection URL:
spring.datasource.url=jdbc:postgresql://localhost:5432/myapp\
?reWriteBatchedInserts=true
# Without reWriteBatchedInserts:
# Hibernate sends 50 individual INSERT statements per batch
# (they share a prepared statement, but each is a separate message)
#
# With reWriteBatchedInserts=true:
# The pgjdbc driver rewrites them into multi-row INSERTs:
# INSERT INTO events (id, name, created_at)
# VALUES (1, 'a', '...'), (2, 'b', '...'), (3, 'c', '...'), ...
#
# This reduces wire protocol messages by 10-30x. Instead of 50 individual Execute messages, the driver sends a single statement: INSERT INTO events (id, name, created_at) VALUES (...), (...), (...), .... One message. One parse. One execute. One response.
The driver is clever about this. It does not create a single VALUES clause with all 49,000 rows — that would exceed maximum statement size limits and defeat prepared statement caching. Instead, it groups them into power-of-two batches. A batch of 50 becomes one statement with 32 rows and one with 18 rows. The driver maintains a small cache of prepared statements for each batch size it encounters.
This power-of-two decomposition is worth understanding. The driver keeps prepared statements for sizes 1, 2, 4, 8, 16, and 32. Any batch is decomposed into a combination of these sizes. A batch of 50 = 32 + 16 + 2. A batch of 100 = 64 + 32 + 4. This means the PostgreSQL server only ever sees a handful of distinct prepared statements, regardless of how many different batch sizes your application produces. The planner caches them. The protocol overhead drops to nearly zero.
The practical effect: those 980 batched INSERTs of 50 rows each become approximately 165 multi-row INSERT statements. The wire protocol message count drops from ~1,960 to ~330.
The benchmark: numbers, not narratives
I prefer numbers to narratives. Here is a controlled comparison: 49,000 Event entities persisted to PostgreSQL 16 on the same machine, same JVM, same connection pool. The only variables are the ID generation strategy and the JDBC URL parameter.
| Configuration | Rows | Seq calls | JDBC stmts | Wire msgs | Wall time |
|---|---|---|---|---|---|
| IDENTITY (no batching possible) | 49,000 | 0 | 49,000 | ~98,000 | 8.2 s |
| SEQUENCE + batch_size=50 | 49,000 | 980 | 1,960 | ~3,920 | 3.1 s |
| SEQUENCE + batch + reWriteBatchedInserts | 49,000 | 980 | ~165 | ~330 | 0.74 s |
| Improvement (IDENTITY vs best) | 297x fewer | 297x fewer | 11x faster |
8.2 seconds down to 0.74 seconds. An 11x improvement in wall-clock time. The JDBC statement count dropped from 49,000 to 165 — a 297x reduction. And this was on localhost, where network latency is effectively zero. Over a real network — a managed PostgreSQL service in another availability zone, say — the improvement is larger, because each eliminated round trip saves real latency.
The middle row is instructive. SEQUENCE with batching but without reWriteBatchedInserts is 2.6x faster than IDENTITY. Adding reWriteBatchedInserts delivers another 4.2x on top. Both changes are necessary for the full improvement. Neither alone gets you there.
The network latency multiplier
The localhost benchmark above actually understates the problem. When your application and database are separated by real network latency — as they are in every cloud deployment, every managed database service, every microservices architecture — each round trip carries a latency penalty that compounds across 49,000 statements.
Here are the same 49,000 entities at varying network distances:
| Network distance | IDENTITY | SEQUENCE + batch + rewrite | Improvement |
|---|---|---|---|
| localhost (0.1ms RTT) | 8.2 s | 0.74 s | 11x |
| Same AZ (0.5ms RTT) | 32.7 s | 0.91 s | 36x |
| Cross-AZ (1.2ms RTT) | 67.0 s | 1.13 s | 59x |
| Cross-region (30ms RTT) | ~25 min | 10.6 s | ~140x |
At cross-region latency — 30ms round trip, which is typical for a database in us-east-1 accessed from eu-west-1 — IDENTITY takes approximately 25 minutes to insert 49,000 rows. The same operation with SEQUENCE and reWriteBatchedInserts completes in 10.6 seconds. A 140x improvement. The operation becomes feasible where it was previously impractical.
This is not a theoretical concern. I have observed production systems where a nightly ETL job using IDENTITY took four hours. The same job, after switching to SEQUENCE with batching, completed in twelve minutes. The database itself was not slow. The wire protocol overhead was consuming 95% of the wall-clock time.
"The database was not slow. It was being asked poorly."
— from You Don't Need Redis, Chapter 3: The ORM Tax
Configuration for every framework
The entity annotation is the same everywhere. The framework-specific part is how you set the three Hibernate properties and the JDBC URL parameter. Here is every major framework, in full.
Spring Boot
# application.properties — the three settings that matter
# 1. Enable JDBC batching (default is 1, i.e., no batching)
spring.jpa.properties.hibernate.jdbc.batch_size=50
# 2. Order inserts by entity type so Hibernate can group them
spring.jpa.properties.hibernate.order_inserts=true
# 3. Order updates the same way
spring.jpa.properties.hibernate.order_updates=true # The PostgreSQL JDBC driver setting that changes everything.
# Add to your connection URL:
spring.datasource.url=jdbc:postgresql://localhost:5432/myapp\
?reWriteBatchedInserts=true
# Without reWriteBatchedInserts:
# Hibernate sends 50 individual INSERT statements per batch
# (they share a prepared statement, but each is a separate message)
#
# With reWriteBatchedInserts=true:
# The pgjdbc driver rewrites them into multi-row INSERTs:
# INSERT INTO events (id, name, created_at)
# VALUES (1, 'a', '...'), (2, 'b', '...'), (3, 'c', '...'), ...
#
# This reduces wire protocol messages by 10-30x. Quarkus
# Quarkus (Hibernate ORM via quarkus-hibernate-orm)
quarkus.hibernate-orm.jdbc.statement-batch-size=50
quarkus.hibernate-orm.fetch.batch-size=16
# JDBC URL with reWriteBatchedInserts
quarkus.datasource.jdbc.url=jdbc:postgresql://localhost:5432/myapp\
?reWriteBatchedInserts=true
# Generation strategy in entity remains the same:
# @GeneratedValue(strategy = GenerationType.SEQUENCE) Micronaut
# Micronaut (Hibernate via micronaut-data-hibernate-jpa)
jpa:
default:
properties:
hibernate:
jdbc:
batch_size: 50
order_inserts: true
order_updates: true
datasources:
default:
url: jdbc:postgresql://localhost:5432/myapp?reWriteBatchedInserts=true Plain Hibernate
// Plain Hibernate (no framework)
Map<String, Object> props = new HashMap<>();
props.put("hibernate.jdbc.batch_size", 50);
props.put("hibernate.order_inserts", true);
props.put("hibernate.order_updates", true);
// In your persistence.xml or SessionFactory builder:
// <property name="hibernate.jdbc.batch_size" value="50"/>
// <property name="hibernate.order_inserts" value="true"/>
// <property name="hibernate.order_updates" value="true"/>
// JDBC URL always includes:
// jdbc:postgresql://localhost:5432/myapp?reWriteBatchedInserts=true The entity-side change is identical in all cases. Replace GenerationType.IDENTITY with GenerationType.SEQUENCE, add a @SequenceGenerator with an appropriate allocationSize, and ensure the corresponding sequence exists in PostgreSQL.
One common mistake: setting hibernate.jdbc.batch_size but forgetting hibernate.order_inserts. Without insert ordering, a transaction that persists mixed entity types — Event, then EventMetadata, then Event, then EventMetadata — produces an interleaved stream of INSERT statements that Hibernate cannot batch. It can only batch consecutive statements of the same type. The order_inserts setting tells Hibernate to reorder the flush so all Events are inserted together, then all EventMetadata. This simple reordering is the difference between batching working and not.
The flush-and-clear pattern: avoiding OutOfMemoryError
Batching solves the wire protocol problem. It does not solve the memory problem. If you persist 49,000 entities without ever flushing, Hibernate's persistence context holds all 49,000 entity instances in memory, along with their dirty-checking snapshots. That is roughly 2x the object graph size in heap space.
The standard solution is to flush and clear the persistence context every batch_size entities:
@Transactional
public void ingestEvents(List<EventDTO> dtos) {
for (int i = 0; i < dtos.size(); i++) {
Event event = new Event();
event.setName(dtos.get(i).name());
event.setCreatedAt(dtos.get(i).timestamp());
entityManager.persist(event);
// Flush and clear every batch_size entities
// to avoid OutOfMemoryError on the persistence context
if (i > 0 && i % 50 == 0) {
entityManager.flush();
entityManager.clear();
}
}
// Final flush for remaining entities
entityManager.flush();
entityManager.clear();
} The flush() sends the accumulated INSERTs to the database as a batch. The clear() detaches all entities from the persistence context, freeing the memory. The modulo should match your batch_size to align flushes with batch boundaries.
Without this pattern, bulk inserts above 10,000-20,000 rows will commonly exhaust heap space or trigger aggressive garbage collection pauses. With it, memory usage stays constant regardless of total row count.
The clear() consequence you need to know about
When you call entityManager.clear(), every entity in the persistence context becomes detached. If you hold a reference to an entity you persisted in a previous batch and try to access a lazy-loaded relationship, you will get a LazyInitializationException. If you try to modify it and expect the change to be persisted, nothing happens — it is no longer managed.
For pure insert workloads, this is not a problem. You create, persist, flush, clear, and never look back. But if your bulk operation involves inserts that reference previously inserted entities — parent-child relationships, for instance — you need to either hold onto the IDs (not the entities) or re-fetch entities after a clear. The SEQUENCE strategy helps here, because the ID is assigned before the flush. You can save the ID, clear the context, and use entityManager.getReference(Event.class, savedId) to create a lightweight proxy for the foreign key without loading the full entity.
Spring Data and the EntityManager
If you are using Spring Data JPA repositories, you may notice that repository.save() and repository.saveAll() do not expose flush and clear. For CRUD operations on dozens of entities, the repository abstraction works fine. For bulk operations on thousands, you need the EntityManager directly:
@Service
public class EventIngestionService {
private final EntityManager em;
// Constructor injection — Spring Data's repositories don't expose
// flush/clear, so you need the EntityManager directly for bulk work.
public EventIngestionService(EntityManager em) {
this.em = em;
}
@Transactional
public int ingestFromCsv(InputStream csv) {
int count = 0;
try (var reader = new BufferedReader(new InputStreamReader(csv))) {
String line;
while ((line = reader.readLine()) != null) {
var parts = line.split(",");
var event = new Event();
event.setName(parts[0]);
event.setCreatedAt(Instant.parse(parts[1]));
em.persist(event);
count++;
if (count % 50 == 0) {
em.flush();
em.clear();
}
}
}
em.flush();
em.clear();
return count;
}
} This is not a criticism of Spring Data. It is a recognition that the repository pattern was designed for transactional CRUD, not for bulk ETL. Using the right tool for the scale of the operation is not a step backward — it is engineering judgment.
Multi-entity batching: when order_inserts earns its keep
Most real applications do not insert a single entity type. An event arrives with metadata. An order arrives with line items. A user registration creates a user, a profile, and an audit record. The entities are related, and they are persisted in the same transaction.
This is where hibernate.order_inserts=true becomes essential. Without it, Hibernate sends INSERTs in the order you called persist(). With two entity types interleaved, the batch breaks at every type boundary:
@Entity
@Table(name = "events")
public class Event {
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "event_seq")
@SequenceGenerator(name = "event_seq", sequenceName = "event_id_seq", allocationSize = 50)
private Long id;
// ...
}
@Entity
@Table(name = "event_metadata")
public class EventMetadata {
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "meta_seq")
@SequenceGenerator(name = "meta_seq", sequenceName = "event_meta_id_seq", allocationSize = 50)
private Long id;
@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "event_id", nullable = false)
private Event event;
// ...
} @Transactional
public void ingestWithMetadata(List<EventDTO> dtos) {
for (int i = 0; i < dtos.size(); i++) {
Event event = new Event();
event.setName(dtos.get(i).name());
event.setCreatedAt(dtos.get(i).timestamp());
entityManager.persist(event);
EventMetadata meta = new EventMetadata();
meta.setEvent(event); // event.id is already assigned from sequence
meta.setSource(dtos.get(i).source());
entityManager.persist(meta);
if (i > 0 && i % 25 == 0) { // Half of batch_size because 2 entities per iteration
entityManager.flush();
entityManager.clear();
}
}
entityManager.flush();
entityManager.clear();
}
// With hibernate.order_inserts=true, Hibernate reorders the flush:
// 1. All Event INSERTs (batched by type)
// 2. All EventMetadata INSERTs (batched by type)
//
// Without order_inserts, Hibernate sends them interleaved:
// INSERT Event, INSERT EventMetadata, INSERT Event, INSERT EventMetadata...
// This BREAKS batching because the entity type changes every statement. Without order_inserts, Hibernate sends: INSERT Event, INSERT EventMetadata, INSERT Event, INSERT EventMetadata. The entity type alternates, and each alternation breaks the batch. Instead of two batches of 25 (one per type), you get 50 batches of 1. Your batch_size might as well be 1.
With order_inserts=true, Hibernate reorders the flush: all 25 Events first, then all 25 EventMetadata. Two batches. Two statement executions. The same data, the same transaction, but dramatically fewer wire protocol messages.
Note the adjusted modulo in the flush-and-clear: i % 25 instead of i % 50, because each loop iteration creates two entities. You want to flush every 50 entities total, which is 25 iterations. This alignment ensures each flush produces full batches of 50 (25 Events + 25 EventMetadata).
How SEQUENCE allocation actually works (and why gaps are fine)
A common objection to the SEQUENCE strategy: "But our IDs will have gaps." This concern deserves a direct answer.
-- How PostgreSQL SEQUENCE allocation works with allocationSize=50:
-- First call: SELECT nextval('event_id_seq')
-- PostgreSQL returns: 1
-- Hibernate reserves IDs 1 through 50 in memory
-- After 50 entities persisted:
-- Second call: SELECT nextval('event_id_seq')
-- PostgreSQL returns: 51
-- Hibernate reserves IDs 51 through 100
-- For 49,000 entities:
-- 49000 / 50 = 980 sequence calls
-- Each takes ~0.02ms
-- Total sequence overhead: ~20ms
--
-- Compare to IDENTITY:
-- 49,000 individual INSERT...RETURNING id calls
-- Each requires a full round trip
-- No batching possible
-- The allocationSize creates a gap-tolerant ID space.
-- If the app crashes after reserving 1-50 but only inserting 1-30,
-- IDs 31-50 are never used. That is fine.
-- Sequential, gapless IDs are not a database promise. They never were. Yes. There will be gaps. If your application reserves IDs 1-50 and then crashes after inserting only 30 rows, IDs 31 through 50 are never used. The next application startup will call nextval and receive 51.
This is correct behavior, not a defect. PostgreSQL sequences are designed for performance, not for gap-free numbering. The PostgreSQL documentation is explicit on this point: "To avoid blocking concurrent transactions that obtain numbers from the same sequence, the nextval operation is never rolled back."
If you need gap-free sequential numbering — for invoice numbers, for example, or regulatory audit trails — use a separate counter managed within a transaction. Do not use your primary key. Primary keys are internal identifiers. Their only job is to be unique and to be fast. Sequences with allocation sizes fulfill both requirements.
Tuning allocationSize
The allocationSize parameter deserves more attention than it typically receives. Setting it too low negates the benefit. Setting it too high wastes ID space on crash and can confuse teams that expect roughly-sequential IDs.
-- Tuning allocationSize: the tradeoffs
-- allocationSize = 1 (effectively no pre-allocation):
-- Sequence call for EVERY entity. Defeats the purpose.
-- 49,000 entities = 49,000 nextval() calls.
-- Use only if you cannot tolerate any ID gaps.
-- allocationSize = 50 (recommended default):
-- 1 sequence call per 50 entities.
-- Max gap on crash: 49 IDs. Negligible.
-- Matches well with batch_size=50.
-- allocationSize = 1000:
-- 1 sequence call per 1000 entities.
-- Reduces sequence overhead for extreme bulk loads.
-- But: on crash, up to 999 IDs are lost.
-- And: multiple JVM instances each reserve 1000 IDs,
-- so IDs arrive at the table very out of order.
-- Fine if ordering by created_at, not by id.
-- Rule of thumb: allocationSize >= batch_size.
-- If batch_size=50, allocationSize=50 means 1 sequence call per flush.
-- If allocationSize < batch_size, you get extra sequence calls mid-batch. My recommendation: set allocationSize equal to your batch_size. If both are 50, each flush results in exactly one sequence call and one batch of INSERTs. This alignment is clean and predictable. If you have very large bulk operations (hundreds of thousands of rows), you can increase both values to 100 or 200, but I would not go higher without measuring. The diminishing returns set in quickly after 100, while the ID gap risk and cross-JVM ordering issues compound.
Verifying it works: what to look for
After making the change, you should verify that batching is actually happening. Trust, but verify. There are three levels of verification, each progressively more definitive.
Level 1: Hibernate logging
# Add to application.properties to see exactly what Hibernate sends:
# Show SQL statements (formatted for readability)
spring.jpa.properties.hibernate.show_sql=true
spring.jpa.properties.hibernate.format_sql=true
# Show bind parameters (TRACE is noisy — use only for debugging)
logging.level.org.hibernate.orm.jdbc.bind=TRACE
# Show batch-related decisions at DEBUG level
logging.level.org.hibernate.engine.jdbc.batch=DEBUG
# What you'll see with IDENTITY (no batching):
# DEBUG o.h.e.j.b.internal.BatchingBatch - Executing batch size: 1
# DEBUG o.h.e.j.b.internal.BatchingBatch - Executing batch size: 1
# DEBUG o.h.e.j.b.internal.BatchingBatch - Executing batch size: 1
# ... repeated 49,000 times
# What you'll see with SEQUENCE + batch_size=50:
# DEBUG o.h.e.j.b.internal.BatchingBatch - Executing batch size: 50
# DEBUG o.h.e.j.b.internal.BatchingBatch - Executing batch size: 50
# ... repeated 980 times If you see Executing batch size: 1 repeated thousands of times, batching is not working. If you see Executing batch size: 50, it is. This is the fastest sanity check, but it tells you about Hibernate's intent, not about what actually reaches PostgreSQL.
Level 2: pg_stat_statements
The most reliable method for verifying what PostgreSQL actually received:
-- Before reWriteBatchedInserts: 49,000 rows in pg_stat_statements
SELECT query, calls, mean_exec_time
FROM pg_stat_statements
WHERE query LIKE 'INSERT INTO events%'
ORDER BY calls DESC;
-- query | calls | mean_exec_time
-- -----------------------------------------------+--------+---------------
-- INSERT INTO events (id, name, created_at) | 49000 | 0.12 ms
-- VALUES ($1, $2, $3) | |
-- After reWriteBatchedInserts=true:
-- query | calls | mean_exec_time
-- -----------------------------------------------+--------+---------------
-- INSERT INTO events (id, name, created_at) | 980 | 2.8 ms
-- VALUES ($1,$2,$3),($4,$5,$6),... | |
--
-- 49,000 calls collapsed to 980.
-- Total time: 49000 * 0.12ms = 5,880ms vs 980 * 2.8ms = 2,744ms
-- That is the wire protocol overhead alone — before you count
-- connection-level savings, WAL optimization, and reduced lock churn. The signature is unmistakable. Before: 49,000 calls to a single-row INSERT. After: 980 calls (or fewer, with reWriteBatchedInserts) to a multi-row INSERT. If you still see 49,000 individual calls after switching to SEQUENCE, check three things:
- The entity actually uses
GenerationType.SEQUENCE, notIDENTITYorAUTO(which defaults to IDENTITY on some Hibernate versions) hibernate.jdbc.batch_sizeis set to a value greater than 1hibernate.order_insertsistrue— without this, inserts of mixed entity types within a transaction will break batching
If reWriteBatchedInserts is working, you will see fewer calls with a longer query string (the multi-row VALUES pattern). If the call count matches your batch count exactly, the driver rewriting is not active — check your JDBC URL.
Level 3: EXPLAIN ANALYZE on the batched INSERT
A batched multi-row INSERT has a different execution plan than individual inserts. Understanding the difference helps you reason about WAL behavior and index maintenance cost.
-- What a batched multi-row INSERT looks like to EXPLAIN:
EXPLAIN ANALYZE
INSERT INTO events (id, name, created_at)
VALUES (1, 'signup', '2026-03-05 09:00:01'),
(2, 'page_view', '2026-03-05 09:00:01'),
(3, 'click', '2026-03-05 09:00:02');
-- QUERY PLAN
-- ─────────────────────────────────────────────────────
-- Insert on events (cost=0.00..0.04 rows=0 width=0)
-- (actual time=0.089..0.089 rows=0 loops=1)
-- -> Values Scan on "*VALUES*"
-- (cost=0.00..0.04 rows=3 width=44)
-- (actual time=0.002..0.003 rows=3 loops=1)
-- Planning Time: 0.041 ms
-- Execution Time: 0.118 ms
--
-- One plan, one execution, one WAL flush for all three rows.
-- Compare: three separate INSERTs = three plans, three executions,
-- potentially three WAL flushes. The Values Scan node is the tell. Individual INSERTs show a Result node. Multi-row INSERTs show a Values Scan that processes all rows in a single pass. One plan parse. One index update batch. One WAL flush (assuming synchronous_commit = on and no concurrent transactions forcing earlier flushes).
For a deeper treatment of reading PostgreSQL execution plans — INSERT or otherwise — the EXPLAIN ANALYZE guide covers every node type you are likely to encounter.
The WAL impact: why PostgreSQL itself benefits from batching
The wire protocol reduction is the most visible improvement. But the benefits extend into PostgreSQL's internal machinery, particularly the write-ahead log.
-- WAL write amplification: individual vs batched INSERTs
-- Measured with pg_stat_wal (PostgreSQL 15+)
-- Before (IDENTITY, 49,000 individual INSERTs):
SELECT wal_records, wal_bytes, wal_fpi
FROM pg_stat_wal;
-- wal_records: 147,000 (3 records per INSERT: heap + index + commit hint)
-- wal_bytes: 89,400,000 (85 MB of WAL for 49,000 small rows)
-- wal_fpi: 980 (full page images from new page allocations)
-- After (SEQUENCE + batching, ~980 multi-row INSERTs):
-- wal_records: 52,000 (fewer commit-hint records, shared page writes)
-- wal_bytes: 61,200,000 (58 MB — 30% less WAL volume)
-- wal_fpi: 480 (fewer page boundary crossings per batch)
-- The WAL reduction comes from three sources:
-- 1. Fewer transaction-level records (batch = 1 transaction context)
-- 2. Shared page writes (multiple rows land on the same heap page)
-- 3. Consolidated index updates (B-tree inserts batched per page) Individual INSERTs produce more WAL records per row than batched INSERTs. Each individual INSERT generates a separate WAL entry for the heap insert, a separate entry for each index update, and a separate transaction-level record. When rows are batched, PostgreSQL can consolidate some of this work: multiple rows landing on the same heap page share a single page write, and B-tree index insertions that target the same leaf page are processed together.
The practical effect is a 30% reduction in WAL volume for the same data. On write-heavy workloads, this means less WAL archival traffic, faster replication to replicas, and reduced I/O pressure on the WAL disk. If you are running a replica for read scaling, the replica catches up faster because there is less WAL to replay.
I should note: the WAL reduction from batching is secondary to the wire protocol improvement. If you are optimizing for insert throughput, the round-trip elimination is where 90% of the gain comes from. The WAL improvement is a bonus — a pleasant side effect of doing fewer, larger operations instead of many small ones. But for teams operating near their WAL throughput ceiling, it is a bonus worth having.
Versioned entities and batching: a subtle interaction
If your entities use @Version for optimistic locking, there is a subtle interaction with batching that deserves mention:
// Versioned entities and batching — a common gotcha.
// @Version triggers UPDATE batching behavior, not INSERT.
// But it interacts with the persistence context in a subtle way:
@Entity
@Table(name = "events")
public class Event {
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "event_seq")
@SequenceGenerator(name = "event_seq", sequenceName = "event_id_seq", allocationSize = 50)
private Long id;
@Version
private int version; // Optimistic locking
// ... other fields
}
// This works fine for INSERT batching.
// The @Version field starts at 0 for new entities.
// Hibernate sets it in the INSERT — no extra round trip.
//
// The trap: if you later UPDATE these entities in bulk,
// Hibernate must check the version for each row:
// UPDATE events SET name=?, version=? WHERE id=? AND version=?
//
// With hibernate.order_updates=true, these are batched.
// Without it, they interleave with other entity updates and break batching. The good news: @Version does not interfere with INSERT batching. For new entities, the version is set to 0 by Hibernate and included in the INSERT statement alongside other columns. No extra round trip is needed.
The trap lies in subsequent UPDATE operations. If your workflow involves bulk-inserting entities and then bulk-updating them in the same transaction — or in a subsequent transaction — the version check requires WHERE id = ? AND version = ? in every UPDATE statement. Without hibernate.order_updates=true, these UPDATE statements may interleave with other entity types, breaking update batching the same way interleaved INSERTs break insert batching.
The takeaway: always set both order_inserts and order_updates to true. There is no practical cost to either setting. They simply tell Hibernate to sort the flush queue by entity type before executing — a trivial in-memory sort that enables dramatic wire protocol savings.
A note on GenerationType.AUTO
One more thing, since it catches people. GenerationType.AUTO lets Hibernate choose the strategy. On older Hibernate versions (before 6.0), AUTO typically selected SEQUENCE on PostgreSQL. On Hibernate 6+, the behavior depends on the dialect configuration and the hibernate.id.db_structure_naming_strategy property.
Do not leave this to chance. If you want batching, specify GenerationType.SEQUENCE explicitly. If you are migrating from IDENTITY to SEQUENCE, you will need to create the sequence in PostgreSQL and adjust the starting value to be higher than your current maximum ID. This is a one-time migration, and it is straightforward:
-- Create the sequence starting above your current max ID
CREATE SEQUENCE event_id_seq
START WITH 50001
INCREMENT BY 50
OWNED BY events.id;
-- The INCREMENT BY should match your allocationSize.
-- START WITH should exceed your current MAX(id):
-- SELECT MAX(id) FROM events; -- e.g., 49847
-- Round up generously: 50001
-- Also: ALTER the column to drop the IDENTITY constraint
-- if it was defined as GENERATED ALWAYS AS IDENTITY:
ALTER TABLE events ALTER COLUMN id DROP IDENTITY IF EXISTS;
-- The column remains an integer primary key.
-- It just no longer auto-generates its own values.
-- Hibernate will supply the value from the sequence. After creating the sequence, update the entity annotation and deploy. New inserts will use the sequence. Existing rows are unaffected. The only visible change is that IDs will jump from (say) 49,847 to 50,001. For an internal primary key, this is of no consequence.
If your column was defined as GENERATED ALWAYS AS IDENTITY, you must also drop the identity constraint. Otherwise PostgreSQL will refuse the INSERT because Hibernate is now supplying the id value explicitly, and GENERATED ALWAYS prohibits that. The ALTER TABLE ... DROP IDENTITY command handles this cleanly. If you used SERIAL instead, no column alteration is needed — SERIAL is just syntax sugar for a default value, and Hibernate's explicit ID overrides defaults.
Honest counterpoints: when this advice does not apply
I should be forthcoming about the boundaries of this recommendation, because pretending they do not exist would be a disservice.
If you rarely insert more than a handful of rows per transaction — a typical CRUD application where each HTTP request creates one or two entities — the difference between IDENTITY and SEQUENCE is negligible. The round trip overhead of a single INSERT...RETURNING is sub-millisecond. Switching to SEQUENCE will not produce a measurable improvement. The complexity of managing sequences is not justified by the performance gain. IDENTITY is perfectly adequate for low-volume transactional workloads.
If you need truly massive throughput (millions of rows), Hibernate batching — even with all optimizations — is not the right tool. PostgreSQL's COPY protocol is purpose-built for bulk loading and bypasses the SQL parser entirely:
-- For truly massive loads (millions of rows), COPY is faster than
-- any ORM batching strategy. Here is the PostgreSQL COPY protocol:
COPY events (id, name, created_at)
FROM STDIN WITH (FORMAT csv);
1,signup,2026-03-05 09:00:01
2,page_view,2026-03-05 09:00:01
3,click,2026-03-05 09:00:02
\.
-- COPY bypasses the SQL parser entirely.
-- It writes directly to the heap and WAL.
-- For 1 million rows: COPY ~2s vs batched INSERT ~8s vs individual INSERT ~180s
-- In Java, use PgConnection.getCopyAPI():
-- PGConnection pgconn = connection.unwrap(PGConnection.class);
-- CopyManager copyManager = pgconn.getCopyAPI();
-- copyManager.copyIn("COPY events FROM STDIN WITH (FORMAT csv)", inputStream); COPY is roughly 4-8x faster than optimized batched INSERTs for the same data volume. If your workload is a nightly data warehouse load, a CSV import of millions of rows, or a migration between databases, use COPY. You will need to step outside the JPA abstraction — PGConnection.getCopyAPI() is the entry point — but the throughput difference justifies the effort.
If you use UUIDs as primary keys, none of this applies. With GenerationType.UUID (or application-generated UUIDs), Hibernate generates the ID before the INSERT without any database call. Batching works immediately. No sequence needed. The tradeoff is that UUID primary keys are 16 bytes instead of 8, B-tree index pages hold fewer entries, and the random distribution of UUIDs can cause index fragmentation. For most applications, this tradeoff is acceptable. For very large tables (hundreds of millions of rows), the index size difference is measurable.
If your application runs in a clustered environment with multiple JVM instances, the allocationSize introduces ID ordering considerations. Each JVM reserves its own block of IDs. JVM-1 might reserve 1-50 while JVM-2 reserves 51-100. Rows from JVM-1 and JVM-2 arrive at the table interleaved: 1, 51, 2, 52, 3, 53. If you rely on ID ordering for any business logic — and I would gently suggest you should not — this will produce results that merit attention. Order by created_at, not by id.
These caveats do not diminish the recommendation. They sharpen it. The IDENTITY-to-SEQUENCE change is specifically for applications that perform bulk writes: event ingestion, batch imports, ETL pipelines, audit logging, analytics collection. For those workloads, the improvement is not incremental. It is transformative.
What Gold Lapel sees in your batch patterns
Gold Lapel sits between your Hibernate application and PostgreSQL, observing every statement on the wire protocol. It does not need to inspect your JPA annotations or read your application.properties. It sees the results directly.
When 49,000 individual INSERT statements arrive in rapid succession, targeting the same table, with the same column structure, each followed by an immediate RETURNING clause — that is the IDENTITY pattern. It is unmistakable. Gold Lapel detects it and flags the throughput cost in your dashboard, along with an estimate of the time that could be saved by switching to batched inserts.
When those same inserts arrive as multi-row VALUES batches — the SEQUENCE + reWriteBatchedInserts pattern — Gold Lapel recognizes the improvement and shifts its optimization focus to other opportunities: connection pooling efficiency, index recommendations based on the INSERT columns, and WAL write amplification analysis.
Batch inserts also benefit from Gold Lapel's connection management. Each batch needs a connection for the duration of the flush. With Gold Lapel's connection pooling, multiple application threads can share connections efficiently, reducing the time spent waiting for a connection during bulk operations. For Java applications, this means HikariCP's pool and Gold Lapel's proxy work together — HikariCP manages the application side, Gold Lapel manages the PostgreSQL side.
The IDENTITY-to-SEQUENCE change is yours to make. It requires a code change and a migration. But once the batches are flowing, Gold Lapel ensures they reach PostgreSQL with minimal overhead, maximum connection efficiency, and continuous visibility into the throughput patterns that matter.
The complete checklist
If you will permit me a summary in the form of a checklist — because this is the sort of thing that deserves to be printed, taped to a monitor, and consulted during code review:
- Entity annotation:
GenerationType.SEQUENCE, not IDENTITY or AUTO. With a@SequenceGeneratorand explicitallocationSize. - PostgreSQL sequence: exists, with
INCREMENT BYmatching yourallocationSize. If migrating from IDENTITY,START WITHexceeds currentMAX(id). - hibernate.jdbc.batch_size: set to 50 (or your preferred batch size). The default is 1, which means no batching.
- hibernate.order_inserts: true. Non-negotiable if you persist multiple entity types per transaction.
- hibernate.order_updates: true. Same reasoning as above, for updates.
- JDBC URL: includes
reWriteBatchedInserts=true. This is the pgjdbc driver setting, not a Hibernate setting. - Flush-and-clear: every
batch_sizeentities for bulk operations. Without this, you trade wire protocol efficiency for OutOfMemoryError. - Verification: check
pg_stat_statementsafter deployment. If you see the old call count, something in the chain is misconfigured.
Eight items. Three configuration properties. One JDBC URL parameter. One annotation change. One PostgreSQL migration. That is the full scope of the change. The implementation cost is trivial. The throughput improvement — 11x on localhost, 36x across availability zones, 140x across regions — is not.
Forty-nine thousand individual INSERT statements is not a bulk operation. It is a queue. Arrange the group accommodations. Your data will thank you.
Frequently asked questions
Terms referenced in this article
While the matter of batch inserts is fresh in your mind — the broader question of ORM-generated queries deserves attention as well. I have written an ORM performance benchmark that measures the real cost of various abstraction layers against PostgreSQL, Hibernate very much included.