chunk() vs cursor() vs lazy() on PostgreSQL: What Actually Happens When You Process a Million Rows
Three methods that sound like they solve the same problem. They solve three different problems, and two of them create new ones.
Good evening. You have a million rows to process.
Perhaps it is an invoice generation job. Perhaps a nightly data export. Perhaps you are backfilling a column on a table that grew larger than anyone expected — a table that was "never going to be more than a hundred thousand rows" approximately eighteen months ago, and has since developed ambitions of its own. The table has a million rows, your Laravel application needs to touch each one, and Order::all() is not an option because Eloquent will hydrate every model into memory simultaneously and your process will be found dead by the OOM killer, with no suicide note and no survivors.
You know this, of course. You are not here because you tried Order::all() on a million rows. You are here because you tried the responsible thing — chunk(), cursor(), or lazy() — and something was still wrong. The job took too long. The memory was too high. PgBouncer complained. Or perhaps nothing broke yet, but you have the good sense to investigate before it does. In my experience, the engineers who read about database internals before production catches fire are the same ones whose production environments rarely do.
Laravel offers three methods that appear to solve the "process large datasets" problem: chunk(), cursor(), and lazy(). The documentation presents them as variations on a theme — different flavors of batch processing, pick whichever suits your taste. They are not variations on a theme. They generate fundamentally different SQL, interact with PostgreSQL in fundamentally different ways, and fail in fundamentally different modes.
Two of them have quiet, serious problems on PostgreSQL that the documentation does not mention. One of them is the same as another with a different return type — a fact that the naming does nothing to communicate. And the actual best options — chunkById() and lazyById() — are buried in the docs like afterthoughts, mentioned in a paragraph you have to scroll past three other methods to find.
Each one deserves proper examination.
chunk(): OFFSET pagination in a trench coat
When you call chunk(1000, ...), Laravel paginates through your table using LIMIT and OFFSET. Each batch is a separate query. Each query skips further into the result set. And each skip costs more than the last.
// chunk() — the one everyone reaches for first.
// Processes 1,000 rows at a time. Looks responsible.
Order::query()
->where('status', 'completed')
->orderBy('id')
->chunk(1000, function ($orders) {
foreach ($orders as $order) {
$order->generateInvoice();
}
});
// What Laravel actually sends to PostgreSQL:
-- Batch 1:
SELECT * FROM "orders" WHERE "status" = 'completed'
ORDER BY "id" ASC LIMIT 1000 OFFSET 0;
-- Batch 2:
SELECT * FROM "orders" WHERE "status" = 'completed'
ORDER BY "id" ASC LIMIT 1000 OFFSET 1000;
-- Batch 500:
SELECT * FROM "orders" WHERE "status" = 'completed'
ORDER BY "id" ASC LIMIT 1000 OFFSET 499000;
-- Batch 1000:
SELECT * FROM "orders" WHERE "status" = 'completed'
ORDER BY "id" ASC LIMIT 1000 OFFSET 999000; This is OFFSET pagination — the mechanism I addressed at length in the keyset pagination guide. Batch 1 is fast because OFFSET 0 skips nothing. Batch 500 must scan 499,000 rows, sort them, discard them, then return 1,000. Batch 1,000 must scan all one million rows. Every batch pays for the work of every batch before it.
The EXPLAIN output tells the story with the precision that stories deserve:
-- EXPLAIN ANALYZE for batch 1 (OFFSET 0):
Limit (cost=0.43..52.18 rows=1000 width=214)
(actual time=0.03..0.84 rows=1000 loops=1)
-> Index Scan using orders_pkey on orders
(actual time=0.02..0.71 rows=1000 loops=1)
Filter: (status = 'completed')
Planning Time: 0.09 ms
Execution Time: 0.98 ms
-- EXPLAIN ANALYZE for batch 500 (OFFSET 499000):
Limit (cost=25842.10..25894.03 rows=1000 width=214)
(actual time=412.38..413.24 rows=1000 loops=1)
-> Index Scan using orders_pkey on orders
(actual time=0.03..348.71 rows=500000 loops=1)
Filter: (status = 'completed')
Planning Time: 0.09 ms
Execution Time: 413.47 ms
-- EXPLAIN ANALYZE for batch 1000 (OFFSET 999000):
Limit (cost=51683.77..51735.70 rows=1000 width=214)
(actual time=841.02..841.93 rows=1000 loops=1)
-> Index Scan using orders_pkey on orders
(actual time=0.03..712.89 rows=1000000 loops=1)
Filter: (status = 'completed')
Planning Time: 0.10 ms
Execution Time: 842.19 ms Batch 1: 0.98ms. Batch 500: 413ms. Batch 1,000: 842ms. The degradation is perfectly linear — each additional batch costs roughly 0.84ms more than the last, because PostgreSQL must scan one more batch-worth of rows before it can start returning results. I want to be precise about what "scan and discard" means here. PostgreSQL is not merely skipping over these rows. It is reading them from disk or shared buffers, evaluating them against the ORDER BY clause, and then — having done all of that work — discarding them. The I/O, the CPU time, the buffer pool pressure: all real. All paid for. All wasted.
| Batch | Query time | Rows scanned |
|---|---|---|
| Batch 1 (OFFSET 0) | 0.98ms | 1,000 |
| Batch 100 (OFFSET 99000) | 82ms | 100,000 |
| Batch 250 (OFFSET 249000) | 206ms | 250,000 |
| Batch 500 (OFFSET 499000) | 413ms | 500,000 |
| Batch 750 (OFFSET 749000) | 621ms | 750,000 |
| Batch 1000 (OFFSET 999000) | 842ms | 1,000,000 |
The total query time for all 1,000 batches is approximately 421 seconds. Seven minutes of database time to process one million rows, and the dominant cost is throwing away rows PostgreSQL already scanned.
// Why does chunk() take 421 seconds total?
//
// Each batch's cost is proportional to its OFFSET depth.
// Batch N costs roughly: 0.84ms × N
//
// Total = 0.84ms × (1 + 2 + 3 + ... + 1000)
// = 0.84ms × (1000 × 1001 / 2)
// = 0.84ms × 500,500
// = 420.42 seconds
//
// This is the sum of an arithmetic series. The cost grows
// quadratically with the number of batches, not linearly.
// Double the table size (2M rows, 2,000 batches):
//
// Total = 0.84ms × (2000 × 2001 / 2)
// = 0.84ms × 2,001,000
// = 1,680.84 seconds (28 minutes!)
//
// The relationship between table size and total chunk() cost
// is O(n²), not O(n). A table that grows 2x takes 4x longer.
// A table that grows 10x takes 100x longer.
//
// chunkById() remains O(n) regardless.
// Double the table = double the batches = double the time.
// 2M rows: ~1.92 seconds. Not 28 minutes. This is the detail that often goes unnoticed. The total cost of chunk() is not O(n) — it is O(n²). Double the table size and the total processing time quadruples. A table that grows from one million rows to ten million rows does not take 10x longer to chunk through. It takes 100x longer. The sum of an arithmetic series is a quadratic function, and chunk() is paying that sum on every run.
For a table that grows by a few thousand rows per day, this is a time bomb. The job that takes 7 minutes today takes 28 minutes when the table doubles, and nearly 2 hours when it hits 5 million rows. I have seen this exact progression in production — a nightly export job that "worked fine" for a year, then quietly exceeded its cron window, started overlapping with the next run, and consumed the database server's I/O budget during morning peak traffic. The fix was a one-line change from chunk() to chunkById(). The diagnosis took three days.
The mutation trap: chunk() can skip your rows
There is a second problem with chunk() that is orthogonal to performance, and arguably more dangerous: if your callback modifies the column used for filtering, chunk() will silently skip rows.
// chunk() has a mutation trap that chunkById() avoids.
//
// If your callback MODIFIES the column used for ordering,
// chunk() can skip rows or process them twice.
// Dangerous with chunk():
Order::where('status', 'pending')
->orderBy('id')
->chunk(1000, function ($orders) {
foreach ($orders as $order) {
$order->update(['status' => 'processed']);
}
});
// What happens:
// Batch 1: SELECT ... WHERE status = 'pending' LIMIT 1000 OFFSET 0
// → Returns orders 1-1000. You update them to 'processed'.
//
// Batch 2: SELECT ... WHERE status = 'pending' LIMIT 1000 OFFSET 1000
// → The first 1000 pending orders are now 'processed'.
// → The result set has shifted. OFFSET 1000 now skips
// orders 1001-2000 (which are still pending).
// → You get orders 2001-3000.
// → Orders 1001-2000 are NEVER PROCESSED.
//
// chunkById() avoids this entirely:
Order::where('status', 'pending')
->chunkById(1000, function ($orders) {
foreach ($orders as $order) {
$order->update(['status' => 'processed']);
}
});
// Batch 1: SELECT ... WHERE status = 'pending' AND id > 0 LIMIT 1000
// → Returns orders 1-1000. You update them.
// Batch 2: SELECT ... WHERE status = 'pending' AND id > 1000 LIMIT 1000
// → Cursor is id > 1000. Not affected by status changes.
// → Orders 1001-2000 are correctly returned.
//
// The cursor (WHERE id > ?) is independent of the filter column.
// Mutations to non-cursor columns cannot shift the pagination. This is not a theoretical edge case. Processing pending orders, migrating records from one status to another, backfilling a column based on the current value — these are precisely the operations that batch processing exists to perform. And chunk() will silently skip half your data if the filter column changes between batches.
chunkById() is immune to this because its cursor — WHERE id > ? — is independent of the filter column. You can change status, updated_at, processed_flag, or any other column, and the cursor remains anchored to the primary key. The pagination never shifts.
I should note that this is documented in Laravel — a brief mention that chunkById() is "useful when updating database records while chunking." It is rather more than useful. It is the only correct approach when the callback mutates filtered data. The word "useful" undersells the severity by a considerable margin.
cursor(): the name is a lie
The name cursor() suggests PostgreSQL's server-side cursors — a mechanism that lets you DECLARE a cursor, FETCH rows in batches, and stream through a result set without loading it all at once. That would be excellent. That is not what happens.
// cursor() — sounds like it should use PostgreSQL cursors.
// It doesn't. Not by default.
foreach (Order::where('status', 'completed')->cursor() as $order) {
$order->generateInvoice();
}
// What Laravel actually sends:
SELECT * FROM "orders" WHERE "status" = 'completed';
// One query. No LIMIT. No OFFSET. No DECLARE CURSOR.
// The entire result set comes back in a single response. Laravel's cursor() sends a single unbounded SELECT — no LIMIT, no cursor declaration, no FETCH. PostgreSQL executes the query, serializes the entire result set, and sends it across the wire. PHP's PDO driver, backed by libpq (the PostgreSQL C client library), receives and buffers every row in memory before returning control to your application.
The name is not merely misleading. It describes the opposite of what happens. A PostgreSQL cursor streams rows from the server on demand, using bounded memory. Laravel's cursor() does the opposite: it fetches everything at once, buffers it entirely, and then pretends to stream by yielding from the buffer one row at a time.
// PHP's PDO driver with libpq (the PostgreSQL C library)
// buffers the ENTIRE result set in memory by default.
//
// cursor() tells Laravel to use a PHP Generator — fetching
// one row at a time from PDO. But PDO already has all the
// rows. They are sitting in a C-level buffer in libpq.
//
// Memory timeline for 1 million orders (~200 bytes each):
//
// 1. PostgreSQL sends all 1M rows → PG: ~200MB sent
// 2. libpq receives and buffers them → PHP process: ~200MB
// 3. PDO wraps the libpq result → PHP process: ~200MB (same buffer)
// 4. cursor() yields one Eloquent model → PHP process: ~200MB + 1 model
// 5. cursor() yields the next model → PHP process: ~200MB + 1 model
// ... (the 200MB buffer never shrinks)
//
// cursor() reduces Eloquent model memory.
// It does NOT reduce the result set buffer.
// You can verify this yourself:
Order::where('status', 'completed')->cursor()->each(function ($order) {
if ($order->id % 100000 === 0) {
dump(memory_get_usage(true) / 1024 / 1024 . ' MB');
}
});
// Output:
// "218.5 MB" ← at row 100,000
// "218.5 MB" ← at row 200,000
// "218.5 MB" ← at row 900,000
// The memory is flat — because it was all allocated up front. The cursor() method uses a PHP Generator to yield one Eloquent model at a time. This is memory-efficient at the Eloquent layer — you never have 1,000 hydrated models in memory simultaneously. But at the PDO layer, all one million rows are sitting in a C-level buffer. The 200MB was allocated the moment the query completed. The Generator does not reduce it. It cannot. The buffer lives in C memory managed by libpq, outside PHP's garbage collector's jurisdiction.
This is a PHP/PDO limitation, not a Laravel bug. PDO with the pgsql driver calls PQexec(), which is libpq's synchronous query function. PQexec() buffers the complete result before returning. To get true streaming, you would need PQsendQuery() with PQsetSingleRowMode(), which PDO does not expose. This is a design decision in PHP's PDO layer that dates back to PHP 5.1, and changing it would break backwards compatibility for every PDO application in existence.
Inside cursor(): the Generator illusion
If you will permit me a brief tour of the internals, the mechanism becomes clear.
// What happens inside Laravel's cursor() method:
// File: Illuminate\Database\Eloquent\Builder.php
public function cursor()
{
// Calls the base query builder's cursor:
return $this->applyScopes()->query->cursor();
}
// File: Illuminate\Database\Query\Builder.php
public function cursor()
{
// ...
return $this->connection->cursor(
$this->toSql(), $this->getBindings(), ! $this->useWritePdo
);
}
// File: Illuminate\Database\Connection.php
public function cursor($query, $bindings = [], $useReadPdo = true)
{
$statement = $this->run($query, $bindings, function ($query, $bindings) use ($useReadPdo) {
// ...
$statement = $this->prepared(
$this->getPdoForSelect($useReadPdo)->prepare($query)
);
$statement->execute($this->prepareBindings($bindings));
return $statement;
});
// Here is the Generator:
while ($record = $statement->fetch(PDO::FETCH_OBJ)) {
yield $record; // One row at a time... from an already-complete buffer.
}
}
// The Generator yields rows one by one. But the PDO::execute()
// call above already received the complete result from libpq.
// The yield is iterating over a completed buffer, not streaming
// from the database. The Generator — the while ($record = $statement->fetch()) { yield $record; } loop — looks like streaming. It has the shape of streaming. The yield keyword suggests that each row is fetched on demand from the database. But $statement->execute() already completed. The query ran. The rows arrived. They are buffered in libpq's memory. The fetch() call is reading from a local buffer, not making a network request to PostgreSQL.
This is the distinction that matters: the Generator controls when Eloquent hydrates each row (turning a raw object into an Eloquent model with all its accessors, mutators, and relationships). It does not control when the data arrives. The data is already there. All of it. You are iterating over a completed buffer with the ceremony of a stream.
I do not fault Laravel for this design. Given PDO's constraints, a Generator is the best way to avoid hydrating a million Eloquent models simultaneously. The memory savings at the model layer are real — perhaps 400MB of Eloquent overhead reduced to a few kilobytes at any given moment. The issue is that the 200MB PDO buffer remains, and for many applications, that is the memory that matters.
What a real server-side cursor looks like
For the sake of completeness — and because the name cursor() invites the comparison — here is what PostgreSQL's actual server-side cursor mechanism looks like.
// What a REAL server-side cursor looks like in PostgreSQL:
//
// This is what you might expect cursor() to do. It doesn't.
-- Server-side cursor (native PostgreSQL):
BEGIN;
DECLARE order_cursor CURSOR FOR
SELECT * FROM orders WHERE status = 'completed';
FETCH 1000 FROM order_cursor; -- Returns 1,000 rows
-- process batch...
FETCH 1000 FROM order_cursor; -- Returns next 1,000 rows
-- process batch...
-- Repeat until FETCH returns 0 rows.
CLOSE order_cursor;
COMMIT;
-- Advantages over cursor():
-- 1. Only 1,000 rows in memory at a time (on BOTH sides)
-- 2. No 200MB PDO buffer — rows are streamed in batches
-- 3. Single query parse/plan — no repeated WHERE clauses
--
-- Disadvantages:
-- 1. Requires a transaction held open for the entire duration
-- 2. PgBouncer transaction mode: same connection-holding problem
-- 3. The cursor holds a snapshot — long-running = potential bloat
-- 4. PDO doesn't expose this natively — you'd need raw SQL A server-side cursor holds the query's execution state on the PostgreSQL server. Each FETCH retrieves the next batch of rows without re-executing the query or scanning from the beginning. Memory is bounded on both sides: PostgreSQL materializes only the requested batch, and the client receives only that batch.
You can implement this in Laravel using raw PDO, if the situation warrants it:
// If you genuinely need server-side cursors in Laravel,
// you can use raw PDO — but understand the trade-offs.
DB::transaction(function () {
$pdo = DB::connection()->getPdo();
$pdo->exec("DECLARE order_cursor CURSOR FOR
SELECT * FROM orders WHERE status = 'completed'");
while (true) {
$stmt = $pdo->query("FETCH 1000 FROM order_cursor");
$rows = $stmt->fetchAll(PDO::FETCH_OBJ);
if (empty($rows)) {
break;
}
foreach ($rows as $row) {
// Process raw stdClass objects — no Eloquent hydration
// You'd need to manually create models or work with raw data
processOrder($row);
}
}
$pdo->exec("CLOSE order_cursor");
});
// This gives you true streaming with bounded memory.
// But you lose Eloquent hydration, model events, accessors,
// mutators, and relationship loading. You are working with
// raw stdClass objects.
//
// For most applications, lazyById() is the better trade-off:
// you keep Eloquent's full feature set, bounded memory,
// constant-time queries, and PgBouncer compatibility.
// The only cost is multiple queries instead of one. I present this for completeness, not as a recommendation. Server-side cursors hold a transaction open for the entire iteration, which creates the same PgBouncer connection-holding problem as cursor(). They also maintain a snapshot for the duration, which can contribute to table bloat on long-running operations. And you lose Eloquent entirely — no model events, no accessors, no relationship loading.
For the vast majority of Laravel applications, lazyById() is the better answer. You keep Eloquent, you keep bounded memory, you keep constant-time queries, and you keep PgBouncer compatibility. The only cost is multiple queries instead of one — and as we have seen, each query takes under a millisecond.
lazy(): it's cursor() with a bow on it
// lazy() — introduced in Laravel 8.
// Identical to cursor() in terms of what hits PostgreSQL.
Order::where('status', 'completed')->lazy()->each(function ($order) {
$order->generateInvoice();
});
// Same single query. Same full result set. Same PDO buffer.
// lazy() uses the same Generator approach as cursor(), but
// returns a LazyCollection instead of a base Generator.
//
// The difference is API convenience, not database behavior.
// LazyCollection gives you map(), filter(), take(), etc.
//
// From PostgreSQL's perspective: identical to cursor(). Laravel 8 introduced lazy() as a more ergonomic alternative to cursor(). From PostgreSQL's perspective, they are identical. Same query. Same PDO buffer. Same memory profile. The same unbounded SELECT hits the wire, the same C-level buffer fills up, and the same Generator yields rows from it one at a time.
The difference is the return type, and it is a meaningful difference at the application layer, even if it is invisible to the database.
// cursor() returns a Generator — limited API:
$generator = Order::where('status', 'completed')->cursor();
// $generator->map() — not available
// $generator->filter() — not available
// You must use foreach or iterator_to_array()
foreach ($generator as $order) {
// manual filtering here
}
// lazy() returns a LazyCollection — full collection API:
Order::where('status', 'completed')
->lazy()
->filter(fn ($order) => $order->total > 100)
->map(fn ($order) => $order->generateInvoice())
->each(fn ($invoice) => $invoice->send());
// The LazyCollection wraps the same Generator internally.
// It adds ->filter(), ->map(), ->take(), ->chunk(),
// ->groupBy(), ->unique(), and every other Collection method.
// The underlying data source is identical.
// lazyById() returns a LazyCollection too:
Order::where('status', 'completed')
->lazyById(1000)
->filter(fn ($order) => $order->total > 100)
->each(fn ($order) => $order->generateInvoice());
// Same API. Better database behavior. cursor() returns a raw PHP Generator. A Generator is a one-pass iterator — you can foreach over it, and that is essentially the extent of the API. You cannot map(), filter(), or take() on it without converting it to an array first, which defeats the purpose of streaming.
lazy() returns a LazyCollection, which wraps the Generator and gives you the full collection API — map(), filter(), take(), groupBy(), unique(), and every other method you are accustomed to from Eloquent collections. These operations are lazy — they compose as pipeline stages rather than materializing intermediate results. ->filter()->map()->take(100) processes only the rows that survive each stage, stopping after 100 results.
If you are choosing between cursor() and lazy(), choose lazy() for the nicer API. But do not choose either of them for million-row processing on PostgreSQL. They both buffer the entire result set in PDO's memory. The distinction between them is developer convenience. The distinction between them and lazyById() is architectural.
"Eloquent will, by default, do exactly what you ask. The difficulty is that what you ask and what you mean are often two different things — and the database pays the difference."
— from You Don't Need Redis, Chapter 11: PHP & Laravel: A Polished Service
The PgBouncer problem with cursor() and lazy()
If you run PostgreSQL behind PgBouncer in transaction mode — and you should, for any application with more than a handful of connections — cursor() and lazy() create a second problem beyond memory. A problem that can cascade from one batch job to your entire application.
// cursor() with PgBouncer in transaction mode: a trap.
//
// cursor() sends one massive SELECT, then reads rows for
// potentially minutes. During this time, the PDO connection
// is holding a PgBouncer backend connection open.
//
// PgBouncer transaction mode releases connections after each
// transaction completes. But cursor()'s implicit transaction
// doesn't complete until all rows are consumed.
//
// With 5 workers running cursor() on 1M rows:
// - 5 backend connections held for ~30 seconds each
// - pool_size = 20 → 25% of your pool is locked
// - Other requests queue behind the cursor consumers
//
// chunkById() releases the connection after each batch query.
// Each query takes <1ms. The connection returns to the pool
// between batches. PgBouncer is happy.
// Worse: if cursor() fails mid-stream (timeout, exception),
// the connection may be returned to PgBouncer in a dirty
// state — with an open transaction that was never committed
// or rolled back. PgBouncer transaction mode works by releasing backend connections back to the pool after each transaction completes. This is what makes it possible for 500 application connections to share 20 PostgreSQL connections. The math works because most web requests hold a connection for only a few milliseconds — the time to execute one or two queries. Between requests, the connection sits idle in PgBouncer's pool, available to any client that needs it.
But cursor() holds its connection for the entire duration of the result consumption — which, for a million rows with any meaningful processing per row, might be several minutes. During that time, one of PgBouncer's backend connections is locked to a single PHP worker. Not executing queries. Just sitting there while PHP iterates through a local buffer.
// Timeline: cursor() holding a PgBouncer connection
//
// T+0.000s PHP worker calls cursor() on 1M rows
// T+0.001s PDO sends: SELECT * FROM orders WHERE status = 'completed'
// T+0.001s PgBouncer assigns backend connection #7
// T+3.200s PostgreSQL finishes executing, starts sending rows
// T+4.800s libpq finishes buffering all 1M rows (~200MB)
// T+4.801s Generator begins yielding — first model hydrated
// T+5.100s ... processing row 1,000 (generateInvoice takes ~0.3ms each)
// T+12.40s ... processing row 25,000
// T+34.80s ... processing row 100,000
// (backend connection #7 STILL held)
// T+298.0s ... processing row 999,000
// T+300.1s Generator exhausted. PDO statement closed.
// T+300.1s PgBouncer releases backend connection #7.
//
// For 5 MINUTES, one of your 20 backend connections was locked
// to a single PHP worker doing row-by-row processing.
//
// Now imagine 5 Horizon workers running this job simultaneously.
// 5 connections × 5 minutes = 25 connection-minutes consumed.
// Your API requests are queueing behind the batch jobs. Run five Horizon workers with cursor() jobs simultaneously and you have consumed 25% of your connection pool for batch processing that could take minutes. Your API requests — the ones serving actual users — are queueing behind the batch jobs for connection access. This is a resource starvation pattern, and it manifests as elevated p99 latencies across your entire application, not just in the batch jobs.
If a cursor() call fails mid-stream — a timeout, an exception, a deployment — the connection may be returned to PgBouncer with an uncommitted implicit transaction. The next client that receives that connection inherits the dirty state. This is not theoretical. It is a production incident waiting for a quiet Tuesday afternoon.
chunkById() and lazyById() release the connection after each batch query. Each query takes under a millisecond. The connection returns to PgBouncer's pool between batches. Other clients can use it. PgBouncer's multiplexing works as designed. The batch job and the API share the pool cooperatively rather than competitively.
chunkById() and lazyById(): the ones that actually work
Laravel provides two methods that use keyset pagination — WHERE id > ? instead of OFFSET. They are called chunkById() and lazyById(), and they are the correct choice for processing large PostgreSQL datasets. Not the best choice. The correct choice.
// lazyById() and chunkById() — Laravel's best options.
// They use WHERE id > ? instead of OFFSET.
Order::where('status', 'completed')
->lazyById(1000)
->each(function ($order) {
$order->generateInvoice();
});
// What Laravel actually sends:
-- Batch 1:
SELECT * FROM "orders" WHERE "status" = 'completed'
AND "id" > 0 ORDER BY "id" ASC LIMIT 1000;
-- Batch 2 (last ID from batch 1 was 1042):
SELECT * FROM "orders" WHERE "status" = 'completed'
AND "id" > 1042 ORDER BY "id" ASC LIMIT 1000;
-- Batch 500 (last ID was 623847):
SELECT * FROM "orders" WHERE "status" = 'completed'
AND "id" > 623847 ORDER BY "id" ASC LIMIT 1000;
-- Every batch: same cost. Index seeks directly to the cursor. Instead of telling PostgreSQL "skip 499,000 rows and give me the next 1,000," lazyById() says "give me 1,000 rows where id is greater than 623,847." PostgreSQL uses the primary key index to jump directly to that position. No scanning. No discarding. Constant time regardless of how deep into the table you are.
-- EXPLAIN ANALYZE for lazyById() — batch 500 (WHERE id > 623847):
Limit (cost=0.43..52.18 rows=1000 width=214)
(actual time=0.03..0.82 rows=1000 loops=1)
-> Index Scan using orders_pkey on orders
(actual time=0.02..0.69 rows=1000 loops=1)
Index Cond: (id > 623847)
Filter: (status = 'completed')
Planning Time: 0.09 ms
Execution Time: 0.96 ms
-- 0.96ms. At batch 500. At batch 1,000. At batch 5,000.
-- The index jumps directly to id > 623847. No scanning.
-- No discarding. Constant time. 0.96ms at batch 500. The same as batch 1. The same as batch 5,000. The index provides the ordering and the WHERE clause positions the read head. PostgreSQL touches exactly the rows it returns. No row is read and discarded. No row is sorted and thrown away. Every unit of I/O produces a unit of output.
The mechanism is the same one behind keyset pagination in APIs — a technique covered in the keyset pagination guide above. The WHERE id > ? clause acts as a cursor, and the primary key index acts as the navigation structure. The cursor remembers where you left off. The index jumps directly there. There is nothing to degrade.
chunkById() vs lazyById(): choosing between them
Both methods generate the same SQL. Both use keyset pagination. Both are PgBouncer-safe. The difference is the programming model they expose to your application code.
// chunkById() — callback-based batching:
Order::where('status', 'completed')
->chunkById(1000, function ($orders) {
foreach ($orders as $order) {
$order->generateInvoice();
}
// You can return false to stop early:
if ($someCondition) {
return false;
}
});
// lazyById() — collection-based streaming:
Order::where('status', 'completed')
->lazyById(1000)
->each(function ($order) {
$order->generateInvoice();
});
// Both generate the same SQL. Both use WHERE id > ?.
// The difference is the programming model:
//
// chunkById():
// - Callback receives a Collection of $batchSize models
// - You process the batch, then it fetches the next
// - Return false to stop early
// - Better when you need batch-level operations (bulk insert, etc.)
//
// lazyById():
// - Returns a LazyCollection that yields one model at a time
// - Full collection API: filter(), map(), take(), etc.
// - Better when you process rows individually
// - Better for pipeline-style code chunkById() gives you a callback that receives a Collection of models — one batch at a time. This is the natural choice when you need batch-level operations: inserting all processed results into another table, dispatching a batch of notifications, or performing a bulk update. The callback receives the batch, processes it, and returns. Returning false stops the iteration early.
lazyById() gives you a LazyCollection that yields one model at a time, with the full collection API. This is the natural choice for pipeline-style processing — filtering, mapping, transforming — where each row is handled individually and the batch boundary is an implementation detail rather than a logical boundary.
For most batch processing jobs — generating invoices, sending emails, updating statuses — either works. I tend to reach for lazyById() by default, because the LazyCollection API is more expressive and the code reads more cleanly. But this is a preference, not a recommendation. The important choice is between these two methods and everything else. Between them, it is a matter of taste.
The full comparison
// Total wall-clock time to process 1 million rows:
//
// chunk(1000) — 1,000 queries, degrading linearly
// Batch 1: 0.98ms
// Batch 500: 413ms
// Batch 1000: 842ms
// Total query time: ~421 seconds (sum of linear degradation)
// Total queries: 1,000
//
// chunkById(1000) — 1,000 queries, constant time
// Every batch: ~0.96ms
// Total query time: ~0.96 seconds
// Total queries: 1,000
//
// cursor() — 1 query, full table scan
// Single query: ~3.2 seconds
// Total queries: 1
// But: 200MB PDO buffer. Connection held for entire duration.
//
// Winner: chunkById(). 440x faster than chunk().
// Uses 1,000 queries but each one is sub-millisecond. | Method | Queries | Memory | PgBouncer safe | Degradation | Verdict |
|---|---|---|---|---|---|
| chunk() | 1 per batch (OFFSET) | Low (one batch of models) | Yes | Linear — each batch slower | Avoid for large datasets |
| chunkById() | 1 per batch (WHERE id >) | Low (one batch of models) | Yes | None — constant time | Best for batch processing |
| cursor() | 1 (full result set) | High (PDO buffers all rows) | Risky (long-running query) | None — single query | Misleading name |
| lazy() | 1 (full result set) | High (PDO buffers all rows) | Risky (long-running query) | None — single query | Same as cursor() |
| lazyById() | 1 per batch (WHERE id >) | Low (one batch of models) | Yes | None — constant time | Best for streaming |
chunkById() is the best choice when you need to process rows in callback batches. lazyById() is the best choice when you want the LazyCollection API for chaining. Both use keyset pagination. Both are PgBouncer-safe. Both stay sub-millisecond at any depth.
The one requirement: the column you paginate by must have a unique index. The default — id — works because it is the primary key. If you need to paginate by a different column, it must be uniquely indexed, and you pass the column name as the second argument: lazyById(1000, 'order_number').
Tuning the batch size
The batch size parameter — the 1000 in lazyById(1000) — is not arbitrary. It controls the trade-off between network round trips and per-batch memory. If you will permit me a brief examination of the numbers:
// Batch size matters. Here's how to think about it.
//
// lazyById(100) — 10,000 queries for 1M rows
// Per-query: ~0.96ms
// Network RTT: ~0.5ms × 10,000 = 5 seconds overhead
// Total: ~14.6 seconds
// Memory: ~20KB per batch (100 models × ~200 bytes)
//
// lazyById(1000) — 1,000 queries for 1M rows
// Per-query: ~0.96ms
// Network RTT: ~0.5ms × 1,000 = 0.5 seconds overhead
// Total: ~1.46 seconds
// Memory: ~200KB per batch
//
// lazyById(5000) — 200 queries for 1M rows
// Per-query: ~4.1ms (more rows per fetch)
// Network RTT: ~0.5ms × 200 = 0.1 seconds overhead
// Total: ~0.92 seconds
// Memory: ~1MB per batch
//
// lazyById(10000) — 100 queries for 1M rows
// Per-query: ~8.2ms
// Network RTT: ~0.5ms × 100 = 0.05 seconds overhead
// Total: ~0.87 seconds
// Memory: ~2MB per batch (plus Eloquent model overhead)
//
// Sweet spot: 1,000-5,000 for most workloads.
// Below 500: network round-trip overhead dominates.
// Above 10,000: diminishing returns, higher peak memory.
// The query cost is constant per row regardless — the variable
// is how many network round trips you're willing to pay. Below 500 rows per batch, the network round-trip overhead becomes the dominant cost. Each query takes under a millisecond of database time, but the round trip between PHP and PostgreSQL (through PgBouncer, typically) adds 0.3-0.5ms per query. At a batch size of 100, you are paying 10,000 round trips for a million rows — 5 seconds of pure network overhead.
Above 5,000 rows per batch, you are into diminishing returns. The database time per query grows linearly with batch size (more rows to serialize and transfer), while the round-trip savings flatten because you have already reduced the query count below 200. And your peak memory per batch grows proportionally — 10,000 Eloquent models at 2KB each is 20MB of model overhead per batch, on top of the PDO result buffer.
The sweet spot for most workloads is 1,000 to 5,000. Start with 1,000. If profiling shows that network round trips are a bottleneck — and they rarely are, because the processing time per row usually dwarfs the query time — increase it. If memory is tight, decrease it. The database-side cost per row is constant regardless of batch size. You are only tuning the ratio of network overhead to memory usage.
A word about SELECT *
While we are attending to the efficiency of your batch processing, there is a related matter that compounds every issue we have discussed.
// One more thing: SELECT * is doing you no favors here.
//
// If you only need the order ID and total for invoice generation,
// don't fetch every column.
// This fetches all 15 columns including the 4KB notes TEXT field:
Order::where('status', 'completed')
->lazyById(1000)
->each(fn ($order) => $order->generateInvoice());
// This fetches only what you need:
Order::where('status', 'completed')
->select(['id', 'user_id', 'total', 'currency', 'status'])
->lazyById(1000, 'id') // must specify the cursor column when using select()
->each(fn ($order) => $order->generateInvoice());
// With 1M rows:
// SELECT * : ~200 bytes/row × 1M = 200MB transferred
// SELECT 5 cols: ~45 bytes/row × 1M = 45MB transferred
//
// Less data transferred = faster queries, less network I/O,
// less memory pressure, less shared_buffers churn.
//
// Important: when using select() with lazyById(), you MUST
// include the cursor column (id) in your select list.
// Laravel needs it to build the WHERE id > ? clause.
// If you forget it, you'll get a confusing error. Every method we have examined — chunk(), cursor(), lazy(), chunkById(), lazyById() — uses SELECT * by default. Eloquent selects every column unless you specify otherwise. For a table with 15 columns including a 4KB TEXT field and a JSONB metadata blob, each row is ~200 bytes of useful data and ~1,600 bytes of columns you never look at.
Over a million rows, the difference between SELECT * and SELECT id, user_id, total, currency, status is the difference between 200MB and 45MB transferred across the wire. Less data means faster serialization on the PostgreSQL side, less network bandwidth, less PDO buffer memory (for cursor()/lazy()), and less time spent by Eloquent hydrating columns you do not use.
One detail to remember: when using select() with lazyById() or chunkById(), you must include the cursor column in your select list. Laravel needs the id value from the last row to build the WHERE id > ? clause for the next batch. If you omit it, Laravel will throw an error — or worse, silently paginate incorrectly. Specify the cursor column explicitly: lazyById(1000, 'id').
Horizon, queues, and connection pressure
Batch processing in Laravel rarely happens in a single process. It happens in Horizon workers — multiple PHP processes consuming jobs from Redis, each with its own PDO connection to PostgreSQL. The choice of batch method does not just affect one job. It affects every other job sharing the same connection pool.
// Laravel Horizon workers and batch processing:
// Each Horizon worker is a separate PHP process with its
// own PDO connection to PostgreSQL (or PgBouncer).
// horizon.php config — be aware of connection impact:
'environments' => [
'production' => [
'supervisor-1' => [
'connection' => 'redis',
'queue' => ['invoices', 'exports', 'backfills'],
'maxProcesses' => 10,
// 10 workers = 10 simultaneous PDO connections
],
],
],
// If all 10 workers run cursor() jobs simultaneously:
// - 10 × 200MB = 2GB of PDO buffers
// - 10 PgBouncer connections held for minutes
// - API requests competing for remaining pool connections
//
// If all 10 workers run lazyById(1000) jobs simultaneously:
// - 10 × 200KB = 2MB of PDO buffers
// - Connections borrowed and returned per batch (<1ms each)
// - PgBouncer pool barely notices
// You can also throttle batch jobs to be kind to the database:
Order::where('status', 'completed')
->lazyById(1000)
->each(function ($order) {
$order->generateInvoice();
// Optional: yield to the event loop between rows
// Prevents a single job from monopolizing a worker
usleep(100); // 0.1ms pause — barely noticeable per row
}); Ten Horizon workers running cursor() jobs simultaneously consume 2GB of memory in PDO buffers and hold 10 PgBouncer connections for several minutes each. Those same ten workers running lazyById(1000) jobs consume 2MB of memory and borrow connections for less than a millisecond per batch. The difference is not incremental. It is the difference between a connection pool that is half-consumed by batch jobs and one that barely notices them.
This matters especially during deployment. If a cursor() job is holding a connection when a deployment restarts the worker, the connection may be returned to PgBouncer in an unclean state. lazyById() jobs can be interrupted between batches with no connection state concerns — each batch is a self-contained query with no ongoing transaction.
The honest counterpoints
I have made a forceful case for chunkById() and lazyById(). A waiter who overstates his case is no waiter at all, so I should be forthcoming about the limitations and the situations where the other methods have genuine merit.
cursor() and lazy() have their place. If your result set is small — under 10,000 rows, say — the PDO buffer is negligible (a few megabytes) and the single-query approach avoids 10 round trips. For a report that pulls 5,000 rows and transforms them in a pipeline, lazy() is simpler and faster than lazyById(). The database sends one query, the pipeline processes the result, and the simplicity has real value. My criticism of cursor() and lazy() is not that they are bad methods — it is that they are used for million-row batch processing, which is not the problem they solve well.
chunkById() requires a unique, indexed column. The default id column works perfectly. But if you need to process rows in a specific non-unique order — by created_at, by a score column, by a composite business key that is not uniquely indexed — chunkById() cannot help directly. You would need to add a unique index on the cursor column, or use chunk() with the understanding that performance degrades at depth. For tables under 100,000 rows, the degradation may be acceptable. For tables over that, adding the index is the correct investment.
The "1,000 queries vs 1 query" objection. Engineers sometimes look at chunkById() generating 1,000 queries for a million rows and balk at the query count. "Surely one query is better than a thousand?" In general, yes. In this specific case, no. One thousand sub-millisecond queries complete in under a second of total database time. One unbounded query takes 3.2 seconds of database time and holds a connection for the entire processing duration. The thousand queries are individually cheaper, collectively faster, and operationally safer. Query count is not a useful optimization metric here. Total database time and connection hold time are.
chunk() works fine for small tables. If your table has 10,000 rows and you chunk by 1,000, you have 10 batches. The last batch scans 10,000 rows, which takes about 8ms. The total overhead from OFFSET is perhaps 50ms across all batches. This is negligible. The problem is not chunk() on small tables. The problem is that small tables become large tables, and nobody refactors the batch job when they do. If you can use chunkById() from the start — and you almost always can — you eliminate the scaling cliff before it matters.
When GL catches chunk() in the wild
Gold Lapel sits between your Laravel application and PostgreSQL, observing the query stream on the wire. It does not require code changes, SDK installation, or Eloquent modifications. It watches what actually reaches the database.
-- Gold Lapel sees this pattern in your query stream:
-- Query 1: SELECT * FROM orders WHERE status = 'completed'
-- ORDER BY id ASC LIMIT 1000 OFFSET 0;
-- Query 2: SELECT * FROM orders WHERE status = 'completed'
-- ORDER BY id ASC LIMIT 1000 OFFSET 1000;
-- ...
-- Query 47: SELECT * FROM orders WHERE status = 'completed'
-- ORDER BY id ASC LIMIT 1000 OFFSET 46000;
-- GL detects: repeated query template with escalating OFFSET.
-- Advisory: "Deep pagination detected on orders. OFFSET 46000
-- scanned 47,000 rows to return 1,000. Consider keyset
-- pagination (WHERE id > ?) for constant-time batch access."
-- No code change required to get the warning.
-- GL sits between Laravel and PostgreSQL, watching the wire. When GL sees a repeated query template with escalating OFFSET values, it recognizes the pattern — whether it comes from chunk(), a hand-written paginator, or an API endpoint serving page 847 of search results. The advisory identifies the table, the current OFFSET depth, the number of rows scanned versus returned, and suggests the keyset alternative.
-- GL also detects cursor()-style queries on large tables:
-- Query: SELECT * FROM orders WHERE status = 'completed'
-- (no LIMIT — unbounded result set)
--
-- GL sees: unbounded SELECT returning 1,000,000+ rows.
-- Advisory: "Unbounded result set on orders — 1,042,317 rows
-- returned without LIMIT. If this is batch processing,
-- consider paginated access (LIMIT/OFFSET or keyset) to
-- reduce memory pressure and connection hold time."
--
-- GL doesn't know you're using cursor() vs lazy() vs get().
-- It sees the SQL. An unbounded SELECT on a million-row table
-- is concerning regardless of which Laravel method generated it. GL also detects the unbounded SELECT pattern that cursor() and lazy() generate. A query that returns a million rows without a LIMIT clause is noteworthy regardless of how the application handles the result. GL flags it because the connection hold time and memory implications affect the entire system, not just the process that issued the query.
GL's detection works at the PostgreSQL wire protocol level. It does not parse PHP. It does not read your chunk() call. It sees the SQL that arrives at PostgreSQL, and it notices when OFFSET values are climbing across repeated queries with the same structure, or when a single query returns an exceptionally large result set. The detection is framework-agnostic — the same advisory fires for Laravel, Django, Rails, or raw SQL.
For teams that cannot immediately refactor from chunk() to chunkById() — because the job is complex, the column is not uniquely indexed, or Tuesday is not the day for it — the advisory provides the evidence needed to prioritize the change. Total rows scanned. Total time spent on discarded work. The linear cost curve that will get worse as the table grows. Evidence, not opinion, is what moves refactoring tickets up the priority queue.
The decision, made simple
I shall dispense with equivocation.
If you are processing more than 10,000 rows in a Laravel application backed by PostgreSQL, use lazyById() or chunkById(). Use lazyById() when you want collection pipeline operations. Use chunkById() when you want batch callbacks. Start with a batch size of 1,000. Select only the columns you need.
Do not use chunk() for large datasets. It generates O(n²) total work and can silently skip rows when the callback modifies filtered data.
Do not use cursor() or lazy() for large datasets. They buffer the entire result set in PDO's memory and hold a PgBouncer connection for the full processing duration.
If your dataset is small enough that cursor() or chunk() work fine today, use lazyById() anyway. It costs you nothing and it will still work fine when your table is ten times larger. The best time to choose the right batch method is at the beginning. The second best time is now.
The database was not slow. It was being asked to scan a million rows, discard 999,000 of them, and return 1,000. One does not blame the household for being untidy when the instructions were unclear.
Frequently asked questions
Terms referenced in this article
I would be remiss if I did not mention that the connection behaviour of chunk(), cursor(), and lazy() becomes far more consequential when Horizon workers enter the picture. I have written a companion guide on how Laravel Horizon exhausts PostgreSQL connections — the natural next chapter for anyone processing large volumes in background jobs.