← Laravel & PHP Frameworks

Doctrine's Identity Map Memory Leak in Symfony Workers: A Matter Requiring Attention

Your Messenger consumer has been running for six hours. It started at 32 MB. It is now at 1.3 GB. Allow me to explain where your memory went.

The Waiter of Gold Lapel · Updated Mar 20, 2026 Published Mar 5, 2026 · 28 min read
The Identity Map has been collecting entities without permission. It declines to stop.

Good evening. Your worker has a hoarding problem.

In traditional PHP — the kind served by PHP-FPM behind Nginx — memory leaks are, in a meaningful sense, impossible. The process starts, handles one request, and dies. Whatever mess it made dies with it. The architecture was the garbage collector.

That era is ending. Symfony Messenger runs workers that process thousands of messages without restarting. RoadRunner and FrankenPHP keep PHP processes alive across HTTP requests. CLI commands import millions of rows in a single invocation. Long-running PHP is no longer exotic. It is increasingly the default.

And Doctrine's Identity Map was not designed for it.

I should be precise about what I mean by "not designed for it," because the Doctrine maintainers are not fools and their design is not an accident. The Identity Map is a pattern from Martin Fowler's Patterns of Enterprise Application Architecture, and it solves a real problem: ensuring that two queries returning the same database row produce the same PHP object in memory. Without it, you could have two different Order objects representing row 42, modify one, flush the other, and lose data. The Identity Map prevents that class of bug entirely.

The assumption underlying its design is that the EntityManager's lifetime is short. A web request. A single unit of work. Milliseconds to seconds. Within that lifetime, tracking every entity is correct behavior — the memory cost is trivial and the consistency guarantees are essential.

The problem is not the pattern. The problem is that Symfony Messenger, RoadRunner, and FrankenPHP have changed the lifetime from milliseconds to hours, and the Identity Map did not receive the memo.

A brief inventory of what the Identity Map actually stores

Before I can explain where the memory goes, I should explain what the Identity Map is storing and why the cost is higher than most developers expect. The Identity Map is not a single array. It is several.

Inside the UnitOfWork
<?php

// Doctrine's Identity Map — what it actually stores.
// The UnitOfWork maintains several internal arrays:

// 1. The Identity Map itself: entity class → ID → entity instance
// $identityMap['App\Entity\Order'][42] = <Order object>
// $identityMap['App\Entity\Customer'][17] = <Customer object>

// 2. The original entity data (for dirty checking at flush time):
// $originalEntityData[spl_object_id($order)] = ['status' => 'pending', ...]

// 3. Entity states:
// $entityStates[spl_object_id($order)] = UnitOfWork::STATE_MANAGED

// 4. Entity change sets (computed at flush):
// $entityChangeSets[spl_object_id($order)] = ['status' => ['pending', 'processed']]

// Every find(), every DQL result row, every lazy-loaded association
// adds entries to ALL of these arrays. In a web request that fetches
// 20 entities, this is trivial. In a worker that fetches 50,000,
// each of these arrays grows without bound.

The UnitOfWork maintains at minimum four parallel data structures for every managed entity: the identity map itself (class name and primary key to object instance), the original entity data (a snapshot of every property's value at the time of fetch, used for dirty checking during flush()), the entity states (managed, new, detached, removed), and the entity change sets (computed diffs between original and current values).

This means that each entity's memory footprint in the Identity Map is not just the entity object itself — it is roughly double. The entity, plus a complete copy of its scalar data as an associative array. An Order entity with 15 columns takes perhaps 2 KB as a PHP object. The original data snapshot adds another 1-2 KB. The metadata overhead — hash table entries, spl_object_id mappings, state flags — adds more. For an entity with associations, multiply by every related entity that Doctrine lazy-loaded on your behalf.

This is why 12,000 entities can consume 289 MB. It is not 12,000 objects. It is 12,000 objects, plus 12,000 data snapshots, plus 12,000 state entries, plus every association proxy and collection wrapper Doctrine created along the way.

How the Identity Map grows in a Messenger worker

Consider a standard Symfony Messenger handler. It receives a message, fetches an entity, does some work, and flushes. Nothing unusual. Nothing a code reviewer would flag.

src/MessageHandler/ProcessOrderHandler.php
<?php

namespace App\MessageHandler;

use App\Entity\Order;
use App\Message\ProcessOrderMessage;
use Doctrine\ORM\EntityManagerInterface;
use Symfony\Component\Messenger\Attribute\AsMessageHandler;

#[AsMessageHandler]
class ProcessOrderHandler
{
    public function __construct(
        private EntityManagerInterface $em,
    ) {}

    public function __invoke(ProcessOrderMessage $message): void
    {
        $order = $this->em->find(Order::class, $message->getOrderId());

        // Business logic — update status, calculate totals, etc.
        $order->setStatus('processed');
        $order->setProcessedAt(new \DateTimeImmutable());

        $this->em->flush();

        // The Order entity is now tracked in the Identity Map.
        // It will stay there until the EntityManager is cleared.
        // Process 10,000 messages? 10,000 Order entities in memory.
        // Plus every related entity Doctrine lazy-loaded along the way.
    }
}

Each invocation adds at least one entity to the Identity Map. If the Order entity has associations — a Customer, a collection of LineItem entities, a ShippingAddress — and any of those are accessed (even by a serializer or logger), they join the map too.

Here is what happens to memory over 50,000 messages:

Memory growth over time
# Symfony Messenger worker processing orders
# Memory sampled every 1,000 messages

Messages processed:     1,000
  Resident memory:      48 MB
  Identity Map size:    1,012 entities

Messages processed:     5,000
  Resident memory:      142 MB
  Identity Map size:    5,847 entities

Messages processed:    10,000
  Resident memory:      289 MB
  Identity Map size:   12,204 entities

Messages processed:    25,000
  Resident memory:      681 MB
  Identity Map size:   31,558 entities

Messages processed:    50,000
  Resident memory:     1,342 MB   # 1.3 GB for a message handler
  Identity Map size:   63,891 entities

# Linear growth. No plateau. The worker will eventually hit
# the memory_limit and restart — or the OOM killer will visit first.

1.3 GB. For a message handler that processes one order at a time. The growth is linear and unbounded because every entity ever fetched remains in the Identity Map, referenced by Doctrine's UnitOfWork, ineligible for garbage collection.

This is not a bug. It is documented behavior. Doctrine's own documentation states that the Identity Map holds references to all managed entities for the lifetime of the EntityManager. In a web request, that lifetime is milliseconds. In a Messenger worker, it is hours.

The lazy-loading amplifier

The entity count in the Identity Map is almost always higher than the number of explicit find() calls, and the discrepancy can be dramatic. The reason is lazy loading. Every time your code — or code that runs on your behalf, such as a serializer, a Twig template, or an event listener — accesses an association on a Doctrine entity, Doctrine fires a query and loads the related entity into the Identity Map.

How one find() becomes twelve Identity Map entries
<?php

// A single find() can trigger a cascade of Identity Map entries.
// Watch what happens with a moderately complex entity graph:

$order = $this->em->find(Order::class, 42);
// Identity Map: 1 entity (Order #42)

$customer = $order->getCustomer();
// Lazy-load triggered. Identity Map: 2 entities.

$items = $order->getItems()->toArray();
// Lazy-load triggered. If 5 line items:
// Identity Map: 7 entities (1 Order + 1 Customer + 5 LineItem)

foreach ($items as $item) {
    $product = $item->getProduct();
    // Each Product lazy-loads individually (classic N+1)
}
// Identity Map: 12 entities (1 + 1 + 5 + 5 products)

// One find() call. 12 entities in the Identity Map.
// A serializer or API transformer that walks the full graph
// can easily touch 20-50 entities per root object.
// Over 1,000 messages: 20,000-50,000 tracked entities.

The numbers in the memory growth table earlier — 12,204 entities after 10,000 messages — suggest roughly 1.2 entities per message. That is conservative. A handler that touches customer data, shipping addresses, or order line items can easily load 5-10 entities per message. I have observed workers in production where the Identity Map contained 8x the number of messages processed, because each message triggered a cascade of lazy loads through a moderately deep entity graph.

The Identity Map size after processing N messages is not N. It is N multiplied by the average depth of entity graph traversal per message. The deeper your associations, the faster the map grows.

PHP's memory model: why clear() alone does not suffice

To understand why EntityManager::clear() is necessary but insufficient, you need to understand how PHP manages memory. This is not an academic exercise. It is the difference between a worker that leaks 50 MB per hour and one that stays flat.

PHP reference counting and circular references
<?php

// PHP's memory model: why circular references matter so much.
//
// PHP uses reference counting for immediate garbage collection:
// - Every variable assignment increments a refcount
// - Every unset/scope exit decrements a refcount
// - When refcount reaches 0, memory is freed immediately
//
// But circular references break this:

class Order {
    public ?Customer $customer = null;
}
class Customer {
    public array $orders = [];
}

$order = new Order();
$customer = new Customer();
$order->customer = $customer;      // $customer refcount: 2
$customer->orders[] = $order;      // $order refcount: 2

unset($order);                      // $order refcount: 1 (not 0!)
unset($customer);                   // $customer refcount: 1 (not 0!)

// Both objects still have refcount 1 because they reference each other.
// PHP's reference-counting GC cannot free them.
// Only gc_collect_cycles() can detect and break cycles.
//
// Doctrine entities with bidirectional associations are ALWAYS cycles.
// ManyToOne + OneToMany between Order and Customer = cycle.
// ManyToMany with inversedBy/mappedBy = cycle.
// Self-referencing trees (Category parent/children) = cycle.

PHP uses two garbage collection mechanisms. The primary mechanism is reference counting: every PHP value maintains a count of how many variables, properties, and array entries point to it. When the count reaches zero, the memory is freed immediately. This is fast and deterministic — no pause, no GC sweep, no stop-the-world collection.

The second mechanism is the cyclic garbage collector, triggered manually by gc_collect_cycles() or automatically when PHP's GC buffer fills (by default, after 10,000 potential cycle roots accumulate). This collector uses a mark-and-sweep algorithm to detect and free objects trapped in reference cycles.

Doctrine entities with bidirectional associations are always reference cycles. An Order has a $customer property pointing to a Customer. That Customer has an $orders collection pointing back to the Order. When clear() removes both from the Identity Map, their reference counts drop from (let's say) 3 to 2. Not zero. The cycle holds them in memory until the cyclic GC runs.

In a web request, this is irrelevant — the process dies moments later and the kernel reclaims everything. In a Messenger worker that runs for hours, cycles accumulate between GC sweeps and the automatic GC buffer threshold may not trigger often enough to keep up with the production rate. That 50 MB per hour leak? It is almost always circular references surviving clear().

Why EntityManager::clear() is necessary but insufficient

The first thing everyone reaches for is EntityManager::clear(). It empties the Identity Map, detaches all managed entities, and resets the UnitOfWork's change tracking state.

The obvious fix
<?php

// The fix everyone reaches for first:
public function __invoke(ProcessOrderMessage $message): void
{
    $order = $this->em->find(Order::class, $message->getOrderId());
    $order->setStatus('processed');
    $this->em->flush();

    // Clear the Identity Map after each message
    $this->em->clear();
}

// This works. Mostly.
// But there are three scenarios where it does not free memory
// as completely as you would expect.

This works for the common case. But there are three scenarios — all common in production Symfony applications — where clear() does not free memory as completely as you would expect.

Three scenarios where clear() falls short
<?php

// Scenario 1: Circular references prevent garbage collection
// Doctrine entities with bidirectional associations hold references
// to each other. PHP's reference-counting GC cannot collect cycles
// without an explicit gc_collect_cycles() call.

$order = $this->em->find(Order::class, 42);
$customer = $order->getCustomer();    // Customer references back to Order
$items = $order->getItems();          // Each Item references back to Order

$this->em->clear();

// The Identity Map is empty. But $order, $customer, and each $item
// still hold references to each other in a cycle.
// PHP's refcount GC will NOT free them until gc_collect_cycles() runs.

// Scenario 2: Event listeners holding entity references
// If a Doctrine event listener or subscriber stores entity references
// in instance properties, clear() does not touch those.

class AuditListener
{
    private array $processedEntities = [];  // grows forever

    public function postFlush(PostFlushEventArgs $args): void
    {
        // Storing references that survive clear()
        foreach ($this->processedEntities as $entity) {
            // ... audit logic
        }
    }
}

// Scenario 3: Logger or profiler holding query references
// Symfony's profiler and Doctrine's SQLLogger store every query
// object, which can reference parameter objects/entities.
// In production workers, the profiler should be disabled entirely.

The circular reference problem is the most prevalent. Doctrine entities with bidirectional associations — Order has many LineItems, each LineItem belongs to an Order — form reference cycles. PHP's reference-counting garbage collector cannot free cycles. Only the cyclic garbage collector, triggered by gc_collect_cycles(), can break them.

After clear(), the Identity Map is empty, but the objects themselves may persist in memory if your handler still holds a variable pointing into the cycle. Even without explicit variables, PHP internals can retain references in less obvious places — in the call stack, in generator frames, in error handler contexts.

Here is the impact quantified:

Measuring the circular reference gap
<?php

// Demonstrating the gc_collect_cycles() impact on circular references.

$em = $this->getEntityManager();

// Simulate: process 1,000 orders with bidirectional Customer relation
for ($i = 1; $i <= 1000; $i++) {
    $order = $em->find(Order::class, $i);
    $customer = $order->getCustomer();  // bidirectional: creates cycle
    $em->flush();
    $em->clear();
}

echo memory_get_usage(true);  // ~85 MB — cycles still in memory

gc_collect_cycles();

echo memory_get_usage(true);  // ~34 MB — cycles collected

// The 51 MB difference is entirely circular references that clear()
// removed from the Identity Map but could not free from PHP memory.
// In a long-running worker, this difference accumulates every batch.

51 MB. That is the difference between clear() alone and clear() plus gc_collect_cycles(), after just 1,000 entities with one bidirectional association each. In a worker processing 50,000 messages per day with entities averaging three bidirectional associations, the difference is the gap between a 256 MB worker and a 1.5 GB worker.

The doctrine/orm#5929 issue tracks this exact problem. It has been open since 2016. The core team's position is reasonable: circular references are a PHP runtime concern, not an ORM concern. They are correct. But your worker is still leaking memory.

The event listener scenario is more insidious because it is invisible. An AuditListener that stores entity references in an instance property survives clear() entirely — the listener is a Symfony service, its lifetime is the process, and its instance properties are never touched by the EntityManager. If you have custom Doctrine listeners, audit them. Literally.

The profiler scenario is the simplest to fix and the most embarrassing to discover in production. Symfony's profiler and Doctrine's SQL logger store every query executed during the process lifetime. In development, this is fine — the process handles one request. In a production Messenger worker, it is a list that grows with every message. Ensure APP_ENV=prod and APP_DEBUG=0 on your worker processes. I mention this because I have seen it misconfigured more often than I would like to admit.

The cleanup strategy table

Six approaches, from simplest to most architectural. The right choice depends on your tolerance for complexity and how severely the leak affects your workers.

StrategyWhen to applyAdvantagesDisadvantagesGC-friendly?
EntityManager::clear()After each message / batchSimple, built-inDoes not break circular refs; detaches all entitiesPartial
clear() + gc_collect_cycles()After each message / batchHandles circular references properlygc_collect_cycles() has CPU cost (~1-5ms)Yes
Messenger middleware (doctrine_clear_entity_manager)Automatic per messageNo manual code; ships with SymfonyNo gc_collect_cycles(); no custom cleanupPartial
Custom middleware with GCAutomatic per messageFull control; handles cycles; logs metricsMust maintain custom middleware classYes
Worker --memory-limit flagWorker process levelSafety net; restarts worker before OOMBand-aid — does not fix the leak, just limits damageN/A
DBAL instead of ORMArchitecture levelEliminates the problem entirely; faster for writesLoses ORM abstraction, lifecycle events, domain modelN/A

For most Symfony applications, the recommended combination is: Symfony's built-in doctrine_clear_entity_manager middleware, plus a custom middleware that calls gc_collect_cycles(), plus the --memory-limit flag as a safety net. Belt, suspenders, and a backup parachute.

The DBAL alternative deserves particular attention for workers. If your message handler is performing a status update — set a flag, write a timestamp, increment a counter — you may not need an entity at all. A single UPDATE statement through DBAL achieves the same result without any Identity Map entry, any circular reference, or any memory to leak. I will return to this option shortly.

Messenger middleware: the correct integration point

Putting clear() inside every handler is fragile. New handlers forget it. Existing handlers have it in inconsistent places — some before the flush, some after, some not at all. The correct approach is a middleware that runs after every message, regardless of which handler processed it.

Symfony ships one out of the box:

config/packages/messenger.yaml
# Symfony already ships a middleware for this since 5.4:
# doctrine_clear_entity_manager is registered by default when
# DoctrineBundle is installed.
#
# config/packages/messenger.yaml

framework:
    messenger:
        buses:
            messenger.bus.default:
                middleware:
                    - doctrine_ping_connection    # reconnect if dropped
                    - doctrine_close_connection   # release after each message
                    - doctrine_clear_entity_manager  # clear() the IdentityMap

# But — and this is critical — the default middleware only calls
# EntityManager::clear(). It does NOT call gc_collect_cycles().
# For workers processing entities with bidirectional associations,
# you still need the cyclic GC call.

The built-in middleware handles the common case. But for applications with bidirectional Doctrine associations — which is nearly all of them — you want a custom middleware that also triggers PHP's cyclic garbage collector:

src/Middleware/ClearIdentityMapMiddleware.php
<?php

namespace App\Middleware;

use Doctrine\ORM\EntityManagerInterface;
use Symfony\Component\Messenger\Envelope;
use Symfony\Component\Messenger\Middleware\MiddlewareInterface;
use Symfony\Component\Messenger\Middleware\StackInterface;

class ClearIdentityMapMiddleware implements MiddlewareInterface
{
    public function __construct(
        private EntityManagerInterface $em,
    ) {}

    public function handle(Envelope $envelope, StackInterface $stack): Envelope
    {
        try {
            return $stack->next()->handle($envelope, $stack);
        } finally {
            $this->em->clear();

            // Force cyclic reference collection
            gc_collect_cycles();
        }
    }
}

# Register in messenger.yaml:
# framework:
#     messenger:
#         buses:
#             messenger.bus.default:
#                 middleware:
#                     - App\Middleware\ClearIdentityMapMiddleware

The try/finally block is important. If the handler throws an exception, the Identity Map still gets cleared. Without it, a failing handler leaks the entities it loaded before the exception, and those entities accumulate across retries.

The gc_collect_cycles() call typically takes 1-5ms. On a worker processing 100 messages per second, that is 100-500ms per second of GC overhead — noticeable but rarely the bottleneck. On a worker processing 10 messages per second, the cost is negligible. Profile your specific workload.

An instrumented middleware for production observability

The basic middleware solves the leak. But in production, you want to see it working. You want evidence, not faith. Here is a version that logs memory metrics every 1,000 messages, giving you a continuous view of your worker's memory behavior:

src/Middleware/InstrumentedClearMiddleware.php
<?php

namespace App\Middleware;

use Doctrine\ORM\EntityManagerInterface;
use Psr\Log\LoggerInterface;
use Symfony\Component\Messenger\Envelope;
use Symfony\Component\Messenger\Middleware\MiddlewareInterface;
use Symfony\Component\Messenger\Middleware\StackInterface;

class InstrumentedClearMiddleware implements MiddlewareInterface
{
    private int $messagesProcessed = 0;
    private float $totalGcTime = 0.0;

    public function __construct(
        private EntityManagerInterface $em,
        private LoggerInterface $logger,
    ) {}

    public function handle(Envelope $envelope, StackInterface $stack): Envelope
    {
        try {
            return $stack->next()->handle($envelope, $stack);
        } finally {
            $uowSize = $this->em->getUnitOfWork()->size();

            $this->em->clear();

            $gcStart = hrtime(true);
            $collected = gc_collect_cycles();
            $gcMs = (hrtime(true) - $gcStart) / 1_000_000;

            $this->messagesProcessed++;
            $this->totalGcTime += $gcMs;

            // Log every 1000 messages for monitoring
            if ($this->messagesProcessed % 1000 === 0) {
                $this->logger->info('Worker memory report', [
                    'messages' => $this->messagesProcessed,
                    'memory_mb' => round(memory_get_usage(true) / 1024 / 1024, 1),
                    'peak_mb' => round(memory_get_peak_usage(true) / 1024 / 1024, 1),
                    'last_uow_size' => $uowSize,
                    'gc_collected' => $collected,
                    'gc_ms' => round($gcMs, 2),
                    'avg_gc_ms' => round($this->totalGcTime / $this->messagesProcessed, 2),
                ]);
            }
        }
    }
}

The uowSize metric tells you how many entities each handler loads — useful for spotting handlers with unexpectedly deep entity graph traversal. The gc_collected count tells you how many circular references were broken — if this number is consistently high, your entities have many bidirectional associations and the cyclic GC is doing real work. The gc_ms metric tells you the CPU cost of that work.

If gc_collected is consistently zero, you have no circular references and can safely remove the gc_collect_cycles() call. If it is consistently in the hundreds, the call is essential and the 1-5ms cost is buying you significant memory savings. Let the numbers decide.

I should note: hrtime(true) returns nanoseconds and is monotonic — it is the correct function for measuring short durations. microtime(true) would also work but is subject to NTP adjustments, which is irrelevant for 1-5ms measurements but worth mentioning for correctness.

Profiling memory in CLI commands and batch jobs

Messenger workers are the most visible case, but CLI commands that process large datasets hit the same wall. Symfony Console commands that iterate over thousands of entities without clearing the EntityManager will consume memory proportional to the dataset size.

Here is a pattern for profiling exactly where the memory goes:

src/Command/MemoryProfileCommand.php
<?php

namespace App\Command;

use Symfony\Component\Console\Attribute\AsCommand;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Doctrine\ORM\EntityManagerInterface;
use App\Entity\Order;

#[AsCommand(name: 'app:memory-profile')]
class MemoryProfileCommand extends Command
{
    public function __construct(
        private EntityManagerInterface $em,
    ) {
        parent::__construct();
    }

    protected function execute(InputInterface $input, OutputInterface $output): int
    {
        $batchSize = 100;
        $processed = 0;

        $orders = $this->em->getRepository(Order::class)
            ->findBy(['status' => 'pending'], limit: 10000);

        foreach ($orders as $order) {
            $order->setStatus('processed');
            $processed++;

            if ($processed % $batchSize === 0) {
                $this->em->flush();
                $this->em->clear();
                gc_collect_cycles();

                $output->writeln(sprintf(
                    'Processed: %d | Memory: %s | Peak: %s | UoW size: %d',
                    $processed,
                    $this->formatBytes(memory_get_usage(true)),
                    $this->formatBytes(memory_get_peak_usage(true)),
                    $this->em->getUnitOfWork()->size(),
                ));
            }
        }

        $this->em->flush();

        return Command::SUCCESS;
    }

    private function formatBytes(int $bytes): string
    {
        return round($bytes / 1024 / 1024, 1) . ' MB';
    }
}

And the output, demonstrating the difference between clearing and not clearing:

Memory profile results
# Without clear() — memory grows linearly:
Processed: 100   | Memory: 32.0 MB  | Peak: 32.0 MB  | UoW size: 100
Processed: 200   | Memory: 38.0 MB  | Peak: 38.0 MB  | UoW size: 200
Processed: 500   | Memory: 56.0 MB  | Peak: 56.0 MB  | UoW size: 500
Processed: 1000  | Memory: 84.0 MB  | Peak: 84.0 MB  | UoW size: 1000
Processed: 5000  | Memory: 298.0 MB | Peak: 298.0 MB | UoW size: 5000
Processed: 10000 | Memory: 571.0 MB | Peak: 571.0 MB | UoW size: 10000

# With clear() + gc_collect_cycles() every 100 entities:
Processed: 100   | Memory: 32.0 MB  | Peak: 32.0 MB  | UoW size: 0
Processed: 200   | Memory: 32.0 MB  | Peak: 32.0 MB  | UoW size: 0
Processed: 500   | Memory: 32.0 MB  | Peak: 34.0 MB  | UoW size: 0
Processed: 1000  | Memory: 32.0 MB  | Peak: 34.0 MB  | UoW size: 0
Processed: 5000  | Memory: 32.0 MB  | Peak: 34.0 MB  | UoW size: 0
Processed: 10000 | Memory: 32.0 MB  | Peak: 34.0 MB  | UoW size: 0

# Flat. The sawtooth pattern is correct behavior.

The numbers are stark. Without clearing, memory grows to 571 MB processing 10,000 entities. With clearing every 100 entities, memory stays flat at 32 MB. The UoW size: 0 after each clear confirms the Identity Map is being emptied.

The Doctrine documentation covers this pattern explicitly in their batch processing guide. The recommended batch size is typically 20-50 entities for inserts and 100-200 for reads. Smaller batches mean more frequent flushes (more database round-trips) but lower peak memory. Larger batches mean fewer round-trips but higher peak memory. For most workloads, 50-100 is the sweet spot — the flush overhead is amortized and the peak memory stays well under 50 MB.

Canonical batch processing pattern
<?php

// The batch processing pattern — iterate and clear in chunks.
// This is the canonical pattern from the Doctrine documentation.
// https://www.doctrine-project.org/projects/doctrine-orm/en/3.3/reference/batch-processing.html

use Doctrine\ORM\EntityManagerInterface;

class ImportService
{
    public function __construct(
        private EntityManagerInterface $em,
    ) {}

    public function importProducts(iterable $records): int
    {
        $batchSize = 50;
        $count = 0;

        foreach ($records as $record) {
            $product = new Product();
            $product->setName($record['name']);
            $product->setPrice($record['price']);
            $product->setSku($record['sku']);

            $this->em->persist($product);
            $count++;

            if ($count % $batchSize === 0) {
                $this->em->flush();
                $this->em->clear();
                // After clear(), all previously persisted Product
                // entities become detached. They are no longer tracked.
                // The Identity Map resets to empty.
            }
        }

        $this->em->flush();  // flush remaining
        $this->em->clear();

        return $count;
    }
}

Memory-efficient iteration with toIterable()

There is a subtlety in the batch processing pattern above that deserves attention. The findBy() call loads the entire result set into memory as an array of entity objects before the loop begins. For 10,000 orders, that is 10,000 entities allocated immediately. The batch clearing releases them in groups, but the initial allocation can still spike memory.

Doctrine 2.11 introduced toIterable() on query objects, which returns a Generator that fetches rows one at a time from the database cursor. Combined with periodic clearing, this keeps memory flat from the first row:

Using toIterable() for memory-efficient iteration
<?php

// Doctrine 2.11+ / 3.x: toIterable() for memory-efficient iteration.
// Instead of loading all results into memory at once,
// iterate one row at a time with manual clear intervals.

use Doctrine\ORM\EntityManagerInterface;

class ReportGenerator
{
    public function __construct(
        private EntityManagerInterface $em,
    ) {}

    public function generateMonthlyReport(\DateTimeImmutable $month): array
    {
        $query = $this->em->createQuery(
            'SELECT o FROM App\Entity\Order o
             WHERE o.createdAt BETWEEN :start AND :end'
        )
        ->setParameter('start', $month->modify('first day of this month'))
        ->setParameter('end', $month->modify('last day of this month'));

        $totals = ['revenue' => 0.0, 'count' => 0];
        $batch = 0;

        // toIterable() returns a Generator — one entity at a time
        foreach ($query->toIterable() as $order) {
            $totals['revenue'] += $order->getTotal();
            $totals['count']++;
            $batch++;

            if ($batch % 200 === 0) {
                $this->em->clear();
                gc_collect_cycles();
            }
        }

        return $totals;
    }
}

// toIterable() uses a cursor internally — PDO fetches rows one at a
// time instead of buffering the full result set. Combined with
// periodic clear(), memory stays flat regardless of result set size.

The distinction matters for large datasets. findBy() with 100,000 entities will spike to several hundred MB on the initial load, then drop as batches clear. toIterable() with the same 100,000 entities stays at baseline memory throughout, because only one entity exists in PHP memory at a time between clear intervals.

I should be honest about the trade-off: toIterable() holds a database cursor open for the duration of the iteration. On PostgreSQL, this means the query's snapshot is held, which can delay autovacuum on the affected tables. For a report that takes 30 seconds, this is fine. For a batch job that iterates for 20 minutes, consider chunking with LIMIT/OFFSET or keyset pagination instead, to release the cursor between chunks.

The detached entity trap

There is a consequence of clear() that catches people the first time, and I would be doing you a disservice not to discuss it frankly. After clearing the EntityManager, every entity you previously fetched becomes detached. It exists in memory as a PHP object, but Doctrine no longer tracks it. If you modify a detached entity and call flush(), Doctrine will throw an exception.

The detached entity problem
<?php

// The detached entity trap — clear() makes entities unmanaged.
// Any reference you still hold becomes a ticking bomb.

public function __invoke(ProcessOrderMessage $message): void
{
    $order = $this->em->find(Order::class, $message->getOrderId());
    $customer = $order->getCustomer();

    $this->em->flush();
    $this->em->clear();   // Both $order and $customer are now DETACHED

    // This will throw:
    // Doctrine\ORM\ORMInvalidArgumentException:
    // "A managed+dirty entity App\Entity\Order was found
    //  during synchronization..."
    $order->setNotes('Follow-up required');
    $this->em->flush();   // BOOM

    // The correct pattern: re-fetch after clear
    $order = $this->em->find(Order::class, $message->getOrderId());
    $order->setNotes('Follow-up required');
    $this->em->flush();   // Works
}

The pattern is straightforward once you know it: clear, then re-fetch anything you need to modify. But it changes the shape of your code. You cannot hold entity references across a clear() boundary. This is the fundamental tension in Doctrine's design when applied to long-running processes — the Identity Map provides consistency guarantees that assume a short-lived EntityManager, and clearing it voids those guarantees.

For Messenger handlers, this is rarely a problem. Each message is independent. Fetch, process, flush, clear. The next message starts fresh. The clear boundary aligns naturally with the message boundary.

For CLI commands that need to cross-reference entities across batches — "process each order, but also update the customer's running total" — you need to restructure the logic around the clear boundaries. Either accumulate the customer IDs and update them in a separate pass, or use DQL UPDATE statements that bypass the Identity Map entirely. The restructuring is not optional. Attempting to hold entity references across clear boundaries will produce exceptions, data loss, or both.

The DBAL alternative: eliminating the problem at its source

I have spent several sections explaining how to manage the Identity Map leak — how to clear it, when to call the cyclic GC, how to instrument the cleanup. Allow me to step back and suggest that for many Messenger handlers, the most elegant solution is to not use the ORM at all.

DBAL — no entities, no Identity Map, no leak
<?php

// When you don't need the Identity Map at all:
// DBAL gives you direct SQL access without entity tracking.

use Doctrine\DBAL\Connection;

class OrderProcessor
{
    public function __construct(
        private Connection $conn,
    ) {}

    public function markProcessed(int $orderId): void
    {
        $this->conn->executeStatement(
            'UPDATE orders SET status = :status, processed_at = NOW()
             WHERE id = :id',
            ['status' => 'processed', 'id' => $orderId]
        );
        // No entity instantiation. No Identity Map entry.
        // No circular references. No memory to leak.
        // The SQL runs and the memory is free.
    }

    public function processInBulk(array $orderIds): int
    {
        // For bulk operations, DBAL is dramatically more efficient.
        // One query instead of N find() + N flush() calls.
        return $this->conn->executeStatement(
            'UPDATE orders SET status = :status, processed_at = NOW()
             WHERE id = ANY(:ids)',
            [
                'status' => 'processed',
                'ids' => $orderIds,
            ],
            [
                'ids' => Connection::PARAM_INT_ARRAY,
            ]
        );
    }
}

// Trade-off: you lose lifecycle events, validation callbacks,
// and the domain model abstraction. For simple status updates
// in workers, this trade is almost always worth making.

Doctrine's DBAL (Database Abstraction Layer) gives you direct access to SQL without the ORM's entity tracking, Identity Map, or UnitOfWork overhead. For message handlers that perform writes — updating a status, recording a timestamp, incrementing a counter — DBAL is faster, uses less memory, and eliminates the Identity Map problem entirely. One query. No entities. No memory to leak.

The trade-off is real. You lose lifecycle events (prePersist, postUpdate), validation callbacks, and the domain model abstraction. If your handler's business logic depends on entity methods, computed properties, or Doctrine events, DBAL requires you to reimplement that logic in SQL or in the handler itself. For complex business rules, the ORM earns its overhead.

But I observe, in production Symfony applications, that the majority of Messenger handlers perform simple data transformations: read a message, update a status, write a timestamp, maybe send a notification. For these handlers, the ORM's object mapping, change tracking, and Identity Map management are overhead without benefit. DBAL does the job in a single round-trip with no cleanup required.

This is not a criticism of Doctrine. It is a recognition that the ORM and the DBAL solve different problems, and Symfony Messenger handlers often fall on the DBAL side of that boundary. Use the tool that matches the task.

"The ORM is not the enemy. But the gap between what it expresses and what PostgreSQL executes is where performance problems live. For handlers that need speed and simplicity, sometimes the shortest path to the database is the best one."

— from You Don't Need Redis, Chapter 3: The ORM Tax

An honest counterpoint: when the Identity Map is doing exactly what you want

I have described the Identity Map as a leak, as a problem, as something requiring attention. I should be forthcoming about the cases where it is doing precisely what it should do, because pretending those cases do not exist would be dishonest and unhelpful.

If your Messenger handler processes a complex order — loading the order, its customer, its line items, its shipping address, its payment records — and those entities reference each other during the business logic, the Identity Map is the reason your code works correctly. Without it, $lineItem->getOrder() and the original $order variable might be different PHP objects, and modifications to one would not appear in the other. The Identity Map guarantees object identity: there is exactly one PHP object per database row per EntityManager. That guarantee is what makes complex entity graph manipulation safe.

If your handler processes one message in 50ms and your worker handles 20 messages per second, the Identity Map adds approximately 5-15 KB per message. At that rate, it takes 4-5 hours to reach 256 MB, and the --memory-limit flag will restart the worker cleanly. For low-throughput workers with complex business logic, the "leak" may be slow enough that periodic restarts are a perfectly acceptable management strategy.

The Identity Map becomes a problem when throughput is high (hundreds of messages per minute), when entity graphs are deep (many associations loaded per message), or when the worker must run for extended periods without restarts (Kubernetes pods with slow rolling deploys, for instance). If none of these apply to your situation, a --memory-limit=256M flag and periodic restarts may be all the management you need. Do not add complexity to solve a problem that does not yet exist in your system.

A note on WeakReference and the future

PHP 8.0 introduced WeakReference and WeakMap, and it is natural to ask whether these could solve the Identity Map problem at the ORM level — a weak identity map that allows the garbage collector to reclaim entities when no application code holds a reference.

WeakReference — promising but not a drop-in solution
<?php

// PHP 8.0+ introduced WeakReference and WeakMap.
// Could Doctrine use WeakReferences for the Identity Map?
//
// In theory: yes. WeakReferences allow the GC to collect
// the referent even while the WeakReference exists.
// A WeakMap-based Identity Map would not prevent GC.
//
// In practice: the Identity Map is not the only reference.
// The UnitOfWork's originalEntityData, entityStates, and
// entityChangeSets arrays also hold strong references.
// Change tracking requires the original data snapshot
// to compute diffs at flush time.
//
// There is an open discussion (doctrine/orm#10864) about
// optional weak identity maps for read-only use cases,
// but it would require fundamental changes to how Doctrine
// tracks entity state.
//
// For now, clear() remains the mechanism. WeakMaps are not
// a drop-in solution — they would change Doctrine's
// consistency guarantees in ways the maintainers (rightly)
// consider breaking.

The short answer is: not yet, and not without fundamental changes to Doctrine's consistency model. The Identity Map is only one of several data structures that hold strong references to entities. The original data snapshots for dirty checking, the entity state tracking, and the change set computation all require the entity to remain in memory and unmodified until flush() is called. A WeakMap-based Identity Map would allow the GC to collect entities mid-UnitOfWork, which would break change tracking in ways that are difficult to reason about.

There is active discussion in the Doctrine community about opt-in weak identity maps for read-only use cases — scenarios where you fetch entities for display or export but never modify them. This would be genuinely useful for report generation and data export commands. But it is not available today, and adopting it would require explicit opt-in at the query level, not a configuration toggle.

For now, clear() remains the mechanism. It is not elegant, but it is reliable, well-understood, and supported by the core team.

RoadRunner and FrankenPHP: the same problem, different entry point

Symfony Messenger is the most common context for this problem, but it is not the only one. RoadRunner and FrankenPHP represent a broader shift in PHP architecture: persistent application processes that handle multiple requests without restarting.

RoadRunner / FrankenPHP — resetting between requests
<?php

// RoadRunner and FrankenPHP keep a single PHP process alive
// across thousands of HTTP requests. The Doctrine EntityManager
// persists between requests unless explicitly reset.

// In a traditional PHP-FPM setup, the process dies after each request.
// The Identity Map dies with it. Memory leak? Impossible.
// The architecture was the garbage collector.

// RoadRunner/FrankenPHP removes that safety net.

// Symfony Runtime component handles this for HTTP:
// https://github.com/php-runtime/runtime

// But for custom integrations, you need to reset manually:

use Doctrine\ORM\EntityManagerInterface;
use Doctrine\Bundle\DoctrineBundle\Registry;

class RequestHandler
{
    public function __construct(
        private Registry $doctrine,
    ) {}

    public function handle($request): Response
    {
        try {
            // ... handle request using EntityManager
            return $response;
        } finally {
            // Reset all entity managers — clears Identity Map,
            // resets UnitOfWork, closes and reopens connection
            foreach ($this->doctrine->getManagers() as $manager) {
                $manager->clear();
            }
            gc_collect_cycles();
        }
    }
}

Symfony's Runtime component handles EntityManager reset for HTTP requests served through these runtimes. But if you have custom workers, background loops, or non-HTTP entry points, the reset is your responsibility.

The mental model shift is significant. In PHP-FPM, you could be sloppy with memory because the process died after every request. In persistent runtimes, every allocation that is not explicitly freed accumulates. The Identity Map is the most visible accumulator, but it is not the only one. Monolog handlers with buffering enabled, event dispatcher listeners with state, cached service instances that grow over time, Twig's template cache — anything that grows over the process lifetime becomes a leak in a persistent process.

If you are migrating from PHP-FPM to RoadRunner or FrankenPHP, audit every Symfony service for state accumulation. The Identity Map is the first thing you will find because Doctrine is the most memory-hungry service in most Symfony applications. It will not be the last.

The --memory-limit safety net
# The safety net — always set a memory limit on Messenger workers.
# This does NOT fix the leak. It limits the blast radius.

# In supervisor config:
[program:messenger-worker]
command=php /var/www/bin/console messenger:consume async --memory-limit=256M --time-limit=3600
autostart=true
autorestart=true
numprocs=4
startsecs=0

# --memory-limit=256M  → worker restarts after consuming 256 MB
# --time-limit=3600    → worker restarts after 1 hour regardless
# autorestart=true     → supervisor immediately starts a new worker

# This is a guardrail, not a solution. A worker that restarts
# every 500 messages because it hit 256 MB is wasting time on
# process startup and connection re-establishment.
# Fix the leak AND set the limit.

Always set a memory limit on Messenger workers, even after fixing the Identity Map leak. It is the last line of defense against leaks you have not found yet. A worker that restarts every few hours because it reached 256 MB is healthy. A worker that gets OOM-killed at 4 GB at 3 AM is not.

I should note that the --time-limit flag is equally important. Even a worker with perfectly flat memory should restart periodically. PHP's internal memory allocator can become fragmented over time, and long-running PostgreSQL connections can accumulate session-level state (prepared statements, advisory locks, temporary tables) that benefits from a fresh start. A 1-hour time limit is a reasonable default. Adjust based on your process startup cost — if your worker takes 5 seconds to boot (compiling the container, warming caches), restarting every hour is fine. If it takes 30 seconds, extend the time limit proportionally.

The connection side: what Gold Lapel handles so you don't have to

The Identity Map leak is an application-layer problem. It lives in PHP's memory, in Doctrine's UnitOfWork, and the fix — clear() plus gc_collect_cycles() — is application code. No proxy, no external tool, no database configuration change can fix it for you. That is your responsibility, and now you have the patterns to handle it.

But long-running workers have a second problem that is not your responsibility: connection management.

Connection lifecycle in long-running workers
# The connection side of long-running workers:
#
# When a Symfony worker runs for hours, its PostgreSQL connection
# can go stale — dropped by firewalls, killed by idle timeouts,
# or invalidated by a pg_terminate_backend() call.
#
# The doctrine_ping_connection middleware helps, but it adds
# a round-trip ping before each message.
#
# Gold Lapel handles this at the proxy layer:
# - Connections between GL and PostgreSQL are pooled and recycled
# - If a backend connection drops, GL transparently assigns a new one
# - Your worker's connection to GL stays alive; GL manages the rest
# - No ping overhead, no stale connection errors at 3 AM
#
# The Identity Map leak is your application's responsibility.
# The connection lifecycle is ours.

A Symfony Messenger worker that runs for eight hours maintains a single PostgreSQL connection for the entire duration. That connection can go stale — dropped by network timeouts, killed by idle_in_transaction_session_timeout, terminated by a DBA running maintenance. The doctrine_ping_connection middleware helps by pinging before each message, but it adds a round-trip to every message processed. With four workers processing 50 messages per second, that is 200 extra round-trips per second just to confirm connections are alive.

Gold Lapel handles this at the proxy layer. Your workers connect to Gold Lapel. Gold Lapel maintains a pool of connections to PostgreSQL, recycling them as needed, transparently replacing any that drop. If a backend connection goes stale at 3 AM, Gold Lapel assigns a new one from the pool. Your worker never notices. No ping overhead, no reconnection logic, no Doctrine\DBAL\Exception\ConnectionLost waking up your on-call engineer.

The Identity Map is application memory. Connections are infrastructure. Fix the Identity Map with clear() and gc_collect_cycles(). Let Gold Lapel handle the connections. Each layer solves the problem it is positioned to solve.

A checklist for Symfony workers in production

If you will permit me a summary — not of what was said, but of what to do. A checklist for long-running Symfony processes that use Doctrine:

  1. Register doctrine_clear_entity_manager middleware in your Messenger bus configuration. This is the baseline. It ensures clear() runs after every message without relying on individual handlers to remember.
  2. Add a custom middleware that calls gc_collect_cycles() after clearing. If your entities have bidirectional associations — and if you use Doctrine, they almost certainly do — the cyclic GC is the difference between a slow leak and no leak.
  3. Set --memory-limit on every worker. 256 MB is a reasonable starting point. This is a safety net, not a fix. If your worker hits this limit within minutes, the Identity Map clearing is not working and you need to investigate.
  4. Set --time-limit on every worker. 3600 seconds (1 hour) is sensible for most workloads. This guards against memory fragmentation, connection staleness, and leaks you have not found yet.
  5. Ensure APP_DEBUG=0 on worker processes. Symfony's profiler and Doctrine's SQL logger are legitimate leaks in long-running processes. They should never be active in production workers.
  6. Audit your Doctrine event listeners for entity references stored in instance properties. These survive clear() and grow without bound.
  7. Consider DBAL for simple write-only handlers. If the handler's job is "update a status column," an ORM entity is overhead without benefit. One SQL statement through DBAL does the job with zero Identity Map impact.
  8. Use toIterable() for large read operations in CLI commands and batch jobs. It keeps memory flat by fetching rows one at a time from a database cursor, rather than loading the full result set into memory.
  9. Instrument your middleware. Log UoW size, GC collections, and memory usage periodically. You cannot manage what you cannot measure, and memory behavior in production often diverges from what profiling in development suggests.

The Identity Map is not a flaw in Doctrine's design. It is a pattern optimized for the lifecycle PHP has historically provided — short-lived, request-scoped processes. As Symfony Messenger, RoadRunner, and FrankenPHP extend that lifecycle, the application must take responsibility for the cleanup that PHP-FPM used to perform for free. The good news is that the cleanup is straightforward: clear the Identity Map, collect the cycles, set a memory limit. The patterns above will see your workers through whatever the message queue sends their way.

If the connections give you trouble, you know where to find me.

Frequently asked questions

Terms referenced in this article

The Symfony Messenger middleware pattern above manages memory. The PostgreSQL connections those workers hold deserve equal attention. I have written a guide to PHP-FPM and PostgreSQL connection exhaustion that covers persistent connections, pool sizing under FPM, and the connection accumulation patterns that long-running workers amplify.