Tom Cannaerts

Is your Symfony/Doctrine import consuming a lot of memory?

If you’ve ever needed to write a mass-dataimport (eg. a periodic product/stock import for a webshop) in a Symfony project and decided to use your Doctrine backed objects/services from a Command object, chances are that you have run into high memory usage issues.

php app/console mybundle:products:stock:import

You will see the process consume more and more memory as it advances through the import, resulting in either an out of memory, or a performance degration where your import is only processing a fraction of the records per second compared to when it just started. In many cases, this can be brought down to the following 2 causes.

Doctrine EntityManager

The typical import scenario is to first do a lookup to see if the record already exists. If it does, compare the data and update accordingly. If it doesn’t, create a new record.

In Symfony/Doctrine, these records are objects. A reference to all these objects will be kept by the Doctrine EntityManager, preventing PHP to garbage-collect these objects. Even when you unset the reference to the object as you move to the next record, it is still kept in memory by Docrtrine.

To overcome this, you can manually clear the EntityManager, causing it to release all references. Keep in mind that you will lose all changes that have not yet been flushed to the underlying datastore, so flush first.

$this->em->flush();
$this->em->clear();

Since flushing is an ‘expensive’ operation, this can affect performance. Flushing in batches will typically be faster than flushing after every record, but making your batches too big can cause problems of its own as a large transaction might cause locks in your database for an extended period of time. It’s a bit of a balancing act between keeping memory usage low enough and performance high enough, so this is something you will have to play with a bit to get the optimal batch-size for your specific situation.

Logger

The second culprit is the Logger. When not speficied, your command will be executed in the dev environment, which has a default loglevel of ‘debug’. This setting will cause Doctrine to log each query to the logger. Since the logger will keep these in memory, your memory usage will also keep growing when advancing through your import.

The key here it to select the appropriate environment, or adjust the logging settings for that environment. Selecting the environment is easy, just specify it using the –env option of app/console.

php app/console mybundle:products:stock:import --env=prod

Changing the logging config of your environment can be done by editing the relevant config_xxx.yml file.

# app/config/config_dev.yml
monolog:
    handlers:
        main:
            type: stream
            path: "%kernel.logs_dir%/%kernel.environment%.log"
            level: debug
            channels: [!event]

The two settings to control the logging, are level and channels. Level is pretty straightforward, you just select the level of messages you want to log, going from debug, info, notice, warning to error, critical, alert and emergency.

Channel allows you to specify what channels you want or don’t want to be processed by the handler. Adding !doctrine to the list of channels will prevent Doctrine events from being logged. Do note that you might have several channels defined. You need to make sure the appropriate level/channels are selected for each of them to prevent the events to be processed by the logger.

Hi there! My name is Tom. I’m a Linux System Administrator and PHP Developer. My job is to keep PHP websites running as smooth as possible. Being both on the ‘dev’ and ‘ops’ side gives me a broad skillset across the entire stack.

One Response

  1. bijsterdee
    bijsterdee October 20, 2016 at 16:29 |

    Great article,

    just missing the hydration on demand for batch processing part. Without that, the process will still increase in memory size.

    Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.