Working Around Memory Leaks in Your Django Application2019-09-19
Several large Django applications that I’ve worked on ended up with memory leaks at some point. The Python processes slowly increased their memory consumption until crashing. Not fun. Even with automatic restart of the process, there was still some downtime.
Memory leaks in Python typically happen in module-level variables that grow unbounded.
This might be an
lru_cache with an infinite
maxsize, or a simple list accidentally declared in the wrong scope.
Leaks don’t need to happen in your own code to affect you either. For example, see this excellent write-up by Peter Karp at BuzzFeed where he found one inside Python’s standard library (since fixed!).
The below workarounds all restart worker processes after so many requests or jobs. This is a simple way to clear out any potential infinitely-accumulating Python objects. If your web server, queue worker, or similar has this ability but isn’t featured, let me know and I’d add it!
Even if you don’t see any memory leaks right now, adding these will increase your application’s resilience.
If you’re using Gunicorn as your Python web server, you can use the
--max-requests setting to periodically restart workers.
Pair with its sibling
--max-requests-jitter to prevent all your workers restarting at the same time.
This helps reduce the worker startup load.
For example, on a recent project I configured Gunicorn to start with:
For the project’s level of traffic, number of workers, and number of servers, this would restart workers about every 1.5 hours. The jitter of 5% was be enough to de-correlate the restart load.
If you’re using uWSGI, you can use its similar
This also restarts workers after so many requests.
For example, on a previous project I used this setting in the
uwsgi.ini file like:
Uwsgi also provides the
max-requests-delta setting for adding some jitter.
But since it’s an absolute number it’s more annoying to configure than Gunicorn.
If you change the number of workers or the value of
max-requests, you will need to recalculate
max-requests-delta to keep your jitter at a certain percentage.
If you’re using the uWSGI Spooler for background tasks, you’ll also want to set the
This restarts a spooler process after it has processed so many background tasks.
This is also set in
Celery provides a couple of different settings for memory leaks.
First, there’s the
This restarts worker child processes after they have processed so many tasks.
There’s no option for jitter, but Celery tasks tend to have a wide range of run times so there will be some natural jitter.
Or if you’re using Django settings:
100 jobs is smaller than I suggested above for web requests. In the past I’ve ended up using smaller values for Celery because I saw more memory consumption in background tasks. (I think I also came upon a memory leak in Celery itself.)
The other setting you could use is
This specifies the maximum kilobytes of memory a child process can use before the parent replaces it.
It’s a bit more complicated, so I’ve not used it.
If you do use
worker_max_memory_per_child, you should probably calculate it as a percentage of your total memory, divided per child process.
This way if you change the number of child processes, or your servers’ available memory, it automatically scales.
For example (untested):
psutil to find the total system memory.
It allocates up to 75% (
0.75) to Celery, which you’d only want if it’s a dedicated Celery server.
Tracking Down Leaks
Debugging memory leaks in Python isn’t the easiest, since any function could allocate a global object in any module. They might also occur in extension code integrated with the C API.
Some tools I have used:
- The standard library module
guppy3packages both pre-date tracemalloc and try to do similar things. They’re both a bit less user friendly but I’ve used them successfully before.
- Scout APM which instruments every “span” (request, SQL query, template tag, etc.) with CPython memory allocation counts. Few APM solutions do this. Disclosure: I maintain the Python integration.
Some other useful blog posts:
- Buzzfeed Tech’s write up for a how-to guide using
tracemallocon a production Python web service.
- Fugue’s write up also using
- Benoit Bernard’s “Freaky Python Memory Leak” post where he uses a variety of tools to track down a C-level leak.
May you leak less,
Working on a Django project? Check out my book Speed Up Your Django Tests which covers loads of best practices so you can write faster, more accurate tests.
One summary email a week, no spam, I pinky promise.
- Getting a Django Application to 100% Test Coverage
- How to Add Database Modifications Beyond Migrations to Your Django Project
- How to Score A+ for Security Headers on Your Django Website
Tags: celery, django
© 2020 All rights reserved.