Django: fixing a memory “leak” from Python 3.14’s incremental garbage collection

There’s a hole in my migratory bucket…

Back in February, I encountered an out-of-memory error while migrating a client project to Python 3.14. The issue occurred when running Django’s database migration command (migrate) on a limited-resource server, and seemed to be caused by the new incremental garbage collection algorithm in Python 3.14.

At the time, I wrote a workaround and started on this blog post, but other tasks took priority and I never got around to finishing it. But four days ago, Hugo van Kemenade, the Python 3.14 release manager, announced that the new garbage collection algorithm will be reverted in Python 3.14.5, and the next Python 3.15 alpha release, due to reports of increased memory usage.

Here’s the story of my workaround, as extra evidence that reverting incremental garbage collection is a good call.

Python 3.14’s incremental garbage collection

Python (well, CPython) has a garbage collector that runs regularly to clean up unreferenced objects. Most objects are cleaned up immediately when their reference count drops to zero, but some objects can be part of reference cycles, where some set of objects reference each other and thus never reach a reference count of zero. The garbage collector sweeps through all objects to find and clean up these cycles.

Python 3.14 changed garbage collection to operate incrementally. Previously, a garbage collection run would sweep through all objects in one go, but this could lead to “stop the world” stalls where your program’s real work could pause for seconds while the garbage collector did its job. The incremental garbage collection algorithm instead does a fraction of the work at a time, spreading out the cost of garbage collection.

Here’s the full release note (historical source):

Incremental garbage collection

The cycle garbage collector is now incremental. This means that maximum pause times are reduced by an order of magnitude or more for larger heaps.

There are now only two generations: young and old. When gc.collect() is not called directly, the GC is invoked a little less frequently. When invoked, it collects the young generation and an increment of the old generation, instead of collecting one or more generations.

The behavior of gc.collect() changes slightly:

  • gc.collect(1): Performs an increment of garbage collection, rather than collecting generation 1.
  • Other calls to gc.collect() are unchanged.

(Contributed by Mark Shannon in 108362.)

The problem

I’d been helping one of my clients upgrade to Python 3.14 for a few months, chipping away at compatibility work like upgrading dependencies and fixing deprecations. Tests were finally all passing and everything was working on the local development server. The next stop was to launch a temporary deployment using Python 3.14 via Heroku’s review apps feature.

At the basic tier, Heroku review apps use fairly resource-constrained servers, including just 512MB of RAM, with the ability to temporarily burst up to nearly 1GB (200%). Paying for larger servers is an option, but unfortunately the next step up is pretty expensive.

When I launched a review app for my Python 3.14 branch, I found its release phase failed while running migrate. Inspecting the logs, I found the migrations started fine:

$ heroku logs --app example-python-314-wsgk3w --num 1000 | less
...
app[release.6634]: System check identified no issues (26 silenced).
app[release.6634]: Operations to perform:
app[release.6634]: Apply all migrations: admin, auth, contenttypes, ...
app[release.6634]: Running migrations:

…but partway through, these messages started appearing:

heroku[release.6634]: Process running mem=527M(101.5%)
heroku[release.6634]: Error R14 (Memory quota exceeded)

…ramping up until the 200% mark:

heroku[release.9599]: Process running mem=977M(190.3%)
heroku[release.9599]: Error R14 (Memory quota exceeded)

…and finally the termination of the release process:

heroku[release.9599]: Process running mem=1033M(201.7%)
heroku[release.9599]: Error R15 (Memory quota vastly exceeded)
heroku[release.9599]: Stopping process with SIGKILL

These messages came from Heroku’s process management layer, which terminated the memory-hungry release process with SIGKILL after the hard threshold of 1GB memory usage was breached. Repeat attempts hit the same issue.

I was confused: migrations should not consume much memory. While they create a lot of temporary objects (Django model classes and fields) in order to calculate the SQL to send to the database, such objects are all short-lived and should be garbage-collected fairly swiftly. Additionally, migrations worked fine on the local and CI environments, and they’d never had memory issues on previous Python versions.

It looked like there was a memory leak, and it was time to dig in.

Initial investigation

I first profiled memory usage of migrate locally using Memray, the memory profiler that I covered in my previous post, using:

$ memray run manage.py migrate

The profiles revealed that memory usage had slightly increased on Python 3.14 compared to 3.13, but did not find a memory leak (a pattern of continual growth). Still, I made some optimizations to defer some imports, saving about 30% of startup memory usage, and tried again, to no avail.

I then had the idea to profile on a Heroku dyno directly. After hacking the release process to not run migrations, I built a review app and SSH’d into its web server:

$ heroku ps:exec -a example-python-314-rspwtc --dyno web.1 bash
Establishing credentials... done
Connecting to web.1 on ⬢ example-python-314-rspwtc...
~ $

Initially, I tried using Memray’s live mode to profile the migrations as they ran:

$ memray run --live manage.py migrate

While this tool looks great for some situations, it didn’t really work here, especially since it seized up after Heroku terminated the server.

I then tried running the default memray run command:

$ memray run manage.py migrate
Writing profile results into memray-manage.py.724.bin

…then, on my local computer, I repeatedly ran this command to copy down the results file:

$ trash memray-manage.py.724.bin && heroku ps:copy -a example-python-314-rspwtc --dyno web.1 memray-manage.py.724.bin

I was a bit worried here that the Memray binary file might be corrupted due to copying it while memray run was generating it. But with a final truncated copy left over after the server crashed, I asked Memray to generate a flamegraph for it:

$ memray flamegraph memray-manage.py.724.bin

…and it worked! Kudos to the Memray team for making their output format usable even when incomplete.

This more detailed flamegraph revealed more than 50% of the memory usage was allocated in ModelState.render(), which creates temporary model classes:

class ModelState:
    ...

    def render(self, apps):
        """Create a Model object from our current state into the given apps."""
        ...
        return type(self.name, bases, body)

This information hinted that these temporary model classes were hanging around beyond their expected short lifetime, leading to the memory leak. For example, every model class could also end up in a list intended for debugging, but accidentally extending the lifetime of these temporary classes.

I decided to dig a bit deeper using machete-mode debugging, with the below snippet that captures the temporary model classes and logs details about them. I wrote this within the Django settings file, where it was guaranteed to run at Django startup time, before the migrate management command.

import atexit
import gc
import tracemalloc
import weakref
from itertools import islice

from django.db.migrations.state import ModelState

tracemalloc.start(2)

orig_render = ModelState.render

rendered_classes = weakref.WeakSet()


def wrapped_render(*args, **kwargs):
    cls = orig_render(*args, **kwargs)
    rendered_classes.add(cls)
    return cls


ModelState.render = wrapped_render


@atexit.register
def show_referrers():
    print(f"🎯 {len(rendered_classes)} classes referred to.\n")

    for cls in islice(rendered_classes, 2):
        print(f"🎁🎁🎁 {cls!r} 🎁🎁🎁")
        for i, referrer in enumerate(gc.get_referrers(cls), start=1):
            print(f"🍌 Referrer #{i}: {referrer!r}")
            if tb := tracemalloc.get_object_traceback(referrer):
                print("\n".join(tb.format(most_recent_first=True)))
            print()
        print()
        print()

Note:

  1. tracemalloc.start() starts Python’s built-in memory allocation tracking.
  2. The ModelState.render() method was monkeypatched with a wrapper that stores every temporary model class in a WeakSet.
  3. The @atexit.register-decorated function runs at the end of the program, and logs two things.
  4. The first piece of logging is the number of temporary model classes still alive at the end of the program, which should be close to zero. (Some may stick around from the final migration state.)
  5. The second piece of logging iterates over the first two live temporary model classes and logs their name and their referring objects, discovered via gc.get_referrers(). For each referring object, it also logs the traceback of where that object was allocated, using tracemalloc.get_object_traceback() (which is why tracemalloc.start() was needed at the beginning).
  6. The emojis are a bit of fun to make the log messages easier to skim through. I have no idea why I picked 🎁 and 🍌!!

The output from this hook was voluminous, even with the limit to the first two live classes. For example, here’s the output for a temporary ContentType model class:

🎁🎁🎁 <class '__fake__.ContentType'> 🎁🎁🎁
🍌 Referrer #1: <generator object WeakSet.__iter__ at 0x1234ef300>
  File "/.../example/core/apps.py", line 45
    for cls in islice(rendered_classes, 2):

...

🍌 Referrer #11: {'name': 'model', ..., 'model': <class '__fake__.ContentType'>}
  File "/.../.venv/lib/python3.14/site-packages/django/utils/functional.py", line 47
    res = instance.__dict__[self.name] = self.func(instance)
  File "/.../.venv/lib/python3.14/site-packages/django/db/models/fields/__init__.py", line 1210
    self.validators.append(validators.MaxLengthValidator(self.max_length))

I checked the live referrers for a few classes, and they all seemed to be expected. However, it did reveal just how many cycles exist between ORM objects. For example, model classes refer to their field objects, which in turn refer back to their model classes, thanks to Django’s Field.contribute_to_class() creating this reference:

def contribute_to_class(self, cls, name, private_only=False):
    ...
    self.model = cls
    ...

Anyway, from comparing the output between Python 3.13 and 3.14, I could see that no new references were being created on Python 3.14. It seemed likely that the incremental garbage collection algorithm was the culprit.

The workaround

Given the investigation, I wanted to work around the issue by forcing a full garbage collection sweep with gc.collect() after each migration file ran. I came up with the below code, saved as management/commands/migrate.py in one of the project’s Django apps. It extends the default migrate command to run gc.collect() after each successful migration (where “apply” is forwards and “unapply” is backwards).

import gc

from django.core.management.commands.migrate import Command as BaseCommand


class Command(BaseCommand):
    """Extended 'migrate' command."""

    def migration_progress_callback(self, action, migration=None, fake=False):
        """
        Extend Django’s migration progress reporting to force garbage
        collection after each migration. This is a workaround to keep memory
        usage low, especially because we have a low limit on Heroku. It seems
        the incremental garbage collector introduced in Python 3.14 cannot
        keep up with the migration process’s tendency to create many cyclical
        objects, so our best fallback is to force collection of everything
        after each migration is applied or unapplied.

        https://adamj.eu/tech/2026/04/20/django-python-3.14-incremental-gc/
        """
        super().migration_progress_callback(action, migration=migration, fake=fake)
        if action in ("apply_success", "unapply_success"):
            gc.collect()

It felt a bit hacky, but it did the trick! The review app succeeded to launch, showing a flat memory profile as before.

We then continued to deploy to staging and production without any issues, and the team have been happily using Python 3.14 for over a month now.

Fin

Well, that’s where the tale ends right now. After the incremental garbage collection algorithm is reverted in Python 3.14.5, I guess I’ll be able to remove this workaround.

While it would be nice to have incremental garbage collection work well, it’s clear that the current implementation has some issues. I think the core team is making the right call reverting it, but hopefully there will be energy to improve the feature for the future.

May your garbage be collected efficiently and without fuss,

—Adam


😸😸😸 Check out my new book on using GitHub effectively, Boost Your GitHub DX! 😸😸😸


Subscribe via RSS, Twitter, Mastodon, or email:

One summary email a week, no spam, I pinky promise.

Related posts:

Tags: ,