Backporting a Django ORM Feature with Database Instrumentation2020-07-29
Last week I covered Django’s database instrumentation, and making a wrapper that’s always installed. Here’s a different use case that I encountered last year on a project.
I was helping optimize query patterns for a model - let’s call it
An import function used a loop to import new model instances based on a unique
for title in titles: Book.objects.get_or_create(title=title, defaults=...)
This function took several seconds as it inserted several hundred records, required for the project to work. This was particularly noticeable at the start of tests.
(Surprise, this is also a test optimization story!)
We can speed up such an import with the
QuerySet.bulk_create() method, passing
This method inserts a list of instances into the database in as few queries as possible (as little as one).
ignore_conflicts=True option skips any instance that conflicts with a unique or primary key constraint:
Book.objects.bulk_create( [Book(title=title, ...) for title in titles], ignore_conflicts=True, )
This is a great solution, but unfortunately it wasn’t available.
The client’s project was on Django 2.0 at the time and the
ignore_conflicts argument to
bulk_create() was added in 2.2.
Upgrading Django would be a rather large yak shave.
Instead, I decided to backport the behaviour of
Normal backporting takes a lot of work:
- Fork the Django repository
- Merge the commit from the later Django version back to your version’s branch (dealing with any merge conflicts)
- Install Django from the forked repository
- Keep your fork up to date with every new minor version.
Instead of doing that, I opted to try backport the behaviour in a different way.
I used database instrumentation to edit the SQL generated by the ORM to match what
I implemented this in a context manager:
from django.db import connection with make_bulk_create_ignore_conflicts(connection): Book.objects.bulk_create( [Book(title=title, ...) for title in titles], ignore_conflicts=True, )
The context manager was a little more involved than solely setting up the database instrumentation wrapper:
from contextlib import contextmanager from unittest import mock import django if django.VERSION >= (2, 2): raise AssertionError( "On Django 2.2+, replace make_bulk_create_ignore_conflicts with use" + " of bulk_create's new argument ignore_conflicts=True" ) @contextmanager def make_bulk_create_ignore_conflicts(connection): # Temporarily disable the backend feature flag to pretend to Django that # PostgreSQL doesn't support "RETURNING". This prevents it fetching the # ID's back, which is a bit different to normal bulk_create, but it # generates SQL which we can easily append to patch_features = mock.patch.object( connection.features, "can_return_ids_from_bulk_insert", False ) # Rewrite SQL to add the same SQL suffix on INSERT statements that Django # 2.2+ uses for ignore_conflict=True # https://github.com/django/django/blob/stable/2.2.x/django/db/models/sql/compiler.py#L1316 def make_inserts_conflict_do_nothing(execute, sql, params, many, context): if sql.startswith("INSERT INTO"): sql += " ON CONFLICT DO NOTHING" return execute(sql, params, many, context) apply_wrapper = connection.execute_wrapper(make_inserts_conflict_do_nothing) with patch_features, apply_wrapper: yield
Let’s look through the code one step at a time.
The first step is the Django version check. This is important so that after upgrading Django the project switches to the official feature. Without such checks, projects tend to accumulate workarounds. It raises an exception at import time to ensure it’s spotted early on when upgrading, regardless of test coverage.
The second step is the definition of the context manager itself.
It uses the
contextlib.contextmanager decorator for simplicity.
I made the name excessively long so it’s noticeable as doing something unusual.
The third step is the patching of
connection.features is part of Django’s undocumented (but mostly stable) database backend API.
The attributes on
features control SQL generation so the output is suitable for the given database.
The patch temporarily disables addition of the
RETURNING clause on
INSERT queries, to allow easy addition of the
ON CONFLICT clause.
Generally, patching global objects is not thread-safe. But it’s okay here because Django creates separate database connection objects per thread.
The fourth step is the database instrumentation wrapper function,
This wrapper inspects each query on its way to the database.
If the SQL is an
INSERT query, it appends the correct
" ON CONFLICT DO NOTHING" suffix, as per the PostgreSQL INSERT documentation.
The fifth step is the
with statement that applies the patch and adds the database instrumentation temporarily.
yield then pauses the
@contextmanager function until the user of our context manager exits its
This was a fun exercise in combining several Django features. I’ve since upgraded the client project to Django 2.2, which triggered the assertion message and allowed me to remove of this backport. But it worked well whilst it lasted.
Thanks to Tom Forbes for the initial implementation of
bulk_create(ignore_conflicts=True) back in Ticket #28668.
I hope this helps you speed up any data imports, or write ORM backports,
🎉 My book Speed Up Your Django Tests is now up to date for Django 3.2. 🎉
Buy now on Gumroad
One summary email a week, no spam, I pinky promise.
- How to Make Always-Installed Django Database Instrumentation
- The Fast Way to Test Django transaction.on_commit() Callbacks
- Disable Instrumentation Packages during Tests
© 2020 All rights reserved.