How to Add Database Modifications Beyond Migrations to Your Django Project

2019-08-07 Migrating geese

On several Django projects I’ve worked on, there has been a requirement for performing database modifications beyond Django migrations. For example:

Managing stored procedures
Managing check constraints, which weren’t supported before Django 2.2
Importing static data from a file
Recording migration operations in a log

Let’s look at three approaches to extending Django to do this as neatly as possible.

1. Use Django migrations

Often I find developers have only been taught how to use Django migrations for model operations. They might know some SQL, but since they haven’t used it within Django, they assume migrations can’t use SQL directly. They can! Many of the uses of custom migration-style SQL code that I’ve seen could be better implemented within migrations.

Let’s take a look at an example we’ll use for the rest of the article. Imagine you’re running a Django version before 2.2 that doesn’t support database check constraints. You might add one to a database table with this SQL:

ALTER TABLE
  myapp_book
ADD
  CONSTRAINT percent_lovers_haters_sum CHECK (
    (percent_lovers + percent_haters) = 100
  );

This constraint will make the database raise an error for any rows added or updated in the myapp_book table that have a percentage of lovers and haters not equal to 100. Neat!

With Django’s default table naming, the table myapp_book would be created for a model called Book inside the app myapp. We’ll use those names in this article too.

The SQL above could be run using django.db.connection:

from django.db import connection

with connection.cursor() as cursor:
    cursor.execute(
        """
        ALTER TABLE
          myapp_book
        ADD
          CONSTRAINT percent_lovers_haters_sum CHECK (
            (percent_lovers + percent_haters) = 100
          );
        """
    )

This works, but it’s a bit ad-hoc. You might run it using manage.py shell or better in a custom management command (more on those later!).

Instead of doing those though, we can turn it into a migration.

First, you can create a new migration with:

$ python manage.py makemigrations --empty myapp
Migrations for 'myapp':
  myapp/migrations/0101_auto_20190715_1057.py

You can then edit that migration file to use the RunSQL operation:

from django.db import migrations


class Migration(migrations.Migration):

    dependencies = [
        ("myapp", "0100_add_book"),
    ]

    operations = [
        migrations.RunSQL(
            """
            ALTER TABLE
              myapp_book
            ADD
              CONSTRAINT percent_lovers_haters_sum CHECK (
                (percent_lovers + percent_haters) = 100
              );
        """
        )
    ]

A little bonus step is to rename the migration file to something descriptive. For example, instead of 0101_auto_20190715_1057.py we could call this 0101_add_book_percentage_sum_constraint.py. This helps a lot in the long term.

To improve the migration beyond our initial SQL, we can add the reverse_sql argument. This tells Django how to reverse the migration. You’ll rarely need to reverse a migration, but when you do… you really do! Be prepared, as I learned in the Scouts.

For our example, we can expand the migration operation to:

migrations.RunSQL(
    sql="""
        ALTER TABLE
          myapp_book
        ADD
          CONSTRAINT percent_lovers_haters_sum CHECK (
            (percent_lovers + percent_haters) = 100
          );
    """,
    reverse_sql="ALTER TABLE myapp_book DROP CONSTRAINT percent_lovers_haters_sum",
)

This will remove the check constraint when reversing the operation.

If the code you want to run is more complex, for example you want to run something on every model/table, you might use the RunPython operation. With it you can do pretty much anything - the power of Python!

It also supports a reverse option, which is worth adding - check out the docs.

Evaluation

This approach fits many use-cases, and it also helps provide a lifetime for any objects created by your custom SQL. For example, if we wanted to drop our constraint at a later date, we could create a similar migration with DROP CONSTRAINT as the forwards SQL. Beautiful symmetry!

One drawback is that it will only run once. This might not make sense if you have something that needs running frequently, as you might need to add a lot of operations.

You might consider writing a custom migration operation or hooking into the pre_migrate signal. However in my experience, the following two approaches would be easier.

2. Override the ‘migrate’ management command

This feels like a bit of a secret feature, given how few projects I’ve seen use it. However, I have found it handy on several occasions.

Django allows you to override management commands by adding another with the same name. You can override commands from Django core, or those in other apps. This is documented in “custom management commands”.

When running a command, Django searches through all the apps in INSTALLED_APPS, and then the core. The first place it finds a management command with the given name wins.

Thus, your apps can override any built-ins.

To override migrate and add your custom behaviour, you’ll want to create myapp/management/commands/migrate.py (replacing myapp with the name of one of your apps). Inside that file you can then subclass the built-in migrate command and add your own behaviour. For example:

from django.core.management.commands.migrate import Command as CoreMigrateCommand
from myapp.db import create_constraints


class Command(CoreMigrateCommand):
    def handle(self, *args, **options):
        # Do normal migrate
        super().handle(*args, **options)

        # Then our custom extras
        create_constraints()

(The custom management commands documentation is likely helpful for writing your own command.)

Overriding works because all throughout Django’s code, migrate is called as a command rather than imported directly. This is done with the call_command function. So, if you’ve overridden migrate, your new version is called instead of the one from core.

We can see this in Django’s test framework code. Its setup_databases function calls each connections’ create_test_db method. These in turn run call_command('migrate') like so (as of version 2.2.3):

# We report migrate messages at one level lower than that requested.
# This ensures we don't get flooded with messages during testing
# (unless you really ask to be flooded).
call_command(
    "migrate",
    verbosity=max(verbosity - 1, 0),
    interactive=False,
    database=self.connection.alias,
    run_syncdb=True,
)

You can override any built-in command.

Evaluation

This approach is more flexible than using RunSQL in migrations. We can add any code we want before or after migrate runs - or even “during” with a context manager.

The major drawback here is we can only override once, in a single app, so it could feel a bit clumsy if we have several app-specific extensions. However, for most projects, I’d recommend keeping it simple.

Having a single “project app” can work really well - I endorse the recommendation in Kristian Glass’ Unofficial FAQ. If you already have multiple apps, you can make one the “core,” have it contain your custom migrate, and then import code from the others. This will be just fine.

That said, sometimes we we want looser coupling, for example when creating third party packages. So let’s look at a final approach.

3. Adding a `post_migrate` signal handler

This approach is slightly more advanced again. It uses Django’s signals, which have a mixed reputation due to their “action at a distance.”

Django sends the post_migrate signal at the very end of migration operations. You can see this in the migrate source code.

To run some extra code at that point, write it as a signal handler. Registering a signal handler is best done in an AppConfig.ready() method, which Django will call at initialization time.

For an example, let’s look at Django’s contenttypes framework. This is included as django.contrib.contenttypes. It uses a post_migrate signal handler to create one ContentType model instance for each model.

The create_contenttypes handler is registered in its AppConfig.ready() like so:

class ContentTypesConfig(AppConfig):
    name = "django.contrib.contenttypes"
    verbose_name = _("Content Types")

    def ready(self):
        pre_migrate.connect(inject_rename_contenttypes_operations, sender=self)
        post_migrate.connect(create_contenttypes)
        checks.register(check_generic_foreign_keys, checks.Tags.models)
        checks.register(check_model_name_lengths, checks.Tags.models)

The handler is defined in the app’s management/__init__.py. This is not the most descriptive filename to contain a signal handler - I’d normally use a handlers.py within the app. The contenttypes framework has it there for historical reasons.

In Django 2.2.3, create_contenttypes is defined like so:

def create_contenttypes(
    app_config,
    verbosity=2,
    interactive=True,
    using=DEFAULT_DB_ALIAS,
    apps=global_apps,
    **kwargs
):
    """
    Create content types for models in the given app.
    """
    if not app_config.models_module:
        return

    app_label = app_config.label
    try:
        app_config = apps.get_app_config(app_label)
        ContentType = apps.get_model("contenttypes", "ContentType")
    except LookupError:
        return

    content_types, app_models = get_contenttypes_and_models(
        app_config, using, ContentType
    )

    if not app_models:
        return

    cts = [
        ContentType(
            app_label=app_label,
            model=model_name,
        )
        for (model_name, model) in app_models.items()
        if model_name not in content_types
    ]
    ContentType.objects.using(using).bulk_create(cts)
    if verbosity >= 2:
        for ct in cts:
            print("Adding content type '%s | %s'" % (ct.app_label, ct.model))

There is quite a lot of logic here. We can ignore most of it right now though, as it’s use-case specific, but you can get the gist by reading it.

The interesting thing to look at is the function signature.

Signal handlers are only called with keyword arguments. For forwards compatibility, they should accept any extras at the end in **kwargs, so that unrecognized arguments added by the sender don’t break the handler - more loose coupling.

The arguments listed here are all as per the post_migrate documentation:

app_config is the current AppConfig - the signal is sent once for each app.
verbosity is the current logging level.
interactive tells us if it’s safe to ask the user for input.
using is the database connection alias, which will vary from DEFAULT_DB_ALIAS when using multiple databases.
apps is an application registry containing the specific state of all models after the migrations have run. Because the user might not have run every migration available, this should be used to access model classes, instead of direct imports.

For writing most custom handlers, I think two of these are the most useful.

First, app_config can be used to restrict your handler to only run for a specific app or set of apps. You can also do this with the sender argument to Signal.connect().

Second, using is worth passing through for any database operations you use, even if for future proofing. Even if your project uses a single database now, it might not in the future, so you should make sure you operate on the same connection that was migrated.

Evaluation

As we’ve discussed, the benefit of this approach is the looser coupling. If you’re writing a third party package, this is probably the way to go, as it reduces the amount of things that users need to install. Signals do require caution, but since we’ve seen Django itself uses this in a contrib app, it’s a sanctioned use.

Fin

I hope this article helps you find the right approach for your project,

—Adam

😸😸😸 Check out my new book on using GitHub effectively, Boost Your GitHub DX! 😸😸😸

One summary email a week, no spam, I pinky promise.

Related posts:

Tags: django, python

How to Add Database Modifications Beyond Migrations to Your Django Project

1. Use Django migrations

Evaluation

2. Override the ‘migrate’ management command

Evaluation

3. Adding a post_migrate signal handler

Evaluation

Fin

3. Adding a `post_migrate` signal handler