Django: Modernize index definitions with zero downtime

2020-07-27 Book Index

Update (2024-06-30): My tool django-upgrade now automatically rewrites most index_together declarations, so there’s no need to do this by hand.

Update (2022-11-16): Meta.index_together will be properly deprecated in Django 4.2, so there’s more reason to modernize your index definitions now!

If you’ve read the Django documentation for Model.Meta.index_together recently, you may have noticed this note:

Use the indexes option instead. The newer indexes option provides more functionality than index_together. index_together may be deprecated in the future.

Django historically provided index control for a single field with Field(db_index=True), and for multiple fields in Meta.index_together. These are good for specifying indexes for one or more fields, but they don’t give you access to the full power of database indexes.

The Meta.indexes option was added in Django 1.11 (2017) to allow use of more index features through the Index() class. Initially Index() added support for indexes with descending ordering. It now supports db_tablespace to control storage, opclasses to use PostgreSQL’s various operator classes for indexes, and condition to create partial indexes that don’t contain every row.

“Upgrading”

So, how do you “upgrade” from Field(db_index=True) or Meta.index_together to Meta.indexes?

Well first, this isn’t necessary. Neither feature is actually deprecated, and they’re not likely to be either. If you have an old project using either Field(db_index=True) or Meta.index_together, you’re best leaving it in place and using indexes for new indexes.

But this change is a good example of how to make a “zero downtime” migration, with low risk. It can be a good be a nice exercise for learning more about Django’s migrations.

Let’s take this model:

from django.db import models


class Status(models.TextChoices):
    UNPUBLISHED = "UN", "Unpublished"
    PUBLISHED = "PB", "Published"


class Book(models.Model):
    status = models.CharField(
        max_length=2,
        choices=Status.choices,
        default=Status.UNPUBLISHED,
    )
    title = models.CharField(max_length=200)

    class Meta:
        index_together = [["status", "title"]]

(N.B. the Status class is using Django 3.0’s new enumeration types.)

Our model uses index_together, which we’ll change to use indexes. The process should be similar to change Field(db_index=True) to use indexes.

We’ll look at two methods. The first uses a rebuild of the index, which can take some time to run on large tables. The second retains the existing index for “zero downtime.”

Note we’ll not be changing the definition of the index at all. If you want to upgrade an index to use any of the extra features of Index(), such as condition, databases typically cannot change the index in-place. You’ll need to add a new index in one migration, then remove the original index in a second migration.

Rebuilding Method

To rebuild, we’d need only to drop index_together and add indexes with an equivalent Index() defined:

from django.db import models


class Book(models.Model):
    status = models.CharField(
        max_length=2,
        choices=Status.choices,
        default=Status.UNPUBLISHED,
    )
    title = models.CharField(max_length=200)

    class Meta:
        indexes = [
            models.Index(
                name="core_book_status_title_idx",
                fields=["status", "title"],
            )
        ]

When we run makemigrations, we’ll end up with a migration file like this:

from django.db import migrations, models


class Migration(migrations.Migration):

    dependencies = [
        ("core", "0001_initial"),
    ]

    operations = [
        migrations.AlterIndexTogether(name="book", index_together=set()),
        migrations.AddIndex(
            model_name="book",
            index=models.Index(
                fields=["status", "title"], name="core_book_status_title_idx"
            ),
        ),
    ]

This is functional. But if we run sqlmigrate, we’ll see that it does DROP INDEX followed by CREATE INDEX:

$ python manage.py sqlmigrate core 0002
BEGIN;
--
-- Alter index_together for book (0 constraint(s))
--
DROP INDEX "core_book_status_title_6099efdb_idx";
--
-- Create index core_book_status_title_idx on field(s) status, title of model book
--
CREATE INDEX "core_book_status_title_idx" ON "core_book" ("status", "title");
COMMIT;

This isn’t great for large tables, where creating an index might take hours. Additionally PostgreSQL and SQLite will lock the table for writes whilst they make the new index. On PostgreSQL, we could swap AddIndex for AddIndexConcurrently (Django 3.0+) to prevent the lock.

Let’s look at the second method that avoids the work of recreating the index.

Zero Downtime Method

To achieve zero down, we need to add the new Index() definition using the existing index name, and then write a migration that tells Django nothing needs to change in the database.

The first thing we need is the name that Django auto-generated for the index. This combines the table name, included field names, and a hash. The hashing algorithm has changed a couple of times in Django’s history versions, so to be safe we’ll retrieve the index name from the database. We can do this with a little SQL in dbshell.

For example, on SQLite, we can run the .indexes command to list the indexes on our model’s table, and pick ours from the list:

python manage.py dbshell
SQLite version 3.24.0 2018-06-04 14:10:15
Enter ".help" for usage hints.
sqlite> .indexes core_book
...
core_book_status_title_6099efdb_idx
...
sqlite>

On MariaDB/MySQL, the query to run is:

SHOW INDEXES FROM core_book;

On PostgreSQL, the query to run is:

SELECT tablename, indexname, indexdef
FROM pg_indexes
WHERE schemaname = 'public' AND tablename = 'core_book'
ORDER BY tablename, indexname;

It’s worth checking the index name is identical across all your environments (development, staging, production). The name might differ between environments if their databases were initially created with different Django versions, and thus different hashing algorithms. If the names do differ, we’d probably want to rename the index on all environments to match production.

Second, we want to move this into an Index() definition, inside Meta.indexes, using the found name:

from django.db import models


class Book(models.Model):
    status = models.CharField(
        max_length=2,
        choices=Status.choices,
        default=Status.UNPUBLISHED,
    )
    title = models.CharField(max_length=200)

    class Meta:
        indexes = [
            models.Index(
                name="core_book_status_title_6099efdb_idx",
                fields=["status", "title"],
            )
        ]

If we run the check command at this point, we’ll see an error:

$ python manage.py check
SystemCheckError: System check identified some issues:

ERRORS:
core.Book: (models.E034) The index name 'core_book_status_title_6099efdb_idx' cannot be longer than 30 characters.

System check identified 1 issue (0 silenced).

The new Index() restricts its names to 30 characters to be compatible with Oracle. This is fair enough, and especially applicable to Django core and third party packages which should be compatible with all database backends. If you’re using Oracle, the old index_together name should be < 30 characters.

For other backends we have more characters to work with:

SQLite - 1,000,000
PostgreSQL - 63
MariaDB/MySQL - 64

In this case, we can safely disable the check. Do this by adding the check ID to the SILENCED_SYSTEM_CHECKS setting:

SILENCED_SYSTEM_CHECKS = [
    # Allow index names >30 characters, because we aren’t using Oracle
    "models.E034",
]

This is a little bit dangerous as it removes the check for every index. However tests should discover if any future index has an overly long index name, because the database should raise an error during migrations.

(N.B. there’s an open ticket to allow more granular system check silencing.)

Rerunning check will show it is now silenced:

$ python manage.py check
System check identified no issues (1 silenced).

We should then run makemigrations with flags to make a new migration:

$ python manage.py makemigrations core --name book_indexes
Migrations for 'core':
  index_change/core/migrations/0002_book_indexes.py
    - Alter index_together for book (0 constraint(s))
    - Create index core_book_status_title_6099efdb_idx on field(s) status, title of model book

(We passed --name to avoid the automatic migration name.)

Our new migration is identical to the previous downtime-inducing one. We need to modify it to allow Django’s migrations to consider these changes as applied, without running any SQL.

Enter SeparateDatabaseAndState.

This operation class takes two lists of migration operations. database_operations is compiled to SQL and run on the database. state_operations is applied to the in-memory version of models. This separation allows us to perform some operations in the database, and tell Django “another” thing happened to the actual model classes. It’s useful for changes that can’t be auto-detected correctly, for example Changing a ManyToManyField to use a through model.

In our case, we want to do nothing to the database, so we provide database_operations=[]. In the state layer, we want to tell Django that we’ve “removed” index_together and “added” the new index in the database. To do this, we can move the auto-generated AlterIndexTogether and AddIndex operations into our state_operations list.

Our migration ends up like this:

from django.db import migrations, models


class Migration(migrations.Migration):

    dependencies = [
        ("core", "0001_initial"),
    ]

    operations = [
        migrations.SeparateDatabaseAndState(
            database_operations=[],
            state_operations=[
                migrations.AlterIndexTogether(
                    name="book",
                    index_together=set(),
                ),
                migrations.AddIndex(
                    model_name="book",
                    index=models.Index(
                        fields=["status", "title"],
                        name="core_book_status_title_6099efdb_idx",
                    ),
                ),
            ],
        ),
    ]

We can verify both sets of operations before running the migration.

First, we can check database_operations really does nothing with sqlmigrate:

$ python manage.py sqlmigrate core 0002
BEGIN;
--
-- Custom state/database change combination
--
COMMIT;

Great - no SQL statements there, except the normal BEGIN and COMMIT.

Second, we can check that state_operations does tell Django our migrations match the latest definition of our models. We do this by running makemigrations --dry-run to ensure the autodetector doesn’t find anything to change:

$ python manage.py makemigrations core --dry-run
No changes detected in app 'core'

Great!

This should now be ready to deploy, after our normal test suite passes :)

Fin

I hope this helps you understand this improved way of defining indexes, and how to write zero downtime migrations,

—Adam

😸😸😸 Check out my new book on using GitHub effectively, Boost Your GitHub DX! 😸😸😸

One summary email a week, no spam, I pinky promise.

Related posts:

Django’s Field Choices Don’t Constrain Your Data

Tags: django