How to Modernize a Django Index Definition with Zero Downtime
If you’ve read the Django documentation for
Model.Meta.index_together recently, you may have noticed this note:
indexesoption instead. The newer
indexesoption provides more functionality than
index_togethermay be deprecated in the future.
Django historically provided index control for a single field with
Field(db_index=True), and for multiple fields in
Meta.index_together. These are good for specifying indexes for one or more fields, but they don’t give you access to the full power of database indexes.
Meta.indexes option was added in Django 1.11 (2017) to allow use of more index features through the
Index() class. Initially
Index() added support for indexes with descending ordering. It now supports
db_tablespace to control storage,
opclasses to use PostgreSQL’s various operator classes for indexes, and
condition to create partial indexes that don’t contain every row.
So, how do you “upgrade” from
Well first, this isn’t necessary. Neither feature is actually deprecated, and they’re not likely to be either. If you have an old project using either
Meta.index_together, you’re best leaving it in place and using
indexes for new indexes.
But this change is a good example of how to make a “zero downtime” migration, with low risk. It can be a good be a nice exercise for learning more about Django’s migrations.
Let’s take this model:
from django.db import models class Status(models.TextChoices): UNPUBLISHED = "UN", "Unpublished" PUBLISHED = "PB", "Published" class Book(models.Model): status = models.CharField( max_length=2, choices=Status.choices, default=Status.UNPUBLISHED, ) title = models.CharField(max_length=200) class Meta: index_together = [["status", "title"]]
Status class is using Django 3.0’s new enumeration types.)
Our model uses
index_together, which we’ll change to use
indexes. The process should be similar to change
Field(db_index=True) to use
We’ll look at two methods. The first uses a rebuild of the index, which can take some time to run on large tables. The second retains the existing index for “zero downtime.”
Note we’ll not be changing the definition of the index at all. If you want to upgrade an index to use any of the extra features of
Index(), such as
condition, databases typically cannot change the index in-place. You’ll need to add a new index in one migration, then remove the original index in a second migration.
To rebuild, we’d need only to drop
index_together and add
indexes with an equivalent
from django.db import models class Book(models.Model): status = models.CharField( max_length=2, choices=Status.choices, default=Status.UNPUBLISHED, ) title = models.CharField(max_length=200) class Meta: indexes = [ models.Index( name="core_book_status_title_idx", fields=["status", "title"], ) ]
When we run
makemigrations, we’ll end up with a migration file like this:
from django.db import migrations, models class Migration(migrations.Migration): dependencies = [ ("core", "0001_initial"), ] operations = [ migrations.AlterIndexTogether(name="book", index_together=set()), migrations.AddIndex( model_name="book", index=models.Index( fields=["status", "title"], name="core_book_status_title_idx" ), ), ]
This is functional. But if we run
sqlmigrate, we’ll see that it does
DROP INDEX followed by
$ python manage.py sqlmigrate core 0002 BEGIN; -- -- Alter index_together for book (0 constraint(s)) -- DROP INDEX "core_book_status_title_6099efdb_idx"; -- -- Create index core_book_status_title_idx on field(s) status, title of model book -- CREATE INDEX "core_book_status_title_idx" ON "core_book" ("status", "title"); COMMIT;
This isn’t great for large tables, where creating an index might take hours. Additionally PostgreSQL and SQLite will lock the table for writes whilst they make the new index. On PostgreSQL, we could swap
AddIndexConcurrently (Django 3.0+) to prevent the lock.
Let’s look at the second method that avoids the work of recreating the index.
To achieve zero down, we need to add the new
Index() definition using the existing index name, and then write a migration that tells Django nothing needs to change in the database.
The first thing we need is the name that Django auto-generated for the index. This combines the table name, included field names, and a hash. The hashing algorithm has changed a couple of times in Django’s history versions, so to be safe we’ll retrieve the index name from the database. We can do this with a little SQL in
For example, on SQLite, we can run the
.indexes command to list the indexes on our model’s table, and pick ours from the list:
python manage.py dbshell SQLite version 3.24.0 2018-06-04 14:10:15 Enter ".help" for usage hints. sqlite> .indexes core_book ... core_book_status_title_6099efdb_idx ... sqlite>
On MariaDB/MySQL, the query to run is:
SHOW INDEXES FROM core_book;
On PostgreSQL, the query to run is:
SELECT tablename, indexname, indexdef FROM pg_indexes WHERE schemaname = 'public' AND tablename = 'core_book' ORDER BY tablename, indexname;
It’s worth checking the index name is identical across all your environments (development, staging, production). The name might differ between environments if their databases were initially created with different Django versions, and thus different hashing algorithms. If the names do differ, we’d probably want to rename the index on all environments to match production.
Second, we want to move this into an
Index() definition, inside
Meta.indexes, using the found name:
from django.db import models class Book(models.Model): status = models.CharField( max_length=2, choices=Status.choices, default=Status.UNPUBLISHED, ) title = models.CharField(max_length=200) class Meta: indexes = [ models.Index( name="core_book_status_title_6099efdb_idx", fields=["status", "title"], ) ]
If we run the
check command at this point, we’ll see an error:
$ python manage.py check SystemCheckError: System check identified some issues: ERRORS: core.Book: (models.E034) The index name 'core_book_status_title_6099efdb_idx' cannot be longer than 30 characters. System check identified 1 issue (0 silenced).
Index() restricts its names to 30 characters to be compatible with Oracle. This is fair enough, and especially applicable to Django core and third party packages which should be compatible with all database backends. If you’re using Oracle, the old
index_together name should be < 30 characters.
For other backends we have more characters to work with:
- SQLite - 1,000,000
- PostgreSQL - 63
- MariaDB/MySQL - 64
In this case, we can safely disable the check. Do this by adding the check ID to the
SILENCED_SYSTEM_CHECKS = [ # Allow index names >30 characters, because we aren’t using Oracle "models.E034", ]
This is a little bit dangerous as it removes the check for every index. However tests should discover if any future index has an overly long index name, because the database should raise an error during migrations.
(N.B. there’s an open ticket to allow more granular system check silencing.)
check will show it is now silenced:
$ python manage.py check System check identified no issues (1 silenced).
We should then run
makemigrations with flags to make a new migration:
$ python manage.py makemigrations core --name book_indexes Migrations for 'core': index_change/core/migrations/0002_book_indexes.py - Alter index_together for book (0 constraint(s)) - Create index core_book_status_title_6099efdb_idx on field(s) status, title of model book
--name to avoid the automatic migration name.)
Our new migration is identical to the previous downtime-inducing one. We need to modify it to allow Django’s migrations to consider these changes as applied, without running any SQL.
This operation class takes two lists of migration operations.
database_operations is compiled to SQL and run on the database.
state_operations is applied to the in-memory version of models. This separation allows us to perform some operations in the database, and tell Django “another” thing happened to the actual model classes. It’s useful for changes that can’t be auto-detected correctly, for example Changing a ManyToManyField to use a through model.
In our case, we want to do nothing to the database, so we provide
database_operations=. In the state layer, we want to tell Django that we’ve “removed”
index_together and “added” the new index in the database. To do this, we can move the auto-generated
AddIndex operations into our
Our migration ends up like this:
from django.db import migrations, models class Migration(migrations.Migration): dependencies = [ ("core", "0001_initial"), ] operations = [ migrations.SeparateDatabaseAndState( database_operations=, state_operations=[ migrations.AlterIndexTogether( name="book", index_together=set(), ), migrations.AddIndex( model_name="book", index=models.Index( fields=["status", "title"], name="core_book_status_title_6099efdb_idx", ), ), ], ), ]
We can verify both sets of operations before running the migration.
First, we can check
database_operations really does nothing with
$ python manage.py sqlmigrate core 0002 BEGIN; -- -- Custom state/database change combination -- COMMIT;
Great - no SQL statements there, except the normal
Second, we can check that
state_operations does tell Django our migrations match the latest defition of our models. We do this by running
makemigrations --dry-run to ensure the autodetector doesn’t find anything to change:
$ python manage.py makemigrations core --dry-run No changes detected in app 'core'
This should now be ready to deploy, after our normal test suite passes :)
I hope this helps you understand this improved way of defining indexes, and how to write zero downtime migrations,
If your Django project’s long test runs bore you, I wrote a book that can help.
One summary email a week, no spam, I pinky promise.