Synchronizing Django model definitions

Cloned crystals

This is about a small problem we faced with the models used for customers in YPlan, now Time Out Checkout.

Customers are stored in two models: Customer for active customers, and RemovedCustomer for inactive customers. When a customer closes their account, a subset of the fields are copied to RemovedCustomer, to comply with data retention policies, and then the original Customer is wiped. The two models are defined something like this:

class Customer(models.Model):
    name = models.CharField(max_length=128, blank=True)
    email = models.CharField(max_length=128, null=True, unique=True)
    # etc.

class RemovedCustomer(models.Model):
    name = models.CharField(max_length=128, blank=True)
    email = models.CharField(max_length=128, null=True)
    # etc.

name and email are two of the fields copied on closure - they’re nearly identical, except is not unique, because the same email address could be used for multiple accounts that get removed one after another.

The problem we faced was keeping the definitions of these fields synchronized, differences like unique asides. Initially the two model classes were declared in the usual way, as above, with the field definitions copy-pasted. This meant that changes to one model needed copying to the other. Unfortunately this got forgotten when a field on Customer had its max_length extended, so it wasn’t copied to RemovedCustomer, and the account close function broke for customers using the new longer max_length as their data couldn’t be copied into RemovedCustomer.

The solution was obvious: we wanted a way to declare that this field should be the same as that field, allowing for overrides like unique for email. Django doesn’t have any built-in function to do this, but it’s not hard to make your own, given a few nice things about Python classes and Django models.

Firstly, there is nothing special about constructing a field in a Django model’s class body. Python class bodies are code contexts like any other, populating a dict that goes on to become the class. Any model ‘magic’ from Django happens after the class body finishes executing, when its Model metaclass rearranges the fields in the class dict and does other processing. Therefore we don’t need to use field classes to create field objects - we can use a function that returns one instead, for example:

class RemovedCustomer(models.Model):
    name = plz_clone_field(Customer, 'name')

Secondly, Django fields are fairly easy to clone. They can’t be copied with copy.deepcopy(), because they get ‘attached’ to the model class by the model meta ‘magic’. However, they do have a handy method called deconstruct(), used for serializing in migrations, which returns a 4-tuple that describes how to reconstruct the field object. Using this we can create a fresh clone of a field, doing something like:

name, klass_path, fargs, fkwargs = field.deconstruct()
field_class = import_string(klass_path)
new_field = field_class(*fargs, **fkwargs)

In our code we created a simple function clone_field based on this snippet. Given a model class, the name of a field to clone from it, and any keyword-arg overrides, it returns a clone of that field. Using it for our models above, it looks like:

def clone_field(model_class, name, **kwargs):
    name, klass_path, fargs, fkwargs = model_class._meta.get_field(name).deconstruct()
    field_class = import_string(klass_path)
    return field_class(*fargs, **fkwargs)

class RemovedCustomer(models.Model):
    name = clone_field(Customer, 'name')
    email = clone_field(Customer, 'email', unique=False)

This elegantly declares what to copy with any differences. Because this happens at class definition time, it can’t affect any of the model meta ‘magic’, as the fields ‘look’ as if they were normally constructed. And this prevents the bug we saw - a change in, e.g., max_length for would be synchronized to automatically, and Django migrations would detect it for both models equally.

Pretty neat!

Tags: django