Migrate PostgreSQL ID’s from serial to identity after upgrading to Django 4.1

Let me regale you with the ballad of the serial to identity column migration…

The Django 4.1 release notes feature this short, innocent-looking note:

On PostgreSQL, AutoField, BigAutoField, and SmallAutoField are now created as identity columns rather than serial columns with sequences.

In this post, we’ll expand on what this means, why you might want to update existing columns from serial types to identity columns, and a management command to update them.

Serial versus identity

What are “identity” and “serial” columns? Well, basically they’re PostgreSQL’s two different ways to create auto-incrementing ID columns.

Originally PostgreSQL only had serial types, used like:

CREATE TABLE example (
    id serial NOT NULL PRIMARY KEY
);

These serial types are not true data types. Instead, it’s a shorthand that creates a column and a sequence for its default value.

PostgreSQL 10 (Oct 2017) added support for SQL-standard identity columns, used like:

CREATE TABLE example (
    id integer NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
);

(Documented under CREATE TABLE, search for “identity” on the page.)

As the syntax shows, the column has a data type, integer, and then “identity” behaviour on top of that. There is an associated sequence for generating values, but PostgreSQL manages this internally.

For regular usage, there is not much difference between the two types. INSERTed rows get incrementing integer primary keys, guaranteed unique.

But identity columns do have a number of advantages:

For more info on identity columns see:

What changed in Django 4.1

Ticket #30511 updated Django’s variously-sized AutoField classes to use identity columns, on PostgreSQL. Initially the ticket discussed adding alternative fields, but ultimately the decision was made to change the existing field classes. This change was released in Django 4.1, with the aforementioned release note.

For example, take this model class:

from django.db import models


class Book(models.Model):
    title = models.TextField()

It has a classic implicit id field, using BigAutoField (the default since Django 3.2).

On version 4.0, Django makes the id column with the bigserial pseudo-type:

$ ./manage.py sqlmigrate core 0001
BEGIN;
--
-- Create model Book
--
CREATE TABLE "core_book" ("id" bigserial NOT NULL PRIMARY KEY, "title" text NOT NULL);
COMMIT;

But from version 4.1, Django uses the identity column syntax:

$ ./manage.py sqlmigrate core 0001
BEGIN;
--
-- Create model Book
--
CREATE TABLE "core_book" ("id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, "title" text NOT NULL);
COMMIT;

Nice and modern.

Should you upgrade previously created columns?

The change in Django only affects newly created columns. Django will not migrate any previously created serial columns.

Existing databases will thus use identity columns for new tables, whilst retaining serial columns for pre-existing tables. Thus your staging and production databases will end up with a mix of types between old and new models. Meanwhile, if you rebuild your local development database occasionally, you’ll end up using only identity columns there.

The differences between the two types aren’t visible in typical Django code. You can carry on doing regular model things, like Model.objects.create(), and new ID’s will be assigned the same way with both types.

But you might encounter some of the differences listed above, at some point in the future. For example, you might want to reset ID values (useful in tests), for which the syntax differs between the types. Thus, I think it’s best to update old serial types to identity columns, in all your environments’ databases.

Simon Charette commented on the ticket:

I guess we could also include an update script in the release notes for users that want to avoid having a mix of serial and IDENTITY primary keys.

What follows is my attempt at an update script, in the form of a management command.

A management command to migrate from serial to identity

Below is the source for a management command that can update your columns from serial to identity. I’m just including it in this post since I don’t think it’s worth releasing as a package. Copy it into your project, e.g. as example/core/management/commands/migrate_postgres_serial_columns.py.

This code is released under the MIT license, no warranty of any kind, etc. If it breaks, you get to keep the shiny pieces.

The command loops over all serial-looking-columns in the current database, checks their related sequence, and updates them if the “dry run mode” is disabled with --write. The migration process is adapted from Erwin Brandstetter’s answer to the Stack Overflow question “How to change a table ID from serial to identity?”

Due to use of nextval(), even dry run mode will advance the sequence values, leaving a “gap”. But you already shouldn’t rely on auto-incrementing ID’s being gapless It’s possible to see gaps from other operations, such as when promoting a replica you can see a gap of 32 ID’s. (If you need gapless ID’s, use django-sequences.)

To be sure that concurrent inserts don’t advance the serial sequence, the update process takes an ACCESS EXCLUSIVE lock on the table. This prevents all concurrent activity on the table, but I think it should be okay even for high concurrency environments as the update is brief. As a precaution, the modification sets a short statement timeout.

I’ve already run this command on my client Silvr’s production database. We migrated 68 ID columns successfully, with the largest table containing 9 million rows. The whole command took completed in less than a second.

Here’s the code:

from __future__ import annotations

import argparse
from collections.abc import Callable
from typing import Any

from django.core.management.base import BaseCommand
from django.db import DEFAULT_DB_ALIAS, connections
from django.db.backends.utils import CursorWrapper
from django.db.transaction import atomic


class Command(BaseCommand):
    help = "Migrate all tables using 'serial' columns to use 'identity' instead."

    def add_arguments(self, parser: argparse.ArgumentParser) -> None:
        parser.add_argument(
            "--database",
            default=DEFAULT_DB_ALIAS,
            help='Which database to update. Defaults to the "default" database.',
        )
        parser.add_argument(
            "--write",
            action="store_true",
            default=False,
            help="Actually edit the database.",
        )
        parser.add_argument(
            "--like",
            default="%",
            help="Filter affected tables with a SQL LIKE clause on name.",
        )

    def handle(
        self, *args: Any, database: str, write: bool, like: str, **kwargs: Any
    ) -> None:
        def output(text: str) -> None:
            self.stdout.write(text)
            self.stdout.flush()

        if not write:
            output("In dry run mode (--write not passed)")

        with connections[database].cursor() as cursor:
            cursor.execute(find_serial_columns, (like,))
            column_specs = cursor.fetchall()
            output(f"Found {len(column_specs)} columns to update")

            cursor.execute("SET statement_timeout='3s'")

            for table_name, column_name in column_specs:
                migrate_serial_to_identity(
                    database, write, output, cursor, table_name, column_name
                )


# Adapted from: https://dba.stackexchange.com/a/90567
find_serial_columns = """\
SELECT
     a.attrelid::regclass::text AS table_name,
     a.attname AS column_name
FROM pg_attribute a
     JOIN pg_class c ON c.oid = a.attrelid
WHERE
    a.attrelid::regclass::text LIKE %s
    AND c.relkind IN ('r', 'p')  /* regular and partitioned tables */
    AND a.attnum > 0
    AND NOT a.attisdropped
    AND a.atttypid = ANY ('{int,int8,int2}'::regtype[])
    AND EXISTS (
        SELECT FROM pg_attrdef ad
        WHERE
            ad.adrelid = a.attrelid
            AND ad.adnum = a.attnum
            AND (
                pg_get_expr(ad.adbin, ad.adrelid)
                =
                'nextval('''
                || (
                    pg_get_serial_sequence(a.attrelid::regclass::text, a.attname)
                )::regclass
                || '''::regclass)'
            )
    )
ORDER BY a.attnum
"""


def migrate_serial_to_identity(
    database: str,
    write: bool,
    output: Callable[[str], None],
    cursor: CursorWrapper,
    table_name: str,
    column_name: str,
) -> None:
    # Adapted from “How to change a table ID from serial to identity?”
    # answer on Stack Overflow:
    # https://stackoverflow.com/a/59233169

    qn = cursor.db.ops.quote_name
    print(f"{qn(table_name)}.{qn(column_name)}", flush=True)

    # Get sequence name
    cursor.execute(
        "SELECT pg_get_serial_sequence(%s, %s)",
        (table_name, column_name),
    )
    sequence_name = cursor.fetchone()[0]
    print(f"    Sequence: {sequence_name}", flush=True)

    with atomic(using=database):
        # Prevent concurrent inserts so we know the sequence is fixed
        if write:
            cursor.execute(
                f"LOCK TABLE {qn(table_name)} IN ACCESS EXCLUSIVE MODE",
            )

        # Get next sequence value
        cursor.execute("SELECT nextval(%s)", (sequence_name,))
        next_value = cursor.fetchone()[0]

        print(f"    Next value: {next_value}", flush=True)

        if write:
            # Drop default, sequence
            cursor.execute(
                f"""\
                ALTER TABLE {qn(table_name)}
                    ALTER COLUMN {qn(column_name)} DROP DEFAULT
                """
            )
            cursor.execute(f"DROP SEQUENCE {sequence_name}")

            # Change column to identity
            cursor.execute(
                f"""\
                ALTER TABLE {qn(table_name)}
                    ALTER {qn(column_name)}
                        ADD GENERATED BY DEFAULT AS IDENTITY (RESTART %s)
                """,
                (next_value,),
            )

            print("    Updated.", flush=True)

Enjoy.

Update (2022-10-29): Adjusted the query in find_serial_columns to avoid a permissions error with TOAST tables, thanks to a report from Claude Paroz.

Example usage

Here are some runs of the command on an example project that has just the above Book model.

First, running the command identifies migratable columns:

$ ./manage.py migrate_postgres_serial_columns
In dry run mode (--write not passed)
Found 2 columns to update
"django_migrations"."id"
    Sequence: public.django_migrations_id_seq
    Next value: 2
"core_book"."id"
    Sequence: public.core_book_id_seq
    Next value: 1

The unmanaged django_migrations table also appears, sneaky.

If you want to focus on just one or a few tables at a time, you can use --like to filter the table names with a SQL LIKE clause:

$ ./manage.py migrate_postgres_serial_columns --like 'core_%s'
In dry run mode (--write not passed)
Found 1 columns to update
"core_book"."id"
    Sequence: public.core_book_id_seq
    Next value: 2

Running with --write actually migrates columns:

$ ./manage.py migrate_postgres_serial_columns --write
Found 2 columns to update
"django_migrations"."id"
    Sequence: public.django_migrations_id_seq
    Next value: 3
    Updated.
"core_book"."id"
    Sequence: public.core_book_id_seq
    Next value: 2
    Updated.

Re-running after this finds no more columns to update:

$ ./manage.py migrate_postgres_serial_columns
In dry run mode (--write not passed)
Found 0 columns to update

Awesome.

Because the command only acts on columns using serial, it’s safe to re-run as necessary.

You can verify the effects of the migration with psql, most easily via Django’s dbshell command. Before updating, you can see the "Default" is nextval() on an automatically-created sequence:

$ ./manage.py dbshell
psql (14.5 (Homebrew))
Type "help" for help.

example=# \d core_book
                            Table "public.core_book"
 Column |  Type  | Collation | Nullable |                Default
--------+--------+-----------+----------+---------------------------------------
 id     | bigint |           | not null | nextval('core_book_id_seq'::regclass)
 title  | text   |           | not null |
Indexes:
    "core_book_pkey" PRIMARY KEY, btree (id)

After updating, this changes to generated by default as identity:

$ ./manage.py dbshell
psql (14.5 (Homebrew))
Type "help" for help.

example=# \d core_book
                         Table "public.core_book"
 Column |  Type  | Collation | Nullable |             Default
--------+--------+-----------+----------+----------------------------------
 id     | bigint |           | not null | generated by default as identity
 title  | text   |           | not null |
Indexes:
    "core_book_pkey" PRIMARY KEY, btree (id)

Cool beans.

Fin

Thanks to Florian Apolloner, Michael Christofides, and Simon Charette for reviewing this post. And thanks to Greg Taperro of Silvr for helping test the command in production.

May your identity always be clear,

—Adam


Make your development more pleasant with Boost Your Django DX.


Subscribe via RSS, Twitter, Mastodon, or email:

One summary email a week, no spam, I pinky promise.

Related posts:

Tags: ,