“Create Table As Select” in Django

A discussion recently came up on django-developers mailing list about how to build the SQL query CREATE TABLE ... AS SELECT (CTAS) with Django’s ORM. This statement and its cousin INSERT ... SELECT are useful for re-shaping data inside your database, using SELECT queries.
They can be useful for building aggregate tables from complicated, slow queries, as a way of imitating materialized views on databases that don’t support them. Django doesn’t support these at current, but I found it’s not hard to implement using some Django ORM internals, and have an example function below. (Support might have changed by the time you’re reading this - check the linked mailing list thread!)
Example Use Case
For example, you might have a widget_summary table for analytics purposes built with a SELECT from your widget table joined with flimflams. You can then regularly rebuild the table with that CTAS query.
We do this in a transaction on databases with transactional DDL (PostgreSQL, SQLite). Start a transaction, drop the table, recreate it with CTAS, and commit.
On databases without transactional DDL, we need a temporary table. Create the table with a new name, hot-swap it in place with the atomic RENAME, then drop the old one.
Let's look at the non-transactional method. The SQL looks like:
/* Drop temporary tables in case the process got cut off before */
DROP TABLE IF EXISTS widget_summary_old, widget_summary_new;
/* Create the new table from the summary query */
CREATE TABLE widget_summary_new AS
SELECT
widget.id,
widget.name,
flimflam.garbles
FROM
products
INNER JOIN
flimflam
ON widget.flimflam_id = flimflam.id;
/* Hot-swap rename */
RENAME TABLE widget_summary TO widget_summary_old,
widget_summary_new TO widget_summary;
/* Drop old table */
DROP TABLE widget_summary_old;
In Django
Here’s a function that allows CTAS in Django today:
from django.db import DEFAULT_DB_ALIAS, connections
def create_table_as(table_name, queryset, using=DEFAULT_DB_ALIAS):
compiler = queryset.query.get_compiler(using=using)
sql, params = compiler.as_sql()
connection = connections[DEFAULT_DB_ALIAS]
sql = "CREATE TABLE " + connection.ops.quote_name(table_name) + " AS " + sql
with connection.cursor() as cursor:
cursor.execute(sql, params)
I’ve tested it on Django 2.2 but it should work with older versions.
It works by:
- Getting the connection and creating a cursor for the database named by
using. - Using the
QuerySet’s internalqueryproperty to get itscompiler. This is part of Django’s internal API, meaning it’s undocumented and may change between Django versions without warning. In practice, the SQL compilation is quite stable, and I don’t believe these have changed much in a while. For an internal example, seeModelIterable.__iter__. - Building SQL with the compiler, but not executing it. This is the main job of the compiler. See the full code if you dare!
- Modifying the SQL with the CTAS prefix. We use the
quote_namemethod to ensure the table name is escaped on the database backend. This is something the Django ORM always does to avoid accidental self-SQL injection. Table names can contain spaces! - Executing the SQL with its parameters using the standard Python DB-API.
Modifying this code to do INSERT ... SELECT should need changes only to the sql = line.
With this, we can recreate the above widgets example like so:
with connection.cursor() as cursor:
# /Drop temporary tables in case the process got cut off before
cursor.execute("DROP TABLE IF EXISTS widget_summary_old, widget_summary_new")
# Create the new table from the summary query
create_table_as(
"widget_summary_new",
Widget.objects.values_list("id", "name", "flimflam__garbles"),
)
# Hot-swap rename
cursor.execute(
"""\
RENAME TABLE widget_summary TO widget_summary_old,
widget_summary_new TO widget_summary
"""
)
# Drop old table
cursor.execute("DROP TABLE widget_summary_old")
The QuerySet here is quite simple, but you can use all the ORM’s features to extend it to dizzying heights.
😸😸😸 Check out my new book on using GitHub effectively, Boost Your GitHub DX! 😸😸😸
One summary email a week, no spam, I pinky promise.
Related posts:
- Getting a Django Application to 100% Test Coverage
- Working Around Memory Leaks in Your Django Application
- Synchronizing Django model definitions
Tags: django