Django: launch pdb when a given SQL query runs

2024-12-05 Debug twice, fix once.

Here’s another technique for using pdb within Django. I will add this to the new debugging chapter of Boost Your Django DX in a future update.

When debugging Django code, you might need to start with a given SQL query and work backwards to find the code that triggered it. This can be surprisingly hard because the ORM doesn’t neatly map to SQL, and the laziness of QuerySets means queries can run far from where they were created.

One way to go from a query to the triggering code is to use database instrumentation. This allows you to write middleware-like functions for SQL queries. In this case, we’ll use that capability to detect when specific queries run and open the debugger.

(I’ve blogged about database instrumentation before, on backporting a Django ORM feature and always-installed instrumentation.)

Below is a context manager that temporarily installs a database instrumentation function. That function opens the debugger with breakpoint() for any query that looks like a SELECT on the example_book table.

from contextlib import contextmanager

from django.db import connection


@contextmanager
def breakpoint_on_target_query():
    def instrument(execute, sql, params, many, context):
        if sql.startswith("SELECT") and 'FROM "example_book"' in sql:
            breakpoint()
        return execute(sql, params, many, context)

    with connection.execute_wrapper(instrument):
        yield

Adjust the if clause as needed to match the query you’re interested in.

Use it by wrapping the code that triggers that query:

from example.tasks import update_authors

with breakpoint_on_target_query():
    update_authors()

When any matching query is executed, the program will pause at the breakpoint() line:

> /.../example/debugging.py(13)instrument()
-> return execute(sql, params, many, context)
(Pdb)

Now the debugger is open, you can inspect the query in the sql variable and the parameters in params:

(Pdb) sql
'SELECT "example_book"."id", "example_book"."title" FROM "example_book" WHERE "example_book"."id" = %s LIMIT 21'
(Pdb) params
(7,)

Then, carry on with your debugging to find the source of the query. For example, use the w (where) command to display a stack trace. Or use the u (up) command to travel up the stack, through Django’s internals, to your code. (Refer to my book or the pdb documentation for more details on using pdb.)

A worked example

I developed this technique when a client asked me to investigate a mysterious query. Loading any deferred field on a particular model triggered an unexpected extra query beyond the load of the deferred field.

I checked this in a Django shell session. First, I loaded a model instance with one field deferred:

In [1]: from django.db import connection, reset_queries

In [2]: from example.models import Book

In [3]: book = Book.objects.defer("description")

Then, I accessed the deferred field, using connection.queries from debug mode to see the executed queries:

In [4]: reset_queries()

In [5]: len(book.description)
Out[5]: 1842

In [6]: connection.queries
Out[6]:
[{'sql': 'SELECT "example_book"."id", "example_book"."description" FROM "example_book" WHERE "example_book"."id" = 1 LIMIT 21',
  'time': '0.030'},
{'sql': 'SELECT "example_book"."id", "example_book"."cover_image" FROM "example_book" WHERE "example_book"."id" = 1 LIMIT 21',
  'time': '0.001'}]

The first query that selected the id and description columns was expected. That is Django loading the deferred field. But the second one was unexpected. It selected the cover_image field, which wasn’t even being touched.

I added database instrumentation as above and adjusted the logic to match the second query:

from contextlib import contextmanager

from django.db import connection


@contextmanager
def breakpoint_on_target_query():
    def instrument(execute, sql, params, many, context):
        if sql.startswith(
            'SELECT "example_book"."id", "example_book"."cover_image" FROM "example_book"'
        ):
            breakpoint()
        return execute(sql, params, many, context)

    with connection.execute_wrapper(instrument):
        yield

Then, I re-ran the example under breakpoint_on_target_query(), opening the debugger when the query ran:

In [6]: with breakpoint_on_target_query():
  ...:      book.description
  ...:
> /.../example.py(13)instrument()
-> return execute(sql, params, many, context)
(Pdb)

The query was indeed the unexpected selection of cover_image:

(Pdb) sql
'SELECT "example_book"."id", "example_book"."cover_image" FROM "example_book" WHERE "example_book"."id" = %s LIMIT 21'

I ran w (where) and got a reasonably large 44-level stack trace. Here are the first and last two frames of it:

(Pdb) w
  /.../manage.py(62)<module>()
-> main()
  /.../manage.py(24)main()
-> execute_from_command_line(sys.argv)
...
-> return executor(sql, params, many, context)
  /.../.../django_read_only/__init__.py(86)blocker()
-> return execute(sql, params, many, context)
> /.../example.py(13)instrument()
-> return execute(sql, params, many, context)

(Nice, the penultimate one is from my library django-read-only, which also uses database instrumentation.)

After some looking and jumping around with u (up), I spotted that refresh_from_db() was called twice. Here are the relevant parts of the stack trace:

...
-> exec(code_obj, self.user_global_ns, self.user_ns)
  <ipython-input-6-899469e2f719>(2)<module>()
-> book.description
  /.../.../django/contrib/gis/db/models/proxy.py(37)__get__()
...
-> instance.refresh_from_db(fields=[field_name])
  /.../.../django/db/models/base.py(724)refresh_from_db()
-> db_instance = db_instance_qs.get()
  /.../.../django/db/models/query.py(645)get()
...
-> instance.refresh_from_db(fields=[field_name])
  /.../.../django/db/models/base.py(724)refresh_from_db()
-> db_instance = db_instance_qs.get()

It turns out that deferred fields call refresh_from_db() to load their missing data. That method constructs a QuerySet that selects only the missing field and calls get() on it, which creates a temporary model instance.

It turned out that the __init__ method of this particular model class read the cover_image field. Something like:

class Book(models.Model):
    ...

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._original_cover_image_name = self.cover_image.name

When the deferred call created an instance with only the id and description fields loaded, accessing cover_image required loading its deferred value. This triggered another call to refresh_from_db() and the second query.

The fix was to change the __init__ method to avoid reading the cover_image field when deferred.

Fin

May you dive freely into code and come to understand all that you encounter,

—Adam

😸😸😸 Check out my new book on using GitHub effectively, Boost Your GitHub DX! 😸😸😸

One summary email a week, no spam, I pinky promise.

Related posts:

Tags: django