Django and the N+1 Queries Problem

Snakey snake

This is a cross-post from the Scout APM blog, where I occasionally write. I also maintain the Scout Python integration.

The N+1 Queries Problem is a perennial database performance issue. It affects many ORM’s and custom SQL code, and Django’s ORM is not immune either.

In this post, we’ll examine what the N+1 Queries Problem looks like in Django, some tools for fixing it, and most importantly some tools for detecting it. Naturally Scout is one of those tools, with its built-in N+1 Insights tool.

We’ll cover:

What Is the N+1 Queries Problem?

In a nutshell: code that loops over a list of results from one query, and then performs another query per result.

Let’s look at a basic example and a couple of its natural extensions.

N+1 Queries

Say we had this code:

books = Book.objects.order_by("title")
for book in books:
    print(book.title, "by", book.author.name)

This code uses print() for simplicity, but the problem exists in all other ways data might be accessed, such as in templates or views returning JSON responses.

Let’s trace the queries our code executes:

  • One line 1, we create the Book QuerySet. Because QuerySets are lazy, this doesn’t execute any query.
  • On line 2, we iterate over the Book QuerySet. This causes the QuerySet to evaluate and fetch its results, which takes one query.
  • One line 3, we access a couple of attributes from each Book. title is a field on the Book itself, so Django fetched it in the first query. But the author attribute is a foreign key, so Django didn’t fetch it. Therefore Django executes another query on this line to fetch the given Author corresponding to book.author_id. The Author instance then includes its name field.

So we execute one query to fetch all the Books, and N queries to fetch the authors, where N is the number of books. This can become a performance problem because each query involves the overhead of communicating with the database server, so the total runtime can be high.

The problem is sometimes also called The 1+N Queries Problem to match the order of operations. N+1 seems to be a more popular name though, perhaps because it highlights problematic N queries.

For another perspective, we can look at the actual queries our code executes. The first query would look like:

SELECT
  id,
  title,
  author_id,
  ...
FROM book
ORDER BY title

The following N author queries would look like:

SELECT
  id,
  name,
  ...
FROM author
WHERE id = %s

(With %s replaced by the each author ID value.)

The repetition of queries like SELECT ... WHERE id = %s is a key signature of the problem. You can spot it with a tool that logs your application’s SQL queries (more these later).

2N+1 Queries

If we access multiple foreign key attributes in the loop, we’ll increase the number of queries. Say we changed the code to display our authors’ home countries as well:

books = Book.objects.order_by("title")
for book in books:
    print(
        book.title,
        "by",
        book.author.name,
        "from",
        book.author.country.name,
    )

The access to author.country foreign key executes another query per loop. So we now have two times the N queries we had before.

NM+N+1 Queries

If we nest loops, we can execute multiple queries per N query, compounding the problem.

Imagine our models changed to allow multiple authors per book. We would adjust our code like so:

books = Book.objects.order_by("title")
for book in books:
    print(book.title, "by: ", end="")
    names = []
    for author in book.authors.all():
        names.append(f"{author.name} from {author.country.name}")
    print(", ".join(names))

Line 5 executes each N query, fetching the list of authors per book. Line 6 then accesses the country foreign key per author. If there are an average of M authors per book, there will be N times M country queries.

Often either N or M is small, so the total number of queries will be fairly controlled. But it’s possible to see really large numbers of queries if both are bigger. If we added more nested loops, there would be further multiplication of queries.

Often such nested loops will not be so easy to spot. They can be obscured by intermediate layers of function calls or template includes.

Tools to Fix the N+1 Queries Problem

Django provides two QuerySet methods that can turn the N queries back into one query, solving the performance issue. These methods are select_related() (docs) and prefetch_related() (docs).

They work similarly - both fetch the related model alongside the original query. The difference is that select_related() fetches the related instances in the same query, whilst prefetch_related() uses a second query. Let’s look at them in turn, then a bonus tool that can automatically use prefetch_related().

django-auto-prefetch

Here I’d like to highlight a special project, django-auto-prefetch. Using this for your models changes Django’s behaviour when accessing unfetched foreign keys.

Recall our first example:

books = Book.objects.order_by("title")
for book in books:
    print(book.title, "by", book.author.name)

On line 3 Django fetches the single author relating to the current book. This happens on each of the N iterations of the loop.

django-auto-prefetch changes this behaviour to fetch all the authors relating to all the fetched books, in one query. It essentially assumes that since one book.author was required, they all will be.

This is equivalent to using prefetch_related() in the original Book QuerySet, hence the library’s name. It has the big advantage that we don’t need to know ahead of time which fields are required.

Installation of django-auto-prefetch is slightly involved, as it requires swapping the base class for all your models. But it’s worth it on larger projects.

The library was created by my friend Gordon Wrigley. We hope to merge it into Django at some point, so the behaviour would be more accessible. It might even eradicate the N+1 Queries Problem for Django applications.

Tools for Finding N+1 Query Problems

N+1 Query Problems are the most frequent Django ORM performance problem. It’s worth having some tooling that can help you find them. Let’s look at three tools that can help you.

django-debug-toolbar

django-debug-toolbar is one of the most popular third party Django packages. I don’t start a project without it. Once installed, it provides a debug panel on each view that shows us several panels of information, including the executed SQL queries.

For example if we had our first example inside a view, the SQL panel would show our queries:

django-debug-toolbar SQL panel showing one query for books and 3 for authors

The panel highlights the N Author queries with the text:

3 similar queries. Duplicated 2 times.

We can expand the queries to see their full SQL and stack traces. This allows us to spot the problematic lines in our code.

django-toolbar is very useful, and the SQL panel is only one of its features. But it has some limitations for finding N+1 Queries.

First, the toolbar can only appear on views that return HTML. This means you can’t use it to analyze views that return JSON, management commands, background tasks, etc. It can work with Django REST Framework’s Browsable API though, since such views return HTML.

Second, using the toolbar requires you to know which pages are slow. It doesn’t provide any proactive protection - we need to remember to open and read the SQL panel on each page during development.

Finally, the toolbar is for development only. We can’t use it in production, which means we can’t inspect the performance issues that are affecting real users. Often the N is small in development and larger in production, so performance issues can appear only in production.

nplusone

The nplusone package logs warnings for potential N+1 Queries. The package works but seems slightly unmaintained, with no commits for two years.

After installing it, we can see several log messages for our example, interlaced with our print() statements:

Potential n+1 query detected on `Book.author`
The Hound of the Baskervilles by Arthur Conan Doyle
Potential n+1 query detected on `Book.author`
The Hundred and One Dalmatians by Dodie Smith
Potential n+1 query detected on `Book.author`
The Lost World by Arthur Conan Doyle
[30/Jul/2020 06:23:46] "GET / HTTP/1.1" 200 9495

These “Potential n+1 query detected” log messages can help us track down the problem.

nplusone is more focussed than django-debug-toolbar, but is also usable with other ORM’s, such as SQL Alchemy. It has many of the same limitations though.

We can use it for all views. But like django-debug-toolbar, it can’t help with management commands, background tasks, and other code paths.

It’s always active, but requires us to manually inspect the logs.

Unfortunately the log message provides only a small amount of information. The log message provides no information about which line of our code is executing the N+1 Queries. We could get stack traces by configuring logging to transform the messages into exceptions, perhaps in development only. Though this may trigger false positives.

We can use nplusone in production. To receive notifications of problematic pages, we could configure our logging provider to alert on the string “Potential n+1 query detected”.

Scout APM

Scout APM automatically detects N+1 Queries in your application, and is suitable for both development and production. It can trace any view, background task, or other code path, and its traces provide rich information for debugging where performance issues occur.

Other APM (Application Performance Monitoring) products can also help with debugging N+1 Queries, but most don’t automatically highlight them.

After following the three step Django installation guide for our example application and loading the problematic view, the N+1 Queries is highlighted on the dashboard:

Scout APM dashboard showing n+1 insights panel

The top graph provides some insights into what our application is currently doing, and is followed by some useful summary statistics.

The “N+1 Insights” tab is what we’re here to look at. This tab lists views in which Scout detected N+1 Query execution. It uses a 150 millisecond threshold so we can focus on problems that really affect users.

If we click on the view name, we’re taken to the relevant trace:

Scout APM trace showing N+1 Queries

The trace shows a timeline of the view’s execution. The problematic repeated query has the “N+1” tag and shows the full SQL. We can click on its “backtrace” button to reveal the full stack trace, allowing us to pinpoint the problematic code.

Fin

I hope this guide has equipped you to find and solve your N+1 Query Problems,

—Adam


Learn how to make your tests run quickly in my book Speed Up Your Django Tests.


Subscribe via RSS, Twitter, Mastodon, or email:

One summary email a week, no spam, I pinky promise.

Related posts:

Tags: