Extending Django's QuerySet to return approximate COUNTs2014-07-16
I was looking through the
for YPlan and discovered that there were a lot of
SELECT COUNT(*) queries going on, which take a long time because they require
a full table scan. These were coming from the
Django admin, which
displays the total count on every page.
SELECT COUNT(*) such a slow query?” you might think, “surely MySQL
could just keep a number in the table metadata and update it on INSERT/DELETE.”
Aha! You are totally right - for the MyISAM storage engine. But Innodb, which
you shoudl be using, provides transacational support and other niceties, at the
cost of making such a metadata count impossible. Each transaction must be
isolated from the others until it commits or rolls back, so a single ‘accurate’
COUNT(*) value per table is impossible. It would also be a point of
contention from locking, which MyISAM doesn’t care about anyway because it
locks the whole table for any write. Hence, Innodb
COUNT(*) = table scan.
Of coures, I wasn’t the only one with this problem, and a quick google found me this great blog post by ‘Avian’ in 2011. This post is my update on that for newer Django versions, with some extra knowledge I’ve gained reading up on MySQL.
Here’s my take on the code:
A few quick things to point out:
If the approximate count is not larger than 1000, the exact count will be obtained anyway via the super call.
EXPLAINto get MySQL to return the approximate count. This returns a tabular analysis of the query execution plan - including an estimate of the number of rows that will be searched (it may be quite a bit off, I’ve seen +-50% in the wild). More on
The query has a comment in it - this is good practice for any custom SQL as it makes the
slow_logand other query digests easy to link back to the part in the app that made the query.
The exact count will still be obtained with the super call if the query is any more complex than
MyModel.objects.count(). This code is just the first stage of optimization to remove the worst
SELECT COUNT(*)queries - most other queries the ORM generates should be able to narrow the count down using an index if you’ve gotten your index design right.
EXPLAIN and not
SHOW TABLE STATUS as in Avian’s blog? Because
EXPLAIN can be used to extract more detailed counts using index reads as well
- an extension left for the reader (i.e. I haven’t needed this yet). But here’s
an example MySQL session showing that
EXPLAINing a simple WHERE clause on an indexed column can extract an approximate count too. It’s only 100% accurate in this case because this table is small (from the example sakila database):
My second step in this fix this was to make sure this only applied in Django’s
admin area, and couldn’t affect any logic in other parts of the app which might
rely on exact counts. This is easy enough if you’re using a custom admin base
class for every
Admin class in your app:
The final step was a cosmetic fix to make sure the approximate nature of the
counts is reported to admin users - we wouldn’t want anyone misquoting ballpark
numbers as exact, would we? A quick hack can make sure that if the approximate
count is used, the templates all say ‘Approximately N’. Replacing lines at the
And here’s what the final product looks like at the top of an admin page:
Are your Django project's tests slow? Read Speed Up Your Django Tests now!
One summary email a week, no spam, I pinky promise.
© 2020 All rights reserved.