My Talk "Django at scale" at Django London Meetup


On Tuesday I gave a talk on Django at the London Django Meetup Group, titled “Django at Scale.” The slides are hosted on GitHub.

Here are a few brief summaries of topics I covered, which I hope to cover in more depth with a blog post on each in the future:

1. Wrap Django’s Classes For Your Project

Sub-classing django’s base classes, e.g. the Admin classes as in for, allows you to make whole codebase changes easily. It’s not that much effort to implement and allows you to solve many problems easily. It’s also an argument for class-based views. I’ve give a complete example of doing this in my post on extending querysets and the admin to return approximate counts.

2. Background-fill Your Cache

The most common cache pattern for django is to test if the item is in the cache, and if it’s not there, to run the slow function to generate it. Often this is a user dependent variable, e.g. the count of all the comments they’ve made. When you expand to a certain scale though, you’re sure to see a user for whom this slow function takes just way too long no matter what. You can notice this by monitoring say the 95th or 99th percentile of request times.

I briefly covered how to fill the cache in the background using an asynchronous task queue such as celery, so the request time is much less variable across the range of users.

One thing I’ve seen a lot is when an app has many related models to show at once, select_related is used to grab them all at once in a single query. This is acting under the presumption that fewer queries is always better - a good heuristic, but for many reasons not always correct. Thankfully django provides the lesser-known prefetch_related (docs) as a one-word replacement, which does the JOIN operation in Python instead. Often swapping it in is an instant performance win.

4. Database Hint: run a query killer

Databases are often the contested resource in a web app - throwing up more webservers is often easier than throwing up more databases. A query killer watches the DB and kills queries that have taken too long, for example connecting every 10 seconds and finding all queries that have taken longer than 20 (generated by the app’s username). It will save you a lot of trouble when, say, the Django admin generates a horrendous query that causes the whole server to lock up writing reams of temporary tables.

I’ve not used it, but Postgresql has a statement timeout you can set so you don’t need to run a killer process. Looks like it will be coming to MySQL in 5.7.

Hope you’ve learnt a little something useful. I had a great time and look forward to the next meetup!

Working on a Django project? Check out my book Speed Up Your Django Tests which covers loads of best practices so you can write faster, more accurate tests.

Subscribe via RSS, Twitter, or email:

One summary email a week, no spam, I pinky promise.

Tags: celery, django