Building a better DatabaseCache for Django on MySQL

Figure 1 - MySQLCache in the flesh

I recently released version 0.1.10 of my library django-mysql, for which the main new feature was a backend for Django’s cache framework called MySQLCache. This post covers some of the inspiration and improvements it has, as well as a basic benchmark against Django’s built-in DatabaseCache.

TL;DR - it’s better than DatabaseCache, and if you’re using MySQL, please try it out by following the instructions linked at the end.

Why bother?

Django’s cache framework provides a generic API for key-value storage, and gets used for a variety of caching tasks in applications. It ships with multiple backends for popular technologies, including Redis and Memcached, as well as a basic cross-RDBMS DatabaseCache. The DatabaseCache is recommended only for smaller environments, and due to its supporting every RDBMS that Django does, it is not optimized for speed. Redis and Memcached are the most popular cache technologies to use, being specifically designed to do key-value storage; you could even say Django’s cache framework is specifically designed to fit them.

If they work so well, why would anyone bother using DatabaseCache, and why would I care about improving on it? Well, I have a few reasons:

MySQL Inspiration

I’ve also seen a few success stories for using MySQL as a key-value store:

Development

When we first set up the Whoosh storage in cache, we used DatabaseCache since we knew we wanted the persistence. Unfortunately when I read the source I realized it wouldn’t be very efficient or fast, due to its design which caters to every database backend.

I created MySQLCache to correct as many problems as I could see by using MySQL-specific syntax and features. My full list of improvements includes:

Benchmark

I’ve created and run a simple benchmark app, available on GitHub. It tests the configured backends with 1000 repeats of operations such as set(), set_many(), and get() with randomly sized values. It’s not a perfect load simulator, and it doesn’t use the network, but it gives an idea of the relative speeds achievable by the different cache backends.

It runs MySQLCache with MySQL’s two most popular storage engines - the non-transactional MyISAM and the default, transactional InnoDB, as well as DatabaseCache and Django’s two Memcached backends, MemcachedCache and PyLibMCCache.

I ran it locally with MariaDB (a MySQL fork) version 10.0.17 and memcached version 1.4.22 on my Macbook Pro. The results are, in seconds per 1000-operation benchmark:

Cache Alias set benchmark set_many benchmark get benchmark get_many benchmark delete benchmark incr benchmark
DatabaseCache 1.023 42.193 0.237 8.621 6.216 1.084
MySQLCache_MyISAM 0.329 6.080 0.311 1.949 0.639 0.195
MySQLCache_Innodb 0.629 6.392 0.213 1.168 0.619 0.344
MemcachedCache 0.102 1.460 0.089 1.308 1.068 0.082
PyLibMCCache 0.075 2.218 0.061 0.488 1.725 0.057

MySQLCache clearly isn’t as fast as MemcachedCache, but to be fair, it is writing to disk. Thankfully it has come out quite a bit faster than DatabaseCache, especially on the *_many operations. I suspect the performance gap gets much bigger as the table grows to tens of thousands, or hundreds of thousands of rows, owing to the SELECT COUNT(*) behaviour - I just didn’t have the patience to keep running the benchmark.

As for the storage engines, MyISAM is faster than InnoDB for set() but slower for get() - not a great trade-off for caching where reads are much more common than writes. Also, if most of your application uses InnoDB and you mix it with MyISAM’s non-transactionality, it can lead to developer confusion as to what to expect.

I would have liked to benchmark the MEMORY storage engine, which would avoid writing to disk and be a fairer comparison with Memcached, but it can’t store BLOB columns (large binary data). There is a patch in the MySQL fork Percona Server to enable this, but I didn’t have time to try it and I guess most projects wouldn’t switch MySQL version so easily to use a cache. Of course, as stated, the permanence may be the very reason you are using a database cache backend, so any potential speed improvements it would give may not be worth worrying about :)

Future Work

MySQL 5.6 ships with a memcached interface plugin for InnoDB, avoiding MySQL protocol overhead and SQL parsing whilst remaining persistent. I haven’t tried it out yet, but it can probably be used directly with Django’s PyLibMCCache or MemcachedCache backends, although with a little setup to create the table and configure the plugin. Maybe django-mysql will provide helpers for the setup process in the future, and I can benchmark it against MySQLCache.

In MySQLCache I’ve also considered adding operations beyond Django’s basic API that can be efficiently supported due to the rows being stored in a table. For example, since keys are stored in order in the primary key BTree index, it is quite efficient to search for keys with a given prefix. Additionally, with MariaDB’s dynamic columns, hashes, lists, or sets could potentially be supported, making it a bit like Redis!

If you think of something, open an issue on GitHub and maybe send a pull request!

To use it…

If MySQLCache sounds appetising and is suitable for your project, you can pip install django-mysql, follow the library installation instructions, and follow the cache setup guide and you’ll be on your way.

Thanks!


Tags: django, mysql