I was recently tasked with keeping the various repeating jobs running for our data scientists at YPlan. They have a number of nightly or weekly jobs to be run, such as creating summary tables of the day’s various activity logs, pulling in data from third party services, and so on.
Last Thursday (24th July 2014) I went to the DevOps Exchange London Meetup on Continuous Delivery; here is my quick review of the talks and what I took away.
Here are some great little Python libraries that have made my life (well, at least the coding part) a little bit nicer and easier. They mostly add neat syntax and a few things that you always wanted to do, but never knew.
I was asked by some of our Data Scientists to get a few R packages onto their server, which I configured by Ansible. R seems to be bit funny compared to other programming languages because it’s package installation happens inside R code, rather than with a dedicated commandline utility.
I was looking through the MySQL slow_log for YPlan and discovered that there were a lot of SELECT COUNT(*) queries going on, which take a long time because they require a full table scan. These were coming from the Django admin, which displays the total count on every page.