Installing and Removing R Packages With Ansible

I was asked by some of our Data Scientists to get a few R packages onto their server, which I configured by Ansible. R seems to be bit funny compared to other programming languages because it’s package installation happens inside R code, rather than with a dedicated commandline utility.

A quick google gave me a blog post by Svend Vanderveken, on how to install R packages with Ansible, but as the author points out, it reports ‘changed’ even when the package is already installed. This is not great - we all know that an alarm that always going off is quickly ignored. Thankfully, it’s not hard to fix that, using a couple of Ansible features changed_when and failed_when, and a bit of extra R code:

{% raw %}
- name: r - packages
  command: >-
    Rscript --slave --no-save --no-restore-history -e "
    if (! ('{{ item }}' %in% installed.packages()[,'Package'])) {
      install.packages(pkgs='{{ item }}', repos=c('http://ftp.heanet.ie/mirrors/cran.r-project.org/'));
      print('Added');
    } else {
      print('Already installed');
    }"
  register: r_result
  failed_when: "r_result.rc != 0 or 'had non-zero exit status' in r_result.stderr"
  changed_when: "'Added' in r_result.stdout"
  with_items:
    - getopt
{% endraw %}

Since it's hard to read as a one liner here's that R code expanded:

{% raw %}
if (! ('{{ item }}' %in% installed.packages()[,'Package'])) {
    install.packages(pkgs='{{ item }}', repos=c('http://ftp.heanet.ie/mirrors/cran.r-project.org/'));
    print('Added');
} else {
    print('Already installed');
}
{% endraw %}

The R code will print out the status of the package, and then the Ansible code checks to the output to only report changed if the package was not installed already :)

While we’re at it, here’s code for removing R packages:

{% raw %}
- name: r - remove packages
  command: >-
    /usr/bin/Rscript --slave --no-save --no-restore-history -e "
    if (! ('{{ item }}' %in% installed.packages()[,'Package'])) {
      print('Not installed');
    } else {
      remove.packages(pkgs='{{ item }}'); print('Removed');
    }"
  register: r_result
  failed_when: r_result.rc != 0
  changed_when: '"Removed" in r_result.stdout'
  with_items:
    - getopt
{% endraw %}

Expanded again for readability:

{% raw %}
if (! ('{{ item }}' %in% installed.packages()[,'Package'])) {
    print('Not installed');
} else {
    remove.packages(pkgs='{{ item }}');
    print('Removed');
}
{% endraw %}

Of course, what we’d really like to see is an R package module, but I’m not very experienced with R so I don’t know how easy it would be to make this cross-platform compatible.

One last hint! Don’t use R packages if you don’t have to. In general, OS packages will be faster to install so you’ll get your provisioning done much quicker - this certainly stands for numpy in Python. So, while in this example I’ve used getopt, I’ve actually ended up using the apt action to install r-cran-getopt.


Improve your Django develompent experience with my new book.


Subscribe via RSS, Twitter, Mastodon, or email:

One summary email a week, no spam, I pinky promise.

Tags: