Contents

1. Introduction

A common request we get is to install some software package on either Gordon or Trestles, sometimes with the implicit assumption that it should be a simple matter of doing sudo apt-get install somepackage. Unfortunately, installing a new piece of software on a shared resource like Gordon or Trestles is not that easy because

  • we need to make sure the software will not break another library or program on the system
  • we will typically have to install it on all of the compute nodes too, not just the login nodes
  • we then have to support that software (and any/all users’ questions about it) since we officially provide it
  • our system engineers, who can actually deploy packages, are not the same applications experts who compile the packages

The net result is that installing a new software package system-wide can take weeks or months to do. When I get requests to install software, I invariably respond that it would be easier and faster for the user (that’s you) to just install the package himself, and provide step-by-step instructions on exactly how to do that.

For the sake of anyone who wants to know how to install his or her own software applications on SDSC Trestles or Gordon (or any other Linux or UNIX machine, for that matter), here are some generic guidelines on how to do this.

2. Python Modules

There are several ways Python will let you manage your own set of libraries, and I find the virtualenv package to be the easiest. It creates what amounts to an installation of Python that is local to your home directory, which means that any libraries you install using that special personalized Python will also install into your home directory.

First, download virtualenv from the project’s website, e.g.,

$ wget --no-check-certificate "https://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.11.2.tar.gz"

Then be sure to load the python module that you wish to clone. This is important because the python that you will run if you don’t explicitly load a Python module is the system-wide default, Python 2.6.6. I recommend using the Python 2.7.x module we provide on both machines, so load that module before trying to install virtualenv:

$ module load python

You must now decide the location of this custom Python you want to install using virtualenv. ~/python27-gordon is a good choice, assuming you are using Python 2.7 as previously discussed. Then, unpacking and installing virtualenv is a snap:

$ tar zxvf virtualenv-1.11.2.tar.gz
virtualenv-1.11.2/
virtualenv-1.11.2/AUTHORS.txt
...
virtualenv-1.11.2/virtualenv_support/pip-1.5.2-py2.py3-none-any.whl
virtualenv-1.11.2/virtualenv_support/setuptools-2.1-py2.py3-none-any.whl

$ python virtualenv-1.11.2/virtualenv.py ~/python27-gordon
New python executable in /home/username/python27-gordon/bin/python
Installing setuptools, pip...done.

And that’s all you have to do. Now whenever you want to use your custom Python installation, you will have to issue this command:

$ source ~/python27-gordon/bin/activate
(python27-gordon)$

As you may notice, it modifies your prompt to show that you are in this custom Python’s “virtual environment.” If you always plan on using this custom Python, you can go ahead and add the following lines to your ~/.bashrc:

module load python
VIRTUAL_ENV_DISABLE_PROMPT=1 source python27-gordon/bin/activate

Note the VIRTUAL_ENV_DISABLE_PROMPT=1 preceding the “source” command–this option prevents that annoying prompt prefix that virtualenv will otherwise give you every time you log in.

Once you’ve got your virtualenv activated, installing new libraries is easy using pip:

(python27-gordon)$ pip install cutadapt
Downloading/unpacking cutadapt
  Downloading cutadapt-1.3.tar.gz (149kB): 149kB downloaded
  Running setup.py (path:/home/username/python27-gordon/build/cutadapt/setup.py) egg_info for package cutadapt
...
Successfully installed cutadapt
Cleaning up...

(python27-gordon)$ pip install pyvcf
Downloading/unpacking pyvcf
  Downloading PyVCF-0.6.4.tar.gz
...
Successfully installed pyvcf distribute setuptools
Cleaning up...

As you can see, pip automatically downloads and installs dependencies for you, making the task of managing Python libraries under your own user account on our supercomputers pretty easy.

3. Perl Modules

One of the standard ways of maintaining your own Perl libraries installed into your home directory is using the local::lib module which, like Python’s virtualenv, lets you emulate having Perl installed locally.

To get started with local::lib you’ve first got to download and unpack it:

$ wget http://search.cpan.org/CPAN/authors/id/E/ET/ETHER/local-lib-1.008010.tar.gz
$ tar zxvf local-lib-1.008010.tar.gz
local-lib-1.008010/
local-lib-1.008010/Changes
local-lib-1.008010/inc/
...
$ cd local-lib-1.008010

Unlike with Python, we do not have a separate Perl module that needs to be loaded. Once you’re in that local-lib-1.008010 directory, you can initiate the bootstrap process by which local::lib creates your custom Perl installation and installs itself. Let’s assume that we want to install our custom Perl into ~/perl5-gordon (note: use $HOME instead of ~):

$ perl Makefile.PL --bootstrap=$HOME/perl5-gordon
Attempting to create directory /home/username/perl5-gordon
...

If you don’t specify a path after the –bootstrap flag, your local::lib installation will be in ~/perl5. This bootstrapping process may take a very long time as CPAN needs to first configure itself, then install all of the libraries that local::lib needs to work. After a lot of text scrolls by (many of which look like errors–this isn’t necessarily bad), hopefully you wind up at

...
Checking if your kit is complete...
Looks good
Generating a GNU-style Makefile
Writing Makefile for local::lib
Writing MYMETA.yml and MYMETA.json

Then test and install local::lib:

$ make test
...
t/subroutine-in-inc.t .. ok
All tests successful.
Files=8, Tests=35,  0 wallclock secs ( 0.04 usr  0.03 sys +  0.23 cusr  0.07 csys =  0.37 CPU)
Result: PASS

$ make install
Installing /home/username/perl5-gordon/lib/perl5/POD2/PT_BR/local/lib.pod
...
Appending installation info to /home/username/perl5-gordon/lib/perl5/x86_64-linux-thread-multi/perllocal.pod

Now we need to put a few new lines in our ~/.bashrc to effectively do what that activate script does for Python’s virtualenv. Issue the following command, then append its output into your ~/.bashrc:

$ perl -I$HOME/perl5-gordon/lib/perl5 -Mlocal::lib=$HOME/perl5-gordon | tee -a ~/.bashrc
export PERL_LOCAL_LIB_ROOT="$PERL_LOCAL_LIB_ROOT:/home/username/perl5-gordon";
export PERL_MB_OPT="--install_base /home/username/perl5-gordon";
export PERL_MM_OPT="INSTALL_BASE=/home/username/perl5-gordon";
export PERL5LIB="/home/username/perl5-gordon/lib/perl5:$PERL5LIB";
export PATH="/home/username/perl5-gordon/bin:$PATH";

You should then either log out and log back in, or paste those export lines into your current terminal session to put them into effect. Following that, you should be able to install Perl libraries into your home directory:

$ perl -MCPAN -e 'install(Time::Piece)'
Reading '/home/username/.cpan/Metadata'
  Database was generated on Mon, 16 Sep 2013 19:53:02 GMT
Running install for module 'Time::Piece'
...
  RJBS/Time-Piece-1.23.tar.gz
  /usr/bin/make install  -- OK

4. R Libraries

Users cannot install R libraries globally on our machines, but R makes it very easy for users to install libraries in their home directories. To do this, fire up R and when presented with the > prompt, use the install.packages() method to install things:

> install.packages('doSNOW')
Installing package(s) into '/opt/R/local/lib'
(as 'lib' is unspecified)
Warning in install.packages("doSNOW") :
  'lib = "/opt/R/local/lib"' is not writable
Would you like to use a personal library instead?  (y/n)

This error comes up because you can’t install libraries system-wide as a non-root user. Say y and accept the default which should be something similar to ~/R/x86_64-unknown-linux-gnu-library/3.0. Pick a mirror and let her rip. If you want to install multiple packages at once, you can just do something like

> install.packages(c('foreach','doMC'))

For most packages, this is all you will have to do. However, sometimes R packages depend on other system libraries, and those system libraries might not be in the default search path for the R package installer. When that happens, you might get an error that looks something like this:

>  install.packages('rjags');
* installing *source* package 'rjags' ...
** package 'rjags' successfully unpacked and MD5 sums checked
checking for prefix by checking for jags... no
configure: error: "Location of JAGS headers not defined. Use configure arg '--with-jags-include' or environment variable 'JAGS_INCLUDE'"
ERROR: configuration failed for package 'rjags'
* removing '/home/glock/R/x86_64-unknown-linux-gnu-library/3.0/rjags'

The downloaded source packages are in
    '/tmp/RtmpdE6UYF/downloaded_packages'
Warning message:
In install.packages("rjags") :
  installation of package ‘rjags’ had non-zero exit status

The relevant part of the error log is highlighted in red; the library could not install because it depends on a system library (as opposed to an R library) that the install.packages() command could not find.

While fixing this error can be tricky since each R library can have a different installation procedure, you can pass extra hints to the install.packages() command to suggest where it can find some of these system libraries. For example, the jags library is already installed on Gordon and Trestles, and it can be loaded using module load jags. After doing this, you can do something like

> install.packages('rjags',
   configure.args=c(rjags='--with-jags-include=$JAGSHOME/include/JAGS
                           --with-jags-lib=$JAGSHOME/lib'))
* installing *source* package 'rjags' ...
** package 'rjags' successfully unpacked and MD5 sums checked
checking for prefix by checking for jags... /opt/jags/bin/jags
checking whether the C++ compiler works... yes
...
** testing if installed package can be loaded
* DONE (rjags)

The green text highlights something that looks gnarly, but actually tells R that

  • the contents of configure.args should be passed to the underlying library’s installer. configure.args is a named list containing special configuration parameters, where the name of each value corresponds to the package to which the special parameters apply.
  • the –with-jags-include and –with-jags-lib signal the rjags installer where your JAGS library’s include and lib directories are located
  • $JAGSHOME is a variable that gets defined when you load the jags module on Gordon and Trestles. On other systems, you would specify the full path to your jags installation directory instead.

Actually knowing what configure.args to use when generic package installations fail requires some amount of intuition. If all else fails, contact your help desk!