R & bioconductor

June 2023: The proxy is no longer available.
To download data from external sites, please do so from a batch job or use an interactive session on a backend node by running qrsh -l short. You DO NOT then need to load the proxy modulefiles. Please see the qrsh notes for more information on interactive use.

Overview

R is a free software environment for statistical computing and graphics.

The following versions are installed on the CSF:

Standard open source R:

  • R 4.4.1
  • R 4.4.0
  • R 4.3.1
  • R 4.2.2
  • R 4.1.2
  • R 4.1.0
  • R 4.0.2
  • R 3.6.2
  • R 3.6.1
  • R 3.6.0
  • R 3.5.2
  • R 3.4.2

Note that Bioconductor is available via a separate modulefile – see below.

You may also want to try the Microsoft R Open version installed on the CSF – this version provides automatic parallelism of various maths / matrix routines in R.

You can also install packages to your own home directory using the Adding Packages instructions below (and in conjunction with information from the bioconductor website.) Alternatively, we may be able to add them to the central install – contact its-ri-team@manchester.ac.uk .

Restrictions on use

There are no restrictions on access to R as it is a free piece of software released under a GNU license. All users should familiarise themselves with the licensing information available via the R website.

All R jobs, aside from very short test jobs (e.g. those lasting less than one minute) must be submitted to the batch system, SGE.

Set up procedure

We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.

Load one of the following modulefiles:

Standard open source R:

module load apps/gcc/R/4.4.1   # Includes gcc 13.3.0 (helps package installs)
                               # For BioConductor see below notes

module load apps/gcc/R/4.4.0   # Includes gcc 12.2.0 (helps package installs)
                               # For BioConductor see below notes

module load apps/gcc/R/4.3.1   # Includes BioConductor 3.16, gcc 9.3 (helps package installs)  
module load apps/gcc/R/4.2.2   # Includes BioConductor 3.16, gcc 9.3 (helps package installs)  
module load apps/gcc/R/4.1.2   # Includes BioConductor 3.14, gcc 9.3 (helps package installs)
module load apps/gcc/R/4.1.0   # Includes BioConductor 3.13, gcc 8.2 (helps package installs)
module load apps/gcc/R/4.0.2   # Includes BioConductor 3.11, gcc 8.2 (helps package installs)
module load apps/gcc/R/3.6.2   # Includes BioConductor 3.10, gcc 8.2 (helps package installs)
module load apps/gcc/R/3.6.1   # Includes BioConductor 3.9, gcc 8.2 (helps package installs)
module load apps/gcc/R/3.6.0   # Includes BioConductor 3.9, gcc 8.2

module load apps/R/3.5.2    # Does not include BioConductor - see modulefile below
module load apps/R/3.4.2    # Does not include BioConductor - see modulefile below

BioConduction Installation

The central installations of R versions 4.4.0, 4.4.1 and later, do not include BioConductor.

Users can install BioConductor themselves by running the following commands in an interactive session:

# On the CSF login node, start an interactive session
qrsh -l short

# You'll now be on a compute node. You can run your commands directly, and R
# will be able to download packages from the outside world:
module load apps/gcc/R/4.4.1      # Choose your required version
R
install.packages("BiocManager")
q();

# To return to the login node, exit from the interative (qrsh) session
exit

Batch jobs can now use your local installation of BioConductor. For more information on using BioConductor, please see the BioConductor installation documentation.

Older version of R require a separate BioConductor modulefile. To use BioConductor, load:

# This is NOT needed for R 3.6 and newer!
module load libs/bioconductor/3.4

This will load the R modulefile if not already loaded.

Running the application

Note that using R CMD BATCH, as below, may save and restore your workspace, which may not be what you want. Using Rscript instead avoids that.

Serial Batch job

Write a submission script, for example:

#!/bin/bash --login
#$ -cwd               # Run job from current directory

## We now recommend loading the modulefile in the jobscript. Change the version as needed.
module load apps/R/3.4.2


R CMD BATCH --no-restore my_test.R  my_test.R.o$JOB_ID
   #                #                   #
   #                #                   # The final argument, "my_test.R.o$JOBID", tells R to send
   #                #                   #  output to a file with this name unique to the current job.
   #                #
   #                # Do not restore any previously saved objects. Ensures you don't load in possibly
   #                # large objects from previous runs of R. If jobs are failing due to lack of memory
   #                # please add this flag or alternatively use --vanilla which applies the following:
   #                # --no-save, --no-restore, --no-site-file, --no-init-file and --no-environ
   #
   # R must be called with both the "CMD" and "BATCH" options which tell it
   # to run an R program, in this case my_test.R, instead of presenting
   # an interactive prompt

Submit the job using

qsub runmyRjob.qsub

where runmyRjob.qsub is the name of your job script.

By default, graphical output from batch jobs is sent to a file called Rplots.pdf. See below for more info on plotting in to an image file.

Parallel Batch Job (single node, multi-core)

Please note that your R code must be parallelised (usually with the ‘parallel’ library) before you submit to more than 1 core. Asking for more than 1 core does not mean your code will automatically use them.

#!/bin/bash --login
#$ -cwd               # Run job from current directory
#$ -pe smp.pe 12      # Number of cores to use. Can be between 2 and 32.

module load apps/R/3.4.2

R CMD BATCH --no-restore my_test.R my_test.R.o$JOB_ID
               #
               # See the serial jobscript example above for a description
               # of the command-line flags.
  • Then submit your job to the batch system
qsub runmyRjob.qsub

where runmyRjob.qsub is the name of your job script.

The various libraries for performing parallel computation in R each have their own way of setting the number of cores to use within R. This will sometimes default to the total number of cores on the node. You need to make sure that your code is using no more than the number of cores you’ve requested in your job script, otherwise your job is liable to be killed without warning.

You can return the number of cores you requested in your jobscript as a variable, using the code:

numCoresAllowed <- Sys.getenv("NSLOTS", unset=1)

(If you’re running the job interactively or on your local machine, the value specified in “unset” will be returned)

You should use this value when you set the number of cores. For example, if you’re using the “doMC” package, you’d use:

registerDoMC(cores = numCoresAllowed)

Some libraries, e.g. the “parallel” library will take the number of cores to use from an environment variable (e.g. MC_CORES) directly. You can set the environment variable in your job script:

export MC_CORES=$NSLOTS

Add this to your jobscript before the R CMD BATCH ... line.

Running R interactively

It is expected that most use of R on the CSF will be in batch mode, i.e., computational jobs will be submitted to the batch system and there will be no subsequent user interaction. However, if required, R can be run on the CSF using either the R command line, or GUI.

Do not simply login to the CSF and start R — your jobs will be killed by the system administrator! The only exception to this is when installing a package from a mirror in R (see below).

To run R jobs interactively on the CSF, make use of the qrsh facility, which literally queues interactive jobs. To start the R command line type

# Load the modulefile on the login node (use your required version)
# If you loaded other modulefiles to do a package installation (e.g., nlopt)
# you should also load them here.
module load apps/R/3.4.2

# Start R in an interactive session on a compute node
qrsh -cwd -V -l short R --vanilla --interactive
                          #          #
                          #          # Needed if you will be starting the R GUI (Rcmdr)
                          #
                          # Can be: "--save", "--no-save" or "--vanilla"...

Note:

  • Use one of --save, --no-save and --vanilla
  • If you want to use the GUI, ensure you type --interactive otherwise the GUI will not start and your will see an error message like:
    The Commander GUI is launched only in interactive sessions
    

    To start the GUI, enter library(Rcmdr) at the R command line:

    library(Rcmdr)
    Loading required package: tcltk
    Loading Tcl/Tk interface ... done
    

Adding packages

You may wish to use a particular package (library) in your code. The central installations of R may already have that package installed. If not, you can install it yourself (it will go in to a folder in your home directory).

Check if a package is already installed

To determine if a package is already installed, simply try loading it in R. For example, on the login node:

R
> library(thing)
Error in library(thing) : there is no package called ‘thing’

# (if you get no output it usually means the library is already installed!)

This tells use we need to install a package/library named ‘thing’. See below for how to do that. Installing BioConductor packages is also possible and this is also covered below.

Note: For the purposes of adding packages you can run R on the login node. But this is the only time you should run R on the login node. All data processing, development and testing must be run in batch jobs or in an interactive session on a compute node (see above for how to run R).

Install a package by Automatically Downloading from CRAN (the default repo)

To add packages to your personal R package directory (~/R/platform/version), downloading from CRAN:

The web-proxy is no longer available. So installations should be done interactively on a compute node so that R can download packages from the outside world. You cannot do that from the login nodes.

To start an interactive session:

# From the login node, start an interactive session:
qrsh -l short

# Once logged in to a compute node, load your required version of R (see above)
module load apps/gcc/R/4.3.1
  #
  # Note: you may need to load other modulefiles to complete a package installation.
  # If your install fails, look at the errors. You can exit from R, load some more
  # modulefiles, then run R again and try the install. Common packages are nlopt and
  # cmake - see sections below for more details.

# Note: you may have old proxy settings in an ~/.Renviron file. You'll need to remove these:
cat ~/.Renviron
  #
  # If you see the following, you do not need to do anything!
  cat: .Renviron: No such file or directory

  # If you see some lines containing
  http_proxy=http://proxy.man.ac.uk:3128
  https_proxy=https://proxy.man.ac.uk:3128
    #
    # Delete these lines or place a # at the start of each line.

  # If your ~/.Renviron file contains only the above proxy lines
  # you can delete the file
  rm ~/.Renviron

Now start R in the usual way:

R

Now ask R to install the required package and answer y when asked if you wish to create a personal library:

> install.packages("thing")
Warning in install.packages("thing") :
'lib = "/opt/apps/apps/gcc/R/3.6.1/lib64/R/library"' is not writeable
Would you like to use a personal library instead?  (y/n) y      # Answer 'y'
Would you like to create a personal library                     # (if first ever package!)
~/R/x86_64-pc-linux-gnu-library/3.6
to install packages into?  (y/n) y                              # Answer 'y'

Select a UK mirror when prompted (e.g., UK Bristol which is near the bottom of the list.)

Once the package is installed, you can now check it has installed correctly by loading the library:

library(thing)
  #
  # No output (or some library-specific info) means it is installed correctly.

You can now exit R and then exit from your interactive sessions or install more libraries by repeating the above steps.

q()
# Now exit your interactive session to return to the login noded
exit

Please remember that your usual R usage, to run scripts and process data must be done in batch or via qrsh (interactively) on a compute node (see above). Do not continue to run computational work on the login node!

In the above instructions replace the module load command with the one appropriate to the R version you wish to use.

If you wish to specify a mirror in the install.packages command instead of selecting it from a menu, try:

install.packages('thing', repos='http://www.stats.bris.ac.uk/R')

Installing a Library from a source package

If you’ve downloaded an R library source file you can add it to your local workspace using the following commands (which assume the source package is in your home directory on the CSF):

Start R with extra command-line args (choose the version of R you require):

module load apps/gcc/R/3.6.1
R CMD INSTALL thing.x.y.z.tar.gz

* installing to library ‘/mnt/iusers01/support/mabcxyz1/R/x86_64-unknown-linux-gnu-library/3.6’
* installing *source* package ‘thing’ ...
** package ‘thing’ successfully unpacked and MD5 sums checked
** R
** data
** demo
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (thing)

Using an installed package

Now run R and test that the library can be loaded:

R
> library(thing)

The compiled library files will be save in a directory named R in your home directory. It contains subdirectories for each version of R so if you want to use the library in different versions of R you will have to repeat the above commands for each version.

nloptr dependency

Some packages fail to install because they depend on the nloptr R package. Trying to install that specific package often fails due to a dependency on the nlopt library, which R fails to compile. So we have provided this as a separate modulefile. For example:

# This will fail due to a failure to compile nloptr
module load apps/gcc/R/4.3.2
R
install.packages("nloptr")           # Other R packages that depend on this one will also fail
q()

# The solution is to load an extra modulefile:
module load apps/gcc/R/4.3.2
module load libs/gcc/nlopt/2.6.2
R
install.packages("nloptr")

Note: you will also need to load the nlopt modulefile in your jobscript when submitting jobs to the batch system.

cmake dependency

If your packages requires cmake to complete its installation, you can load the cmake modulefile before running R, then R will be able to find it:

module load apps/gcc/R/4.3.2
module load tools/gcc/cmake/3.25.1      # Other versions of cmake are available
R
install.packages("mice")

Note: you will likely NOT need to load the cmake modulefile in your jobscript when submitting jobs to the batch system. cmake is usually only used during the installation, not when you run R.

Please see the cmake page for available versions.

Adding BioConductor Packages – R 3.6.0 and newer

Note: This is NOT the method used for older versions of R (3.5 and older). See below for that.

The ‘manager’ for bioconductor has changed in version 3.6.0. Details are given here on how to install BioConductor packages in R 3.6.0 (and up).

# Check the BiocManager version
BiocManager::version()

# See what is installed
BiocManager::available()

# Install a package to your home directory
BiocManager::install(c("esATAC"))
     ## In this case esATAC (replace that with the package you are interested in)
     ## You will be prompted to install to a local (your home) directory as below

Bioconductor version 3.9 (BiocManager 1.30.4), R 3.6.0 (2019-04-26)
Installing package(s) 'esATAC'
Warning in install.packages(pkgs = doing, lib = lib, repos = repos, ...) :
  'lib = "/opt/apps/apps/gcc/R/3.6.0/lib64/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) y   # Answer 'y'
Would you like to create a personal library
‘~/R/x86_64-pc-linux-gnu-library/3.6’
to install packages into? (yes/No/cancel) y                           # Answer 'y'

Adding BioConductor Packages – R 3.5.2 and R 3.4.2

Note: This is NOT the method used for newer versions of R (3.6 and newer). See above for that.

BioConductor packages can be installed in to your local R library (in your home directory) as follows:

# This will automatically load the R modulefile as well
module load libs/bioconductor/3.4
R
source("https://bioconductor.org/biocLite.R")
biocLite("packagename")             # Give a biocLite package name: EG: "S4Vectors"

# You will see some output then be asked to install locally:

  'lib = "/opt/gridware/apps/R/3.4.2/lib64/R/library"' is not writable
Would you like to use a personal library instead?  (y/n) y      # Answer 'y'
Would you like to create a personal library
~/R/x86_64-pc-linux-gnu-library/3.4
to install packages into?  (y/n) y                              # Answer 'y'

The package will be downloaded and installed in to your local R library.

Using BioConductor Packages

BioConductor packages have to be loaded like any other package if you’ve previously installed them. For example, assuming you have installed a BioConductor package named bioThing, to use it in your code use:

# Load/use a BioConductor package named 'bioThing' previously installed
library(bioThing)

Adding rjags and related packages

rjags is a popular package for working with Bayesian graphical models using MCMC. If is also used by other packages such as JMbayes. The rjags package relies on a library named JAGS. This is already installed on the CSF so you can make it available to R by loading its modulefile. This will allow you to then install rjags and related packages such as JMbayes. If you are using R 3.6.2 or later you must load the JAGS modulefile that is compatible with the GCC 8.2.0 compiler (which was used to install R 3.6.2). Here is a complete example of installing JMbayes, which will install rjags in your local R directory in your home directory:

# On the login node, start an interactive session
qrsh -l short

# On the interactive compute node
module load apps/gcc/R/3.6.2                # Uses GCC 8.2.0
module load apps/gcc/jags/4.3.0-gcc-8.2.0   # Use the gcc-8.2.0 compatible version
R
install.packages("JMbayes")
library(JMbayes)

You will be asked to select a mirror site from which to download the JMbayes packages (we typically use the Bristol UK mirror).

Once the package has been installed, the library(JMbayes) command should be used each time you wish to use the package. You will also need to load the jags modulefile, as well as the R modulefile, in your jobscripts.

Listing Packages

To list the installed packages run:

installed.packages();

To list loaded packages run:

(.packages())

Removing Packages

Should you need to delete an installed package:

module load apps/gcc/R/version
R
remove.packages(thing)

This will remove it from your local library of packages, for the version of R you are currently using. If you’ve used several versions of R over time and have installed the package with each one, you would need to load the modulefile for each version and remove the package from each one in turn.

Plotting

If you wish to plot graphs, for example, to image files, you will need to use the cairo plotting device.

The following example generates a histogram and plots it to a .png file and a .jpg file. The jobscript is:

#!/bin/bash --login
#$ -cwd
#$ -l short
module load apps/gcc/R/4.4.0
R CMD BATCH --no-restore plot.R

The R-code is:

# R script to demonstrate plotting to image files on the CSF

# Enable cairo device (needed to prevent 'X11 not available' errors)
options(bitmapType='cairo')

# Initialize some data to plot
x = rnorm(100)

# Save a png plot
png(file="hist.png")
hist(x)
rug(x,side=1)
dev.off()

# How about jpg
jpeg(file="hist.jpg")
hist(x)
rug(x,side=1)
dev.off()

# R 3.6.1 (and later) can also do tiff
tiff(file="hist.tif")
hist(x)
rug(x,side=1)
dev.off()

Now to view your images while on the CSF, use the eye of gnome (eog) image viewer:

# List the image files created by the above example
ls hist.*
hist.jpg  hist.png  hist.tif

# Use the image viewer name 'eog' (Eye of Gnome) on the CSF login node
eog hist.png

If you need other image file formats you can then convert your PNG file using the convert command-line tool, available on the login node or can be run in your jobscript (note that convert is a Linux command-line program, not an R function):

# Using the hist.png example file from the above R script, convert it to another format:
convert hist.png hist.tif      # R 3.6.1 can write tif files directly (see above) but older versions can't

# How about a .pdf
convert hist.png hist.pdf

# Now view a .pdf on the login node
evince hist.pdf

Further Info

Updates

3.6.1 was installed October 2019.
3.6.0 was installed June 2019.
3.5.2 was installed Feb 2019.

Last modified on November 1, 2024 at 12:34 pm by George Leaver