- Recent Posts & Updates
- None

Page Contents

R & Bioconductor

Overview

R is a free software environment for statistical computing and graphics.

See modulefile section below for list of available versions.

Bioconductor can be installed by you – see below.

You can install packages to your own home directory using the Adding Packages instructions below. We also recommend using Renv (below) to manage separate R projects (similar to python/conda “environments”).

Restrictions on use

There are no restrictions on access to R as it is a free piece of software released under a GNU license. All users should familiarise themselves with the licensing information available via the R website.

All R jobs, aside from very short test jobs (e.g. those lasting less than one minute) must be submitted to the batch system.

Set up procedure

We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this.

Load one of the following modulefiles:

Standard open source R:

# Note: we no longer pre-install Bioconductor. See below for how to install it.
module load apps/gcc/R/4.5.0   # Uses default gcc (currently 11.5.0)
module load apps/gcc/R/4.4.2   # Includes gcc 14.2.0
module load apps/gcc/R/4.4.1   # Includes gcc 13.3.0
module load apps/gcc/R/4.4.0   # Includes gcc 12.2.0

# Older version had BioConductor pre-installed.
module load apps/gcc/R/4.3.1   # Includes BioConductor 3.16, gcc 9.3 (helps package installs)  
module load apps/gcc/R/4.2.2-gcc14.2   # Includes gcc 14.2(helps package installs)  
module load apps/gcc/R/4.1.2   # Includes BioConductor 3.14, gcc 9.3 (helps package installs)
module load apps/gcc/R/4.1.0   # Includes BioConductor 3.13, gcc 8.2 (helps package installs)
module load apps/gcc/R/4.0.2   # Includes BioConductor 3.11, gcc 8.2 (helps package installs)
module load apps/gcc/R/3.6.2   # Includes BioConductor 3.10, gcc 8.2 (helps package installs)
module load apps/gcc/R/3.6.1   # Includes BioConductor 3.9, gcc 8.2 (helps package installs)
module load apps/gcc/R/3.6.0   # Includes BioConductor 3.9, gcc 8.2

# Very old version may no longer work on the CSF3(Slurm)
module load apps/R/3.5.2    # Does not include BioConductor - see modulefile below
module load apps/R/3.4.2    # Does not include BioConductor - see modulefile below

BioConduction Installation

The central installations of R versions 4.4.0, 4.4.1 and later, do not include BioConductor.

Users can install BioConductor themselves by running the following commands in an interactive session:

# On the CSF login node, start an interactive session
srun -p interactive -t 0-1 --pty bash

# You'll now be on a compute node. You can run your commands directly, and R
# will be able to download packages from the outside world:
module load apps/gcc/R/4.4.1      # Choose your required version
R
install.packages("BiocManager")
q();

# To return to the login node, exit from the interative (qrsh) session
exit

Batch jobs can now use your local installation of BioConductor. For more information on using BioConductor, please see the BioConductor installation documentation.

Older version of R require a separate BioConductor modulefile. To use BioConductor, load:

# This is NOT needed for R 3.6 and newer!
module load libs/bioconductor/3.4

This will load the R modulefile if not already loaded.

Running the application

Note that using R CMD BATCH, as below, may save and restore your workspace, which may not be what you want. Using Rscript instead avoids that.

Serial Batch job

Write a submission script, for example:

#!/bin/bash --login
#SBATCH -p serial        # (or --partition=) Run on the nodes dedicated to 1-core jobs
#SBATCH -t 2-0           # Wallclock time limit (2-0 is 2 days, max permitted is 7-0)

## We now recommend loading the modulefile in the jobscript. Change the version as needed.
module purge
module load apps/R/3.4.2

R CMD BATCH --no-restore my_test.R  my_test.R.$SLURM_JOB_ID
   #                #                   #
   #                #                   # The final argument, "my_test.R.$SLURM_JOB_ID", tells R to send
   #                #                   #  output to a file with this name unique to the current job.
   #                #
   #                # Do not restore any previously saved objects. Ensures you don't load in possibly
   #                # large objects from previous runs of R. If jobs are failing due to lack of memory
   #                # please add this flag or alternatively use --vanilla which applies the following:
   #                # --no-save, --no-restore, --no-site-file, --no-init-file and --no-environ
   #
   # R must be called with both the "CMD" and "BATCH" options which tell it
   # to run an R program, in this case my_test.R, instead of presenting
   # an interactive prompt

Submit the job using

sbatch runmyRjob.slurm

where runmyRjob.slurm is the name of your job script.

By default, graphical output from batch jobs is sent to a file called Rplots.pdf. See below for more info on plotting in to an image file.

Parallel Batch Job (single node, multi-core)

Please note that your R code must be parallelised (usually with the ‘parallel’ library) before you submit to more than 1 core. Asking for more than 1 core does not mean your code will automatically use them.

#!/bin/bash --login
#SBATCH -p multicore     # (or --partition=) Run on the nodes dedicated to 1-core jobs
#SBATCH -n 8             # (or --ntasks=) Number of cores
#SBATCH -t 2-0           # Wallclock time limit (2-0 is 2 days, max permitted is 7-0)

module load
module load apps/R/3.4.2

R CMD BATCH --no-restore my_test.R my_test.R.$SLURM_JOB_ID
               #
               # See the serial jobscript example above for a description
               # of the command-line flags.

Then submit your job to the batch system

sbatch runmyRjob.slurm

where runmyRjob.slurm is the name of your job script.

The various libraries for performing parallel computation in R each have their own way of setting the number of cores to use within R. This will sometimes default to the total number of cores on the node. You need to make sure that your code is using no more than the number of cores you’ve requested in your job script, otherwise your job is liable to be killed without warning.

You can return the number of cores you requested in your jobscript as a variable, using the code:

numCoresAllowed <- Sys.getenv("SLURM_NTASKS", unset=1)

(If you’re running the job interactively or on your local machine, the value specified in “unset” will be returned)

You should use this value when you set the number of cores. For example, if you’re using the “doMC” package, you’d use:

registerDoMC(cores = numCoresAllowed)

Some libraries, e.g. the “parallel” library will take the number of cores to use from an environment variable (e.g. MC_CORES) directly. You can set the environment variable in your job script:

export MC_CORES=$SLURM_NTASKS

Add this to your jobscript before the R CMD BATCH ... line.

Running R interactively

It is expected that most use of R on the CSF will be in batch mode, i.e., computational jobs will be submitted to the batch system and there will be no subsequent user interaction. However, if required, R can be run on the CSF using either the R command line, or GUI.

Do not simply login to the CSF and start R — your jobs will be killed by the system administrator! The only exception to this is when installing a package from a mirror in R (see below).

To run R jobs interactively on the CSF, make use of the srun facility, which literally queues interactive jobs. To start the R command line type

# Load the modulefile on the login node (use your required version)
# If you loaded other modulefiles to do a package installation (e.g., nlopt)
# you should also load them here.
module purge
module load apps/gcc/R/4.4.2

# Start R in an interactive session on a compute node (text-mode only)
srun -p interactive -t 0-1 --pty R --vanilla
                                      # 
                                      # 
                                      #
                                      # Can be: "--save", "--no-save" or "--vanilla"...

Adding packages

You may wish to use a particular package (library) in your code. The central installations of R may already have that package installed. If not, you can install it yourself (it will go in to a folder in your home directory).

In this section we provide details of the standard method of installing packages in R, using the install.packages() command. This will install packages within your home directory area, but does not allow you to install packages on a project-by-project basis.

If you prefer a more project-oriented installation method (similar to python’s virtual environments), where the packages you install for one project can be kept separate to those of another, please see the Renv method described further down. We recommend this method.

Check if a package is already installed

To determine if a package is already installed, simply try loading it in R. For example, on the login node:

R
> library(thing)
Error in library(thing) : there is no package called ‘thing’

# (if you get no output it usually means the library is already installed!)

This tells use we need to install a package/library named ‘thing’. See below for how to do that. Installing BioConductor packages is also possible and this is also covered below.

Note: For the purposes of adding packages you can run R on the login node. But this is the only time you should run R on the login node. All data processing, development and testing must be run in batch jobs or in an interactive session on a compute node (see above for how to run R).

Install a package by Automatically Downloading from CRAN (the default repo)

To add packages to your personal R package directory (~/R/platform/version), downloading from CRAN:

Note: You can do R package installations on the login node.

module purge
module load apps/gcc/R/4.4.2
  #
  # Note: you may need to load other modulefiles to complete a package installation.
  # If your install fails, look at the errors. You can exit from R, load some more
  # modulefiles, then run R again and try the install. Common packages are nlopt and
  # cmake - see sections below for more details.

# Note: you may have old proxy settings in an ~/.Renviron file. You'll need to remove these:
cat ~/.Renviron
  #
  # If you see the following, you do not need to do anything!
  cat: .Renviron: No such file or directory

  # If you see some lines containing
  http_proxy=http://proxy.man.ac.uk:3128
  https_proxy=https://proxy.man.ac.uk:3128
    #
    # Delete these lines or place a # at the start of each line.

  # If your ~/.Renviron file contains only the above proxy lines
  # you can delete the file
  rm ~/.Renviron

Now start R in the usual way:

Now ask R to install the required package and answer y when asked if you wish to create a personal library:

> install.packages("thing")
Warning in install.packages("thing") :
'lib = "/opt/apps/apps/gcc/R/4.4.2/lib64/R/library"' is not writeable
Would you like to use a personal library instead?  (y/n) y      # Answer 'y'
Would you like to create a personal library                     # (if first ever package!)
~/R/x86_64-pc-linux-gnu-library/4.2
to install packages into?  (y/n) y                              # Answer 'y'

Select a UK mirror when prompted (e.g., UK Bristol which is near the bottom of the list.)

Once the package is installed, you can now check it has installed correctly by loading the library:

library(thing)
  #
  # No output (or some library-specific info) means it is installed correctly.

You can now exit R and then exit from your interactive sessions or install more libraries by repeating the above steps.

q()

Please remember that your usual R usage, to run scripts and process data must be done in batch or via srun (interactively) on a compute node (see above). Do not continue to run computational work on the login node!

In the above instructions replace the module load command with the one appropriate to the R version you wish to use.

If you wish to specify a mirror in the install.packages command instead of selecting it from a menu, try:

install.packages('thing', repos='http://www.stats.bris.ac.uk/R')

Installing a Library from a source package

If you’ve downloaded an R library source file you can add it to your local workspace using the following commands (which assume the source package is in your home directory on the CSF):

Start R with extra command-line args (choose the version of R you require):

module load apps/gcc/R/4.4.2
R CMD INSTALL thing.x.y.z.tar.gz

* installing to library ‘/mnt/iusers01/support/mabcxyz1/R/x86_64-unknown-linux-gnu-library/4.2’
* installing *source* package ‘thing’ ...
** package ‘thing’ successfully unpacked and MD5 sums checked
** R
** data
** demo
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (thing)

Using an installed package

Now run R and test that the library can be loaded:

R
> library(thing)

The compiled library files will be save in a directory named R in your home directory. It contains subdirectories for each version of R so if you want to use the library in different versions of R you will have to repeat the above commands for each version.

nloptr dependency

Some packages fail to install because they depend on the nloptr R package. Trying to install that specific package often fails due to a dependency on the nlopt library, which R fails to compile. So we have provided this as a separate modulefile. For example:

# This will fail due to a failure to compile nloptr
module load apps/gcc/R/4.3.2
R
install.packages("nloptr")           # Other R packages that depend on this one will also fail
q()

# The solution is to load an extra modulefile:
module load apps/gcc/R/4.3.2
module load libs/gcc/nlopt/2.6.2
R
install.packages("nloptr")

Note: you will also need to load the nlopt modulefile in your jobscript when submitting jobs to the batch system.

cmake dependency

If your packages requires cmake to complete its installation, you can load the cmake modulefile before running R, then R will be able to find it:

module load apps/gcc/R/4.3.2
module load tools/gcc/cmake/3.25.1      # Other versions of cmake are available
R
install.packages("mice")

Note: you will likely NOT need to load the cmake modulefile in your jobscript when submitting jobs to the batch system. cmake is usually only used during the installation, not when you run R.

Please see the cmake page for available versions.

Adding BioConductor Packages – R 3.6.0 and newer

Note: This is NOT the method used for older versions of R (3.5 and older). See below for that.

The ‘manager’ for bioconductor has changed in version 3.6.0. Details are given here on how to install BioConductor packages in R 3.6.0 (and up).

# Check the BiocManager version
BiocManager::version()

# See what is installed
BiocManager::available()

# Install a package to your home directory
BiocManager::install(c("esATAC"))
     ## In this case esATAC (replace that with the package you are interested in)
     ## You will be prompted to install to a local (your home) directory as below

Bioconductor version 3.9 (BiocManager 1.30.4), R 3.6.0 (2019-04-26)
Installing package(s) 'esATAC'
Warning in install.packages(pkgs = doing, lib = lib, repos = repos, ...) :
  'lib = "/opt/apps/apps/gcc/R/3.6.0/lib64/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) y   # Answer 'y'
Would you like to create a personal library
‘~/R/x86_64-pc-linux-gnu-library/3.6’
to install packages into? (yes/No/cancel) y                           # Answer 'y'

Adding BioConductor Packages – R 3.5.2 and R 3.4.2

Note: This is NOT the method used for newer versions of R (3.6 and newer). See above for that.

BioConductor packages can be installed in to your local R library (in your home directory) as follows:

# This will automatically load the R modulefile as well
module load libs/bioconductor/3.4
R
source("https://bioconductor.org/biocLite.R")
biocLite("packagename")             # Give a biocLite package name: EG: "S4Vectors"

# You will see some output then be asked to install locally:

  'lib = "/opt/gridware/apps/R/3.4.2/lib64/R/library"' is not writable
Would you like to use a personal library instead?  (y/n) y      # Answer 'y'
Would you like to create a personal library
~/R/x86_64-pc-linux-gnu-library/3.4
to install packages into?  (y/n) y                              # Answer 'y'

The package will be downloaded and installed in to your local R library.

Using BioConductor Packages

BioConductor packages have to be loaded like any other package if you’ve previously installed them. For example, assuming you have installed a BioConductor package named bioThing, to use it in your code use:

# Load/use a BioConductor package named 'bioThing' previously installed
library(bioThing)

Adding rjags and related packages

rjags is a popular package for working with Bayesian graphical models using MCMC. If is also used by other packages such as JMbayes. The rjags package relies on a library named JAGS. This is already installed on the CSF so you can make it available to R by loading its modulefile. This will allow you to then install rjags and related packages such as JMbayes. If you are using R 3.6.2 or later you must load the JAGS modulefile that is compatible with the GCC 8.2.0 compiler (which was used to install R 3.6.2). Here is a complete example of installing JMbayes, which will install rjags in your local R directory in your home directory:

# Packages can be installed while you are on the login node
module purge
module load apps/gcc/R/3.6.2                # Uses GCC 8.2.0
module load apps/gcc/jags/4.3.0-gcc-8.2.0   # Use the gcc-8.2.0 compatible version
R
install.packages("JMbayes")
library(JMbayes)

You will be asked to select a mirror site from which to download the JMbayes packages (we typically use the Bristol UK mirror).

Once the package has been installed, the library(JMbayes) command should be used each time you wish to use the package. You will also need to load the jags modulefile, as well as the R modulefile, in your jobscripts.

Adding RStan

RStan is the R interface to Stan, a popular software for Bayesian Data Analysis. To install RStan you would need to load a couple of additional libraries as modules. You would also have to load these modules every time you need to run RStan, e.g in a batch file.
Here is an example of installing RStan in your local R directory in your home directory:

# Request an interactive job on a compute node for 1hr.
srun -p interactive -t 0-1 --pty bash


# When srun finds a node for you, load the required modules
# You need to load these modules every time you use RStan (e.g. in batch jobs)
module purge
module load apps/gcc/R/4.5.0
module load libs/gcc/flexiblas/3.4.5
module load libs/gcc/glpk/5.0

# Now run R at the CSF command-prompt:
R
# Type below commands inside the R shell
Sys.setenv(DOWNLOAD_STATIC_LIBV8 = 1)
install.packages("rstan", repos = c("https://mc-stan.org/r-packages/", getOption("repos")), dependencies = TRUE)
  #
  # choose a mirror that is close to you, e.g. 64: Bristol
  # answer 'yes' if asked: Would you like to use a personal library instead?
  # Then 'yes' again to confirm the personal library path.
  # It should take around 30 mins to download and compile the packages
  # When done quit and restart R
q()
n
### Now run R again at the CSF command-prompt:
R
# check that rstan is installed
library("rstan")
# Run the example in the rstan documentation to test (optional)
> example(stan_model, package = "rstan", run.dontrun = TRUE)

Listing Packages

To list the installed packages run:

installed.packages();

To list loaded packages run:

(.packages())

Removing Packages

Should you need to delete an installed package:

module load apps/gcc/R/version
R
remove.packages('thing')

This will remove it from your local library of packages, for the version of R you are currently using. If you’ve used several versions of R over time and have installed the package with each one, you would need to load the modulefile for each version and remove the package from each one in turn.

R Project Environments

We advise using renv to install R packages that you need. The renv package helps you create conflict-free reproducible environments for your R projects. Using renv you can maintain separate project folder with their own set of R packages and they will not conflict with other R packages in other renv project folders or R modules. This is somewhat similar to using conda virtual environments. Benefit of using renv includes:

Isolation: Installing a new or updated package for one pipeline will not break your other projects/piplelines, and vice versa. That’s because renv gives each project its own private library. You can have separate isolated project directory for each of your work/pipelines with its own sets of packages.

Portability: You can easily transport your projects from one cluster/computer to another, even across different platforms. renv makes it easy to install the packages your project depends on in the new environment.

Reproducibility: renv records the exact package versions you have installed in a project. This helps in ensuring those exact versions are installed wherever you want to move your work to.

The main steps involved are:

Create a separate directory for each of your projects/pipelines and move (cd) to that directory.

*If you are accessing multiple servers/clusters from the same home directory and you are using different R modules for them, it is important that you use renv for your projects/pipelines to avoid conflicts.

Load appropriate R module as per your requirement.

*It is advisable to use the latest R module available for your system as packages of older versions are not always maintained in the CRAN repositories.

Start R

Install ‘renv’ package.

Initialize renv.

This will set up a project library. Following files and folder are created in that directory at the time of initialization which records the packages and the metadata needed to reinstall them: renv.lock, .Rprofile & renv.
These files and folders should not be altered. You can see them when you exit R and run the command ls -al from that directory later.

Quit/Exit R.

*This is IMPORTANT, you need to quit/exit R after you have initialized renv for the first time after installation.

Start R again from the same directory.

Install required R package(s).

Create a snapshot of the installation.

Create a text file and add information like the platform/server the project was created and the R and other modules that were used, inside the project folder for your future reference.

If needed, repeat the same steps for a different project/pipleline requiring different sets of packages in a separate directory.

To run a job using a package installed within a specific project created like this please see the sample jobscripts below:

Here are the commands needed to perform the steps described above.
In this example we will install only the ‘BiocManager’ package in this renv R Project folder, but you can install as many packages you need in a project.

mkdir ~/MyFirstRProject
cd ~/MyFirstRProject
module load apps/gcc/R/4.4.2
R
install.packages("renv")
# Select the preferred CRAN mirror from the presented list
renv::init()
q()
n
ls -al
R
install.packages("BiocManager")
# Run commands to install additional R packages if needed
Y
renv::snapshot()
q()
n

cat README.txt
------------------------------------------------------------
| Project Platform: CSF3_EL9                               |
| Project directory location: ~/MyFirstRProject            |
| Modules used: apps/gcc/R/4.4.2                           |
| Packages Installed: BiocManager                          |
------------------------------------------------------------

Sample Jobscripts to run jobs using packages installed within a specific project created using ‘renv’

#!/bin/bash --login
#SBATCH -p serial   # Partition name is required (serial will default to 1 core)
#SBATCH -t 4-0      # Job "wallclock" limit is required. Max permitted is 7 days (7-0)

module load apps/gcc/R/4.4.2
cd ~/MyFirstRProject      # You need to cd to the renv project folder first.
R CMD BATCH --no-restore input.R

For more information on ‘renv’ please visit this link.

Plotting

If you wish to plot graphs, for example, to image files, you will need to use the cairo plotting device.

The following example generates a histogram and plots it to a .png file and a .jpg file. The jobscript is:

#!/bin/bash --login
#$ -cwd
#$ -l short
module load apps/gcc/R/4.4.0
R CMD BATCH --no-restore plot.R

The R-code is:

# R script to demonstrate plotting to image files on the CSF

# Enable cairo device (needed to prevent 'X11 not available' errors)
options(bitmapType='cairo')

# Initialize some data to plot
x = rnorm(100)

# Save a png plot
png(file="hist.png")
hist(x)
rug(x,side=1)
dev.off()

# How about jpg
jpeg(file="hist.jpg")
hist(x)
rug(x,side=1)
dev.off()

# R 3.6.1 (and later) can also do tiff
tiff(file="hist.tif")
hist(x)
rug(x,side=1)
dev.off()

Now to view your images while on the CSF, use the eye of gnome (eog) image viewer:

# List the image files created by the above example
ls hist.*
hist.jpg  hist.png  hist.tif

# Use the image viewer name 'eog' (Eye of Gnome) on the CSF login node
eog hist.png

If you need other image file formats you can then convert your PNG file using the convert command-line tool, available on the login node or can be run in your jobscript (note that convert is a Linux command-line program, not an R function):

# Using the hist.png example file from the above R script, convert it to another format:
convert hist.png hist.tif      # R 3.6.1 can write tif files directly (see above) but older versions can't

# How about a .pdf
convert hist.png hist.pdf

# Now view a .pdf on the login node
evince hist.pdf

Installing a package from source

The following example shows how to install an R package from source. This allows us to modify the source to resolve a C++ problem with the std::isnan method. Here we install the igraph package.

module purge
module load apps/gcc/R/4.4.2   # Use which ever version you require
mkdir -p ~/software
cd ~/software
# Download igraph v2.1.4
wget https://www.stats.bris.ac.uk/R/src/contrib/igraph_2.1.4.tar.gz
tar xzf igraph_2.1.4.tar.gz
# Modify the source to remove the "using std::isnan" declaration.
grep -lr '^using std::isnan' igraph | xargs sed -i 's@^using std::isnan@//\0@'
# Install the package from source
R -e "install.packages('$PWD/igraph/', repos=NULL, type='source')"

You can then use the library in the usual manner

R
library(igraph)

Error in socketAccept

If you get an error message like the following while running any R package that you have installed from somewhere:

 
Error in socketAccept(socket = socket, blocking = TRUE, open = "a+b",  :
  all 128 connections are in use
....
Execution halted

this might point to the fact that a part of code in your installed package is trying to run across all CPU cores instead of the number of CPU cores that you have requested for the job. In such a case you should refer the documentation of the package to understand how to set/control the number of CPU cores utilized by it and then submit your job requesting that many CPU cores.

Further Info

R website
Bioconductor website
There is a University R user group and an external Manchester R group.
R, Open Research, and Reproducibility by Andrew Stewart course materials.

Updates

3.6.1 was installed October 2019.
3.6.0 was installed June 2019.
3.5.2 was installed Feb 2019.

Last modified on April 28, 2026 at 2:45 pm by George Leaver

Page Contents

R & Bioconductor

Overview

Restrictions on use

Set up procedure

BioConduction Installation

Running the application

Serial Batch job

Parallel Batch Job (single node, multi-core)

Running R interactively

Adding packages

Check if a package is already installed

Install a package by Automatically Downloading from CRAN (the default repo)

Installing a Library from a source package

Using an installed package

nloptr dependency

cmake dependency

Adding BioConductor Packages – R 3.6.0 and newer

Adding BioConductor Packages – R 3.5.2 and R 3.4.2

Using BioConductor Packages

Adding rjags and related packages

Adding RStan

Listing Packages

Removing Packages

R Project Environments

Sample Jobscripts to run jobs using packages installed within a specific project created using ‘renv’

Plotting

Installing a package from source

Error in socketAccept

Further Info

Updates