cuda-torch

Overview

Torch is a scientific computing framework with wide support for machine learning algorithms that puts GPUs first. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation.

Version 7 is installed on the CSF using CUDA 7.5. It has been installed inside a singularity image to help with dependency installation. The image is based on a docker recipe at: https://github.com/Kaixhin/dockerfiles/blob/master/cuda-torch/cuda_v7.5/Dockerfile. If you require a more generic Torch installation please contact us.

Restrictions on use

There are no restrictions on accessing this software on the CSF. It is released under a permissive open source license and all usage must adhere to that license.

Set up procedure

We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.

Load one of the following modulefiles:

module load apps/singularity/cuda-torch/7.5

Running the application

Please do not run torch on the login node. Jobs should be submitted to the compute nodes via batch.

After loading the modulefile there are three scripts available for you to run. See below for examples of how to run these scripts.

# Can be run from a jobscript or an interactive session
# Run the torch 'th' interpreter. Pass in the name of a .lua file on the command-line.
# Without any .lua file it will run interactively in the shell (used with qrsh)
cuda-torch-singularity-th filename.lua

# Must be run from an interactive session
# Run the singularity image interactively. You will be able to run any commands that have been
# installed in the image. Your scratch and home directories are visible. Used with qrsh.
cuda-torch-singularity-inter

# Must be run from the login node - it will submit a job for you
# Runs a jupyter notebook using the python installation in the singularity image.
# Follow the instructions displayed in your login window. Add -g N were N is the
# number of GPUs to use.
cuda-torch-singularity-jnotebook [-g N]

Serial batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
                    # NO -V line - we load modulefiles in the jobscript

# Load the version you require
module load apps/singularity/cuda-torch/7.5

# Run the 'th' command on a .lua file
cuda-torch-singularity-th filename.lua

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Parallel batch job submission

Add the following two lines to the above jobscript before the fastANI command to make the job a parallel (multi-core job):

#$ -pe smp.pe 8                    # Number of cores, can be 2--32.
export OMP_NUM_THREADS=$NSLOTS     # Inform torch how many cores it can use

Interactive Usage

The following example shows how to start an interactive session on a GPU node and then start the th torch app. This is useful when developing your torch code.

# Wait for a free GPU node and log in to it. Here we ask for 2 GPUs:
qrsh -l v100=2 bash

# Now on the GPU node, set up to use cuda-torch
module load apps/singularity/cuda-torch/7.5

# Now run the 'th' app
cuda-torch-singularity-th
  ______             __   |  Torch7 
 /_  __/__  ________/ /   |  Scientific computing for Lua. 
  / / / _ \/ __/ __/ _ \  |  Type ? for help 
 /_/  \___/_/  \__/_//_/  |  https://github.com/torch 
                          |  http://torch.ch 
	
th> Enter torch commands here

th> exit

# Now return to the login node
exit

Singularity Image Recipe

For details of how the singularity image was build (i.e., which packages were installed in the image) you can access the recipe file using:

module load apps/singularity/cuda-torch/7.5
cd $CUDA_TORCH_HOME/build
ls
cat build.sh
cat Cuda-torch.v2

If you wish to build your own singularity image you should copy the above files to your local PC and run singularity there. For security reasons only the sysadmins can build images on the CSF.

Running the Jupyter Notebook

Running a jupyter notebook is a little more complex on the CSF. You must submit a job to start a notebook server and then wait for it to run. Once running, you must create an ssh tunnel from your local PC to the notebook server running on a compute node. Then you can connect a web-browser running on your local PC to the notebook server via the ssh tunnel.

The script we provide will help with the above – it submits the batch job then instructs you how to set up the ssh tunnel and which web address to use.

Run the following command on the login node and then follow the instructions displayed in your login window:

# Add -g N to use N GPUs. For example
cuda-torch-singularity-jnotebook -g 2

You must read carefully the information displayed in your login window!

Further info

Updates

None.

Last modified on December 18, 2020 at 9:05 am by George Leaver