cuda-torch
Overview
Torch is a scientific computing framework with wide support for machine learning algorithms that puts GPUs first. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation.
Version 7 is installed on the CSF using CUDA 7.5. It has been installed inside a singularity image to help with dependency installation. The image is based on a docker recipe at: https://github.com/Kaixhin/dockerfiles/blob/master/cuda-torch/cuda_v7.5/Dockerfile. If you require a more generic Torch installation please contact us.
Restrictions on use
There are no restrictions on accessing this software on the CSF. It is released under a permissive open source license and all usage must adhere to that license.
Set up procedure
We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.
Load one of the following modulefiles:
module load apps/singularity/cuda-torch/7.5
Running the application
Please do not run torch on the login node. Jobs should be submitted to the compute nodes via batch.
After loading the modulefile there are three scripts available for you to run. See below for examples of how to run these scripts.
# Can be run from a jobscript or an interactive session # Run the torch 'th' interpreter. Pass in the name of a .lua file on the command-line. # Without any .lua file it will run interactively in the shell (used with qrsh) cuda-torch-singularity-th filename.lua # Must be run from an interactive session # Run the singularity image interactively. You will be able to run any commands that have been # installed in the image. Your scratch and home directories are visible. Used with qrsh. cuda-torch-singularity-inter # Must be run from the login node - it will submit a job for you # Runs a jupyter notebook using the python installation in the singularity image. # Follow the instructions displayed in your login window. Add -g N were N is the # number of GPUs to use. cuda-torch-singularity-jnotebook [-g N]
Serial batch job submission
Create a batch submission script (which will load the modulefile in the jobscript), for example:
#!/bin/bash --login #$ -cwd # Job will run from the current directory # NO -V line - we load modulefiles in the jobscript # Load the version you require module load apps/singularity/cuda-torch/7.5 # Run the 'th' command on a .lua file cuda-torch-singularity-th filename.lua
Submit the jobscript using:
qsub scriptname
where scriptname is the name of your jobscript.
Parallel batch job submission
Add the following two lines to the above jobscript before the fastANI
command to make the job a parallel (multi-core job):
#$ -pe smp.pe 8 # Number of cores, can be 2--32. export OMP_NUM_THREADS=$NSLOTS # Inform torch how many cores it can use
Interactive Usage
The following example shows how to start an interactive session on a GPU node and then start the th
torch app. This is useful when developing your torch code.
# Wait for a free GPU node and log in to it. Here we ask for 2 GPUs: qrsh -l v100=2 bash # Now on the GPU node, set up to use cuda-torch module load apps/singularity/cuda-torch/7.5 # Now run the 'th' app cuda-torch-singularity-th ______ __ | Torch7 /_ __/__ ________/ / | Scientific computing for Lua. / / / _ \/ __/ __/ _ \ | Type ? for help /_/ \___/_/ \__/_//_/ | https://github.com/torch | http://torch.ch th> Enter torch commands here th> exit # Now return to the login node exit
Singularity Image Recipe
For details of how the singularity image was build (i.e., which packages were installed in the image) you can access the recipe file using:
module load apps/singularity/cuda-torch/7.5 cd $CUDA_TORCH_HOME/build ls cat build.sh cat Cuda-torch.v2
If you wish to build your own singularity image you should copy the above files to your local PC and run singularity there. For security reasons only the sysadmins can build images on the CSF.
Running the Jupyter Notebook
Running a jupyter notebook is a little more complex on the CSF. You must submit a job to start a notebook server and then wait for it to run. Once running, you must create an ssh tunnel from your local PC to the notebook server running on a compute node. Then you can connect a web-browser running on your local PC to the notebook server via the ssh tunnel.
The script we provide will help with the above – it submits the batch job then instructs you how to set up the ssh tunnel and which web address to use.
Run the following command on the login node and then follow the instructions displayed in your login window:
# Add -g N to use N GPUs. For example cuda-torch-singularity-jnotebook -g 2
You must read carefully the information displayed in your login window!
Further info
Updates
None.