Research Infrastructure > CSF2 (retired) > Software > Nvidia GPUs and CUDA

- Recent Posts & Updates

Page Contents

The CSF2 has been replaced by the CSF3 - please use that system! This documentation may be out of date. Please read the CSF3 documentation instead.
To display this old CSF2 page click here.

Nvidia GPUs and CUDA

Overview – current GPGPUs

June 2017: There are a total of 3 Nvidia GPGPUs in production in the CSF.

We hope to purchase some more GPUs in late 2017/early 2018 – please get in touch (its-ri-team@manchester.ac.uk) if you would like to be involved in that procurement.

Three K20s (two hosted in one 12 core compute node, one hosted in one 12 core compute node)

Retired GPGPUs

The NVida 2050s and 2070s have all been retired due to hardware faults.

~~Seven blade servers each hosting one Nvidia card, two of which are M2070 cards and five are M2050 cards.~~
~~16 Nvidia M2050 GPUs, two hosted on each of eight Intel compute nodes.~~
~~The eight M2050 hosts are connected by Infiniband, so are ideal for computational jobs based on both MPI and CUDA.~~

Hardware and Software Versions

Driver: 384.81
CUDA Driver 9.0 / Runtime 9.0
CUDA toolkit 9.0.176 (earlier versions also available via modulefiles)
CUDA Capability Major/Minor version number: 3.5
OpenCL Device 1.2 / OpenCL C 1.2

Restrictions on who can use these GPUs

Access to the GPGPUs is more restrictive than that for standard compute nodes. Please email its-ri-team@manchester.ac.uk before attempting to use these resources with brief details of what you wish to use them for.

The K20s are usually only accessible by a specific group from MACE.

Set up procedure

Once you have emailed its-ri-team@manchester.ac.uk and been granted access, set up your environment by loading the appropriate module from the following:

# Load one of the following modulefiles:
module load 9.0.176
module load 8.0.44
module load 7.5.18
module load 6.5.14

# These are very old versions
module load 5.5.22
module load 4.2.9
module load 4.1.28
module load 4.0.17
module load 3.2.16

Other Libraries

The Nvidia cuDNN libraries are also available via the following modulefiles. Before you load these modulefiles you must load one of the cuda modulefiles from above – the list below indicates which versions of cuda can be used with the different cuDNN versions:

module load libs/cuDNN/7.0.3         # Load cuda 8.0.44 or 9.0.176 first
module load libs/cuDNN/6.0.21        # Load cuda 7.5.18 or 8.0.44 first
module load libs/cuDNN/5.1.5         # Load cuda 7.5.18 or 8.0.44 first

Compiling GPU Code

The following sections describe how to compile CUDA and OpenCL code on CSF.

CUDA

CUDA code can be compiled on the login node provided you are using the CUDA runtime library, and not the CUDA driver library. The runtime library is used when you allow CUDA to automatically set up the device. That is, your CUDA code uses the style where you assume CUDA will be set up on the first CUDA function call. For example:

#include <cuda_runtime.h>

int main( void ) {

   // We assume CUDA will set up the GPU device automatically
   cudaMalloc( ... );
   cudaMemcpy( ... );
   myKernel<<<...>>>( ... );
   cudaMemcpy( ... );
   cudaFree( ... );
   return 0;
}

The CUDA driver library allows much more low-level control of the GPU device (and makes CUDA set up more like OpenCL). In that case you must compile on a GPU node because the CUDA driver library is only available on the backend GPU nodes. Driver code will contain something like the following:

#include <cuda.h>

int main( void ) {

  // Low-level device setup using the driver API
  cuDeviceGetCount( ... );
  cuDeviceGet( ... );
  cuDeviceGetName( ... );
  cuDeviceComputeCapability( ... );
  ...

  return 0;
}

No matter where you compile your code you cannot run your CUDA code on the login node because it does not contain any GPUs (see the next section for running your code).

The CUDA libraries and header files are available in the following directories once you have loaded the CUDA module:

# All nodes
$CUDA_HOME/lib64     # CUDA runtime library, CUBlas, CURand etc
$CUDA_HOME/include

# On a GPU node only
/usr/lib64           # CUDA driver library

It is beyond the scope of this page to give a tutorial on CUDA compilation (there are many possible flags for the nvcc compiler). The CUDA GPU Programming SDK available on CSF in $CUDA_SDK gives many examples of CUDA programs and how to compile them. However, a simple compile line to run on the command line would be as follows

nvcc -o myapp myapp.cu -I$CUDA_HOME/include -L$CUDA_HOME/lib64 -lcudart

To use the above line in a Makefile, enclose the variable names in brackets as follows

# Simple CUDA Makefile
CC = nvcc

all: myapp

myapp: myapp.cu
        $(CC) -o myapp myapp.cu -I$(CUDA_HOME)/include -L$(CUDA_HOME)/lib64 -lcudart
# note: the preceeding line must start with a TAB, not 8 spaces. 'make' requires a TAB!

The above to compilation methods use the CUDA runtime libary (libcudart) and so can be used to compile on the login node.

OpenCL

Please see OpenCL programming on CSF for compiling OpenCL code.

Running the application

All work on the Nvidia GPUs must be via the batch system. There are two types of environments which can be used. First, batch, for non-interactive computational work; this should be used where possible. Secondly, an interactive environment for debugging and other necessarily-interactive work.

Resource Limits

K20 GPUs

Maximum job runtime is 14 days. Currently most users are restricted to one job running at any one time. This is due to the small number of GPUs available and the high demand for those GPUs.

Example Job Submission Scripts and Commands

As stated above, all jobs must be submitted to the batch system, whether for non-interactive (possibly long) computational runs or for short interactive runs. Jobs should be submitted to the batch system ensuring that the appropriate GPU resources are requested. Examples of jobscripts and commands to access the GPU resources are given below. In all cases ensure you have the appropriate module loaded (see above).

Serial batch job submission to K20 GPUs

Ensure you have the appropriate CUDA module loaded (see above), then use the following jobscript (note the use of the nvidia_ib resource)

#!/bin/bash
#$ -cwd
#$ -V
#$ -l nvidia_k20

./my_gpu_prog arg1 arg2

Submit the job in the usual way

qsub gpujob.sh

Interactive use of the K20 GPUs with X11

If you are familiar with the use of X11 (X-Windows), load the appropriate environment module, then enter

qrsh -cwd -V -l inter -l nvidia_k20 xterm

Within the xterm, for example

./my_gpu_prog

CUDA and OpenCL SDK Examples (e.g., deviceQuery)

The CUDA SDK contains many example CUDA and OpenCL programs which can be compiled and run. A useful one is deviceQuery (and oclDeviceQuery) which gives you lots of information about the Nvidia GPU hardware.

Version 5.5.22 and later

In CUDA 5.5 and up there is no separate SDK installation directory. Instead the CUDA toolkit (which provides the nvcc compiler, profiler and numerical libraries) also contains a Samples directory. The examples have already been compiled but you may also take a copy of the samples so that you can modify them. You can access the samples by loading the CUDA modulefile and then going in to the directory:

cd $CUDA_SAMPLES

The compiled samples are available using

cd $CUDA_SAMPLES/bin/x86_64/linux/release/

As always, running the samples on the login node won’t work – there’s no GPU there!

Version 4.2.9 and earlier

In CUDA 4.2.9 the CUDA SDK provides the sample files and is separate to the CUDA toolkit (which provides the nvcc compiler, profiler and numerical libraries). You’ll need to copy the entire SDK to your home (or scratch) area. Compile the SDK on a GPU node, not the login node because some of the examples use the CUDA driver library (e.g., see $CUDA_SDK/C/src/vectorAddDrv/) and OpenCL examples can only be compiled on a GPU node. For example:

# First start an interactive session on a GPU node
qrsh -l inter -l nvidia

# Once the interactive session starts:
module load libs/cuda/4.2.9
export CUDA_INSTALL_PATH=$CUDA_HOME      # Needs adding to the modulefile?
mkdir ~/cuda-sdk
cd ~/cuda-sdk
cp -r $CUDA_SDK .                        # notice the '.' at the end of this command!
cd 4.2.9
make -k

# Run one of the examples (deviceQuery) while still on the GPU node
./C/bin/linux/release/deviceQuery
./OpenCL/bin/linux/release/oclDeviceQuery

# End your interactive session
exit

# You are now back on the login node

The CUDA and OpenCL example programs are just like any other GPU code so please see the instructions earlier on running code either in batch or interactively on a GPU node.

Further info

Applications and compilers which can use the Nvidia GPUs are being installed on the CSF. Links to the appropriate documentation will be provided here and will include:

Last modified on July 10, 2018 at 1:01 pm by Pen Richardson

Page Contents