The CSF2 has been replaced by the CSF3 - please use that system! This documentation may be out of date. Please read the CSF3 documentation instead. To display this old CSF2 page click here. |
Nvidia GPUs and CUDA
Overview – current GPGPUs
June 2017: There are a total of 3 Nvidia GPGPUs in production in the CSF.
We hope to purchase some more GPUs in late 2017/early 2018 – please get in touch (its-ri-team@manchester.ac.uk) if you would like to be involved in that procurement.
- Three K20s (two hosted in one 12 core compute node, one hosted in one 12 core compute node)
Retired GPGPUs
The NVida 2050s and 2070s have all been retired due to hardware faults.
Seven blade servers each hosting one Nvidia card, two of which are M2070 cards and five are M2050 cards.16 Nvidia M2050 GPUs, two hosted on each of eight Intel compute nodes.The eight M2050 hosts are connected by Infiniband, so are ideal for computational jobs based on both MPI and CUDA.
Hardware and Software Versions
- Driver: 384.81
- CUDA Driver 9.0 / Runtime 9.0
- CUDA toolkit 9.0.176 (earlier versions also available via modulefiles)
- CUDA Capability Major/Minor version number: 3.5
- OpenCL Device 1.2 / OpenCL C 1.2
Restrictions on who can use these GPUs
Access to the GPGPUs is more restrictive than that for standard compute nodes. Please email its-ri-team@manchester.ac.uk before attempting to use these resources with brief details of what you wish to use them for.
The K20s are usually only accessible by a specific group from MACE.
Set up procedure
Once you have emailed its-ri-team@manchester.ac.uk and been granted access, set up your environment by loading the appropriate module from the following:
# Load one of the following modulefiles: module load 9.0.176 module load 8.0.44 module load 7.5.18 module load 6.5.14 # These are very old versions module load 5.5.22 module load 4.2.9 module load 4.1.28 module load 4.0.17 module load 3.2.16
Other Libraries
The Nvidia cuDNN libraries are also available via the following modulefiles. Before you load these modulefiles you must load one of the cuda modulefiles from above – the list below indicates which versions of cuda can be used with the different cuDNN versions:
module load libs/cuDNN/7.0.3 # Load cuda 8.0.44 or 9.0.176 first module load libs/cuDNN/6.0.21 # Load cuda 7.5.18 or 8.0.44 first module load libs/cuDNN/5.1.5 # Load cuda 7.5.18 or 8.0.44 first
Compiling GPU Code
The following sections describe how to compile CUDA and OpenCL code on CSF.
CUDA
CUDA code can be compiled on the login node provided you are using the CUDA runtime library, and not the CUDA driver library. The runtime library is used when you allow CUDA to automatically set up the device. That is, your CUDA code uses the style where you assume CUDA will be set up on the first CUDA function call. For example:
#include <cuda_runtime.h> int main( void ) { // We assume CUDA will set up the GPU device automatically cudaMalloc( ... ); cudaMemcpy( ... ); myKernel<<<...>>>( ... ); cudaMemcpy( ... ); cudaFree( ... ); return 0; }
The CUDA driver library allows much more low-level control of the GPU device (and makes CUDA set up more like OpenCL). In that case you must compile on a GPU node because the CUDA driver library is only available on the backend GPU nodes. Driver code will contain something like the following:
#include <cuda.h> int main( void ) { // Low-level device setup using the driver API cuDeviceGetCount( ... ); cuDeviceGet( ... ); cuDeviceGetName( ... ); cuDeviceComputeCapability( ... ); ... return 0; }
No matter where you compile your code you cannot run your CUDA code on the login node because it does not contain any GPUs (see the next section for running your code).
The CUDA libraries and header files are available in the following directories once you have loaded the CUDA module:
# All nodes $CUDA_HOME/lib64 # CUDA runtime library, CUBlas, CURand etc $CUDA_HOME/include # On a GPU node only /usr/lib64 # CUDA driver library
It is beyond the scope of this page to give a tutorial on CUDA compilation (there are many possible flags for the nvcc compiler). The CUDA GPU Programming SDK available on CSF in $CUDA_SDK
gives many examples of CUDA programs and how to compile them. However, a simple compile line to run on the command line would be as follows
nvcc -o myapp myapp.cu -I$CUDA_HOME/include -L$CUDA_HOME/lib64 -lcudart
To use the above line in a Makefile, enclose the variable names in brackets as follows
# Simple CUDA Makefile CC = nvcc all: myapp myapp: myapp.cu $(CC) -o myapp myapp.cu -I$(CUDA_HOME)/include -L$(CUDA_HOME)/lib64 -lcudart # note: the preceeding line must start with a TAB, not 8 spaces. 'make' requires a TAB!
The above to compilation methods use the CUDA runtime libary (libcudart) and so can be used to compile on the login node.
OpenCL
Please see OpenCL programming on CSF for compiling OpenCL code.
Running the application
All work on the Nvidia GPUs must be via the batch system. There are two types of environments which can be used. First, batch, for non-interactive computational work; this should be used where possible. Secondly, an interactive environment for debugging and other necessarily-interactive work.
Resource Limits
K20 GPUs
Maximum job runtime is 14 days. Currently most users are restricted to one job running at any one time. This is due to the small number of GPUs available and the high demand for those GPUs.
Example Job Submission Scripts and Commands
As stated above, all jobs must be submitted to the batch system, whether for non-interactive (possibly long) computational runs or for short interactive runs. Jobs should be submitted to the batch system ensuring that the appropriate GPU resources are requested. Examples of jobscripts and commands to access the GPU resources are given below. In all cases ensure you have the appropriate module loaded (see above).
Serial batch job submission to K20 GPUs
Ensure you have the appropriate CUDA module loaded (see above), then use the following jobscript (note the use of the nvidia_ib
resource)
#!/bin/bash #$ -cwd #$ -V #$ -l nvidia_k20 ./my_gpu_prog arg1 arg2
Submit the job in the usual way
qsub gpujob.sh
Interactive use of the K20 GPUs with X11
If you are familiar with the use of X11 (X-Windows), load the appropriate environment module, then enter
qrsh -cwd -V -l inter -l nvidia_k20 xterm
Within the xterm, for example
./my_gpu_prog
CUDA and OpenCL SDK Examples (e.g., deviceQuery)
The CUDA SDK contains many example CUDA and OpenCL programs which can be compiled and run. A useful one is deviceQuery
(and oclDeviceQuery
) which gives you lots of information about the Nvidia GPU hardware.
Version 5.5.22 and later
In CUDA 5.5 and up there is no separate SDK installation directory. Instead the CUDA toolkit (which provides the nvcc
compiler, profiler and numerical libraries) also contains a Samples directory. The examples have already been compiled but you may also take a copy of the samples so that you can modify them. You can access the samples by loading the CUDA modulefile and then going in to the directory:
cd $CUDA_SAMPLES
The compiled samples are available using
cd $CUDA_SAMPLES/bin/x86_64/linux/release/
As always, running the samples on the login node won’t work – there’s no GPU there!
Version 4.2.9 and earlier
In CUDA 4.2.9 the CUDA SDK provides the sample files and is separate to the CUDA toolkit (which provides the nvcc
compiler, profiler and numerical libraries). You’ll need to copy the entire SDK to your home (or scratch) area. Compile the SDK on a GPU node, not the login node because some of the examples use the CUDA driver library (e.g., see $CUDA_SDK/C/src/vectorAddDrv/
) and OpenCL examples can only be compiled on a GPU node. For example:
# First start an interactive session on a GPU node qrsh -l inter -l nvidia # Once the interactive session starts: module load libs/cuda/4.2.9 export CUDA_INSTALL_PATH=$CUDA_HOME # Needs adding to the modulefile? mkdir ~/cuda-sdk cd ~/cuda-sdk cp -r $CUDA_SDK . # notice the '.' at the end of this command! cd 4.2.9 make -k # Run one of the examples (deviceQuery) while still on the GPU node ./C/bin/linux/release/deviceQuery ./OpenCL/bin/linux/release/oclDeviceQuery # End your interactive session exit # You are now back on the login node
The CUDA and OpenCL example programs are just like any other GPU code so please see the instructions earlier on running code either in batch or interactively on a GPU node.
Further info
Applications and compilers which can use the Nvidia GPUs are being installed on the CSF. Links to the appropriate documentation will be provided here and will include: