Besso
Specification
Besso comprises one server hosting two Nvidia GPUs. The host:
- Two Six-core Intel Sandybridge CPUs
- 190 GB RAM
Nvidia GPUs:
- Two x Tesla K40c, 12GB GPU memory, 2880 CUDA cores, CUDA compute capability 3.5
For full specifications please run deviceQuery
after logging in to a k40 node (see below for how to do this correctly) using the commands:
module load libs/cuda # Load most recent CUDA modulefile deviceQuery # (other versions available)
The CUDA driver is v390.46.
To assist with running Amber on the GPUs, both GPUs have been modified to provide:
- Persistence (
nvidia-smi -pm 1
) - Compute Exclusive Mode (
nvidia-smi -c 3
) - Boosted clocks (
nvidia-smi -ac 2600,758
)
These settings are applied at boot (see /etc/rc.local
on besso).
Getting Access to Besso
To gain access, please email the ITS RI team at its-ri-team@manchester.ac.uk.
Restrictions on Access
Priority is given to those who funded the system, but other University of Manchester academics and computational researchers may gain access to evaluation and pump-priming purposes.
Accessing the Host Node
For interactive use
From the Zrek login node, use qrsh
to log in to the k40 node. This will give you a command-line on a besso (the k40 node) node and you can then run GUI apps or non-GUI compute apps:
- To reserve one (of the two) k40 GPUs in one of the hosts:
qrsh -l k40 bash
- To reserve both k40 GPUs in one of the hosts:
qrsh -l k40duo bash
Reminder: run the above commands on the Zrek login node! No password will required from the zrek login node.
Once you’ve been logged in to the k40 node you should now load any modulefiles (see below) required for your applications.
You can also open more terminals (command-line windows) on that node by running:
xterm &
For traditional batch jobs
From the Zrek login node, batch jobs (non-interactive) can be submitted using qsub jobscript
and the jobscript should contain the following line to run on a single GPU:
#$ -l k40
or, to run on both GPUs in the same host:
#$ -l k40duo
Once you have submitted the batch job you can even log out of zrek – the job will be in the system and zrek will run it when a suitable GPU node becomes free.
Using the GPUs
Once you have been allocated a GPU by either qrsh
or qsub
you will have exclusive access to that GPU. The environment variable
CUDA_VISIBLE_DEVICES
will be set to either 0
or 1
, or 0,1
to indicate which GPU(s) in the host you have access to.
Most CUDA-capable applications will use this variable to determine which GPU to use at runtime (e.g., pmemd.cuda and MATLAB will honour this setting). You should NOT assume you have been given the first GPU in the system – another user may be using that. Hence if you have the option of specifying a fixed GPU id in your software you should generally not do so – let the CUDA library use the above environment variable instead.
You can determine which GPU you have been allocated as follows:
echo $CUDA_VISIBLE_DEVICES
It will show either 0
or 1
or 0,1
. If you don’t see any output then you have logged in to the node incorrectly. Log out (use exit
) and log back in again using the method above.
Note: if you want to open more terminals (command-line windows) to run other programs on the node, simpy run
xterm &
to get a new window.
The CUDA toolkit contains an application named deviceQuery
to report device properties. It will display properties for the device(s) you have access to. For example:
module load libs/cuda # Report what we have been allocated (in this example we requested 'k40duo'): echo $CUDA_VISIBLE_DEVICES 0,1 deviceQuery | grep ^De Detected 2 CUDA Capable device(s) Device 0: "Tesla K40c" Device 1: "Tesla K40c"
It is possible to check whether a GPU is free by running:
nvidia-smi
This will show stats about the two GPUs. For example, the following output shows that GPU 0 is busy running pmemd.cuda
:
[mxyzabc1@besso ~]$ nvidia-smi +------------------------------------------------------+ | NVIDIA-SMI 352.39 Driver Version: 352.39 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K40c On | 0000:02:00.0 Off | 0 | | 23% 37C P8 20W / 235W | 37MiB / 11519MiB | 0% E. Process | +-------------------------------+----------------------+----------------------+ | 1 Tesla K40c On | 0000:03:00.0 Off | 0 | | 23% 32C P8 21W / 235W | 23MiB / 11519MiB | 0% E. Process | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 2505 G pmemd.cuda 630MiB | +-----------------------------------------------------------------------------+
The GPUs are setup to run in compute-exclusive mode so if you try to use a GPU already in use your application will fail.