Besso

Specification

Besso comprises one server hosting two Nvidia GPUs. The host:

  • Two Six-core Intel Sandybridge CPUs
  • 190 GB RAM

Nvidia GPUs:

  • Two x Tesla K40c, 12GB GPU memory, 2880 CUDA cores, CUDA compute capability 3.5

For full specifications please run deviceQuery after logging in to a k40 node (see below for how to do this correctly) using the commands:

module load libs/cuda            # Load most recent CUDA modulefile
deviceQuery                      #       (other versions available)

The CUDA driver is v390.46.

To assist with running Amber on the GPUs, both GPUs have been modified to provide:

  • Persistence (nvidia-smi -pm 1)
  • Compute Exclusive Mode (nvidia-smi -c 3)
  • Boosted clocks (nvidia-smi -ac 2600,758)

These settings are applied at boot (see /etc/rc.local on besso).

Getting Access to Besso

To gain access, please email the ITS RI team at its-ri-team@manchester.ac.uk.

Restrictions on Access

Priority is given to those who funded the system, but other University of Manchester academics and computational researchers may gain access to evaluation and pump-priming purposes.

Accessing the Host Node

For interactive use

From the Zrek login node, use qrsh to log in to the k40 node. This will give you a command-line on a besso (the k40 node) node and you can then run GUI apps or non-GUI compute apps:

  • To reserve one (of the two) k40 GPUs in one of the hosts:
    qrsh -l k40 bash
  • To reserve both k40 GPUs in one of the hosts:
    qrsh -l k40duo bash

Reminder: run the above commands on the Zrek login node! No password will required from the zrek login node.

Once you’ve been logged in to the k40 node you should now load any modulefiles (see below) required for your applications.

You can also open more terminals (command-line windows) on that node by running:

xterm &

For traditional batch jobs

From the Zrek login node, batch jobs (non-interactive) can be submitted using qsub jobscript and the jobscript should contain the following line to run on a single GPU:

#$ -l k40

or, to run on both GPUs in the same host:

#$ -l k40duo

Once you have submitted the batch job you can even log out of zrek – the job will be in the system and zrek will run it when a suitable GPU node becomes free.

Using the GPUs

Once you have been allocated a GPU by either qrsh or qsub you will have exclusive access to that GPU. The environment variable

CUDA_VISIBLE_DEVICES

will be set to either 0 or 1, or 0,1 to indicate which GPU(s) in the host you have access to.

Most CUDA-capable applications will use this variable to determine which GPU to use at runtime (e.g., pmemd.cuda and MATLAB will honour this setting). You should NOT assume you have been given the first GPU in the system – another user may be using that. Hence if you have the option of specifying a fixed GPU id in your software you should generally not do so – let the CUDA library use the above environment variable instead.

You can determine which GPU you have been allocated as follows:

echo $CUDA_VISIBLE_DEVICES

It will show either 0 or 1 or 0,1. If you don’t see any output then you have logged in to the node incorrectly. Log out (use exit) and log back in again using the method above.

Note: if you want to open more terminals (command-line windows) to run other programs on the node, simpy run

xterm &

to get a new window.

The CUDA toolkit contains an application named deviceQuery to report device properties. It will display properties for the device(s) you have access to. For example:

module load libs/cuda

# Report what we have been allocated (in this example we requested  'k40duo'):
echo $CUDA_VISIBLE_DEVICES
0,1
deviceQuery | grep ^De
  
   Detected 2 CUDA Capable device(s)
   Device 0: "Tesla K40c"
   Device 1: "Tesla K40c"

It is possible to check whether a GPU is free by running:

nvidia-smi

This will show stats about the two GPUs. For example, the following output shows that GPU 0 is busy running pmemd.cuda:

[mxyzabc1@besso ~]$ nvidia-smi

+------------------------------------------------------+                       
| NVIDIA-SMI 352.39     Driver Version: 352.39         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K40c          On   | 0000:02:00.0     Off |                    0 |
| 23%   37C    P8    20W / 235W |     37MiB / 11519MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K40c          On   | 0000:03:00.0     Off |                    0 |
| 23%   32C    P8    21W / 235W |     23MiB / 11519MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      2505    G   pmemd.cuda                                     630MiB |
+-----------------------------------------------------------------------------+

The GPUs are setup to run in compute-exclusive mode so if you try to use a GPU already in use your application will fail.

Last modified on May 1, 2018 at 8:35 am by George Leaver