Nvidia H200 GPUs

General Information

The Nvidia H200 GPUs are a significant investment by several contributor projects and the Faculty of Science and Engineering (FSE).

The H200 resource is aimed at high memory machine learning/AI applications and other very GPU-optimised computations.

Before requesting H200 access, please consider whether our general access GPUs (L40S and A100) may be sufficient for your task as they are more numerous and may have similar performance depending on the application.

Details on who can apply for access (and how) are given below.

You must already have CSF3 access to apply to use the H200 GPUs. If you do not yet have a CSF account, request CSF3 access first and familiarise yourself with CSF3 environment before requesting H200 access.

H200 hardware

4x GPU compute nodes

  • Dell PowerEdge XE9680 6U chassis
  • 8x NVIDA H200 with 141 GB VRAM (HGX)
  • 2x Intel Xeon Platinum 8562Y+ 2.8G, 32 Core CPUs, HT disabled (Emerald Rapids)
  • 1.5 TB RAM
  • 7.5 TB NVme storage (localscratch)
  • 8x 100 Gbit Ethernet ports

Additional scratch Storage

  • Dell PowerStore 3200Q, plus expansion enclosure
  • 230 TB of allocatable storage
  • Low-level data deduplication for efficiency
  • 4x 100 Gbit Ethernet ports

Networking

  • Two 100GbE switches connecting the H200 nodes and storage to each other, and to the wider CSF3 network.

Access policy

Only members of H200 contributing projects, and staff or PGDR members of FSE, with an evidenced need for H200 GPUs, may use the H200 partitions.

Members of contributing projects must be approved by the project PI.

Members of FSE who are not part of a contributing project must go through a lightweight application process (see below).

There is no provision for general access by members of faculties other than FSE.

Undergraduates and postgraduate taught students (e.g. Masters) are not eligible for access.

Requesting Access

Provided you meet the access criteria (see above), please raise a Connect Portal request of type: ‘Request access to HPC/HTC’ and enter ‘H200’ in the ‘Service Required’ box.

  • If you are a member of a H200 contributing project, provide the name of the project in the “Additional information” field and ensure the project PI is named.
  • If you are a member of Staff or a PGDR in the FSE, but not part of a contributor project, please provide your PI/line manager/supervisor and state “Applying as a member of FSE” in the “Additional information”. You will then be contacted via the ticket to initiate the lightweight application process.

Running H200 jobs

There are two Slurm partitions:

  • gpuH – for batch jobs
  • gpuH_short – for short-wallclock batch and interactive jobs

In either partition, each GPU can be requested with 1-8 CPU-cores (the default is 1 CPU-core per GPU if unspecified) and your job will be allocated 24GB of host RAM per CPU-core. For example:

  • 1 x GPU job with 8 CPU-cores will have 141 GB VRAM (GPU mem) and 192 GB RAM (host mem)
  • 8 x GPU job with 64 CPU-cores will have 1.1 TB of VRAM (GPU mem) and 1.5 TB of RAM (host mem)

Using the account code

In addition to specifying the partition, all H200 jobs require a Slurm account code.

If your access is approved, you will be sent an account code, which is used in your job scripts like:

#SBATCH --account the-account-name            # or use the -A short-form flag
If you work in more than one project involving H200, you may have more than one H200 account code.

It is your responsibility to use the correct account code for each job.

There may be financial consequences for the University if the funder audits a project and accounting is incorrect e.g. the project shows no H200 use because you’ve incorrectly been running jobs in another account.

If you run batch jobs on the A100 or L40S GPU partitions, you should comment out or remove the H200 account code.

Job Priority and resource limits

The Slurm job priority (fairshare) is set in proportion to the investment made, but may be calibrated to maximise utilisation of the GPUs.

Default simultaneous use limits for both gpuH and gpuH_short partitions:

Account type Max H200 GPUs in use per user Max running jobs per user Max H200 GPUs per Account
Contributing project 8 4 16
FSE general 1 1 16

Note 1: For contributing projects, the above limits mean it is not possible for one user to have eight single-GPU jobs running. The H200 hardware is aimed at multi-GPU work, and so the limits ensure more resource is made available to multi-GPU jobs. Hence, it is possible to have four 2-GPU jobs running or one 8-GPU job running.

Note 2: There are no limits on the number of jobs you can submit. The batch system will run your jobs within the above limits, but you can queue up as many jobs as you like.

Software

All GPU enabled software that is available on CSF3 is available to H200 GPUs – please see the general GPU documentation for more detail.

If you manage your own software environment, e.g. with Conda, check that the versions of CUDA toolkit etc. you are using are built to take advantage of the H200 architecture.

H200 additional scratch storage

This scratch storage location is provided to maximise file I/O for H200 jobs and is available in addition to the CSF3-wide scratch filesystem on ~/scratch.

The additional H200 storage is mounted on the H200 GPU nodes and login nodes at:

/mnt/h200-scratch/${USER}

and as a symbolic link your home directory:

~/h200-scratch

Each user has a 1 TB default limit.

Any data stored on the H200 additional scratch storage should be considered at risk – it is not backed up, there is no recycle bin and it is not possible to recover deleted or corrupted files.

As such, the H200 additional scratch storage should not be used for long-term project storage.

If you need long-term backed-up storage, please consider the RDS, all research projects with an R-code are entitled to some RDS without charge.

For very fast, but small, local-to-node storage, only available while your job is running, please use the directory given by the Slurm job environment variable $TMPDIR. This gives the name of a temporary directory, created by Slurm, on the NVMe storage in each compute node. Slurm will delete the directory at the end of your job.

gpuH partition

This partition is for batch jobs only, 4-day wallclock limit.

Example sbatch for a multithreaded job (no MPI):

#!/bin/bash --login
##
#SBATCH -p gpuH           # H200 141GB GPUs
#SBATCH -G 1              # Number of GPUs
#SBATCH -n 1              # number of tasks
#SBATCH -c 8              # number of CPUs per task
#SBATCH -t 4-0            # Wallclock timelimit, 4 days (0 hours) in this example
##
#SBATCH -A account-name   # mandatory account name for H200
module purge
module load libs/cuda/12.8.1

## Example - print stats of assigned GPU(s)
nvidia-smi -q

Example sbatch for a MPI job, note the CPUs (cores) are specified as tasks:

#!/bin/bash --login
##
#SBATCH -p gpuH           # H200 141GB GPUs
#SBATCH -G 1              # Number of GPUs
#SBATCH -n 8              # number of tasks (MPI processes)
#SBATCH -c 1              # number of CPUs per task (cores per MPI process)
#SBATCH -t 4-0            # Wallclock timelimit, 4 days (0 hours) in this example
##
#SBATCH -A account-name   # mandatory account name for H200
module purge
module load libs/cuda/12.8.1

## Example - print stats of assigned GPU(s)
nvidia-smi -q

Submit with: sbatch filename, where filename is your sbatch file.

To monitor your job, use squeue to confirm your jobid and connect to it with:

srun --jobid=1234567 --pty bash

substituting your specific jobid for 1234567.

gpuH_short partition

This partition is for interactive or batch jobs, 1-day limit.
Example interactive session:
srun -p gpuH_short -G 1 -c 8 -t 0-08 -A account-name --pty bash

The CPU request in srun has particular characteristics:
-n is number of tasks (default if unspecified = 1)
-c is CPUs per task (default if unspecified = 1)

srun launches the interactive session inside a single task, if you set n=8 and c is default (1) your application will only see 1 CPU. For further information on Slurm options please see the Slurm documentation.

Example sbatch file for a multithreaded job not (MPI):

#!/bin/bash --login
##
#SBATCH -p gpuH_short     # H200 141GB GPUs
#SBATCH -G 1              # Number of GPUs
#SBATCH -n 1              # number of tasks
#SBATCH -c 8              # number of CPUs per task
#SBATCH -t 1-0            # Wallclock timelimit, 1 day (0 hours) in this example
##
#SBATCH -A account-name   # mandatory account name for H200
module purge
module load libs/cuda/12.8.1

## Example - print stats of assigned GPU(s)
nvidia-smi -q

Example sbatch file for a parallel job with MPI:

#!/bin/bash --login
##
#SBATCH -p gpuH_short     # H200 141GB GPUs
#SBATCH -G 1              # Number of GPUs
#SBATCH -n 8              # number of tasks (MPI processes)
#SBATCH -c 1              # number of CPUs per task (cores per MPI process)
#SBATCH -t 1-0            # Wallclock timelimit, 1 day (0 hours) in this example
##
#SBATCH -A account-name   # mandatory account name for H200
module purge
module load libs/cuda/12.8.1

## Example - print stats of assigned GPU(s)
nvidia-smi -q

Submit with: sbatch filename

Last modified on January 7, 2026 at 11:00 am by George Leaver