PyTorch

Overview

PyTorch is an open source Python package that provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration and Deep neural networks built on a tape-based autograd system.

Versions available are detailed below.

Note that the GPU version of PyTorch usually requires a specific version of the Nvidia CUDA libraries. Whilst installing PyTorch inside a conda environment (or with pip) will also install the required CUDA version, newer version of CUDA may require us to install a new CUDA driver on the GPU nodes. This requires the GPU nodes to be removed from service temporarily. So we try not to do this too often and when there is a lot of demand for GPUs. Hence we may not be able to install new versions of PyTorch until we can schedule the installation of new CUDA drivers.

Restrictions on use

There are no access restrictions on the CSF. All usage must adhere to the PyTorch License.

Set up procedure

To access the software you must first load one of the following modulefiles:

GPUs

# PyTorch 2.3.0 using Python 3.11 for GPUs: (uses CUDA 12.1, Anaconda3, 2023.09)
# Works on v100 AND A100 GPUs
module load apps/binapps/pytorch/2.3.0-311-gpu-cu121

# PyTorch 1.11.0 using Python 3.9 for GPUS: (uses CUDA 11.3, Anaconda3, 2021.11)
# Works on v100 AND A100 GPUs
module load apps/binapps/pytorch/1.11.0-39-gpu-cu113

# Python 3.9 for GPUS: (uses CUDA 11.2.0, Anaconda3 2021.11)
# Works on v100 GPUs but NOT A100 GPUs
module load apps/binapps/pytorch/1.11.0-39-gpu
# Python 3.9 for GPUS: (uses CUDA 11.2.0, Anaconda3 2021.11)
module load apps/binapps/pytorch/1.8.1-39-gpu
     ## 01.04.2022: The install we had of 1.8.2-39-gpu had an issue and has been removed. 
     ## Apologies for any inconvenience caused.
# Python 3.7 for GPUS: (uses CUDA 10.1.168, Anaconda3 2019.07)
module load apps/binapps/pytorch/1.3.1-37-gpu
# Python 3.6 for GPUs: (uses CUDA 9.2.148, Anaconda3 5.2.0)
module load apps/binapps/pytorch/1.0.1-36-gpu
module load apps/binapps/pytorch/0.4.1-36-gpu

CPUs

# PyTorch 2.3.0 using Python 3.11 for CPUs: (uses Anaconda3, 2023.09)
module load apps/binapps/pytorch/2.3.0-311-cpu

# Python 3.9 for CPUS: (Anaconda3 2021.11)
module load apps/binapps/pytorch/1.11.0-39-cpu
# Python 3.9 for CPUs: Anaconda3 2021.11) 
module load apps/binapps/pytorch/1.8.1-39-cpu
    ## 01.04.2022: The install we had of 1.8.2-39-cpu had an issue and has been removed. 
    ## Apologies for any inconvenience caused.
# Python 3.7 for CPUs: Anaconda3 2019.07)
module load apps/binapps/pytorch/1.3.1-37-cpu
# Python 3.6 for CPUs: (uses Anaconda3 5.2.0)
module load apps/binapps/pytorch/1.0.1-36-cpu
module load apps/binapps/pytorch/0.4.1-36-cpu

The above modulefiles will load any necessary dependency modulefiles for you. Note that you cannot run the GPU version of PyTorch on a CPU-only node (it must be run on a GPU node).

Check GPU Detection

To check whether PyTorch can see the GPU:

# From the CSF login node:
module load apps/binapps/pytorch/1.11.0-39-gpu-cu113
qrsh -l v100 -V 'python -c "import torch; print(torch.cuda.is_available())"'
True

qrsh -l a100 -V 'python -c "import torch; print(torch.cuda.is_available())"'
True
  #
  # If you see 'False' here, check you've loaded an A100-compatible version of PyTorch. See above.

Running the application on a GPU node

Please do not run PyTorch on the login node. Jobs should be run interactively on the backend nodes (via qrsh) or submitted to the compute nodes via batch.

Example PyTorch GPU python script

Example PyTorch python scripts are available from https://pytorch.org/tutorials/beginner/pytorch_with_examples.html. We reproduce one of them here. Please see the link for full details of what the code is actually doing!

Create the following PyTorch example script for use on a GPU node (e.g., my-gpu-script.py):

import torch

dtype = torch.float

# Run on the GPU
device = torch.device("cuda:0")

# Uncomment this to run on CPU
#device = torch.device("cpu") 

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Create random Tensors for weights.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y using operations on Tensors
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # Compute and print loss using operations on Tensors.
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.item())

    # Use autograd to compute the backward pass.
    loss.backward()

    # Manually update weights using gradient descent.
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        # Manually zero the gradients after updating weights
        w1.grad.zero_()
        w2.grad.zero_()

You can now run the above script interactively on a GPU node or in batch.

Interactive use on a GPU node

Once you have been granted access to the Nvidia v100 nodes, start an interactive session as follows:

qrsh -l nvidia_v100=1 bash

# Wait until you are logged in to a backed compute node, then:
module load apps/binapps/pytorch/1.0.1-36-gpu

# Run the above script
python my-gpu-script.py

# Alternatively enter the above script in a python shell:
python
   # Enter each line of the script above - it will execute immediately
   import pytorch
   ...
   # When finished, exit python
   Ctrl-D

# When finished with your interactive session, return to the login node
exit

Batch usage on a GPU node

Once you have been granted access to the Nvidia v100 nodes, create a jobscript as follows:

#!/bin/bash --login
#$ -cwd                   # Run job from directory where submitted

# If running on a GPU, add:
#$ -l v100=1

#$ -pe smp.pe 8          # Number of cores on a single compute node. GPU jobs can
                         # use up to 8 cores per GPU.

# We now recommend loading the modulefile in the jobscript
module load apps/binapps/pytorch/1.0.1-36-gpu

# $NSLOTS is automatically set to the number of cores requested on the pe line.
# Inform some of the python libraries how many cores we can use.
export OMP_NUM_THREADS=$NSLOTS

python my-gpu-script.py

Submit the jobscript using

qsub jobscript

where jobscript is the name of your jobscript file (not your python script file!)

Running the application on a CPU node

Please do not run PyTorch on the login node. Jobs should be run interactively on the backend nodes (via qrsh) or submitted to the compute nodes via batch.

Example PyTorch CPU python script

Modify the above example script for use on a CPU node (e.g., my-cpu-script.py).

Interactive use on a Backend CPU-only Node

To request an interactive session on a backend compute node run:

qrsh -l short

# Wait until you are logged in to a backend compute node, then:
module load apps/binapps/pytorch/1.0.1-36-cpu

# Run the above python script, eg:
python my-cpu-script.py

# Alternatively enter the above script in a python shell:
python
   # Enter each line of the script above - it will execute immediately
   import pytorch
   ...
   # When finished, exit python
   Ctrl-D

# When finished with your interactive session, return to the login node
exit

Batch usage on a CPU node

Create a jobscript as follows:

#!/bin/bash --login
#$ -cwd                   # Run job from directory where submitted
#$ -pe smp.pe 16          # Number of cores on a single compute node. Can be 2-32 for CPU jobs.
                          # Remove the -pe line completely to run a serial (1-core) job.

# We now recommend loading the modulefile in the jobscript
module load apps/binapps/pytorch/1.0.1-36-cpu

# $NSLOTS is automatically set to the number of cores requested on the pe line.
# Inform some of the python libraries how many cores we can use.
export OMP_NUM_THREADS=$NSLOTS

python my-cpu-script.py

Submit the jobscript using

qsub jobscript

where jobscript is the name of your jobscript file (not your python script file!)

Further info

Updates

None.

Last modified on August 19, 2024 at 5:14 pm by George Leaver