PyTorch
Overview
PyTorch is an open source Python package that provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration and Deep neural networks built on a tape-based autograd system.
Versions available are detailed below.
Note that the GPU version of PyTorch usually requires a specific version of the Nvidia CUDA libraries. Whilst installing PyTorch inside a conda environment (or with pip) will also install the required CUDA version, newer version of CUDA may require us to install a new CUDA driver on the GPU nodes. This requires the GPU nodes to be removed from service temporarily. So we try not to do this too often and when there is a lot of demand for GPUs. Hence we may not be able to install new versions of PyTorch until we can schedule the installation of new CUDA drivers.
Restrictions on use
There are no access restrictions on the CSF. All usage must adhere to the PyTorch License.
Set up procedure
To access the software you must first load one of the following modulefiles:
GPUs
# PyTorch 2.3.0 using Python 3.11 for GPUs: (uses CUDA 12.1, Anaconda3, 2023.09) # Works on v100 AND A100 GPUs module load apps/binapps/pytorch/2.3.0-311-gpu-cu121 # PyTorch 1.11.0 using Python 3.9 for GPUS: (uses CUDA 11.3, Anaconda3, 2021.11) # Works on v100 AND A100 GPUs module load apps/binapps/pytorch/1.11.0-39-gpu-cu113 # Python 3.9 for GPUS: (uses CUDA 11.2.0, Anaconda3 2021.11) # Works on v100 GPUs but NOT A100 GPUs module load apps/binapps/pytorch/1.11.0-39-gpu # Python 3.9 for GPUS: (uses CUDA 11.2.0, Anaconda3 2021.11) module load apps/binapps/pytorch/1.8.1-39-gpu ## 01.04.2022: The install we had of 1.8.2-39-gpu had an issue and has been removed. ## Apologies for any inconvenience caused. # Python 3.7 for GPUS: (uses CUDA 10.1.168, Anaconda3 2019.07) module load apps/binapps/pytorch/1.3.1-37-gpu # Python 3.6 for GPUs: (uses CUDA 9.2.148, Anaconda3 5.2.0) module load apps/binapps/pytorch/1.0.1-36-gpu module load apps/binapps/pytorch/0.4.1-36-gpu
CPUs
# PyTorch 2.3.0 using Python 3.11 for CPUs: (uses Anaconda3, 2023.09) module load apps/binapps/pytorch/2.3.0-311-cpu # Python 3.9 for CPUS: (Anaconda3 2021.11) module load apps/binapps/pytorch/1.11.0-39-cpu # Python 3.9 for CPUs: Anaconda3 2021.11) module load apps/binapps/pytorch/1.8.1-39-cpu ## 01.04.2022: The install we had of 1.8.2-39-cpu had an issue and has been removed. ## Apologies for any inconvenience caused. # Python 3.7 for CPUs: Anaconda3 2019.07) module load apps/binapps/pytorch/1.3.1-37-cpu # Python 3.6 for CPUs: (uses Anaconda3 5.2.0) module load apps/binapps/pytorch/1.0.1-36-cpu module load apps/binapps/pytorch/0.4.1-36-cpu
The above modulefiles will load any necessary dependency modulefiles for you. Note that you cannot run the GPU version of PyTorch on a CPU-only node (it must be run on a GPU node).
Check GPU Detection
To check whether PyTorch can see the GPU:
# From the CSF login node: module load apps/binapps/pytorch/1.11.0-39-gpu-cu113 qrsh -l v100 -V 'python -c "import torch; print(torch.cuda.is_available())"' True qrsh -l a100 -V 'python -c "import torch; print(torch.cuda.is_available())"' True # # If you see 'False' here, check you've loaded an A100-compatible version of PyTorch. See above.
Running the application on a GPU node
Please do not run PyTorch on the login node. Jobs should be run interactively on the backend nodes (via qrsh
) or submitted to the compute nodes via batch.
Example PyTorch GPU python script
Example PyTorch python scripts are available from https://pytorch.org/tutorials/beginner/pytorch_with_examples.html. We reproduce one of them here. Please see the link for full details of what the code is actually doing!
Create the following PyTorch example script for use on a GPU node (e.g., my-gpu-script.py
):
import torch dtype = torch.float # Run on the GPU device = torch.device("cuda:0") # Uncomment this to run on CPU #device = torch.device("cpu") # N is batch size; D_in is input dimension; # H is hidden dimension; D_out is output dimension. N, D_in, H, D_out = 64, 1000, 100, 10 # Create random Tensors to hold input and outputs. x = torch.randn(N, D_in, device=device, dtype=dtype) y = torch.randn(N, D_out, device=device, dtype=dtype) # Create random Tensors for weights. w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True) w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True) learning_rate = 1e-6 for t in range(500): # Forward pass: compute predicted y using operations on Tensors y_pred = x.mm(w1).clamp(min=0).mm(w2) # Compute and print loss using operations on Tensors. loss = (y_pred - y).pow(2).sum() print(t, loss.item()) # Use autograd to compute the backward pass. loss.backward() # Manually update weights using gradient descent. with torch.no_grad(): w1 -= learning_rate * w1.grad w2 -= learning_rate * w2.grad # Manually zero the gradients after updating weights w1.grad.zero_() w2.grad.zero_()
You can now run the above script interactively on a GPU node or in batch.
Interactive use on a GPU node
Once you have been granted access to the Nvidia v100 nodes, start an interactive session as follows:
qrsh -l nvidia_v100=1 bash # Wait until you are logged in to a backed compute node, then: module load apps/binapps/pytorch/1.0.1-36-gpu # Run the above script python my-gpu-script.py # Alternatively enter the above script in a python shell: python # Enter each line of the script above - it will execute immediately import pytorch ... # When finished, exit python Ctrl-D # When finished with your interactive session, return to the login node exit
Batch usage on a GPU node
Once you have been granted access to the Nvidia v100 nodes, create a jobscript as follows:
#!/bin/bash --login #$ -cwd # Run job from directory where submitted # If running on a GPU, add: #$ -l v100=1 #$ -pe smp.pe 8 # Number of cores on a single compute node. GPU jobs can # use up to 8 cores per GPU. # We now recommend loading the modulefile in the jobscript module load apps/binapps/pytorch/1.0.1-36-gpu # $NSLOTS is automatically set to the number of cores requested on the pe line. # Inform some of the python libraries how many cores we can use. export OMP_NUM_THREADS=$NSLOTS python my-gpu-script.py
Submit the jobscript using
qsub jobscript
where jobscript
is the name of your jobscript file (not your python script file!)
Running the application on a CPU node
Please do not run PyTorch on the login node. Jobs should be run interactively on the backend nodes (via qrsh
) or submitted to the compute nodes via batch.
Example PyTorch CPU python script
Modify the above example script for use on a CPU node (e.g., my-cpu-script.py
).
Interactive use on a Backend CPU-only Node
To request an interactive session on a backend compute node run:
qrsh -l short # Wait until you are logged in to a backend compute node, then: module load apps/binapps/pytorch/1.0.1-36-cpu # Run the above python script, eg: python my-cpu-script.py # Alternatively enter the above script in a python shell: python # Enter each line of the script above - it will execute immediately import pytorch ... # When finished, exit python Ctrl-D # When finished with your interactive session, return to the login node exit
Batch usage on a CPU node
Create a jobscript as follows:
#!/bin/bash --login #$ -cwd # Run job from directory where submitted #$ -pe smp.pe 16 # Number of cores on a single compute node. Can be 2-32 for CPU jobs. # Remove the -pe line completely to run a serial (1-core) job. # We now recommend loading the modulefile in the jobscript module load apps/binapps/pytorch/1.0.1-36-cpu # $NSLOTS is automatically set to the number of cores requested on the pe line. # Inform some of the python libraries how many cores we can use. export OMP_NUM_THREADS=$NSLOTS python my-cpu-script.py
Submit the jobscript using
qsub jobscript
where jobscript
is the name of your jobscript file (not your python script file!)
Further info
Updates
None.