{"id":481,"date":"2018-09-29T14:45:19","date_gmt":"2018-09-29T13:45:19","guid":{"rendered":"http:\/\/ri.itservices.manchester.ac.uk\/csf3\/?page_id=481"},"modified":"2024-08-19T17:14:54","modified_gmt":"2024-08-19T16:14:54","slug":"pytorch","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/software\/applications\/pytorch\/","title":{"rendered":"PyTorch"},"content":{"rendered":"<h2>Overview<\/h2>\n<p><a href=\"https:\/\/pytorch.org\/\">PyTorch<\/a> is an open source Python package that provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration and Deep neural networks built on a tape-based autograd system.<\/p>\n<p>Versions available are detailed below.<\/p>\n<p>Note that the GPU version of PyTorch usually requires a specific version of the Nvidia CUDA libraries. Whilst installing PyTorch inside a conda environment (or with pip) will also install the required CUDA version, newer version of CUDA may require us to install a new CUDA driver on the GPU nodes. This requires the GPU nodes to be removed from service temporarily. So we try not to do this too often and when there is a lot of demand for GPUs. Hence we may not be able to install new versions of PyTorch until we can schedule the installation of new CUDA drivers.<\/p>\n<h2>Restrictions on use<\/h2>\n<p>There are no access restrictions on the CSF. All usage must adhere to the <a href=\"https:\/\/github.com\/pytorch\/pytorch\/blob\/main\/LICENSE\">PyTorch License<\/a>.<\/p>\n<h2>Set up procedure<\/h2>\n<p>To access the software you must first load <em>one<\/em> of the following modulefiles:<\/p>\n<h3>GPUs<\/h3>\n<pre>\r\n# PyTorch 2.3.0 using Python 3.11 for GPUs: (uses CUDA 12.1, Anaconda3, 2023.09)\r\n# Works on v100 <strong>AND A100 GPUs<\/strong>\r\nmodule load apps\/binapps\/pytorch\/2.3.0-311-gpu-cu121\r\n\r\n# PyTorch 1.11.0 using Python 3.9 for GPUS: (uses CUDA 11.3, Anaconda3, 2021.11)\r\n# Works on v100 <strong>AND A100 GPUs<\/strong>\r\nmodule load apps\/binapps\/pytorch\/1.11.0-39-gpu-cu113\r\n\r\n# Python 3.9 for GPUS: (uses CUDA 11.2.0, Anaconda3 2021.11)\r\n# Works on v100 GPUs but <strong>NOT A100 GPUs<\/strong>\r\nmodule load apps\/binapps\/pytorch\/1.11.0-39-gpu\r\n# Python 3.9 for GPUS: (uses CUDA 11.2.0, Anaconda3 2021.11)\r\nmodule load apps\/binapps\/pytorch\/1.8.1-39-gpu\r\n     ## 01.04.2022: The install we had of 1.8.2-39-gpu had an issue and has been removed. \r\n     ## Apologies for any inconvenience caused.\r\n# Python 3.7 for GPUS: (uses CUDA 10.1.168, Anaconda3 2019.07)\r\nmodule load apps\/binapps\/pytorch\/1.3.1-37-gpu\r\n# Python 3.6 for GPUs: (uses CUDA 9.2.148, Anaconda3 5.2.0)\r\nmodule load apps\/binapps\/pytorch\/1.0.1-36-gpu\r\nmodule load apps\/binapps\/pytorch\/0.4.1-36-gpu\r\n<\/pre>\n<h3>CPUs<\/h3>\n<pre>\r\n# PyTorch 2.3.0 using Python 3.11 for CPUs: (uses Anaconda3, 2023.09)\r\nmodule load apps\/binapps\/pytorch\/2.3.0-311-cpu\r\n\r\n# Python 3.9 for CPUS: (Anaconda3 2021.11)\r\nmodule load apps\/binapps\/pytorch\/1.11.0-39-cpu\r\n# Python 3.9 for CPUs: Anaconda3 2021.11) \r\nmodule load apps\/binapps\/pytorch\/1.8.1-39-cpu\r\n    ## 01.04.2022: The install we had of 1.8.2-39-cpu had an issue and has been removed. \r\n    ## Apologies for any inconvenience caused.\r\n# Python 3.7 for CPUs: Anaconda3 2019.07)\r\nmodule load apps\/binapps\/pytorch\/1.3.1-37-cpu\r\n# Python 3.6 for CPUs: (uses Anaconda3 5.2.0)\r\nmodule load apps\/binapps\/pytorch\/1.0.1-36-cpu\r\nmodule load apps\/binapps\/pytorch\/0.4.1-36-cpu\r\n<\/pre>\n<p>The above modulefiles will load any necessary dependency modulefiles for you. Note that you cannot run the GPU version of PyTorch on a CPU-only node (it must be run on a GPU node).<\/p>\n<h3>Check GPU Detection<\/h3>\n<p>To check whether PyTorch can see the GPU:<\/p>\n<pre>\r\n# From the CSF login node:\r\nmodule load apps\/binapps\/pytorch\/1.11.0-39-gpu-cu113\r\nqrsh -l v100 -V 'python -c \"import torch; print(torch.cuda.is_available())\"'\r\nTrue\r\n\r\nqrsh -l a100 -V 'python -c \"import torch; print(torch.cuda.is_available())\"'\r\nTrue\r\n  #\r\n  # If you see 'False' here, check you've loaded an A100-compatible version of PyTorch. See above.\r\n<\/pre>\n<h2>Running the application on a GPU node<\/h2>\n<p>Please do not run PyTorch on the login node. Jobs should be run interactively on the backend nodes (via <code>qrsh<\/code>) or submitted to the compute nodes via batch.<\/p>\n<h3>Example PyTorch GPU python script<\/h3>\n<p>Example PyTorch python scripts are available from <a href=\"https:\/\/pytorch.org\/tutorials\/beginner\/pytorch_with_examples.html\">https:\/\/pytorch.org\/tutorials\/beginner\/pytorch_with_examples.html<\/a>. We reproduce one of them here. Please see the link for full details of what the code is actually doing!<\/p>\n<p>Create the following PyTorch example script for use on a GPU node (e.g., <code>my-gpu-script.py<\/code>):<\/p>\n<pre>import torch\r\n\r\ndtype = torch.float\r\n\r\n<strong># Run on the GPU<\/strong>\r\ndevice = torch.device(\"cuda:0\")\r\n\r\n<strong># Uncomment this to run on CPU<\/strong>\r\n#device = torch.device(\"cpu\") \r\n\r\n# N is batch size; D_in is input dimension;\r\n# H is hidden dimension; D_out is output dimension.\r\nN, D_in, H, D_out = 64, 1000, 100, 10\r\n\r\n# Create random Tensors to hold input and outputs.\r\nx = torch.randn(N, D_in, device=device, dtype=dtype)\r\ny = torch.randn(N, D_out, device=device, dtype=dtype)\r\n\r\n# Create random Tensors for weights.\r\nw1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)\r\nw2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)\r\n\r\nlearning_rate = 1e-6\r\nfor t in range(500):\r\n    # Forward pass: compute predicted y using operations on Tensors\r\n    y_pred = x.mm(w1).clamp(min=0).mm(w2)\r\n\r\n    # Compute and print loss using operations on Tensors.\r\n    loss = (y_pred - y).pow(2).sum()\r\n    print(t, loss.item())\r\n\r\n    # Use autograd to compute the backward pass.\r\n    loss.backward()\r\n\r\n    # Manually update weights using gradient descent.\r\n    with torch.no_grad():\r\n        w1 -= learning_rate * w1.grad\r\n        w2 -= learning_rate * w2.grad\r\n\r\n        # Manually zero the gradients after updating weights\r\n        w1.grad.zero_()\r\n        w2.grad.zero_()\r\n<\/pre>\n<p>You can now run the above script interactively on a GPU node or in batch.<\/p>\n<h3>Interactive use on a GPU node<\/h3>\n<p>Once you have been granted access to the <a href=\"\/csf3\/batch\/gpu-jobs\/\">Nvidia v100 nodes<\/a>, start an interactive session as follows:<\/p>\n<pre>qrsh -l nvidia_v100=1 bash\r\n\r\n# Wait until you are logged in to a backed compute node, then:\r\nmodule load apps\/binapps\/pytorch\/1.0.1-36-gpu\r\n\r\n# Run the above script\r\npython my-gpu-script.py\r\n\r\n# Alternatively enter the above script in a python shell:\r\npython\r\n   # Enter each line of the script above - it will execute immediately\r\n   import pytorch\r\n   ...\r\n   # When finished, exit python\r\n   Ctrl-D\r\n\r\n# When finished with your interactive session, return to the login node\r\nexit\r\n<\/pre>\n<h3>Batch usage on a GPU node<\/h3>\n<p>Once you have been granted access to the <a href=\"\/csf3\/batch\/gpu-jobs\/\">Nvidia v100 nodes<\/a>, create a jobscript as follows:<\/p>\n<pre>#!\/bin\/bash --login\r\n#$ -cwd                   # Run job from directory where submitted\r\n\r\n# If running on a GPU, add:\r\n#$ -l v100=1\r\n\r\n#$ -pe smp.pe 8          # Number of cores on a single compute node. GPU jobs can\r\n                         # use up to 8 cores <em>per<\/em> GPU.\r\n\r\n# We now recommend loading the modulefile in the jobscript\r\nmodule load apps\/binapps\/pytorch\/1.0.1-36-gpu\r\n\r\n# $NSLOTS is automatically set to the number of cores requested on the pe line.\r\n# Inform some of the python libraries how many cores we can use.\r\nexport OMP_NUM_THREADS=$NSLOTS\r\n\r\npython my-gpu-script.py\r\n<\/pre>\n<p>Submit the jobscript using<\/p>\n<pre>qsub <em>jobscript<\/em>\r\n<\/pre>\n<p>where <code><em>jobscript<\/em><\/code> is the name of your jobscript file (not your python script file!)<\/p>\n<h2>Running the application on a CPU node<\/h2>\n<p>Please do not run PyTorch on the login node. Jobs should be run interactively on the backend nodes (via <code>qrsh<\/code>) or submitted to the compute nodes via batch.<\/p>\n<h3>Example PyTorch CPU python script<\/h3>\n<p>Modify the above example script for use on a CPU node (e.g., <code>my-cpu-script.py<\/code>).<\/p>\n<h3>Interactive use on a Backend CPU-only Node<\/h3>\n<p>To request an interactive session on a backend compute node run:<\/p>\n<pre>qrsh -l short\r\n\r\n# Wait until you are logged in to a backend compute node, then:\r\nmodule load apps\/binapps\/pytorch\/1.0.1-36-cpu\r\n\r\n# Run the above python script, eg:\r\npython my-cpu-script.py\r\n\r\n# Alternatively enter the above script in a python shell:\r\npython\r\n   # Enter each line of the script above - it will execute immediately\r\n   import pytorch\r\n   ...\r\n   # When finished, exit python\r\n   Ctrl-D\r\n\r\n# When finished with your interactive session, return to the login node\r\nexit\r\n<\/pre>\n<h3>Batch usage on a CPU node<\/h3>\n<p>Create a jobscript as follows:<\/p>\n<pre>#!\/bin\/bash --login\r\n#$ -cwd                   # Run job from directory where submitted\r\n#$ -pe smp.pe 16          # Number of cores on a single compute node. Can be 2-32 for CPU jobs.\r\n                          # Remove the -pe line completely to run a serial (1-core) job.\r\n\r\n# We now recommend loading the modulefile in the jobscript\r\nmodule load apps\/binapps\/pytorch\/1.0.1-36-cpu\r\n\r\n# $NSLOTS is automatically set to the number of cores requested on the pe line.\r\n# Inform some of the python libraries how many cores we can use.\r\nexport OMP_NUM_THREADS=$NSLOTS\r\n\r\npython my-cpu-script.py\r\n<\/pre>\n<p>Submit the jobscript using<\/p>\n<pre>qsub <em>jobscript<\/em>\r\n<\/pre>\n<p>where <code><em>jobscript<\/em><\/code> is the name of your jobscript file (not your python script file!)<\/p>\n<h2>Further info<\/h2>\n<ul>\n<li><a href=\"https:\/\/pytorch.org\/\">PyTorch website<\/a><\/li>\n<\/ul>\n<h2>Updates<\/h2>\n<p>None.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview PyTorch is an open source Python package that provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration and Deep neural networks built on a tape-based autograd system. Versions available are detailed below. Note that the GPU version of PyTorch usually requires a specific version of the Nvidia CUDA libraries. Whilst installing PyTorch inside a conda environment (or with pip) will also install the required CUDA version, newer version of CUDA may.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/software\/applications\/pytorch\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"parent":86,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-481","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/481","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/comments?post=481"}],"version-history":[{"count":20,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/481\/revisions"}],"predecessor-version":[{"id":7911,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/481\/revisions\/7911"}],"up":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/86"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/media?parent=481"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}