{"id":457,"date":"2018-09-28T11:59:23","date_gmt":"2018-09-28T10:59:23","guid":{"rendered":"http:\/\/ri.itservices.manchester.ac.uk\/csf3\/?page_id=457"},"modified":"2025-06-19T14:40:24","modified_gmt":"2025-06-19T13:40:24","slug":"tensorflow","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/software\/applications\/tensorflow\/","title":{"rendered":"Tensorflow"},"content":{"rendered":"<h2>Overview<\/h2>\n<p><a href=\"https:\/\/www.tensorflow.org\/\">TensorFlow<\/a> is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs.<\/p>\n<p>See the modulefiles below for available versions.<\/p>\n<p>As of Tensorflow version 2.4, <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/software\/applications\/keras\/\">Keras<\/a> is now packaged within the Tensorflow install as tensorflow.keras.<\/p>\n<p>The recomended method to install tensorflow is in your own <a href=\"..\/anaconda-python\">conda environments<\/a>. It is currently the only way to obtain a version newer than we&#8217;ve installed centrally. See <a href=\"#conda\">below<\/a> for a complete example of installing TF 2.19 (latest as of June 2025) .<\/p>\n<h2>Restrictions on use<\/h2>\n<p>There are no access restrictions on the CSF.<\/p>\n<h2>Set up procedure<\/h2>\n<p>We now recommend installing a newer version of tensorflow, with newer CUDA libraries, all in a <em>conda environment<\/em>, which you can do in your home directory. Please follow the <a href=\"#conda\">step-by-step instructions below<\/a> for a complete example.<\/p>\n<p>To access the older centrally installed versions, software you must first load <em>one<\/em> of the following modulefiles:<\/p>\n<pre># <strong>This is now a fairly old version of Tensorflow. See end of page<\/strong>\r\n# <strong>for how to install your own newer version in a conda environment.<\/strong>\r\n\r\n# TF 2.8.0, Python 3.9 for GPUs: (uses CUDA 11.2.0, cuDNN 8.1.0, Anaconda3 2021.11)\r\nmodule load apps\/binapps\/tensorflow\/2.8.0-39-gpu\r\n\r\n# TF 2.7.0, Python 3.7 for GPUs: (uses CUDA 11.0.3, cuDNN 8.0.4, Anaconda3 2019.07)\r\nmodule load apps\/binapps\/tensorflow\/2.7.0-37-gpu\r\n\r\n# TF 2.4.0, Python 3.7 for GPUs: (uses CUDA 11.0.3, cuDNN 8.0.4, NCCL 2.5.6, TensorRT 6.0.1, Anaconda3 2019.07) \r\nmodule load apps\/binapps\/tensorflow\/2.4.0-37-gpu\r\n\r\n# TF 2.3.1, Python 3.7 for GPUs: (uses CUDA 10.1.243, cuDNN 7.6.5, NCCL 2.5.6, TensorRT 6.0.1, Anaconda3 2019.07)\r\nmodule load apps\/binapps\/tensorflow\/2.3.1-37-gpu\r\n\r\n# TF 2.2.0, Python 3.7 for GPUs: (uses CUDA 10.1.243, cuDNN 7.6.5, NCCL 2.5.6, TensorRT 6.0.1, Anaconda3 2019.07)\r\nmodule load apps\/binapps\/tensorflow\/2.2.0-37-gpu\r\n\r\n# TF 2.1.0, Python 3.7 for GPUs: (uses CUDA 10.1.243, cuDNN 7.6.5, NCCL 2.5.6, TensorRT 6.0.1, Anaconda3 2019.07)\r\nmodule load apps\/binapps\/tensorflow\/2.1.0-37-gpu\r\n\r\n# TF 2.0.0, Python 3.7 for GPUs: (uses CUDA 10.0.130, cuDNN 7.6.2, NCCL 2.4.7, Anaconda3 2019.07)\r\nmodule load apps\/binapps\/tensorflow\/2.0.0-37-gpu\r\n\r\n# TF 1.14.0, Python 3.6 for GPUs: (uses CUDA 10.0.130, cuDNN 7.6.2, NCCL 2.2.13, Anaconda3 5.2.0)\r\nmodule load apps\/binapps\/tensorflow\/1.14.0-36-gpu\r\n\r\n# Python 3.6 for GPUs: (uses CUDA 9.0.176, cuDNN 7.3.0, NCCL 2.2.13, Anaconda3 5.2.0)\r\nmodule load apps\/binapps\/tensorflow\/1.11.0-36-gpu\r\nmodule load apps\/binapps\/tensorflow\/1.10.1-36-gpu\r\n<\/pre>\n<p>There are also CPU-only versions available:<\/p>\n<pre># Python 3.9 for <strong>CPUs only<\/strong>: (uses Anaconda3 2021.11)\r\nmodule load apps\/binapps\/tensorflow\/2.8.0-39-cpu\r\n# Python 3.7 for <strong>CPUs only<\/strong>: (uses Anaconda3 2019.07)\r\nmodule load apps\/binapps\/tensorflow\/2.7.0-37-cpu\r\nmodule load apps\/binapps\/tensorflow\/2.4.0-37-cpu\r\nmodule load apps\/binapps\/tensorflow\/2.3.1-37-cpu\r\nmodule load apps\/binapps\/tensorflow\/2.2.0-37-cpu \r\nmodule load apps\/binapps\/tensorflow\/2.1.0-37-cpu \r\nmodule load apps\/binapps\/tensorflow\/2.0.0-37-cpu \r\n\r\n# Python 3.6 for <strong>CPUs only<\/strong>: (uses Anaconda3 5.2.0) \r\nmodule load apps\/binapps\/tensorflow\/1.14.0-36-cpu \r\nmodule load apps\/binapps\/tensorflow\/1.11.0-36-cpu \r\nmodule load apps\/binapps\/tensorflow\/1.10.1-36-cpu\r\n<\/pre>\n<p>The above modulefiles will load any necessary dependency modulefiles for you. Note that you cannot run the GPU version of tensorflow on a CPU-only node (it must be run on a GPU node).<\/p>\n<p><a href=\"https:\/\/www.tensorflow.org\/install\/source#tested_build_configurations\">Here is a reference<\/a> of all tensorflow versions and the relevant supported Python, CUDA and cuDNN versions.<\/p>\n<h2>Running the application on a GPU node<\/h2>\n<p>Please do not run Tensorflow on the login node. Jobs should be run interactively on the backend nodes (via <code>srun<\/code>) or submitted to the compute nodes via batch.<\/p>\n<h3>Example Tensorflow v2.8 GPU python script<\/h3>\n<p>Note that the following example will not work with Tensorflow 1.x due to significant changes in the Tensorflow API.<\/p>\n<p>See <a href=\"https:\/\/www.tensorflow.org\/tutorials\/quickstart\/beginner\">https:\/\/www.tensorflow.org\/tutorials\/quickstart\/beginner<\/a> for more information on this code.<\/p>\n<p><a name=\"tfkexample\"><\/a><br \/>\nCreate the following tensorflow example script for use on a GPU node (e.g., <code>my-gpu-script.py<\/code>):<\/p>\n<pre># Tensorflow example on a GPU\r\nimport tensorflow as tf\r\nprint(\"TensorFlow version:\", tf.__version__)\r\nprint(\"List of GPUs:\", tf.config.list_physical_devices('GPU'))\r\n\r\nmnist = tf.keras.datasets.mnist\r\n(x_train, y_train),(x_test, y_test) = mnist.load_data()\r\nx_train, x_test = x_train \/ 255.0, x_test \/ 255.0\r\n\r\nmodel = tf.keras.models.Sequential([\r\n  tf.keras.layers.Flatten(input_shape=(28, 28)),\r\n  tf.keras.layers.Dense(128, activation='relu'),\r\n  tf.keras.layers.Dropout(0.2),\r\n  tf.keras.layers.Dense(10, activation='softmax')\r\n])\r\n\r\nmodel.compile(optimizer='adam',\r\n              loss='sparse_categorical_crossentropy',\r\n              metrics=['accuracy'])\r\n\r\nmodel.fit(x_train, y_train, epochs=5, verbose=0)\r\n# This should report a ~98% accuracy\r\nmodel.evaluate(x_test,  y_test, verbose=2)\r\n<\/pre>\n<p>You can now run the above script interactively on a GPU node or in batch.<\/p>\n<h3>Interactive use on a GPU node<\/h3>\n<p>Start an <a href=\"\/csf3\/batch-slurm\/gpu-jobs-slurm\/#Interactive_Jobs\">interactive session on the GPU<\/a> nodes as follows:<\/p>\n<pre class='slurm'>\r\nsrun -p gpuV -G 1 -t 0-1 --pty bash           # use 1xV100 GPU for up to 1hr\r\n\r\n# Wait until you are logged in to a backed compute node, then:\r\nmodule purge\r\nmodule load apps\/binapps\/tensorflow\/2.8.0-39-gpu\r\n\r\n# Run the above script\r\npython my-gpu-script.py\r\n\r\n# Alternatively enter the above script in a python shell:\r\npython\r\n   # Enter each line of the script above - it will execute immediately\r\n   import tensorflow as tf\r\n   ...\r\n   # When finished, exit python\r\n   Ctrl-D\r\n\r\n# When finished with your interactive session, return to the login node\r\nexit\r\n<\/pre>\n<h3>Batch usage on a GPU node<\/h3>\n<p>Create a jobscript as follows:<\/p>\n<pre class='slurm'>\r\n#!\/bin\/bash --login\r\n#SBATCH -p gpuV         # v100 GPU(s) partition chosen\r\n#SBATCH -G 1            # number of GPUs required. Here only 1\r\n#SBATCH -n 8            # number of CPU cores required\r\n#SBATCH -t 0-1          # Wallclock timelimit (0-1 is 1hr, 4-0 is 4days, the max permitted)\r\n\r\n# We recommend loading the modulefile in the jobscript\r\nmodule purge\r\nmodule load module load apps\/binapps\/tensorflow\/2.8.0-39-gpu\r\n\r\n# $SLURM NTASKS is automatically set to the number of cores requested at the -n line\r\n# and can be read by your python code.\r\npython my-gpu-script.py\r\n<\/pre>\n<p>Submit the jobscript using<\/p>\n<pre class='slurm'>sbatch <em>jobscript<\/em>\r\n<\/pre>\n<p>where <code><em>jobscript<\/em><\/code> is the name of your jobscript file (not your python script file!)<\/p>\n<h2>Running the application on a CPU node<\/h2>\n<p>Please do not run Tensorflow on the login node. Jobs should be run interactively on the backend nodes (via <code>srun<\/code>) or submitted to the compute nodes via batch.<\/p>\n<h3>Example Tensorflow v2.x CPU python script<\/h3>\n<p>Create the following tensorflow example script for use on a CPU node (e.g., <code>my-cpu-script.py<\/code>). Note that we determine the number of CPU cores that can be used and instruct Tensorflow to only use that many threads.<\/p>\n<pre class='slurm'># Tensorflow example on a CPU only (not a GPU)\r\nimport tensorflow as tf\r\nimport os\r\n\r\n# Get number of cores reserved by the batch system (NSLOTS is automatically set, or use 1 if not)\r\nNUMCORES=int(os.getenv(\"SLURM_NTASKS\",1))\r\nprint(\"Using\", NUMCORES, \"core(s)\" )\r\n\r\ntf.config.threading.set_inter_op_parallelism_threads(NUMCORES) \r\ntf.config.threading.set_intra_op_parallelism_threads(NUMCORES)\r\ntf.config.set_soft_device_placement(1)\r\n\r\n# Now create a TF graph\r\na = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')\r\nb = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')\r\ntf.linalg.matmul(a, b)\r\n<\/pre>\n<h3>Interactive use on a Backend CPU-only Node<\/h3>\n<p>To request an interactive session on a backend compute node run:<\/p>\n<pre class='slurm'>\r\nsrun -p interactive -n 4 -t 0-1 --pty bash\r\n\r\n# Wait until you are logged in to a backend compute node, then:\r\nmodule purge\r\nmodule load apps\/binapps\/tensorflow\/2.8.0-39-cpu\r\n\r\n# Run the above python script, eg:\r\npython my-cpu-script.py\r\n\r\n# Alternatively enter the above script in a python shell:\r\npython\r\n   # Enter each line of the script above - it will execute immediately\r\n   import tensorflow as tf\r\n   ...\r\n   # When finished, exit python\r\n   Ctrl-D\r\n\r\n# When finished with your interactive session, return to the login node\r\nexit\r\n<\/pre>\n<h3>Batch usage on a CPU node<\/h3>\n<p>Create a jobscript as follows:<\/p>\n<pre class='slurm'>\r\n#!\/bin\/bash --login\r\n#SBATCH -p multicore    # choose the multicore partition -> Runs on AMD Genoa nodes\r\n#SBATCH -n 16           # Number of cores on a single compute node. Can be 2-168 for CPU jobs.\r\n#SBATCH -t 1-0          # Wallclock timelimit (1-0 is one day, 4-0 is max permitted)\r\n\r\n# We now recommend loading the modulefile in the jobscript\r\nmodule purge\r\nmodule load apps\/binapps\/tensorflow\/2.8.0-39-cpu\r\n\r\n# $SLURM_NTASKS is automatically set to the number of cores requested on the -n line\r\n# and can be read by your python code (see example above).\r\npython my-cpu-script.py\r\n<\/pre>\n<p>Submit the jobscript using<\/p>\n<pre class='slurm'>srun <em>jobscript<\/em>\r\n<\/pre>\n<p>where <code><em>jobscript<\/em><\/code> is the name of your jobscript file (not your python script file!)<br \/>\n<a name=\"conda\"><\/a><\/p>\n<h2>Using Tensorflow in Conda Environments<\/h2>\n<p>Conda Environments are a way of installing all of the python packages you need for a project in a directory in your home directory. You can create other conda environments for other projects. This ensures each project is kept separate and the packages for one project do not break those of another. We recommend reading the <a href=\"..\/anaconda-python\">Anaconda Python CSF page<\/a> for more info on using conda environments.<\/p>\n<table class=\"warning-wide\">\n<tr>\n<td>June 2023: The proxy is <strong>no longer available<\/strong>.<br \/>To download packages from external sites (e.g., when creating a <em>conda env<\/em>), please do so from a batch job or use an <em>interactive session<\/em> on a backend node by running <code>srun -p interactive -t 0-1 --pty bash<\/code> or <code>srun -p gpuV -G 1 -n 1 -t 0-1 --pty bash<\/code>. You DO NOT then need to load the proxy modulefiles. Please see the <a href=\"\/csf3\/batch-slurm\/srun\/\">srun notes<\/a> for more information on interactive use.<\/td>\n<\/tr>\n<\/table>\n<h3>Example 1 &#8211; Installing Tensorflow<\/h3>\n<p>The following notes have been updated in June 2025 for tensorflow 2.19.0, installed in an anaconda 2024.10 environment with python 3.12.<\/p>\n<pre class='slurm'>\r\n# From the login node, start an interactive session\r\nsrun -p gpuV -G 1 -n 1 -t 0-1 --pty bash\r\n\r\n# <strong>Now on the GPU node<\/strong> - quite a few steps to install, but it is then easy to use in your jobscripts\r\n\r\n# Note that you must <strong>NOT<\/strong> have any existing conda environments active.\r\n# If your command prompt looks something like:\r\n(<strong>base<\/strong>) [<em>username<\/em>@node800 [csf3] ~]$\r\n  #\r\n  # Any (<em>name<\/em>) here is the name of the active conda env.\r\n  #\r\n  # then you need to <strong>deactivate<\/strong> the <code>base<\/code> conda env (or whatever name is showing)\r\n  # using the following 'source deactivate' command:\r\n\r\nsource deactivate\r\n\r\n# Now the prompt shows <em>no<\/em> conda env. That's correct. We are going to create a new env for tensorflow.\r\n[<em>username<\/em>@node800 [csf3] ~]$\r\n<\/pre class='slurm'>\r\nContinue with the tensorflow installation &#8211; the following commands were run on a CSF GPU node (e.g., <code>node800<\/code> in our example):\r\n<pre>\r\nmodule purge\r\n# Use python 3.12 as required by tensorflow (2.19.0 at the time of writing)\r\nmodule load apps\/binapps\/anaconda3\/2024.10\r\n\r\n# Create a conda env with some basic packages needed to install other packages.\r\n# We're using anaconda3 v2024.10 which provides python 3.12 (use \"python --version\").\r\n# We use 'tf' as the env name in this example (you can change this if you want).\r\npython --version\r\nconda create -n tf python=<strong>3.12<\/strong>\r\nProceed ([y]\/n)? y                          # &lt;--- Press y [return] to proceed\r\n\r\n# Now activate the env so we can install other packages in to the env\r\nsource activate tf\r\n\r\n# Now install latest tensorflow (2.19.0 at time of writing) and cuda inside the env\r\npip install --isolated --log pip.tf.log tensorflow[and-cuda]\r\npip install --isolated --log pip.rt.log --extra-index-url https:\/\/pypi.nvidia.com tensorrt\r\n\r\n# Now run a quick test:\r\npython -c \"import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))\"\r\n\r\n# Return to the login node\r\nexit\r\n<\/pre>\n<p>Let&#8217;s go back to the login node and do a test of everything from the beginning, without any install steps.<\/p>\n<pre class='slurm'>\r\n# Start a new interactive session\r\nsrun -p gpuV -G 1 -n 1 -t 0-1 --pty bash\r\nmodule purge\r\nmodule load apps\/binapps\/anaconda3\/2024.10\r\nsource activate tf\r\npython -c \"import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))\"\r\nsource deactivate\r\nexit\r\n<\/pre>\n<p>Let&#8217;s also test a batch job. First write a jobscript, e.g., using the command: <code>xnedit tf.sbatch<\/code><\/p>\n<pre class='slurm'>\r\n#!\/bin\/bash --login\r\n#SBATCH -p gpuV         # v100 GPU(s) partition chosen\r\n#SBATCH -G 1            # number of GPUs required. Here only 1\r\n#SBATCH -n 8            # number of CPU cores required\r\n#SBATCH -t 0-1          # Wallclock timelimit (0-1 is 1hr, 4-0 is 4days, the max permitted)\r\n\r\nmodule purge\r\nmodule load apps\/binapps\/anaconda3\/2024.10\r\nsource activate tf\r\npython -c \"import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))\"\r\n<\/pre>\n<p>Now subimt job:<\/p>\n<pre class='slurm'>\r\nsbatch tf.sbatch\r\n<\/pre>\n<p>You should see the following output in your <code>slurm<em>123456<\/em>.out<\/code> output file:<\/p>\n<pre>\r\n[PhysicalDevice(name='\/physical_device:GPU:0', device_type='GPU')]\r\n<\/pre>\n<p>Note that the <code>slurm<em>123456<\/em>.out<\/code> output file will also contain some warnings:<\/p>\n<pre>\r\n2025-06-16 13:08:09.872373: E external\/local_xla\/xla\/stream_executor\/cuda\/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\r\nWARNING: All log messages before absl::InitializeLog() is called are written to STDERR\r\nE0000 00:00:1750075690.307379 2950388 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\r\nE0000 00:00:1750075690.430066 2950388 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\r\nW0000 00:00:1750075691.911909 2950388 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\r\nW0000 00:00:1750075691.912009 2950388 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\r\nW0000 00:00:1750075691.912021 2950388 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\r\nW0000 00:00:1750075691.912032 2950388 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\r\n2025-06-16 13:08:12.028140: I tensorflow\/core\/platform\/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\r\nTo enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\r\n<\/pre>\n<p>They can be ignored.<\/p>\n<p><!-- Below examples no longer relevant, removed as they were of older versions of Tensorflow\n     Kept here as hidden reference if some of the old fixes are needed as reference.\n\n\n\n<h3>Example 1 - Installing Tensorflow<\/h3>\n\n\nThe following notes have been updated in Feb 2024 for tensorflow 2.15.0 and include a fix for the \"TensorRT not found\" warning.\n\n\n<pre clas='slurm'>\r\n# From the login node, start an interactive session\r\nsrun -p gpuV -G 1 -n 1 -t 0-1 --pty bash\r\n\r\n# <strong>Now on the GPU node<\/strong> - quite a few steps to install, but is is then easy to use in your jobscripts\r\n\r\n# Note that you must <strong>NOT<\/strong> have any existing conda environments active.\r\n# If your command prompt looks something like:\r\n(<strong>base<\/strong>) [<em>username<\/em>@node800 [csf3] ~]$\r\n  #\r\n  # Any (<em>name<\/em>) here is the name of the active conda env.\r\n  #\r\n  # then you need to <strong>deactivate<\/strong> the <code>base<\/code> conda env (or whatever name is showing)\r\n  # using the following 'source deactivate' command:\r\n\r\nsource deactivate\r\n\r\n# Now the prompt shows <em>no<\/em> conda env. That's correct. We are going to create a new env for tensorflow.\r\n[<em>username<\/em>@node800 [csf3] ~]$\r\n<\/pre>\n\n\nContinue with the tensorflow installation - the following commands were run on a CSF GPU node (e.g., <code>node800<\/code> in our example):\n\n\n<pre>\r\nmodule purge\r\n# Use python 3.9 as required by tensorflow (2.15.0 at the time of writing)\r\nmodule load apps\/binapps\/anaconda3\/2022.10\r\n\r\n# Create a conda env with some basic packages needed to install other packages.\r\n# We're using anaconda3 v2022.10 which provides python 3.9.13 (use \"python --version\").\r\n# We use 'tf' as the env name in this example (you can change this if you want).\r\npython --version\r\nconda create -n tf python=<strong>3.9.13<\/strong>\r\nProceed ([y]\/n)? y                          # &lt;--- Press y [return] to proceed\r\n\r\n# Now activate the env so we can install other packages in to the env\r\nsource activate tf\r\n\r\n# Now install latest tensorflow (2.15.0 at time of writing) and cuda inside the env\r\npip install --isolated --log pip.tf.log tensorflow[and-cuda]\r\npip install --isolated --log pip.rt.log --extra-index-url https:\/\/pypi.nvidia.com tensorrt\r\n\r\n# Now fix a warning message that complains about tensorrt.\r\n# I'm not sure that tensorrt is really used but the warning is annoying.\r\n# Based on https:\/\/github.com\/tensorflow\/tensorflow\/issues\/61468\r\n\r\nmkdir -p $CONDA_PREFIX\/etc\/conda\/activate.d\r\n# Note: No single quotes in this command but there are two dirname commands!\r\necho TENSORRT_PATH=$(dirname $(dirname $(python -c \"import tensorrt;print(tensorrt.__file__)\")))\/tensorrt_libs >> $CONDA_PREFIX\/etc\/conda\/activate.d\/env_vars.sh\r\n\r\n# But we <em>do<\/em> want the single quotes in this command\r\necho 'export LD_LIBRARY_PATH=$TENSORRT_PATH:$LD_LIBRARY_PATH' >> $CONDA_PREFIX\/etc\/conda\/activate.d\/env_vars.sh\r\n\r\n# Now while we're installing, apply the above settings manually.\r\n# In future, when you \"source activate tf\" to activate the env,\r\n# the settings will be applied automatically:\r\nsource $CONDA_PREFIX\/etc\/conda\/activate.d\/env_vars.sh\r\n\r\n# Fix some missing symlinks\r\npushd $TENSORRT_PATH\r\n# Will need to update these commands in future to match the tensorrt version\r\nln -s libnvinfer.so.8 libnvinfer.so.8.6.1\r\nln -s libnvinfer_plugin.so.8 libnvinfer_plugin.so.8.6.1\r\npopd\r\n\r\n# Now run a quick test:\r\npython -c \"import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))\"\r\n\r\n# Return to the login node\r\nexit\r\n<\/pre>\n\n\n\nLet's go back to the login node and do a test of everything from the beginning, without any install steps.\n\n\n<pre>\r\n# Start a new interactive session\r\nsrun -p gpuV -G 1 -n 1 -t 0-1 --pty bash\r\nmodule load apps\/binapps\/anaconda3\/2022.10\r\nsource activate tf\r\npython -c \"import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))\"\r\nsource deactivate\r\nexit\r\n<\/pre>\n\n\n\nLet's also test a batch job. First write a jobscript, e.g., using the command: <code>gedit tf.sbatch<\/code>\n\n\n\n<pre class='slurm'>\r\n#!\/bin\/bash --login\r\n#SBATCH -p gpuV\r\n#SBATCH --gpus=1\r\n#SBATCH -n 8\r\n#SBATCH -t 0-1\r\nmodule purge\r\nmodule load apps\/binapps\/anaconda3\/2022.10\r\nsource activate tf\r\npython -c \"import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))\"\r\n<\/pre>\n\n\nNow subimt job:\n\n\n<pre>\r\nsbatch tf.sbatch\r\n<\/pre>\n\n\nYou should see the following output in your <code>slurm<em>123456<\/em>.out<\/code> output file:\n\n\n<pre>\r\n[PhysicalDevice(name='\/physical_device:GPU:0', device_type='GPU')]\r\n<\/pre>\n\n\nNote that the <code>tf.qsub.e<em>123456<\/em><\/code> error file will contain some warnings:\n\n\n<pre>\r\n2024-02-19 19:45:15.604408: E external\/local_xla\/xla\/stream_executor\/cuda\/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\r\n2024-02-19 19:45:15.604463: E external\/local_xla\/xla\/stream_executor\/cuda\/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\r\n2024-02-19 19:45:15.606101: E external\/local_xla\/xla\/stream_executor\/cuda\/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\r\n2024-02-19 19:45:15.615548: I tensorflow\/core\/platform\/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\r\nTo enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\r\n<\/pre>\n\n\nThey can be ignored.\n\n\n\n\n\n<h3>Example 2 - Older notes for Tensorflow 2.12<\/h3>\n\n\nThe following creates a conda env and installs Tensorflow 2.12. There may be some setting which are needed in future so we keep this here for now. \n\n\n\n<pre>\r\n# Start an interactive session\r\nqrsh -l nvidia_v100 bash\r\n\r\n# Note that you must NOT have any existing conda environments active.\r\n# If your command prompt looks something like:\r\n(<strong>base<\/strong>) [<em>username<\/em>@node800 [csf3] ~]$\r\n  #\r\n  # Any (<em>name<\/em>) here is the name of the active conda env.\r\n<\/pre>\n\n\nthen you need to deactivate the <code>base<\/code> conda env (or whatever name is showing) using the following command:\n\n\n<pre>source deactivate\r\n\r\n# Now the prompt shows no conda env\r\n[<em>username<\/em>@node800 [csf3] ~]$\r\n<\/pre>\n\n\nThe following commands were run on a CSF compute node:\n\n\n<pre>\r\nmodule purge\r\nmodule load apps\/binapps\/anaconda3\/2022.10\r\n\r\n# Create a conda env with some basic packages needed to install other packages.\r\n# We're using anaconda3 v2022.10 which provides python 3.9.13 (use \"python --version\").\r\n# We use 'tensorflow' as the env name in this example (you can change this if you want).\r\npython --version\r\nconda create -n tensorflow python=3.9.13\r\nProceed ([y]\/n)? y                          # &lt;--- Press y [return] to proceed\r\n\r\n# Now activate the env so we can install other packages in to the env\r\nsource activate tensorflow\r\n\r\n# Now install CUDA. The tensorflow website tells you which version is required.\r\n# If you install the wrong version, tensorflow will complain it can't find a GPU.\r\nconda install -c conda-forge cudatoolkit=11.8.0\r\nProceed ([y]\/n)? y                          # &lt;--- Press y [return] to proceed\r\n\r\n# now install tensorflow using pip. We use a slightly different command to that\r\n# shown on the tensorflow website to ensure the packages are installed inside our\r\n# conda env:\r\npip install --isolated nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.12.*\r\n\r\n# Extra steps to fix bug in tensorflow 2.11 and 2.12. Hopefully not needed in TF 2.13!!\r\nconda install -c nvidia cuda-nvcc --yes\r\nmkdir -p $CONDA_PREFIX\/lib\/nvvm\/libdevice\/\r\ncp -p $CONDA_PREFIX\/lib\/libdevice.10.bc $CONDA_PREFIX\/lib\/nvvm\/libdevice\/\r\n\r\n# Finally, deactivate the conda env on the login node\r\nsource deactivate\r\n\r\n# End out interactive session and return to the login node\r\nexit\r\n<\/pre>\n\n\nWe can now submit a test job. There are a couple of extra lines needed in the jobscript to set up the environment so that python can find your local installation of tensorflow. Create a jobscript as follows:\n\n\n<pre>#!\/bin\/bash --login\r\n#$ -pe smp.pe 8          # Use 8 CPU cores per GPU\r\n#$ -l v100=1             # Use one v100 GPU\r\n\r\nmodule load apps\/binapps\/anaconda3\/2022.10\r\n# Activate the conda env. In a jobscript you must use \"source activate\" not \"conda activate\"\r\nsource activate tensorflow\r\n\r\n# Some extra setup needed by tensorflow to get the location of our conda env\r\nCUDNN_PATH=$(dirname $(python -c \"import nvidia.cudnn;print(nvidia.cudnn.__file__)\"))\r\nexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX\/lib:$CUDNN_PATH\/lib\r\n\r\n# Extra setting needed to fix TF 2.11 and 2.12. Hopefully not needed in TF 2.13!\r\nexport XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX\/lib\r\n\r\n# Now run some python code. This is just a simple TF test.\r\npython -c \"import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))\"\r\n\r\n# Can deactivate the env at the end of the job\r\nsource deactivate\r\n<\/pre>\n\n\nSubmit the jobs to the GPU nodes using\n\n\n<pre>\r\n# You should be back on the login node at this point. Submit the batch job:\r\nqsub <em>jobscript<\/em>\r\n<\/pre>\n\n\nwhere <code><em>jobscript<\/em><\/code> is the name of your jobscript. The output should be:\n\n\n<pre># Look in the .oNNNNNN file for the output\r\ncat <em>jobscript<\/em>.o<em>NNNNNN<\/em>\r\n[PhysicalDevice(name='\/physical_device:GPU:0', device_type='GPU')]\r\n<\/pre>\n\n\nThis shows that tensorflow ran and found a GPU. You should also be able to use the <a href=\"#tfkexample\">Tensorflow Keras example<\/a> from earlier.\n--><\/p>\n<h3>Example 2 &#8211; Installing PyPaz<\/h3>\n<p>The following example creates a conda env in which to install a package named <a href=\"https:\/\/github.com\/oarriaga\/paz\">PyPAZ<\/a>, an image processing package that uses tensorflow. It shows how to use pip to install the package within a conda env.<\/p>\n<pre class='slurm'>\r\n# The following commands were run on the CSF login node\r\nsrun -p gpuV -G 1 -n 1 -t 0-1 --pty bash\r\n\r\n# Wait for your interactive session to be scheduled....\r\n\r\n# Now on the GPU node:\r\n\r\n# The following was the most recent anaconda tested working with pypaz\r\nmodule purge\r\nmodule load apps\/binapps\/anaconda3\/2022.10\r\n\r\n# Create a conda env that will use the same version of python as the central anaconda install.\r\n# Otherwise conda will install the very latest version of python, which takes longer.\r\npython --version\r\n  # Python 3.9.13\r\nconda create -n paz python=3.9.13\r\n\r\n# Activate the env\r\nsource activate paz\r\n\r\n# Now install CUDA. The tensorflow website tells you which version is required.\r\n# If you install the wrong version, tensorflow will complain it can't find a GPU.\r\nconda install -c conda-forge cudatoolkit=11.8.0\r\nProceed ([y]\/n)? y                          # &lt;--- Press y [return] to proceed\r\n\r\n# now install tensorflow using pip. We use a slightly different command to that\r\n# shown on the tensorflow website to ensure the packages are installed inside our\r\n# conda env:\r\npip install --isolated nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.12.*\r\n\r\n# Install the paz package, telling pip to ignore any local config.\r\n# This ensures the packages are installed inside the current conda env.\r\npip install --isolated --log ~\/pypaz.log pypaz\r\n\r\n# Extra steps to fix bug in tensorflow 2.11 and 2.12. Hopefully not needed in TF 2.13!!\r\nconda install -c nvidia cuda-nvcc --yes\r\nmkdir -p $CONDA_PREFIX\/lib\/nvvm\/libdevice\/\r\ncp -p $CONDA_PREFIX\/lib\/libdevice.10.bc $CONDA_PREFIX\/lib\/nvvm\/libdevice\/\r\n\r\n# Check what has been installed\r\nconda list\r\n  #\r\n  # Note that tensorflow has been installed, so you do not need\r\n  # to use our central installation on the CSF - you now have your\r\n  # own version in the 'paz' conda env\r\n\r\n# Let's deactivate the env now we've installed it. We recommend only activating\r\n# conda environments when you want to install packages in them or when running\r\n# jobs for that project.\r\nsource deactivate\r\n\r\n\r\n# We'll now use the GPU node to test the installation\r\n# We recommend you terminate your previous session and start a new one:\r\nexit\r\nsrun -p gpuV -G 1 -n 1 -t 0-1 --pty bash\r\n\r\n# Load the modulefile needed to use anaconda (also do this if you submit a jobscript for paz!)\r\nmodule purge\r\nmodule load apps\/binapps\/anaconda3\/2022.10\r\n\r\n# Activate the conda env (must use \"source activate\" not \"conda activate\" in a jobscript)\r\nsource activate paz\r\n\r\n# Some extra setup needed by tensorflow to get the location of our conda env\r\nCUDNN_PATH=$(dirname $(python -c \"import nvidia.cudnn;print(nvidia.cudnn.__file__)\"))\r\nexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX\/lib:$CUDNN_PATH\/lib\r\n\r\n# Extra setting needed to fix TF 2.11 and 2.12\r\nexport XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX\/lib\r\n\r\n# Start python (we'll type python code directly at the prompt.)\r\npython\r\n  \r\n  # Now using the python commands shown on the Paz github page:\r\n  from paz.applications import SSD512COCO\r\n  detect = SSD512COCO()\r\n  exit()\r\n\r\n# Return to the login node\r\nexit\r\n<\/pre>\n<h2>Further info<\/h2>\n<ul>\n<li><a href=\"https:\/\/www.tensorflow.org\/\">TensorFlow website<\/a><\/li>\n<\/ul>\n<h2>Updates<\/h2>\n<p>None.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs. See the modulefiles below for available versions. As of Tensorflow version 2.4, Keras is now packaged within the Tensorflow install as tensorflow.keras. The recomended method.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/software\/applications\/tensorflow\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"parent":86,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-457","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/457","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/comments?post=457"}],"version-history":[{"count":23,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/457\/revisions"}],"predecessor-version":[{"id":10375,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/457\/revisions\/10375"}],"up":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/86"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/media?parent=457"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}