The CSF2 has been replaced by the CSF3 - please use that system! This documentation may be out of date. Please read the CSF3 documentation instead. To display this old CSF2 page click here. |
Tensorflow
Overview
TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs.
Version 1.8.0 (using Python 3.5) has been installed for GPUs on the CSF.
Versions 0.8.0, 0.9.0rc0, 1.0.0 (using Python 2.7), version 0.10.0 (using Python 3.4), version 0.11.0, 0.12.1, 1.0.0, 1.2.1, 1.8.0 (using Python 3.5) all for CPUs, versions 1.10.1 and 1.11.0 (using Python 3.6) for CPUs and GPUs have been installed on the CSF.
Restrictions on use
There are no access restrictions on the CSF.
Set up procedure
Note that as of Tensorflow 1.10.1 it is not possible to run on a Westmere CPU because the Tensorflow libraries require support for the Intel AVX instruction set. This means you must run on Sandybridge, Ivybridge, Haswell or Broadwell CPUs. So if you are submitting a batch CPU job, please add one of the following flags to your jobscript:
# Use only one of the following flags -l sandybridge -l ivybridge -l haswell -l broadwell
The CSF Nvidia K20 GPU nodes are Sandybridge nodes so you should NOT add any of these flags if using a GPU node.
It is not currently possible to run interactive CPU jobs using 1.10.1 and up because there are no Sandybridge (or better) interactive nodes in the CSF.
To access the software you must first load one of the following modulefile:
# Python 3.6 for GPUs: (uses CUDA 9.0.176, cuDNN 7.3.0, Anaconda3 5.2.0, not westmere CPUs) apps/gcc/tensorflow/1.11.0-py36-gpu apps/gcc/tensorflow/1.10.1-py36-gpu # Python 3.5 for GPUs: (uses CUDA 9.0.176, cuDNN 7.0.3, Anaconda3 4.2.0) apps/gcc/tensorflow/1.8.0-py35-gpu # Python 3.6 for CPUs: (uses Anaconda3 5.2.0, not Westmere CPUs) apps/gcc/tensorflow/1.11.0-py36-cpu apps/gcc/tensorflow/1.10.1-py36-cpu # Python 3.5 for CPUs: (uses Anaconda3 4.2.0) apps/gcc/tensorflow/1.8.0-py35-cpu apps/gcc/tensorflow/1.2.1-py35-cpu apps/gcc/tensorflow/1.0.0-py35-cpu apps/gcc/tensorflow/0.12.1-py35-cpu apps/gcc/tensorflow/0.11.0-py35-cpu # Python 3.4 for CPUs: (new versions not being installed unless requested) apps/gcc/tensorflow/0.10.0-py34-cpu apps/gcc/tensorflow/0.9.0rc0-py34-cpu apps/gcc/tensorflow/0.8.0-py34-cpu # Python 2.7 for CPUs: apps/gcc/tensorflow/1.2.1-py27-cpu # New apps/gcc/tensorflow/1.0.0-py27-cpu apps/gcc/tensorflow/0.9.0rc0-py27-cpu apps/gcc/tensorflow/0.8.0-py27-cpu
The above modulefiles will load the following modulefiles automatically:
- One of the following Anaconda python modulefiles:
apps/binapps/anaconda/3/4.2.0
(python 3.5.2)apps/binapps/anaconda/3/2.3.0
(python 3.4.3)apps/binapps/anaconda/2.5.0
(python 2.7.11)
compilers/gcc/4.8.2
(C++11 compatible compiler)
Running the application
Please do not run Tensorflow on the login node. Jobs should be run interactively on the backend nodes (via qrsh
) or submitted to the compute nodes via batch.
The following instructions describe interactive use on a backend node and batch jobs from the login node.
Technical Note (you are not required to do anything – this is for information only)
- We use a modified python executable (a shell script) named
python
to start the usual Anaconda python interpreter. This actually runs the following:LD_PRELOAD=/usr/lib64/librt.so:$TFDIR/fixes/stubs/mylibc.so:$GCCDIR/lib64/libstdc++.so.6 python
The
LD_PRELOAD
is needed to load a few libraries that replace system libraries. The pre-compiled TensorFlow installation supplied by Google requires a newer version of GLIBC than is available on the CSF. We have modified the TensorFlow library_pywrap_tensorflow.so
to be less strict about the version of GLIBC present. But we then supply some function that are missing in our older GLIBC library that are required by TensorFlow.
Interactive use on a Backend GPU Node
June 2018: Currently only a couple of Nvidia K20 GPUs are available. To request access to these nodes please email its-ri-team@manchester.ac.uk.
Once you have been granted access to the Nvidia K20 node, start an interactive session as follows:
qrsh -l inter -l nvidia_k20 # Wait until you are logged in to a backed compute node, then: module load apps/gcc/tensorflow/1.8.0-py35-gpu python
An example TensorFlow GPU script is as follows:
# 3. Start python then enter the commands python # Now enter the following python commands: # Load the tensorflow library (using a short name for convenience) import tensorflow as tf # You should see: # successfully opened CUDA library libcudnn.so locally # (and other GPU details)... # Create a graph a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') c = tf.matmul(a, b) # Turn on device placement reporting so we can see where a graph runs sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) # You should see: # Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4332 MB memory) -> physical GPU (device: 0, name: Tesla K20m, pci bus id: 0000:03:00.0, compute capability: 3.5) # Device mapping: # /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus id: 0000:03:00.0, compute capability: 3.5 # 2018-06-26 12:11:40.198864: I tensorflow/core/common_runtime/direct_session.cc:284] Device mapping: # /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus id: 0000:03:00.0, compute capability: 3.5 # Run the graph. It will report the GPU used to do so. sess.run(c) # You should see # MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0 # 2018-06-26 12:11:44.481336: I tensorflow/core/common_runtime/placer.cc:886] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0 # b: (Const): /job:localhost/replica:0/task:0/device:GPU:0 # 2018-06-26 12:11:44.481358: I tensorflow/core/common_runtime/placer.cc:886] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0 # a: (Const): /job:localhost/replica:0/task:0/device:GPU:0 # 2018-06-26 12:11:44.481370: I tensorflow/core/common_runtime/placer.cc:886] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0 # # array([[ 22., 28.], # [ 49., 64.]], dtype=float32) # Exit the python shell Ctrl-D
Interactive use on a Backend CPU-only Node
To request an interactive session on a backend compute node run:
qrsh -l inter -l short # Wait until you are logged in to a backend compute node, then: module load apps/gcc/tensorflow/1.2.1-py35-cpu python
An example TensorFlow session is given below.
If there are no free interactive resources the qrsh
command will ask you to try again later. Please do not run TensorFlow on the login node. Any jobs running there will be killed without warning.
Single CPU Example
A simple TensorFlow test is as follows:
# Assuming you are at the CSF login node: # 1. Log in to a backend node qrsh -l inter -l short # 2. Load the modulefile on the backend node module load apps/gcc/tensorflow/1.2.1-py35-cpu # 3. Start python then enter the commands python # Enter the following program import tensorflow as tf # Create a graph a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') c = tf.matmul(a, b) # Create the TensorFlow session and restrict threads to number of cores we can use. # An interactive 'qrsh' session can only use one core. sess = tf.Session(config=tf.ConfigProto(inter_op_parallelism_threads=1, intra_op_parallelism_threads=1)) # Run the graph. It will report the GPU used to do so. sess.run(c) # You should see the following reported MatMul: /job:localhost/replica:0/task:0/cpu:0 b: /job:localhost/replica:0/task:0/cpu:0 a: /job:localhost/replica:0/task:0/cpu:0 array([[ 22., 28.], [ 49., 64.]], dtype=float32) # Exit the python shell Ctrl-D
Serial batch job submission
Ensure you have loaded the correct modulefile on the login node. Create a python script (e.g., my-script.py
) as follows. It will detect how many cores it can use:
import tensorflow as tf import os # Get number of cores reserved by the batch system (NSLOTS is automatically set, or use 1 if not) NUMCORES=int(os.getenv("NSLOTS",1)) print("Using", NUMCORES, "core(s)" ) # Create TF session using correct number of cores sess = tf.Session(config=tf.ConfigProto(inter_op_parallelism_threads=NUMCORES, intra_op_parallelism_threads=NUMCORES)) # Now create a TF graph a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') c = tf.matmul(a, b) # Run the graph and print result print(sess.run(c))
Now create a jobscript similar to the following:
#!/bin/bash #$ -S /bin/bash #$ -cwd # Run job from directory where submitted #$ -V # Inherit environment (modulefile) settings # $NSLOTS is automatically set to 1. The python script uses this (see above). python my-script.py
Submit your jobscript using
qsub jobscript
where jobscript
is the name of your jobscript.
Parallel batch job submission
Ensure you have loaded the correct modulefile and then create a jobscript similar to the following:
#!/bin/bash #$ -S /bin/bash #$ -cwd # Run job from directory where submitted #$ -V # Inherit environment (modulefile) settings #$ -pe smp.pe 16 # Number of cores on a single compute node. Can be 2-24. # $NSLOTS is automatically set to the number of cores requested on the pe line # and can be read by your python code (see example above). python my-script.py
The above my-script.py
example will get the number of cores to use from the $NSLOTS
environment variable.
Submit your jobscript using
qsub jobscript
where jobscript
is the name of your jobscript.
Further info
Updates
None.