Research Infrastructure > CSF2 (retired) > Software > Applications > Tensorflow

- Recent Posts & Updates

Page Contents

The CSF2 has been replaced by the CSF3 - please use that system! This documentation may be out of date. Please read the CSF3 documentation instead.
To display this old CSF2 page click here.

Tensorflow

Overview

TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs.

Version 1.8.0 (using Python 3.5) has been installed for GPUs on the CSF.

Versions 0.8.0, 0.9.0rc0, 1.0.0 (using Python 2.7), version 0.10.0 (using Python 3.4), version 0.11.0, 0.12.1, 1.0.0, 1.2.1, 1.8.0 (using Python 3.5) all for CPUs, versions 1.10.1 and 1.11.0 (using Python 3.6) for CPUs and GPUs have been installed on the CSF.

Restrictions on use

There are no access restrictions on the CSF.

Set up procedure

Note that as of Tensorflow 1.10.1 it is not possible to run on a Westmere CPU because the Tensorflow libraries require support for the Intel AVX instruction set. This means you must run on Sandybridge, Ivybridge, Haswell or Broadwell CPUs. So if you are submitting a batch CPU job, please add one of the following flags to your jobscript:

# Use only one of the following flags
-l sandybridge
-l ivybridge
-l haswell
-l broadwell

The CSF Nvidia K20 GPU nodes are Sandybridge nodes so you should NOT add any of these flags if using a GPU node.

It is not currently possible to run interactive CPU jobs using 1.10.1 and up because there are no Sandybridge (or better) interactive nodes in the CSF.

To access the software you must first load one of the following modulefile:

# Python 3.6 for GPUs: (uses CUDA 9.0.176, cuDNN 7.3.0, Anaconda3 5.2.0, not westmere CPUs)
apps/gcc/tensorflow/1.11.0-py36-gpu
apps/gcc/tensorflow/1.10.1-py36-gpu

# Python 3.5 for GPUs: (uses CUDA 9.0.176, cuDNN 7.0.3, Anaconda3 4.2.0)
apps/gcc/tensorflow/1.8.0-py35-gpu

# Python 3.6 for CPUs: (uses Anaconda3 5.2.0, not Westmere CPUs)
apps/gcc/tensorflow/1.11.0-py36-cpu
apps/gcc/tensorflow/1.10.1-py36-cpu

# Python 3.5 for CPUs: (uses Anaconda3 4.2.0)
apps/gcc/tensorflow/1.8.0-py35-cpu
apps/gcc/tensorflow/1.2.1-py35-cpu
apps/gcc/tensorflow/1.0.0-py35-cpu
apps/gcc/tensorflow/0.12.1-py35-cpu
apps/gcc/tensorflow/0.11.0-py35-cpu

# Python 3.4 for CPUs: (new versions not being installed unless requested)
apps/gcc/tensorflow/0.10.0-py34-cpu
apps/gcc/tensorflow/0.9.0rc0-py34-cpu
apps/gcc/tensorflow/0.8.0-py34-cpu

# Python 2.7 for CPUs:
apps/gcc/tensorflow/1.2.1-py27-cpu          # New
apps/gcc/tensorflow/1.0.0-py27-cpu
apps/gcc/tensorflow/0.9.0rc0-py27-cpu
apps/gcc/tensorflow/0.8.0-py27-cpu

The above modulefiles will load the following modulefiles automatically:

One of the following Anaconda python modulefiles:
- apps/binapps/anaconda/3/4.2.0 (python 3.5.2)
- apps/binapps/anaconda/3/2.3.0 (python 3.4.3)
- apps/binapps/anaconda/2.5.0 (python 2.7.11)
compilers/gcc/4.8.2 (C++11 compatible compiler)

Running the application

Please do not run Tensorflow on the login node. Jobs should be run interactively on the backend nodes (via qrsh) or submitted to the compute nodes via batch.

The following instructions describe interactive use on a backend node and batch jobs from the login node.

Technical Note (you are not required to do anything – this is for information only)

We use a modified python executable (a shell script) named python to start the usual Anaconda python interpreter. This actually runs the following:
```
LD_PRELOAD=/usr/lib64/librt.so:$TFDIR/fixes/stubs/mylibc.so:$GCCDIR/lib64/libstdc++.so.6 python
```
The LD_PRELOAD is needed to load a few libraries that replace system libraries. The pre-compiled TensorFlow installation supplied by Google requires a newer version of GLIBC than is available on the CSF. We have modified the TensorFlow library _pywrap_tensorflow.so to be less strict about the version of GLIBC present. But we then supply some function that are missing in our older GLIBC library that are required by TensorFlow.

Interactive use on a Backend GPU Node

June 2018: Currently only a couple of Nvidia K20 GPUs are available. To request access to these nodes please email its-ri-team@manchester.ac.uk.

Once you have been granted access to the Nvidia K20 node, start an interactive session as follows:

qrsh -l inter -l nvidia_k20

# Wait until you are logged in to a backed compute node, then:
module load apps/gcc/tensorflow/1.8.0-py35-gpu
python

An example TensorFlow GPU script is as follows:

# 3. Start python then enter the commands
python

# Now enter the following python commands:

# Load the tensorflow library (using a short name for convenience)
import tensorflow as tf

  # You should see:
  #   successfully opened CUDA library libcudnn.so locally
  #   (and other GPU details)...

# Create a graph
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)

# Turn on device placement reporting so we can see where a graph runs
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

  # You should see:
  # Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4332 MB memory) -> physical GPU (device: 0, name: Tesla K20m, pci bus id: 0000:03:00.0, compute capability: 3.5)
  # Device mapping:
  # /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus id: 0000:03:00.0, compute capability: 3.5
  # 2018-06-26 12:11:40.198864: I tensorflow/core/common_runtime/direct_session.cc:284] Device mapping:
  # /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus id: 0000:03:00.0, compute capability: 3.5

# Run the graph. It will report the GPU used to do so.
sess.run(c)

  # You should see
  # MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
  # 2018-06-26 12:11:44.481336: I tensorflow/core/common_runtime/placer.cc:886] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
  # b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
  # 2018-06-26 12:11:44.481358: I tensorflow/core/common_runtime/placer.cc:886] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
  # a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
  # 2018-06-26 12:11:44.481370: I tensorflow/core/common_runtime/placer.cc:886] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
  #
  #   array([[ 22.,  28.],
  #          [ 49.,  64.]], dtype=float32)

# Exit the python shell
Ctrl-D

Interactive use on a Backend CPU-only Node

To request an interactive session on a backend compute node run:

qrsh -l inter -l short

# Wait until you are logged in to a backend compute node, then:

module load apps/gcc/tensorflow/1.2.1-py35-cpu
python

An example TensorFlow session is given below.

If there are no free interactive resources the qrsh command will ask you to try again later. Please do not run TensorFlow on the login node. Any jobs running there will be killed without warning.

Single CPU Example

A simple TensorFlow test is as follows:

# Assuming you are at the CSF login node:

# 1. Log in to a backend node 
qrsh -l inter -l short

# 2. Load the modulefile on the backend node
module load apps/gcc/tensorflow/1.2.1-py35-cpu

# 3. Start python then enter the commands
python
# Enter the following program
import tensorflow as tf

# Create a graph
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)

# Create the TensorFlow session and restrict threads to number of cores we can use.
# An interactive 'qrsh' session can only use one core.
sess = tf.Session(config=tf.ConfigProto(inter_op_parallelism_threads=1,
  intra_op_parallelism_threads=1))

# Run the graph. It will report the GPU used to do so.
sess.run(c)

# You should see the following reported
MatMul: /job:localhost/replica:0/task:0/cpu:0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
array([[ 22.,  28.],
       [ 49.,  64.]], dtype=float32)

# Exit the python shell
Ctrl-D

Serial batch job submission

Ensure you have loaded the correct modulefile on the login node. Create a python script (e.g., my-script.py) as follows. It will detect how many cores it can use:

import tensorflow as tf
import os

# Get number of cores reserved by the batch system (NSLOTS is automatically set, or use 1 if not)
NUMCORES=int(os.getenv("NSLOTS",1))
print("Using", NUMCORES, "core(s)" )

# Create TF session using correct number of cores
sess = tf.Session(config=tf.ConfigProto(inter_op_parallelism_threads=NUMCORES,
   intra_op_parallelism_threads=NUMCORES))

# Now create a TF graph
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)

# Run the graph and print result
print(sess.run(c))

Now create a jobscript similar to the following:

#!/bin/bash
#$ -S /bin/bash
#$ -cwd                   # Run job from directory where submitted
#$ -V                     # Inherit environment (modulefile) settings

# $NSLOTS is automatically set to 1. The python script uses this (see above).
python my-script.py

Submit your jobscript using

qsub jobscript

where jobscript is the name of your jobscript.

Parallel batch job submission

Ensure you have loaded the correct modulefile and then create a jobscript similar to the following:

#!/bin/bash
#$ -S /bin/bash
#$ -cwd                   # Run job from directory where submitted
#$ -V                     # Inherit environment (modulefile) settings
#$ -pe smp.pe 16          # Number of cores on a single compute node. Can be 2-24.

# $NSLOTS is automatically set to the number of cores requested on the pe line
# and can be read by your python code (see example above).
python my-script.py

The above my-script.py example will get the number of cores to use from the $NSLOTS environment variable.