Chainer

Overview

Chainer is a deep-learning framework capable of running on GPUs and CPUs. It supports various network architectures including feed-forward nets, convnets, recurrent nets and recursive nets. It also supports per-batch architectures.

Versions 5.0.0 and 5.4.0 both using Python 3.6 and CUDA 9.2 (with Nvidia cuDNN and NCCL libraries) are installed on the CSF.

You need to request being added to the relevant group to access GPUs before you can run Chainer on them.

Restrictions on use

There are no restrictions on accessing this software on the CSF. All use must adhere to the Chainer License.

Set up procedure

We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.

Load one of the following modulefiles:

module load apps/binapps/chainer/5.4.0
module load apps/binapps/chainer/5.0.0

The above modulefiles will load the necessary Anaconda Python and CUDA modulefiles for you. You may still run Chainer on CPUs-only, the CUDA modulefile does not force use of a GPU.

Checking the App Capabilities

To see what has been compiled in to the application run the following commands on the login node:

# Choose your required version
module load apps/binapps/chainer/5.4.0

# Check the CPU version (ignore warning about FutureWarning: Conversion)
qrsh -l short -V 'python -c "import chainer; chainer.print_runtime_info();"'

   Chainer: 5.4.0
   NumPy: 1.16.2
   CuPy: Not Available
   iDeep: Not Available

# Check the GPU version (if you have GPU access)
qrsh -l v100 -V 'python -c "import chainer; chainer.print_runtime_info();"'

   Chainer: 5.4.0
   NumPy: 1.16.2
   CuPy:
     CuPy Version          : 5.4.0
     CUDA Root             : /opt/apps/libs/nvidia-cuda/toolkit/9.2.148
     CUDA Build Version    : 9020
     CUDA Driver Version   : 9020
     CUDA Runtime Version  : 9020
     cuDNN Build Version   : 7402
     cuDNN Version         : 7402
     NCCL Build Version    : 2402
     NCCL Runtime Version  : 2402
   iDeep: Not Available

Running the application

Please do not run Chainer on the login node. Jobs should be submitted to the compute nodes via batch. In these notes we use the chainer MNIST example code to demonstrate how to run the application. For the source code of that application please see the files in the directory:

$CHAINER_HOME/examples/mnist/

Serial CPU (not GPU) batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
                    # NO -V line - we load modulefiles in the jobscript

# Load the modulefile for the version you require!
module load apps/binapps/chainer/5.4.0

# Only use the requested number of CPU cores. For serial jobs $NSLOTS is set to 1.
export OMP_NUM_THREADS=$NSLOTS

# Run you chainer code in python
python my_chainer_code.py

# Example: To run the MNIST code on a CPU (use a -ve GPU ID)
$CHAINER_HOME/examples/mnist/train_mnist.py -g -1

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Parallel CPU (not GPU) batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
#$ -pe smp.pe 16    # Number of cores (can be 2 -- 32)

# Load the modulefile for the version you require!
module load apps/binapps/chainer/5.4.0

# Only use the requested number of CPU cores. $NSLOTS is set to the number above.
export OMP_NUM_THREADS=$NSLOTS

# Run you chainer code in python
python my_chainer_code.py

# Example: To run the MNIST code on a CPU (use a -ve GPU ID)
$CHAINER_HOME/examples/mnist/train_mnist.py -g -1

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Single GPU batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
#$ -l v100=1        # Single Nvidia GPU
#$ -pe smp.pe 8     # Can request (2--8) CPUs for each GPU we use

# Load the modulefile for the version you require!
module load apps/binapps/chainer/5.4.0

# Only use the requested number of CPU cores. For serial jobs $NSLOTS is set to 1.
export OMP_NUM_THREADS=$NSLOTS

# Run you chainer code in python
python my_chainer_code.py

# Example: To run the MNIST code on a CPU (use a -ve GPU ID)
$CHAINER_HOME/examples/mnist/train_mnist.py -g 0
                                               #
                                               # Use the GPU assigned to our job.
                                               # The physical ID may be higher than 0
                                               # but you should always use 0 to select
                                               # the first GPU assigned to our job.

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Multi GPU batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
#$ -l v100=2        # Multiple Nvidia GPUs (not all users have access to more than 1 GPU)
#$ -pe smp.pe 16    # Can request (2--8) CPUs for each GPU we use

# Load the modulefile for the version you require!
module load apps/binapps/chainer/5.4.0

# Only use the requested number of CPU cores. For serial jobs $NSLOTS is set to 1.
export OMP_NUM_THREADS=$NSLOTS

# Run you chainer code in python
python my_chainer_code.py

# Example: run the MNIST code on two GPUs
$CHAINER_HOME/examples/mnist/train_mnist_model_parallel.py -g 0 -G 1
                                                              #    #
                                                              #    #
                                                              #    # ID of second GPU in this
                                                              #    # example. Always use 1 to
                                                              #    # mean the second GPU
                                                              #    # assigned to our job.
                                                              # 
                                                              # Use the GPU assigned to our job.
                                                              # The physical ID may be higher
                                                              # than 0 but you should always
                                                              # use 0 to select the first GPU
                                                              # assigned to our job.

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Further info

Updates

None.

Last modified on April 23, 2019 at 12:36 pm by George Leaver