{"id":3211,"date":"2019-04-18T19:29:34","date_gmt":"2019-04-18T18:29:34","guid":{"rendered":"http:\/\/ri.itservices.manchester.ac.uk\/csf3\/?page_id=3211"},"modified":"2019-04-23T12:36:28","modified_gmt":"2019-04-23T11:36:28","slug":"chainer","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/software\/applications\/chainer\/","title":{"rendered":"Chainer"},"content":{"rendered":"<h2>Overview<\/h2>\n<p><a href=\"https:\/\/chainer.org\/\">Chainer<\/a> is a deep-learning framework capable of running on GPUs and CPUs. It supports various network architectures including feed-forward nets, convnets, recurrent nets and recursive nets. It also supports per-batch architectures. <\/p>\n<p>Versions 5.0.0 and 5.4.0 both using Python 3.6 and CUDA 9.2 (with Nvidia cuDNN and NCCL libraries) are installed on the CSF.<\/p>\n<p><strong>You need to request being added to the relevant group to access <a href=\"\/csf3\/batch\/gpu-jobs\/\">GPUs<\/a> before you can run Chainer on them.<\/strong><\/p>\n<h2>Restrictions on use<\/h2>\n<p>There are no restrictions on accessing this software on the CSF. All use must adhere to the <a href=\"https:\/\/docs.chainer.org\/en\/stable\/license.html\">Chainer License<\/a>.<\/p>\n<h2>Set up procedure<\/h2>\n<p>We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job <abbr title=\"add '#$ -V' to your jobscript\">inherit these settings<\/abbr>.<\/p>\n<p>Load one of the following modulefiles:<\/p>\n<pre>\r\nmodule load apps\/binapps\/chainer\/5.4.0\r\nmodule load apps\/binapps\/chainer\/5.0.0\r\n<\/pre>\n<p>The above modulefiles will load the necessary Anaconda Python and CUDA modulefiles for you. You may still run Chainer on CPUs-only, the CUDA modulefile does not force use of a GPU. <\/p>\n<h2>Checking the App Capabilities<\/h2>\n<p>To see what has been compiled in to the application run the following commands on the login node:<\/p>\n<pre>\r\n# Choose your required version\r\nmodule load apps\/binapps\/chainer\/5.4.0\r\n\r\n# Check the CPU version (ignore warning about FutureWarning: Conversion)\r\nqrsh -l short -V 'python -c \"import chainer; chainer.print_runtime_info();\"'\r\n\r\n   Chainer: 5.4.0\r\n   NumPy: 1.16.2\r\n   CuPy: Not Available\r\n   iDeep: Not Available\r\n\r\n# Check the GPU version (if you have GPU access)\r\nqrsh -l v100 -V 'python -c \"import chainer; chainer.print_runtime_info();\"'\r\n\r\n   Chainer: 5.4.0\r\n   NumPy: 1.16.2\r\n   CuPy:\r\n     CuPy Version          : 5.4.0\r\n     CUDA Root             : \/opt\/apps\/libs\/nvidia-cuda\/toolkit\/9.2.148\r\n     CUDA Build Version    : 9020\r\n     CUDA Driver Version   : 9020\r\n     CUDA Runtime Version  : 9020\r\n     cuDNN Build Version   : 7402\r\n     cuDNN Version         : 7402\r\n     NCCL Build Version    : 2402\r\n     NCCL Runtime Version  : 2402\r\n   iDeep: Not Available\r\n<\/pre>\n<h2>Running the application<\/h2>\n<p>Please do not run Chainer on the login node. Jobs should be submitted to the compute nodes via batch. In these notes we use the chainer MNIST example code to demonstrate how to run the application. For the source code of that application please see the files in the directory:<\/p>\n<pre>\r\n$CHAINER_HOME\/examples\/mnist\/\r\n<\/pre>\n<h3>Serial CPU (not GPU) batch job submission<\/h3>\n<p>Create a batch submission script (which will load the modulefile in the jobscript), for example:<\/p>\n<pre>\r\n#!\/bin\/bash --login\r\n#$ -cwd             # Job will run from the current directory\r\n                    # NO -V line - we load modulefiles in the jobscript\r\n\r\n# Load the modulefile for the version you require!\r\nmodule load apps\/binapps\/chainer\/5.4.0\r\n\r\n# Only use the requested number of CPU cores. For serial jobs $NSLOTS is set to 1.\r\nexport OMP_NUM_THREADS=$NSLOTS\r\n\r\n# Run you chainer code in python\r\npython my_chainer_code.py\r\n\r\n# Example: To run the MNIST code on a CPU (use a -ve GPU ID)\r\n$CHAINER_HOME\/examples\/mnist\/train_mnist.py -g -1\r\n<\/pre>\n<p>Submit the jobscript using: <\/p>\n<pre>qsub <em>scriptname<\/em><\/pre>\n<p>where <em>scriptname<\/em> is the name of your jobscript.<\/p>\n<h3>Parallel CPU (not GPU) batch job submission<\/h3>\n<p>Create a batch submission script (which will load the modulefile in the jobscript), for example:<\/p>\n<pre>\r\n#!\/bin\/bash --login\r\n#$ -cwd             # Job will run from the current directory\r\n#$ -pe smp.pe 16    # Number of cores (can be 2 -- 32)\r\n\r\n# Load the modulefile for the version you require!\r\nmodule load apps\/binapps\/chainer\/5.4.0\r\n\r\n# Only use the requested number of CPU cores. $NSLOTS is set to the number above.\r\nexport OMP_NUM_THREADS=$NSLOTS\r\n\r\n# Run you chainer code in python\r\npython my_chainer_code.py\r\n\r\n# Example: To run the MNIST code on a CPU (use a -ve GPU ID)\r\n$CHAINER_HOME\/examples\/mnist\/train_mnist.py -g -1\r\n<\/pre>\n<p>Submit the jobscript using: <\/p>\n<pre>qsub <em>scriptname<\/em><\/pre>\n<p>where <em>scriptname<\/em> is the name of your jobscript.<\/p>\n<h3>Single GPU batch job submission<\/h3>\n<p>Create a batch submission script (which will load the modulefile in the jobscript), for example:<\/p>\n<pre>\r\n#!\/bin\/bash --login\r\n#$ -cwd             # Job will run from the current directory\r\n#$ -l v100=1        # Single Nvidia GPU\r\n#$ -pe smp.pe 8     # Can request (2--8) CPUs for each GPU we use\r\n\r\n# Load the modulefile for the version you require!\r\nmodule load apps\/binapps\/chainer\/5.4.0\r\n\r\n# Only use the requested number of CPU cores. For serial jobs $NSLOTS is set to 1.\r\nexport OMP_NUM_THREADS=$NSLOTS\r\n\r\n# Run you chainer code in python\r\npython my_chainer_code.py\r\n\r\n# Example: To run the MNIST code on a CPU (use a -ve GPU ID)\r\n$CHAINER_HOME\/examples\/mnist\/train_mnist.py -g 0\r\n                                               #\r\n                                               # Use the GPU assigned to our job.\r\n                                               # The physical ID may be higher than 0\r\n                                               # but you should always use 0 to select\r\n                                               # the <em>first<\/em> GPU assigned to our job.\r\n<\/pre>\n<p>Submit the jobscript using: <\/p>\n<pre>qsub <em>scriptname<\/em><\/pre>\n<p>where <em>scriptname<\/em> is the name of your jobscript.<\/p>\n<h3>Multi GPU batch job submission<\/h3>\n<p>Create a batch submission script (which will load the modulefile in the jobscript), for example:<\/p>\n<pre>\r\n#!\/bin\/bash --login\r\n#$ -cwd             # Job will run from the current directory\r\n#$ -l v100=2        # Multiple Nvidia GPUs (not all users have access to more than 1 GPU)\r\n#$ -pe smp.pe 16    # Can request (2--8) CPUs for each GPU we use\r\n\r\n# Load the modulefile for the version you require!\r\nmodule load apps\/binapps\/chainer\/5.4.0\r\n\r\n# Only use the requested number of CPU cores. For serial jobs $NSLOTS is set to 1.\r\nexport OMP_NUM_THREADS=$NSLOTS\r\n\r\n# Run you chainer code in python\r\npython my_chainer_code.py\r\n\r\n# Example: run the MNIST code on two GPUs\r\n$CHAINER_HOME\/examples\/mnist\/train_mnist_model_parallel.py -g 0 -G 1\r\n                                                              #    #\r\n                                                              #    #\r\n                                                              #    # ID of second GPU in this\r\n                                                              #    # example. Always use 1 to\r\n                                                              #    # mean the second GPU\r\n                                                              #    # assigned to our job.\r\n                                                              # \r\n                                                              # Use the GPU assigned to our job.\r\n                                                              # The physical ID may be higher\r\n                                                              # than 0 but you should always\r\n                                                              # use 0 to select the <em>first<\/em> GPU\r\n                                                              # assigned to our job.\r\n<\/pre>\n<p>Submit the jobscript using: <\/p>\n<pre>qsub <em>scriptname<\/em><\/pre>\n<p>where <em>scriptname<\/em> is the name of your jobscript.<\/p>\n<h2>Further info<\/h2>\n<ul>\n<li><a href=\"\">APP website<\/a><\/li>\n<\/ul>\n<h2>Updates<\/h2>\n<p>None.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview Chainer is a deep-learning framework capable of running on GPUs and CPUs. It supports various network architectures including feed-forward nets, convnets, recurrent nets and recursive nets. It also supports per-batch architectures. Versions 5.0.0 and 5.4.0 both using Python 3.6 and CUDA 9.2 (with Nvidia cuDNN and NCCL libraries) are installed on the CSF. You need to request being added to the relevant group to access GPUs before you can run Chainer on them. Restrictions.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/software\/applications\/chainer\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"parent":86,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-3211","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/3211","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/comments?post=3211"}],"version-history":[{"count":14,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/3211\/revisions"}],"predecessor-version":[{"id":3227,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/3211\/revisions\/3227"}],"up":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/86"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/media?parent=3211"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}