{"id":108,"date":"2013-04-19T18:07:39","date_gmt":"2013-04-19T18:07:39","guid":{"rendered":"http:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/?page_id=108"},"modified":"2018-07-10T13:01:06","modified_gmt":"2018-07-10T13:01:06","slug":"nvidiagpu","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/software\/applications\/nvidiagpu\/","title":{"rendered":"Nvidia GPUs and CUDA"},"content":{"rendered":"<h2>Overview &#8211; current GPGPUs<\/h2>\n<p>June 2017: There are a total of 3 Nvidia GPGPUs in production in the CSF.<\/p>\n<p>We hope to purchase some more GPUs in late 2017\/early 2018 &#8211; please get in touch (<a href=\"m&#97;&#105;&#x6c;&#x74;&#x6f;:&#105;&#116;&#x73;&#x2d;&#x72;i-&#116;&#x65;&#x61;&#x6d;&#64;m&#97;&#110;&#x63;&#x68;&#x65;s&#116;&#101;&#x72;&#x2e;&#x61;c&#46;&#117;&#x6b;\">&#x69;t&#x73;-&#x72;&#105;&#x2d;&#116;e&#x61;m&#x40;&#109;&#x61;&#110;&#x63;&#104;e&#x73;t&#x65;&#114;&#x2e;&#97;&#x63;&#x2e;u&#x6b;<\/a>) if you would like to be involved in that procurement.<\/p>\n<ul>\n<li>Three K20s (two hosted in one 12 core compute node, one hosted in one 12 core compute node)<\/li>\n<\/ul>\n<h2>Retired GPGPUs<\/h2>\n<p>The NVida 2050s and 2070s have all been retired due to hardware faults.<\/p>\n<ul>\n<li><strike>Seven blade servers each hosting one Nvidia card, two of which are M2070 cards and five are M2050 cards.<\/strike><\/li>\n<li><strike>16 Nvidia M2050 GPUs, two hosted on each of eight Intel compute nodes.<\/strike><\/li>\n<li><strike>The eight M2050 hosts are connected by Infiniband, so are ideal for computational jobs based on <em>both<\/em> MPI and CUDA.<\/strike><\/li>\n<\/ul>\n<h2>Hardware and Software Versions<\/h2>\n<ul>\n<li>Driver: 384.81<\/li>\n<li>CUDA Driver 9.0 \/ Runtime 9.0<\/li>\n<li>CUDA toolkit 9.0.176 (earlier versions also available via modulefiles)<\/li>\n<li>CUDA Capability Major\/Minor version number: 3.5<\/li>\n<li>OpenCL Device 1.2 \/ OpenCL C 1.2<\/li>\n<\/ul>\n<h2>Restrictions on who can use these GPUs<\/h2>\n<p>Access to the GPGPUs is more restrictive than that for standard compute nodes.  <em>Please email <a href=\"&#105;&#116;&#x73;-&#114;&#x69;&#x2d;t&#101;&#x61;&#x6d;&#64;&#109;&#x61;n&#99;&#x68;&#x65;s&#116;&#x65;&#x72;&#46;&#97;&#x63;&#46;&#117;&#x6b;\">&#x69;&#x74;&#115;&#45;r&#x69;&#x2d;&#x74;&#101;&#97;m&#x40;&#x6d;&#x61;&#110;&#99;h&#x65;&#x73;&#x74;&#101;&#114;&#46;&#x61;&#x63;&#x2e;&#117;k<\/a> before attempting to use these resources<\/em> with brief details of what you wish to use them for.<\/p>\n<p>The K20s are usually only accessible by a specific group from MACE.<br \/>\n<a name=\"module\"><\/a><\/p>\n<h2>Set up procedure<\/h2>\n<p>Once you have emailed <a href=\"&#109;a&#105;l&#116;o&#x3a;i&#x74;s&#x2d;r&#x69;-&#x74;e&#x61;m&#x40;m&#x61;n&#x63;h&#x65;&#115;&#x74;&#101;&#x72;&#46;&#x61;&#99;&#x2e;&#117;&#x6b;\">&#x69;&#116;&#115;-&#x72;&#x69;&#45;t&#x65;&#x61;&#109;&#64;&#x6d;&#x61;&#110;c&#x68;&#x65;&#115;t&#x65;&#x72;&#46;a&#x63;&#x2e;&#117;&#107;<\/a> and been granted access, set up your environment by loading the appropriate module from the following:<\/p>\n<pre>\r\n# Load <strong>one<\/strong> of the following modulefiles:\r\nmodule load 9.0.176\r\nmodule load 8.0.44\r\nmodule load 7.5.18\r\nmodule load 6.5.14\r\n\r\n# These are very old versions\r\nmodule load 5.5.22\r\nmodule load 4.2.9\r\nmodule load 4.1.28\r\nmodule load 4.0.17\r\nmodule load 3.2.16\r\n<\/pre>\n<h2>Other Libraries<\/h2>\n<p>The Nvidia cuDNN libraries are also available via the following modulefiles. Before you load these modulefiles you <em>must<\/em> load one of the cuda modulefiles from above &#8211; the list below indicates which versions of cuda can be used with the different cuDNN versions:<\/p>\n<pre>\r\nmodule load libs\/cuDNN\/7.0.3         # Load cuda 8.0.44 or 9.0.176 first\r\nmodule load libs\/cuDNN\/6.0.21        # Load cuda 7.5.18 or 8.0.44 first\r\nmodule load libs\/cuDNN\/5.1.5         # Load cuda 7.5.18 or 8.0.44 first\r\n<\/pre>\n<h2>Compiling GPU Code<\/h2>\n<p>The following sections describe how to compile CUDA and OpenCL code on CSF.<\/p>\n<h3>CUDA<\/h3>\n<p>CUDA code can be <strong>compiled<\/strong> on the login node provided you are using the CUDA runtime library, and not the CUDA driver library. The runtime library is used when you allow CUDA to automatically set up the device. That is, your CUDA code uses the style where you assume CUDA will be set up on the first CUDA function call. For example:<\/p>\n<pre>\r\n#include &lt;cuda_runtime.h&gt;\r\n\r\nint main( void ) {\r\n\r\n   \/\/ We assume CUDA will set up the GPU device automatically\r\n   cudaMalloc( ... );\r\n   cudaMemcpy( ... );\r\n   myKernel<<<...>>>( ... );\r\n   cudaMemcpy( ... );\r\n   cudaFree( ... );\r\n   return 0;\r\n}\r\n<\/pre>\n<p>The CUDA driver library allows much more low-level control of the GPU device (and makes CUDA set up more like OpenCL). In that case you must compile on a <strong>GPU node<\/strong> because the CUDA driver library is only available on the backend GPU nodes. Driver code will contain something like the following:<\/p>\n<pre>\r\n#include &lt;cuda.h&gt;\r\n\r\nint main( void ) {\r\n\r\n  \/\/ Low-level device setup using the driver API\r\n  cuDeviceGetCount( ... );\r\n  cuDeviceGet( ... );\r\n  cuDeviceGetName( ... );\r\n  cuDeviceComputeCapability( ... );\r\n  ...\r\n\r\n  return 0;\r\n}\r\n<\/pre>\n<p>No matter where you compile your code you <strong>cannot run<\/strong> your CUDA code on the login node because it does not contain any GPUs (see the next section for running your code). <\/p>\n<p>The CUDA libraries and header files are available in the following directories once you have loaded the CUDA module:<\/p>\n<pre>\r\n# All nodes\r\n$CUDA_HOME\/lib64     # CUDA runtime library, CUBlas, CURand etc\r\n$CUDA_HOME\/include\r\n\r\n# On a GPU node only\r\n\/usr\/lib64           # CUDA driver library\r\n<\/pre>\n<p>It is beyond the scope of this page to give a tutorial on CUDA compilation (there are many possible flags for the nvcc compiler). The CUDA GPU Programming SDK available on CSF in <code>$CUDA_SDK<\/code> gives many examples of CUDA programs and how to compile them. However, a simple compile line to run on the command line would be as follows<\/p>\n<pre>nvcc -o myapp myapp.cu -I$CUDA_HOME\/include -L$CUDA_HOME\/lib64 -lcudart<\/pre>\n<p>To use the above line in a Makefile, enclose the variable names in brackets as follows<\/p>\n<pre>\r\n# Simple CUDA Makefile\r\nCC = nvcc\r\n\r\nall: myapp\r\n\r\nmyapp: myapp.cu\r\n        $(CC) -o myapp myapp.cu -I$(CUDA_HOME)\/include -L$(CUDA_HOME)\/lib64 -lcudart\r\n# note: the preceeding line must start with a TAB, not 8 spaces. 'make' requires a TAB!\r\n<\/pre>\n<p>The above to compilation methods use the CUDA runtime libary (libcudart) and so can be used to compile on the login node.<\/p>\n<h3>OpenCL<\/h3>\n<p>Please see <a href=\"\/csf-apps\/software\/applications\/opencl\">OpenCL programming on CSF<\/a> for compiling OpenCL code.<\/p>\n<h2>Running the application<\/h2>\n<p>All work on the Nvidia GPUs must be via the batch system.  There are two types of environments which can be used.  First, batch, for non-interactive computational work;  this should be used where possible.  Secondly, an interactive environment for debugging and other necessarily-interactive work.<\/p>\n<h2>Resource Limits<\/h2>\n<h3>K20 GPUs<\/h3>\n<p>Maximum job runtime is 14 days. Currently most users are restricted to one job running at any one time. This is due to the small number of GPUs available and the high demand for those GPUs.<\/p>\n<h2>Example Job Submission Scripts and Commands<\/h2>\n<p>As stated above, all jobs must be submitted to the batch system, whether for non-interactive (possibly long) computational runs or for short interactive runs. Jobs should be submitted to the batch system ensuring that the appropriate GPU resources are requested. Examples of jobscripts and commands to access the GPU resources are given below. In all cases ensure you have the appropriate module loaded (<a href=\"#module\">see above<\/a>).<\/p>\n<h3>Serial batch job submission to K20 GPUs<\/h3>\n<p>Ensure you have the appropriate CUDA module loaded (see above), then use the following jobscript (note the use of the <code>nvidia_ib<\/code> resource)<\/p>\n<pre>\r\n#!\/bin\/bash\r\n#$ -cwd\r\n#$ -V\r\n#$ -l nvidia_k20\r\n\r\n.\/my_gpu_prog arg1 arg2\r\n<\/pre>\n<p>Submit the job in the usual way<\/p>\n<pre>qsub gpujob.sh<\/pre>\n<h3>Interactive use of the K20 GPUs with X11<\/h3>\n<p>If you are familiar with the use of X11 (X-Windows), load the appropriate environment module, then enter<\/p>\n<pre>\r\nqrsh -cwd -V -l inter -l nvidia_k20 xterm\r\n<\/pre>\n<p>Within the xterm, for example<\/p>\n<pre>\r\n.\/my_gpu_prog\r\n<\/pre>\n<p><a name=\"cudasdk\"><\/a><\/p>\n<h2>CUDA and OpenCL SDK Examples (e.g., deviceQuery)<\/h2>\n<p>The CUDA SDK contains many example CUDA <strong>and<\/strong> OpenCL programs which can be compiled and run. A useful one is <code>deviceQuery<\/code> (and <code>oclDeviceQuery<\/code>) which gives you lots of information about the Nvidia GPU hardware. <\/p>\n<h3>Version 5.5.22 and later<\/h3>\n<p>In CUDA 5.5 and up there is no separate SDK installation directory. Instead the CUDA toolkit (which provides the <code>nvcc<\/code> compiler, profiler and numerical libraries) also contains a <em>Samples<\/em> directory. The examples have already been compiled but you may also take a copy of the samples so that you can modify them. You can access the samples by loading the CUDA modulefile and then going in to the directory:<\/p>\n<pre>\r\ncd $CUDA_SAMPLES\r\n<\/pre>\n<p>The compiled samples are available using<\/p>\n<pre>\r\ncd $CUDA_SAMPLES\/bin\/x86_64\/linux\/release\/\r\n<\/pre>\n<p>As always, running the samples on the login node won&#8217;t work &#8211; there&#8217;s no GPU there!<\/p>\n<h3>Version 4.2.9 and earlier<\/h3>\n<p>In CUDA 4.2.9 the CUDA SDK provides the sample files and is separate to the CUDA toolkit (which provides the <code>nvcc<\/code> compiler, profiler and numerical libraries). You&#8217;ll need to copy the entire SDK to your home (or scratch) area. Compile the SDK on a <strong>GPU node<\/strong>, not the login node because some of the examples use the CUDA driver library (e.g., see <code>$CUDA_SDK\/C\/src\/vectorAddDrv\/<\/code>) and OpenCL examples can only be compiled on a GPU node. For example:<\/p>\n<pre>\r\n# First start an interactive session on a GPU node\r\nqrsh -l inter -l nvidia\r\n\r\n# Once the interactive session starts:\r\nmodule load libs\/cuda\/4.2.9\r\nexport CUDA_INSTALL_PATH=$CUDA_HOME      # Needs adding to the modulefile?\r\nmkdir ~\/cuda-sdk\r\ncd ~\/cuda-sdk\r\ncp -r $CUDA_SDK .                        # notice the '.' at the end of this command!\r\ncd 4.2.9\r\nmake -k\r\n\r\n# Run one of the examples (deviceQuery) while still on the GPU node\r\n.\/C\/bin\/linux\/release\/deviceQuery\r\n.\/OpenCL\/bin\/linux\/release\/oclDeviceQuery\r\n\r\n# End your interactive session\r\nexit\r\n\r\n# You are now back on the login node\r\n<\/pre>\n<p>The CUDA and OpenCL example programs are just like any other GPU code so please see the instructions earlier on running code either in batch or interactively on a GPU node.<\/p>\n<h2>Further info<\/h2>\n<p>Applications and compilers which can use the Nvidia GPUs are being installed on the CSF.  Links to the appropriate documentation will be provided here and will include:<\/p>\n<ul>\n<li><a href=\"\/csf-apps\/software\/applications\/opencl\">OpenCL programming on CSF<\/a><\/li>\n<li><a href=\"\/csf-apps\/software\/applications\/pgi\">PGI Accelerator compilers on CSF<\/a><\/li>\n<li><a href=\"https:\/\/developer.nvidia.com\/cuda-toolkit\">Nvidia&#8217;s CUDA toolkit pages<\/a><\/li>\n<li><a href=\"http:\/\/www.nvidia.co.uk\/cuda\">Nvidia&#8217;s CUDA pages<\/a><\/li>\n<li><a href=\"GPU \">University GPU Club<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Overview &#8211; current GPGPUs June 2017: There are a total of 3 Nvidia GPGPUs in production in the CSF. We hope to purchase some more GPUs in late 2017\/early 2018 &#8211; please get in touch (i&#116;&#x73;&#x2d;&#x72;i&#45;&#x74;&#x65;&#x61;m&#64;&#x6d;&#x61;&#x6e;c&#104;&#x65;&#x73;&#x74;e&#114;&#x2e;&#x61;&#x63;&#46;&#117;&#x6b;) if you would like to be involved in that procurement. Three K20s (two hosted in one 12 core compute node, one hosted in one 12 core compute node) Retired GPGPUs The NVida 2050s and 2070s have all been retired.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/software\/applications\/nvidiagpu\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"parent":31,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-108","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/108","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/comments?post=108"}],"version-history":[{"count":20,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/108\/revisions"}],"predecessor-version":[{"id":4798,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/108\/revisions\/4798"}],"up":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/31"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/media?parent=108"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}