{"id":91,"date":"2013-04-19T16:15:42","date_gmt":"2013-04-19T16:15:42","guid":{"rendered":"http:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/?page_id=91"},"modified":"2014-11-28T12:00:04","modified_gmt":"2014-11-28T12:00:04","slug":"compilersamd","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/software\/applications\/compilersamd\/","title":{"rendered":"CompilersAMD"},"content":{"rendered":"<p>This page describes how best to compile and run jobs on the <strong>AMD Bulldozer<\/strong> architecture compute nodes on the CSF, i.e, how to get the best performance out of these nodes.<\/p>\n<p>If compiling for the older AMD Magny-Cours architecture then the Intel Compilers support that architecture perfectly well.<\/p>\n<h2>Overview<\/h2>\n<ul class=\"gaplist\">\n<li>The CSF AMD Bulldozer nodes each have 64 CPU cores, with 2 GB RAM per core; all nodes are connected via Infiniband.<\/li>\n<li>Intel compilers do not fully support this architecture.<\/li>\n<li>AMD recommend the use of the AMD Open64 compiler with the <a href=\"\/csf-apps\/software\/applications\/acml\">AMD Core Mathematics Library (ACML)<\/a> for maximum performance. The ACML is an implementation of BLAS and LAPACK optimised especially for AMD processors. The library contains other routines too, for example FFT. See the above link for more information on using ACML on the CSF.<\/li>\n<li>Compilation and linking of binaries for these nodes should be performed on a dedicated Bulldozer node by using <code>qrsh<\/code> as descibed below.<\/li>\n<li>Jobs size must be a multiple of 64.<\/li>\n<li>The maximum runtime for a job is 4 days.<\/li>\n<li>Binaries compiled for the AMD Bulldozer compute nodes will not run on other nodes. Attempting to run such a binary on other nodes, for example the Intel nodes, will yield a warning <code>Illegal instruction<\/code> and the programme will not run.<\/li>\n<\/ul>\n<p>Installed compiler versions:<\/p>\n<ul>\n<li>4.5.2.1<\/li>\n<li>4.5.2<\/li>\n<\/ul>\n<h2>Restrictions on Use<\/h2>\n<p>None.<\/p>\n<h2>AMD Bulldozer CPU architecture<\/h2>\n<p>This compiler is installed on the CSF as it is recommended by AMD to get the best performance out of the new AMD Bulldozer CPU architecture.  (The Intel compilers do not fully support the Bulldozer architecture.)  <\/p>\n<p>The CSF Bulldozer compute nodes are accessible via the following parallel environments:<\/p>\n<ul>\n<li><code>smp-64bd.pe<\/code> parallel environment (for single-node OpenMP jobs)<\/li>\n<li><code>orte-64bd-ib.pe<\/code> parallel environment (for multi-node MPI jobs).<\/li>\n<\/ul>\n<h2>Set up procedure<\/h2>\n<p>To access the compilers load the appropriate module:<\/p>\n<pre>\r\nmodule load compilers\/amd\/4.5.2.1\r\n\r\n# Or the slightly older version\r\nmodule load compilers\/amd\/4.5.2\r\n<\/pre>\n<h2>Using the compiler<\/h2>\n<h3>Logging onto a Bulldozer host<\/h3>\n<p>The principle reason for using the Open64 compiler on the CSF is to compile code in such a way that it is optimised for the Bulldozer compute nodes.  To do this, you should first login to one of those nodes:<\/p>\n<pre>\r\nqrsh -l bulldozer -l short\r\n   #\r\n   # -l short is now required to access the bulldozer nodes.\r\n   # This gives a maximum of 2 hours for compilation and short tests.\r\n   #\r\n<\/pre>\n<p>Please note<\/p>\n<ol>\n<li>Attempting to execute programmes compiled for the Bulldozer architecture on compute nodes with other CPU architectures, for example the Intel nodes, will result in an error:  <code>Illegal instruction<\/code>.<\/li>\n<li>The opencc and openf90 compilers will cross compile &#8211; i.e., you can use <code>opencc -march=bdver1<\/code> on the CSF login node to produce an executable to be run on the bulldozer nodes. But you can&#8217;t run the executable on the login node. Hence logging in to the bulldozer node, as described above, will be a better way of working.<\/li>\n<li>The MPI wrappers (mpicc, mpif90) will <em>not<\/em> cross compile &#8211; you must use these on a bulldozer node.<\/li>\n<\/ol>\n<h3>Example fortran compilation<\/h3>\n<pre>\r\nqrsh -l bulldozer -l short\r\nmodule load compilers\/amd\/4.5.2.1\r\n\r\nopenf90 -march=bdver1 hello.f90 -o hellof90\r\n    # \r\n    # ...generates a binary called \"hellof90\" from the source file \"hello.f90\"...\r\n<\/pre>\n<h3>Example C\/C++ compilation<\/h3>\n<pre>\r\nqrsh -l bulldozer -l short\r\nmodule load compilers\/amd\/4.5.2.1\r\n\r\n opencc -march=bdver1 hello.c -o helloc\r\n     #\r\n     # ...generates a binary called \"helloc\" from the source file \"hello.c\"...\r\n<\/pre>\n<p>For C++ compilation use the <code>openCC<\/code> command.<\/p>\n<h3>Serial job submission<\/h3>\n<p><strong>Please note:<\/strong> this is for short testing of your code to ensure it has compiled correctly. Do not run computational work in serial. To submit a serial (single-core) batch job to SGE:<\/p>\n<ul>\n<li>Ensure you are on the login node, not the Bulldozer architecture compute node used for compilation.<\/li>\n<li>Make sure you have the compiler environment module loaded (see above).<\/li>\n<li>Create a submission script similar to this\n<pre>\r\n#!\/bin\/bash\r\n#$ -S bash\r\n#$ -cwd\r\n#$ -V\r\n#$ -l bulldozer -l short\r\n    # ...both required:  tells SGE to select a Bulldozer compute node for max 2 hours...\r\n\r\n.\/helloc.exe\r\n    #\r\n    # ...or hellof90.exe (for example)...\r\n<\/pre>\n<\/li>\n<li>Submit the script:\n<pre>\r\nqsub open64.qsub\r\n<\/pre>\n<\/li>\n<\/ul>\n<h3>Parallel job submission<\/h3>\n<p>Your code, and thus the resulting executable, will usually use either OpenMP and\/or MPI in order to run in parallel. Please follow these links to find out how to compile code and submit to SGE batch jobs of these types:<\/p>\n<ul>\n<li><a href=\"\/csf-apps\/software\/applications\/openmp\">CSF OpenMP information<\/a><\/li>\n<li><a href=\"\/csf-apps\/software\/applications\/compilersamd\/openmpibd\/\">CSF MPI on Bulldozer information<\/a><\/li>\n<\/ul>\n<h2>Further information and help<\/h2>\n<p>See also<\/p>\n<ul>\n<li><a href=\"http:\/\/developer.amd.com\/wordpress\/media\/2012\/10\/51803A_OpteronLinuxTuningGuide_SCREEN.pdf\">Opteron Linux Tuning Guide<\/a> (The Bulldozer node use Opteron 6276 Processors).<\/li>\n<li><a href=\"http:\/\/developer.amd.com\/wordpress\/media\/2012\/10\/CompilerOptQuickRef-62004200.pdf\">Compiler Opt Quick Reference<\/a><\/li>\n<\/ul>\n<h2>Updates<\/h2>\n<p>The <code>-l short<\/code> resource flag is now required to access the bulldozer nodes for compilation and testing.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This page describes how best to compile and run jobs on the AMD Bulldozer architecture compute nodes on the CSF, i.e, how to get the best performance out of these nodes. If compiling for the older AMD Magny-Cours architecture then the Intel Compilers support that architecture perfectly well. Overview The CSF AMD Bulldozer nodes each have 64 CPU cores, with 2 GB RAM per core; all nodes are connected via Infiniband. Intel compilers do not.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/software\/applications\/compilersamd\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"parent":31,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-91","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/91","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/comments?post=91"}],"version-history":[{"count":12,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/91\/revisions"}],"predecessor-version":[{"id":2059,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/91\/revisions\/2059"}],"up":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/31"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/media?parent=91"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}