{"id":105,"date":"2013-04-19T18:02:56","date_gmt":"2013-04-19T18:02:56","guid":{"rendered":"http:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/?page_id=105"},"modified":"2016-04-18T14:17:24","modified_gmt":"2016-04-18T14:17:24","slug":"pgi","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/software\/applications\/pgi\/","title":{"rendered":"PGI"},"content":{"rendered":"<h2>Overview<\/h2>\n<p>The Portland Compiler Suite contains Fortran 95, Fortran 90, Fortran 77, C, C++, HPF and CUDA Accelerator compilers for machines running 64bit Linux.<\/p>\n<p>Version 13.6 is the most up to date installed on the CSF<\/p>\n<p>Note: the PGI CUDA Accelerator supports OpenACC compiler directives. This is available via the PGI C compiler (<code>pgcc<\/code>) from version 12.10 and the PGI C++ and fortran compilers (<code>pgcpp<\/code> and <code>pgf90<\/code>) from version 13.6.<\/p>\n<p>The PGI compiler can be used to compile for Intel (Westmere, Sandybridge), AMD (Magny-Cours, Bulldozer) and Nvidia CUDA GPU architectures.<\/p>\n<h2>Restrictions on use<\/h2>\n<p>There are only two network licenses available site wide for each of Fortran and C\/C++.  If you get a license related error, it is almost certainly because all of our licenses are in use and you should try again at a later time.  For further licensing information see the following webpage:<\/p>\n<ul>\n<li><a href=\"http:\/\/www.applications.itservices.manchester.ac.uk\/show_product.php?id=321\">http:\/\/www.applications.itservices.manchester.ac.uk\/show_product.php?id=321&#038;tab=licensing<\/a><\/li>\n<\/ul>\n<h2>Set up procedure<\/h2>\n<p>To gain access to these compilers, run <strong>one<\/strong> of the following command after logging into the login node.<\/p>\n<pre>\r\nmodule load compilers\/PGI\/13.6\r\nmodule load compilers\/PGI\/12.10\r\nmodule load compilers\/PGI\/12.8\r\n<\/pre>\n<p>If your code is using the AMD Core Math Libraries (ACML) for routines such as BLAS, LAPACK, FFT and you are compiling for AMD Bulldozer nodes then a version of the compiler containing an optmized ACML utilizing the Fused Multiply Add instruction in Bulldozer (FMA4) is available by loading<\/p>\n<pre>\r\nmodule load compilers\/PGI\/14.10-acml-fma4\r\nmodule load compilers\/PGI\/13.6-acml-fma4\r\n<\/pre>\n<h2>Basic compilation commands<\/h2>\n<p>Basic compilation commands are as follows:<\/p>\n<ul>\n<li>C<\/li>\n<\/ul>\n<pre>\r\npgcc hello.c -o hello\r\n<\/pre>\n<ul>\n<li>C++<\/li>\n<\/ul>\n<pre>\r\npgCC hello.cpp -o hello\r\npgcpp hello.cpp -o hello\r\n<\/pre>\n<p>Note: Use <code>pgc++<\/code> if link compatibility with the GNU C++ compiler is required (see <code>man pgc++<\/code>)<\/p>\n<ul>\n<li>Fortran<\/li>\n<\/ul>\n<pre>\r\npgf90 hello.f90 -o hello\r\npgf95 hello.f90 -o hello\r\npgfortran hello.f90 -o hello\r\n<\/pre>\n<ul>\n<li>High Performance Fortran<\/li>\n<\/ul>\n<pre>\r\npghpf hello.hpf -o hello\r\n<\/pre>\n<h3>Target Architectures<\/h3>\n<p>To compile specifically for architectures such as Intel Sandybridge or AMD Bulldozer use the <code>-tp=<\/code><em>architecture<\/em> flag. For example:<\/p>\n<pre>\r\npgcc -tp=sandybridge hello.c -o hello_sb\r\npgcc -tp=bulldozer hello.c -o hello_bd\r\n<\/pre>\n<p>Further optimization flags may be appropriate for different architectures. For example, on AMD Bulldozer it is recommended to use the following:<\/p>\n<pre>\r\npgcc -tp=bulldozer -O3 -fast \r\n<\/pre>\n<p>See <a href=\"#nvidia\">below<\/a> for compiling for Nvidia GPUs (which uses the <code>-ta=<\/code><em>accelerator<\/em> flag to specify <code>nvidia<\/code> accelerators).<\/p>\n<h2>Sample code<\/h2>\n<p>Sample code can be found in three subfolders of <code>$PGI\/linux86-64\/$PGIVER\/etc\/samples<\/code><\/p>\n<ul>\n<li><code>\/accel<\/code><\/li>\n<li><code>\/cudafor<\/code><\/li>\n<li><code>\/openacc<\/code><\/li>\n<\/ul>\n<p>Each subfolder contains makefiles to compile the sample code. If you copy these folders and see an errors like <code>\/usr\/bin\/ld: cannot open output file c1.exe: Permission denied<\/code> when compiling ensure you have the necessary permissions to write files to your folders.<\/p>\n<p><a name=\"nvidia\"><\/a><\/p>\n<h2>Compiling and submitting a test CUDA accelerator application in C<\/h2>\n<p>Note, this method uses PGI-specific directives, not OpenACC directives. We recommend using OpenACC for portability (<a href=\"#openacc\">see below<\/a>).<\/p>\n<p>It is not necessary to be logged into a node hosting a GPU in order to compile CUDA executables.  Copy one of the samples to your current working directory:<\/p>\n<pre>\r\ncp $PGI\/linux86-64\/$PGIVER\/etc\/samples\/accel\/c1.c .\r\n<\/pre>\n<p>Compile it for NVIDIA CUDA either using the makefile provided in <code>\/accel<\/code> or use the following command:<\/p>\n<pre>\r\npgcc -ta=nvidia -Minfo=accel -fast c1.c -o c1.exe\r\n<\/pre>\n<p>The following SGE script, <code>pgi.sge<\/code>, is suitable for submitting the above executable:<\/p>\n<pre>\r\n#!\/bin\/bash\r\n#$-cwd\r\n#$-S \/bin\/bash\r\n#$-l nvidia\r\n.\/c1.exe > out.txt\r\n<\/pre>\n<p>Submit this as a batch job using:<\/p>\n<pre>\r\nqsub pgi.sge\r\n<\/pre>\n<p>If successful, the output file <strong>out.txt<\/strong> will contain the following:<\/p>\n<pre>\r\n100000 iterations completed\r\n<\/pre>\n<h3>Source code for an example application<\/h3>\n<p>Here is the source code for the example referred to above:<\/p>\n<pre>\r\n#include &lt;stdio.h&gt;\r\n#include &lt;stdlib.h&gt;\r\n#include &lt;assert.h&gt;\r\n\r\nint main( int argc, char* argv[] )\r\n{\r\n    int n;      \/* size of the vector *\/\r\n    float *restrict a;  \/* the vector *\/\r\n    float *restrict r;  \/* the results *\/\r\n    float *restrict e;  \/* expected results *\/\r\n    int i;\r\n    if( argc > 1 )\r\n        n = atoi( argv[1] );\r\n    else\r\n        n = 100000;\r\n    if( n <= 0 ) n = 100000;\r\n\r\n    a = (float*)malloc(n*sizeof(float));\r\n    r = (float*)malloc(n*sizeof(float));\r\n    e = (float*)malloc(n*sizeof(float));\r\n    for( i = 0; i < n; ++i ) a[i] = (float)(i+1);\r\n\r\n    #pragma acc region\r\n    {\r\n        for( i = 0; i < n; ++i ) r[i] = a[i]*2.0f;\r\n    }\r\n    \/* compute on the host to compare *\/\r\n        for( i = 0; i < n; ++i ) e[i] = a[i]*2.0f;\r\n    \/* check the results *\/\r\n    for( i = 0; i < n; ++i )\r\n        assert( r[i] == e[i] );\r\n    printf( \"%d iterations completed\\n\", n );\r\n    return 0;\r\n}\r\n<\/pre>\n<p>Note that all that is required in order to run a loop on the GPU is the line:<\/p>\n<pre>\r\n#pragma acc region\r\n<\/pre>\n<p><a name=\"openacc\"><\/a><\/p>\n<h2>Compiling and submitting OpenACC C code<\/h2>\n<p>Copy one of the samples to your current working directory:<\/p>\n<pre>\r\ncp $PGI\/linux86-64\/$PGIVER\/etc\/samples\/openacc\/acc_c1.c .\r\n<\/pre>\n<p>Compile it either using the makefile provided in <code>\/openacc<\/code> or use the following command:<\/p>\n<pre>\r\npgcc -o acc_c1.exe acc_c1.c <strong>-acc<\/strong> -Minfo=accel -fast             # notice the additional -acc flag\r\n<\/pre>\n<p>The code in this example is very similar to that given above. However, the OpenACC <em>pragma<\/em> is slightly different as it introduces the <em>kernel<\/em> keyword. You should examine carefully the <code>acc_c1.c<\/code> file.<\/p>\n<pre>\r\n#pragma acc kernels loop\r\n<\/pre>\n<p>The following SGE script, <code>pgi.sge<\/code>, is suitable for submitting the above executable:<\/p>\n<pre>\r\n#!\/bin\/bash\r\n#$-cwd\r\n#$-S \/bin\/bash\r\n#$-l nvidia\r\n.\/acc_c1.exe > out.txt\r\n<\/pre>\n<p>Submit this as a batch job using:<\/p>\n<pre>\r\nqsub pgi.sge\r\n<\/pre>\n<p>If successful, the output file <strong>out.txt<\/strong> will contain the following:<\/p>\n<pre>\r\n100000 iterations completed\r\n<\/pre>\n<h2>Further information<\/h2>\n<p>Once you have loaded the PGI module (see <strong>set up procedure<\/strong>), basic documentation is available as a manpage:<\/p>\n<pre>\r\nman pgcc\r\n     #\r\n     # replace pgcc with the names of the other compilers as required\r\n     #\r\n<\/pre>\n<p>Documentation in pdf files is available in directory $PGI\/linux86-64\/$PGIVER\/doc and can be viewed using evince, e.g. <\/p>\n<pre>\r\nevince $PGI\/linux86-64\/$PGIVER\/doc\/pgi12ug.pdf\r\n<\/pre>\n<p>or<\/p>\n<pre>\r\nevince $PGI\/linux86-64\/$PGIVER\/doc\/pgi12ref.pdf\r\n<\/pre>\n<p>The following links may be useful:<\/p>\n<ul>\n<li><a href=\"\/csf-apps\/software\/applications\/nvidiagpu\">Overview of GPUs on the CSF<\/a><\/li>\n<li><a href=\"http:\/\/wiki.rcs.manchester.ac.uk\/community\/GPU\/programming\/directives \">GPU Programming Using Directives<\/a> (University GPU Club)<\/li>\n<li><a href=\"http:\/\/www.pgroup.com\/resources\/accel.htm\">An overview of the PGI Accelerator model<\/a> (Portland Group)<\/li>\n<li><a href=\"http:\/\/www.applications.itservices.manchester.ac.uk\/show_product.php?id=321\">http:\/\/www.applications.itservices.manchester.ac.uk\/show_product.php?id=321\"<\/a> - Extra information about the PGI compiler from IT Services including how to obtain it for installation on your own machine.<\/li>\n<li><a href=\"http:\/\/www.pgroup.com\/userforum\/viewforum.php?f=12\">PGI Accelerator Programming Forum<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Overview The Portland Compiler Suite contains Fortran 95, Fortran 90, Fortran 77, C, C++, HPF and CUDA Accelerator compilers for machines running 64bit Linux. Version 13.6 is the most up to date installed on the CSF Note: the PGI CUDA Accelerator supports OpenACC compiler directives. This is available via the PGI C compiler (pgcc) from version 12.10 and the PGI C++ and fortran compilers (pgcpp and pgf90) from version 13.6. The PGI compiler can be.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/software\/applications\/pgi\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"parent":31,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-105","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/105","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/comments?post=105"}],"version-history":[{"count":13,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/105\/revisions"}],"predecessor-version":[{"id":2947,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/105\/revisions\/2947"}],"up":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/31"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/media?parent=105"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}