The CSF2 has been replaced by the CSF3 - please use that system! This documentation may be out of date. Please read the CSF3 documentation instead. To display this old CSF2 page click here. |
Intel Compilers
Overview
The Intel Compilers are highly optimising compilers suitable for both Intel processors (Westmere, Sandybridge) and AMD processors (mainly Magny-Cours – see open64 compiler for Bulldozer). They include features such as OpenMP and automatic parallelization which allows programmers to easily take advantage of multi-core processors.
Installed versions – we recommend using the latest version when compiling software for the first time.
- Fortran (ifort)
- 11.1.064 with MKL 10.2 Update 3
- 12.0.5 with MKL 10.3 Update 5 (also known as Intel Fortran Composer XE 2011 update 5 or 2011.5.220)
- 14.0.3 with MKL 11.1 update 3 (also known as Intel Fortran Composer XE 2013 update 3 or 2013.3.174)
- 15.0.3 with MKL 11.2 update 3 (also known as Intel Fortran Composer XE 2015 update 3)
- 17.0.2 with MKL 2017.0 update 2 (also known as Intel Fortran Composer XE 2017 update 2)
- C/C++ (icc/icpc)
- 11.1.075 with MKL 10.2 Update 7
- 12.0.5 with MKL 10.3 Update 5 (also known as Intel C++ Composer XE 2011 update 5 or 2011.5.220)
- 14.0.3 with MKL 11.1 update 3 (also known as Intel C++ Composer XE 2013 update 3 or 2013.3.174)
- 15.0.3 with MKL 11.2 update 3 (also known as Intel C++ Composer XE 2015 update 3)
- 17.0.2 with MKL 2017.0 update 2 (also known as Intel C++ Composer XE 2017 update 2)
Programming in Fortran or C is currently beyond the scope of this webpage. IT Services for Research run training courses which may be of interest.
Restrictions on use
The CSF uses the University license. The Intel compilers are available to any student or member of staff of the University of Manchester for the purpose of the normal business of the University. Such use includes research, personal development and administration and management of the university. The software may not be used for research, consultancy or services leading to commercial exploitation of the software.
Code may be compiled on the login node, but aside from very short test runs (e.g., one minute on fewer than 4 cores), executables must always be run by submitting to the batch system, SGE.
Set up procedure
You will need to load these modulefiles to compile your code and whenever you want to run the executables you compiled. To access the compilers load the appropriate module. We recommend using the latest version if compiling new code.
Note, as of 12.0.5 you only need to load one modulefile (either fortran or c) – these are historically separate modulefiles but loading one (fortran or c) will give you access to the fortran and C/C++ compilers.
- Fortran:
module load compilers/intel/fortran/17.0.2 module load compilers/intel/fortran/15.0.3 module load compilers/intel/fortran/14.0.3 module load compilers/intel/fortran/12.0.5 module load compilers/intel/fortran/11.1.064
- C/C++:
module load compilers/intel/c/17.0.2 module load compilers/intel/c/15.0.3 module load compilers/intel/c/14.0.3 module load compilers/intel/c/12.0.5 module load compilers/intel/c/11.1.064
Note that for version 12.0.5, 14.0.3, 15.0.3 and 17.0.2, loading either Fortran or C modulefiles will make both ifort
and icc
/icpc
available so you really only need to load one of the modulefiles.
The Math Kernel Library (MKL) contains optimised routines (eg BLAS) that you are encouraged to use for performance gains if not using OpenBLAS (which provides equivalent level 3 BLAS performance on the CSF hardware). Information about MKL.
Running the application – Compiling
We give example firstly of how to compile and then how to run the compiled executable.
Example fortran compilation
Make sure you have loaded the modulefile first (see above).
ifort hello.f90 -o hello # # ...generates a binary executable "hello" from source code # file "hello.f90"... # # ...which can then be run sequn
Example C compilation
Make sure you have loaded the modulefile first (see above).
icc hello.c -o hello # # ...generates a binary executable "hello" from source code # file "hello.c"... #
Example C++ compilation
Make sure you have loaded the modulefile first (see above).
icpc hello.cpp -o hello # # ...generates a binary executable "hello" from source code # file "hello.cpp"... #
Note that is it perfectly acceptable to run the compiler as a batch job. This is recommended if you have a large compilation to perform – one that has a lot of source files and libraries to compile.
Running your Compiled Code
Once you have compiled your application you will run that as a batch job as normal on the CSF. For example, your compiled application will do some data processing.
You must ensure the same compiler modulefile used to compile your application is loaded when you run your application as a batch job. This is because there are system libraries (and other libraries such as the MKL) that are specific to the compiler version. If you compile your code with Intel compiler 17.0.2, say, then you must run your application with that compiler’s modulefile loaded.
Serial Job submission
To submit a single core batch job to SGE:
- Make sure that you have the same fortran or C/C++ module loaded that was used to compile the application.
- An example submission script:
#!/bin/bash #$ -cwd #$ -V ./hello
- To submit a serial job you do not need to include any ‘-pe’ information. The simplest example is therefore:
qsub jobscript
where jobscript is replaced with the name of your submission script.
Parallel Job submission
Your code, and thus the resulting executable, must use either OpenMP (for single-node multicore jobs) and/or MPI (for multi-node multicore jobs) in order to run in parallel. Please follow these links to find out how to submit batch jobs of these types to SGE:
Useful Compiler Flags
Please see the relevant man
pages. Note that ifort
, icc
and icpc
generally have the same set of flags but will differ for language-specific features.
We suggest a few commonly used flags below but this is by no means an exhaustive list. Please see the man pages.
Optimising and Parallelizing your Code
-O0
,-O1
,-O2
,-O3
,-fast
(see fortran note): this give you zero (ie no) optimisation (for testing numerics of the code) through to maximum optimisation (which may be processor specific).-vec
(which is on by default) vectorizes your code where possible.-openmp
,-parallel
: for producing hand-coded OpenMP or automatically threaded executables, respectively, to run on smp.pe. Note that-parallel
requires-O2
or-O3
and that using-O3
will enable-opt-matmul
. Please consult the man page for full details.-opt-report
,-vec-report
,-par-report
: informs the compiler to produce a report detailing what’s been optimised, vectorized or parallelised. Can increase verbosity and detail by adding optional=N
where N is0
,1
,2
… (0
to turn off – see man page for range of possible values and what they report). E.g.:-par-report=2
.
Debugging your Code
Compile your code for use with a debugger.
-g
,-traceback
,-C
: flags for including debug symbols, producing a stacktrace upon runtime errors and for checking array bounds
Profiling your Code
Profiling of loops and function calls can be enabled using the following flags (for both icc and ifort)
-profile-functions -profile-loops=all -profile-loops-report=2
For example, to generate a profiling report of prog.exe
and display the results in a graphical tool supplied with the Intel compiler, use the following example
# Ensure you have loaded the required compiler modulefile. Then: # Compile with profiling options. icc -profile-functions -profile-loops=all -profile-loops-report=2 prog.c -o prog.exe # # Or use the 'ifort' fortran compiler # You may also want to add -fno-inline-functions if the profiler # appears not to report calls to some of your functions. # Alternatively add -O0 (which also disables inlining as part of # the optimizations). However, the loop profiler will not work # with -O0. # # Copy executable to scratch and run from there cp prog.exe ~/scratch cd ~/scratch # Run your prog.exe code. In this case it is a serial code hence no PE needed. qsub -l short -cwd -V -b y ./prog.exe # Remove -l short if prog.exe runs for more than 1 hour # You'll now have a .xml file (and some .dump files) generated during profiling. # View the xml file in the Intel Java GUI tool. loopprofileviewer.sh loop_prof_1350468736.xml # The number will be different # # Note: if you get an error from Java, try running the tool again. #
You should see something like:
Fortran Static Linking (-lm error)
Compiling FORTRAN code with -static
or -fast
(which enables -static
) may produce an error:
ld: cannot find -lm
This is because linux is moving away from static linking and so the static version of the maths library libm.a
is installed in a path not searched by default. Use the following compile line to add the path explicitly:
ifort -fast mycode.f90 -o mycode.exe -L/usr/lib/x86_64-redhat-linux5E/lib64/
Sandybridge, Ivybridge, Haswell, Broadwell
A significant number of nodes in the CSF use either the Intel Sandybridge architecture or the Intel Ivybridge architecture, both of which supports the AVX instruction set. This allows your code to make use of the 256-bit wide floating-point vector instructions supported by the Intel hardware. The Intel Compiler can auto-vectorise your code in certain circumstances to take advantage of these architectures.
The Haswell and Broadwell nodes support AVX2 vector instructions. These promote the 128-bit wide integer vector instructions found in AVX hardware to 256-bit wide integer vector instructions.
While a complete discussion of this is beyond the scope of this webpage (and users are encouraged to read the Intel AVX docs), the following compiler flags will allow you to compile code optimized for the Sandy/Ivy Bridge and Haswell architectures:
-xAVX
the compiler may generate processor-specific code for AVX instruction set (Sandybridge / Ivybridge / Haswell / Broadwell architectures). This code will only run on Sandybridge, Ivybridge, Haswell or Broadwell nodes. Hence you must add-l sandybridge
or-l ivybridge
or-l haswell
or-l broadwell
to your jobscript to run this code (without one of these flags the job could land on an Intel Westmere node which cannot run code compiled with-xAVX
.-axAVX
the compiler will generate multiple, processor-specific auto-dispatch code paths if there is a benefit. This code will execute on Sandybridge, Ivybridge, Haswell, Broadwell nodes and other nodes (Westmere, some AMD nodes). The executable may contain multiple versions of sections of code, one of which will be executed depending on whether running on a Sandybridge, Ivybridge, Haswell, Broadwell node or not. The baseline path is determined by the-x
flag if present (which sets a baseline causing the executable to work only on Intel architectures) or-m
flag (which sets a baseline that runs on Intel and AMD architectures) or a default architecture and optimization level if neither-x
or-m
are specified. The default baseline settings are usually very conservative – they ensure code runs on very old Intel and AMD architectures and so the performance of the compiled code is not as optimal as it could be. For example, the following flags will set a good baseline (for Intel and AMD nodes in the CSF) and also optimize for newer Intel Sandybridge and Ivybridge nodes in the CSF:-msse2 -axAVX
(actually, -msse2 is the default and does not need to be specified). Another option would be to compile with an sse2 baseline and include optional sse4.2 and AVX code paths for optimized execution on Intel hardware (SSE4.2 is supported by our Intel Westmere nodes and is faster than SSE2.) Hence we could also use:-msse2 -axSSE4.2,AVX
- For AVX2 hardware (Haswell and Broadwell ndoes) add
CORE-AVX2
to the-ax
or-x
flags. - To compile your code with multiple execution paths to support all possible CSF architectures (AVX2 on Haswell and Broadwell, AVX on Sandybridge and Ivybridge, SSE4.2 on Westmere and a baseline SSE2 that will also work on AMD hardware) use:
-msse2 -axSSE4.2,AVX,CORE-AVX2
Note that the
-msse2
must be lower-case.The compiler will report which functions have been compiled for both AVX and non-AVX architectures with a message such as
../my_solver.c(139): (col. 1) remark: solver_func() has been targeted for automatic cpu dispatch.
The combinations of -x
, -ax
and -m
flags are somewhat complicated and so you are encouraged to read the Intel Compiler manual page (man icc
or man ifort
).
Further info and help
- ITS Applications Intel Fortran Compiler webpage
- Applications Intel C/C++ Compiler webpage]]
- Each compiler has an online manual available from the command line:
man ifort # for the fortran compiler man icc # for the C/C++ compiler
ifort --help
andicc --help
give similar information- Training Courses run by IT Services for Research.
- If you require advice on programming matters, for example how to debug or profile a code, or how to use maths libraries, please email its-ri-team@manchester.ac.uk