The CSF2 has been replaced by the CSF3 - please use that system! This documentation may be out of date. Please read the CSF3 documentation instead. To display this old CSF2 page click here. |
AMD Compilers on the CSF
This page describes how best to compile and run jobs on the AMD Bulldozer architecture compute nodes on the CSF, i.e, how to get the best performance out of these nodes.
If compiling for the older AMD Magny-Cours architecture then the Intel Compilers support that architecture perfectly well.
Overview
- The CSF AMD Bulldozer nodes each have 64 CPU cores, with 2 GB RAM per core; all nodes are connected via Infiniband.
- Intel compilers do not fully support this architecture.
- AMD recommend the use of the AMD Open64 compiler with the AMD Core Mathematics Library (ACML) for maximum performance. The ACML is an implementation of BLAS and LAPACK optimised especially for AMD processors. The library contains other routines too, for example FFT. See the above link for more information on using ACML on the CSF.
- Compilation and linking of binaries for these nodes should be performed on a dedicated Bulldozer node by using
qrsh
as descibed below. - Jobs size must be a multiple of 64.
- The maximum runtime for a job is 4 days.
- Binaries compiled for the AMD Bulldozer compute nodes will not run on other nodes. Attempting to run such a binary on other nodes, for example the Intel nodes, will yield a warning
Illegal instruction
and the programme will not run.
Installed compiler versions:
- 4.5.2.1
- 4.5.2
Restrictions on Use
None.
AMD Bulldozer CPU architecture
This compiler is installed on the CSF as it is recommended by AMD to get the best performance out of the new AMD Bulldozer CPU architecture. (The Intel compilers do not fully support the Bulldozer architecture.)
The CSF Bulldozer compute nodes are accessible via the following parallel environments:
smp-64bd.pe
parallel environment (for single-node OpenMP jobs)orte-64bd-ib.pe
parallel environment (for multi-node MPI jobs).
Set up procedure
To access the compilers load the appropriate module:
module load compilers/amd/4.5.2.1 # Or the slightly older version module load compilers/amd/4.5.2
Using the compiler
Logging onto a Bulldozer host
The principle reason for using the Open64 compiler on the CSF is to compile code in such a way that it is optimised for the Bulldozer compute nodes. To do this, you should first login to one of those nodes:
qrsh -l bulldozer -l short # # -l short is now required to access the bulldozer nodes. # This gives a maximum of 2 hours for compilation and short tests. #
Please note
- Attempting to execute programmes compiled for the Bulldozer architecture on compute nodes with other CPU architectures, for example the Intel nodes, will result in an error:
Illegal instruction
. - The opencc and openf90 compilers will cross compile – i.e., you can use
opencc -march=bdver1
on the CSF login node to produce an executable to be run on the bulldozer nodes. But you can’t run the executable on the login node. Hence logging in to the bulldozer node, as described above, will be a better way of working. - The MPI wrappers (mpicc, mpif90) will not cross compile – you must use these on a bulldozer node.
Example fortran compilation
qrsh -l bulldozer -l short module load compilers/amd/4.5.2.1 openf90 -march=bdver1 hello.f90 -o hellof90 # # ...generates a binary called "hellof90" from the source file "hello.f90"...
Example C/C++ compilation
qrsh -l bulldozer -l short module load compilers/amd/4.5.2.1 opencc -march=bdver1 hello.c -o helloc # # ...generates a binary called "helloc" from the source file "hello.c"...
For C++ compilation use the openCC
command.
Serial job submission
Please note: this is for short testing of your code to ensure it has compiled correctly. Do not run computational work in serial. To submit a serial (single-core) batch job to SGE:
- Ensure you are on the login node, not the Bulldozer architecture compute node used for compilation.
- Make sure you have the compiler environment module loaded (see above).
- Create a submission script similar to this
#!/bin/bash #$ -S bash #$ -cwd #$ -V #$ -l bulldozer -l short # ...both required: tells SGE to select a Bulldozer compute node for max 2 hours... ./helloc.exe # # ...or hellof90.exe (for example)...
- Submit the script:
qsub open64.qsub
Parallel job submission
Your code, and thus the resulting executable, will usually use either OpenMP and/or MPI in order to run in parallel. Please follow these links to find out how to compile code and submit to SGE batch jobs of these types:
Further information and help
See also
- Opteron Linux Tuning Guide (The Bulldozer node use Opteron 6276 Processors).
- Compiler Opt Quick Reference
Updates
The -l short
resource flag is now required to access the bulldozer nodes for compilation and testing.