The CSF2 has been replaced by the CSF3 - please use that system! This documentation may be out of date. Please read the CSF3 documentation instead. To display this old CSF2 page click here. |
Mark 23 Fortran using Intel Compiler with Parallel SMP support (FSL6I23DCL)
Overview
This is the parallel (multicore) version of the NAG Fortran library. Run the following module commands to load the environment for the Intel Compiler and the relevant NAG library
module load compilers/intel/c/12.0.5 module load libs/intel/nag/fortran_smp_mark23_intel
This will set the following environment variable for easy access to the libraries, header files and example scripts:
NAG_HOME_FSL6I23DCL=/opt/gridware/libs/intel/nag/fsl6i23dcl
Compiling the example programs
The NAG library ships with example programs for every routine available. The directory
$NAG_HOME_FSL6I23DCL/scripts
contains two scripts:
nagsmp_example nagsmp_example_shar
which provide easy to use interfaces to compile, link and run each of these examples. This scripts directory is added to your PATH environment variable for convenience. The differences between these scripts are shown below:
- nagsmp_example, to link with the NAG static library libnagsmp.a and the supplied MKL libraries
- nagsmp_example_shar, to link with the NAG shareable library libnagsmp.so and the supplied MKL libraries
These can be used to inform you how to compile the NAG library according to your needs. For example, to statically compile the example program for e04nrf such that it would run on four cores you would run
nagsmp_example e04nrf 4
The output from this script will be something like
Copying e04nrfe.f90 to current directory cp /software/libs/intel/nag/fsl6i23dcl/examples/source/e04nrfe.f90 . Compiling and linking e04nrfe.f90 to produce executable e04nrfe.exe ifort -openmp -I/software/libs/intel/nag/fsl6i23dcl/nag_interface_blocks e04nrfe.f90 \ /software/libs/intel/nag/fsl6i23dcl/lib/libnagsmp.a -Wl,--start-group \ /software/libs/intel/nag/fsl6i23dcl/mkl10.3/libmkl_intel_lp64.a \ /software/libs/intel/nag/fsl6i23dcl/mkl10.3/libmkl_intel_thread.a \ /software/libs/intel/nag/fsl6i23dcl/mkl10.3/libmkl_core.a \ -Wl,--end-group -o e04nrfe.exe LD_LIBRARY_PATH=/software/libs/intel/nag/fsl6i23dcl/mkl10.3:/opt/gridware/compilers/intel/2011.5.220/composerxe-2011.5.220/compiler/lib/intel64:/opt/gridware/libs/intel/nag/fsl6i23dcl/lib ; export LD_LIBRARY_PATH OMP_NUM_THREADS=4 ; export OMP_NUM_THREADS Copying e04nrfe.d to current directory cp /software/libs/intel/nag/fsl6i23dcl/examples/data/e04nrfe.d . Copying e04nrfe.opt to current directory cp /software/libs/intel/nag/fsl6i23dcl/examples/data/e04nrfe.opt . Running e04nrfe.exe with data from e04nrfe.d and options from e04nrfe.opt ./e04nrfe.exe < e04nrfe.d > e04nrfe.r
Among other things, the above shows you the form of the ifort command that NAG recommend for this usage case. The output from running the example will be contained in the file e04nrfe.r
Controlling the number of threads
Set the environment variable OMP_NUM_THREADS to the number of threads required, up to the maximum available on a single node (Usually 12 on the CSF).
For example, type:
export OMP_NUM_THREADS=N
where N is the number of threads required. OMP_NUM_THREADS may be re-set between each execution of the program, as desired.
You cannot spawn threads across multiple nodes.
Submitting an example job
This example will use a modified version of the NAG example file for g02aaf (nearest correlation matrix computation) – benchg02aaf.f90. Upload this to your home directory on the CSF.
Compile with the command
NPATH=/software/libs/intel/nag/fortran_smp_mark23_intel ifort -openmp ./benchg02aaf.f90 -I$NPATH/nag_interface_blocks/ $NPATH/lib/libnagsmp.a \ -Wl,--start-group $NPATH/mkl10.3/libmkl_intel_lp64.a $NPATH/mkl10.3/libmkl_intel_thread.a \ $NPATH/mkl10.3/libmkl_core.a -Wl,--end-group -o ./benchg02aaf.exe
The program benchg02aaf.exe creates an N x N random matrix and then finds its nearest correlation matrix. It takes two input arguments. The first argument is an integer, N, determining the size of the random matrix and the second is a boolean, IO which determines whether or not the output is to be shown. For example, to find the nearest correlation matrix to a 1000 by 1000 random matrix without outputting the result you would do
./benchg02aaf.exe 1000 .f.
We can use this program to determine how execution speed of this NAG routine scales with number of cores.
To run on 4 cores, create a submission file called nag_submit.sh containing the following
#$ -S /bin/bash #$ -pe smp.pe 4 #$ -N NAGbench #$ -cwd #$ -o outputfile.log #$ -j y #$ -V ## The variable NSLOTS sets OMP_NUM_THREADS to match the cores requested on the pe line export OMP_NUM_THREADS=$NSLOTS time ./benchg02aaf.exe 5000 .f.
Which NAG routines make use of multiple cores?
Only a subset of NAG routines make use of multiple core. The complete list is available at the SMP-Tuned section of the NAG Mark 23 Fortran Library documentation.