Parallel Jobs

Current Configuration and Parallel Environments

For jobs that require two or more CPU cores, the appropriate SGE parallel environment should be selected from the table below. Please also consult the software page specific to the code / application you are running for advice on the most suitable pe.

A parallel job script takes the form:

#!/bin/bash --login
#$ -cwd                       # Job will run in the current directory (where you ran qsub)
#$ -pe pename numcores        # Choose a PE name from the tables below and a number of cores

# Load any required modulefiles
module load apps/some/example/1.2.3

# Now the commands to be run in job. You MUST tell your app how many cores to use! There are usually
# three way to do this. Note: $NSLOTS is automatically set to the number of cores given above.

  • OpenMP applications (multicore but all in a single compute node):
    export OMP_NUM_THREADS=$NSLOTS
    the_openmp_app
    
  • MPI applications (small jobs on a single node or larger jobs across multiple compute nodes)
    mpirun -n $NSLOTS the_mpi_app 
    
  • Other multicore apps that use their own command-line flags (you must check the app’s documentation for how to do this correctly). For example:
    the_bioinfo_app --numthreads $NSLOTS         # This is an example - check your app's docs!
    

The available parallel environments are now described. Use the name of a parallel environment in your jobscript.

Intel parallel environments

Single Node Multi-core(SMP) and small MPI Jobs

-l shortMax 24 cores, 4GB/core, 1 hour runtime (usually means queuing times are also very short). Currently a two 24 core haswell nodes. This option is for test jobs and interactive only – DO NOT use it for production runs as it is unfair on those who need it for testing/interactive.

PE name: smp.pe

  • For jobs of 2 to 32 cores.
  • Jobs will use a single compute node. Use for OpenMP (or other multicore/SMP jobs) and small MPI jobs.
  • 4GB or 6GB per core depending where the job runs.
  • 7 day runtime limit.
  • Currently, jobs may be placed on either Haswell (max 24 cores), Broadwell (max 28 cores) or Skylake (max 32 cores) CPUs. (11.08.21 – Sandybridge 12 core nodes removed from service, 01.11.2022 Ivybridge standard memory 16 cores nodes removed from service). See optional resources below to control this. The system will choose if not specified.
  • We recommend you do not specify a type of CPU unless absolutely necessary for your application/situation as doing so reduces the pool of nodes available to you and can lead to an increased wait in the queue.
  • Large pool of cores.
  • The optional resource flags below can be used to modify your job. You should specify only one flag (if using any such flags) unless indicated in the table.
  • Note: Choosing a node type is not recommended as it can mean a much longer wait in the queue.
Optional Resources Max cores per job, RAM per core Additional usage guidance
-l mem256 Max 16 cores, 16GB/core (haswell nodes only) High memory nodes. Jobs must genuinely need extra memory.
-l mem512
-l mem512 -l ivybridge
-l mem512 -l haswell
Max 16 cores, 32GB/core (system chooses ivybridge or haswell)
Max 16 cores, 32GB/core (ivybridge nodes)
Max 16 cores, 32GB/core (haswell nodes)
High memory nodes. Jobs must genuinely need extra memory.
-l haswell Max 24 cores, 5GB/core Use only Haswell cores.
-l broadwell Max 28 cores, 5GB/core Use only Broadwell cores.
-l skylake Max 32 cores, 6GB/core Use only Skylake cores.
-l avx Limits will depend on what node type the system chooses and whether you include any memory options. System will choose one of Haswell, Broadwell, Skylake CPUs
-l avx2 Limits will depend on what node type the system chooses and whether you include any memory options. System will choose one of Haswell, Broadwell, Skylake CPUs
-l avx512 Max 32 cores, 6GB/core Use only Skylake CPUs

Multi-node large MPI Jobs

PE name: mpi-24-ib.pe (note: the CSF2 name “orte-24-ib.pe” can also be used so you don’t need to edit existing CSF2 jobscripts)

  • For MPI jobs of 48 or more cores, in multiples of 24, up to a maximum of 120.
  • On CSF3 there are no separate InfiniBand modulefiles – applications will automatically use the fast InfiniBand network.
  • 24-core jobs not permitted as they fit on one compute node so do not utilise the InfiniBand network (see smp.pe for 24–32 core jobs).
  • 5GB RAM per core.
  • 7 day runtime limit.
  • Haswell nodes only.
  • Small pool of cores.
  • Jobs submitted here by ‘free at the point of use’ users will not run as it the maximum ‘free’ job size is 32.
Optional Resources Max cores per job, RAM per core Additional usage guidance
NONE NONE NONE

Basic parallel batch SGE job submission

When submitting parallel jobs to the batch system you usually specify the number of cores required in two places:

  1. The pe option tells the batch system how many cores you are requesting. It will only run your job when the correct resources become available. Add the following to your jobscript:
    #$ -pe pename numcores
    

    replacing pename with one of the PE names described above and numcores with the number of cores to use (satisfying the rules in the PE description above).

  2. Your application will also need to be informed how many cores you have requested from the batch system. There is usually a command-line flag or environment variable that you must give to your application so that it uses no more cores than you requested from the batch system. Be careful here. Some software will try to use all cores in a node if you don’t specifically tell it how many you actually requested. You might end up using cores that haven’t been allocated to you, which could adversely affect other users’ jobs. You must ensure you tell your application how many cores to use.

The batch system automatically sets the environment variable $NSLOTS to the number of cores requested on the $# -pe line (the number you replaced numcores with). You can then use this environment variable to tell your application how many cores to use. See above for examples of this when running MPI and OpenMP jobs.

If you use some other parallel method (e.g., Java threads or Boost library threads) then you should check the application’s documentation for how to specify the number of cores to use. In particular, if running Java applications please see our instructions for running Java applications to ensure you only use the cores which you have reserved in the batch system. Another example is the Gaussian application – this requires you to put the number of cores to use in your data input file!

Last modified on July 10, 2023 at 9:16 am by Pen Richardson