Parallel Jobs

Current Configuration and Parallel Partitions

For jobs that require two or more CPU cores, the appropriate SLURM partition should be selected from the table below. Please also consult the software page specific to the code / application you are running for advice on the most suitable partition.

Parallel jobscripts using a single compute node

These jobscript will use CPU cores on a single compute node. There are a couple of ways that you can run your app in the jobscript, depending on which parallel methods the application supports.

Multicore parallel OpenMP and small MPI jobs

For jobs requiring between 2 and 40 cores. It will fit on a single compute node. This could be a multi-core (OpenMP) app or a small MPI job. This is the recommended jobscript for single-node parallel jobs. The jobscript will require the following lines:

#!/bin/bash --login
# Runs in current dir by default
#SBATCH -p multicore  # (or --partition=) 
#SBATCH -n numtasks   # (or --ntasks=) Number of MPI procs or CPU cores. 2--40.
                      # The $SLURM_NTASKS variable will be set to this value.

# Can load modulefiles
module load appname/x.y.x

# For an OpenMP (multicore) app, inform the app how many cores to use, then run the app
export OMP_NUM_THREADS=$SLURM_NTASKS
openmpapp arg1 arg2 ...

# For an MPI app SLURM knows to run numtasks from above
mpirun mpiapp arg1 arg2 ...

Multicore parallel for OpenMP jobs only

For jobs requiring between 2 and 40 cores. Note that this method should NOT be used if running MPI applications. It is for multicore applications (usually OpenMP). MPI will not start the expected number of processes (only one MPI process is started which is probably not what you would want for MPI jobs!) It will fit on a single compute node. The jobscript will require the following lines:

#!/bin/bash --login
# Runs in current dir by default
#SBATCH -p multicore    # (or --partition=) 
#SBATCH -c corespertask # (or --cpus-per-task=) Number of cores to use for OpenMP (2--40)
                        # The $SLURM_CPUS_PER_TASK variable will be set to this value.
# Can load modulefiles
module load appname/x.y.x

# For an OpenMP (multicore) app say how many cores to use then run the app
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
openmpapp arg1 arg2 ...

# If you start an MPI app SLURM will only start one 1 MPI process!!
# This is because the $SLURM_NTASKS variable is NOT defined (no -n, --numtasks flag above)
# mpirun mpiapp arg1 arg2 ...

Parallel jobscripts using multiple compute nodes

These jobscript will use CPU cores on multiple compute nodes. There are a couple of ways can run your job, depending which parallel methods the application supports.

Multinode parallel large MPI jobs

These jobs will use at least two 40-core compute nodes (but you can request more than two nodes) and will use all of the cores on each compute node. Your job has exclusive use of the compute nodes (no other jobs will be running on them). There are two method of specifying such jobs – specifying the number of nodes OR the total number of cores.

Method 1: specify the total number of cores (tasks) – DO NOT USE

June 2024: THIS METHOD NO LONGER WORKS – YOU MUST SPECIFY THE NUMBER OF NODES (-N) AND OPTIONALLY THE NUMBER OF TASKS (-n)
Method 1: specify the total number of cores

#!/bin/bash --login
# Runs in current dir by default
#SBATCH -p multinode  # (or --partition=) 
#SBATCH -n numtasks   # (or --ntasks=) 80 or more in multiples of 40. 
# This old method does NOT use the -N (--nodes) flag. We need to use it now!

# Can load modulefiles
module load appname/x.y.x

# For an MPI app SLURM knows how many cores to run
mpirun mpiapp arg1 arg2 ...

Remember: specifying only the total number of cores (e.g., -n 80) will NO LONGER WORK.

Method 2: specify the number of nodes

THIS METHOD WORKS (but $SLURM_NTASKS will not be set – see below if you need that environment variable)

#!/bin/bash --login
# Runs in current dir by default
#SBATCH -p multinode  # (or --partition=) 
#SBATCH -N numnodes   # (or --nodes=) 2 or more. The jobs uses all 40 cores on each node.
                      # Note: $SLURM_NTASKS is NOT set if you use only the -N (--nodes) flag.
                      # To use $SLURM_NTASKS in your jobscript, add -n (see below)!!

# Can load modulefiles
module load appname/x.y.x

# For an MPI app SLURM knows to run 40 MPI tasks on each compute node
mpirun mpiapp arg1 arg2 ...

See the mixed-mode example below for a more complex multi-node job.

Method 3: specify the number of nodes AND number of cores (tasks) – RECOMMENDED

THIS METHOD WORKS (and $SLURM_NTASKS will be set)
Method 3: specify the number of nodes AND number of tasks

#!/bin/bash --login
# Runs in current dir by default
#SBATCH -p multinode  # (or --partition=) 
#SBATCH -N numnodes   # (or --nodes=) 2 or more. The jobs uses all 40 cores on each node.
#SBATCH -n numtasks   # (or --ntasks=) 80 or more - the TOTAL number of tasks in your job.

# Can load modulefiles
module load appname/x.y.x

# For an MPI app SLURM knows to run 40 MPI tasks on each compute node
mpirun mpiapp arg1 arg2 ...

See the mixed-mode example below for a more complex multi-node job.

Multinode parallel large mixed-mode (MPI+OpenMP)

Mixed-mode jobs run a smaller number of MPI processes which themselves use OpenMP (multicore) to provide some of the parallelism. If your application supports mixed-mode parallelism, this method can often provide performance benefits over entirely MPI parallel jobs, by reducing the amount of MPI communication between the nodes.

These jobs will use at least two 40-core compute nodes (but you can request more than two nodes) and will use all of the cores on each compute node. Your job has exclusive use of the compute nodes (no other jobs will be running on them). The jobscript will require the following lines (see below for specific examples):

#!/bin/bash --login
# Runs in current dir by default
#SBATCH -p multinode    # (or --partition=)
#SBATCH -N numnodes     # (or --nodes=) Use all cores on this many compute nodes (2 or more.)
#SBATCH -n numtasks     # (or --ntasks=) Number of MPI processes to run in total. They will be
                        #                spread across the requested number of nodes.
#SBATCH -c corespertask # (or --cpus-per-task=) Number of cores to use for OpenMP in each MPI process.

# Can load modulefiles
module load appname/x.y.x

# Inform each MPI process how many OpenMP cores to use
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# For an MPI+OpenMP app SLURM knows to run numtasks MPI procs across numnodes nodes
mpirun --map-by ppr:N:RES:pe=$OMP_NUM_THREADS mpiapp arg1 arg2 ...
                   #        #
                   #        # pe=$OMP_NUM_THREADS gives each MPI process access to
                   #        # the cores needed for its OpenMP threads (see below).
                   #
                   # ppr is 'processes per resource'. It means we are about to specify
                   # how MPI processes and OpenMP threads should be placed on the nodes.
                   # N is a number of MPI processes (you should use a number here.) 
                   # RES is the resource unit (for example 'node' or 'socket'.)
                   # You will have N MPI processes placed on each RES.
                   # See below for a real example.

For example, to run a mixed-mode application using three compute nodes, where each compute node runs two MPI processes (each placed on a socket – aka a CPU, because our nodes have two CPUs in them) and each MPI process runs 20 OpenMP threads:

+-- Compute node 1 --+     +-- Compute node 2 --+     +-- Compute node 3 --+
|Socket (CPU0)       |     |Socket (CPU0)       |     |Socket (CPU0)       |
|   MPI Proc         |     |   MPI Proc         |     |   MPI Proc         |
|     20 OpenMP cores|     |     20 OpenMP cores|     |     20 OpenMP cores|
+- - - - - - - - - - +     +- - - - - - - - - - +     +- - - - - - - - - - +
|Socket (CPU1)       |     |Socket (CPU1)       |     |Socket (CPU1)       |
|   MPI Proc         |     |   MPI Proc         |     |   MPI Proc         | 
|     20 OpenMP cores|     |     20 OpenMP cores|     |     20 OpenMP cores|
+--------------------+     +--------------------+     +--------------------+
The MPI processes will communicate with each other, using the InfiniBand network between the nodes.

The jobscript it as follows:

#!/bin/bash --login
#SBATCH -p multinode
#SBATCH -N 3          # 3 whole compute nodes (we use all 40-cores on each compute node, 120 in total)
#SBATCH -n 6          # 6 MPI processes in total (2 per compute node)
#SBATCH -c 20         # 20 cores to be used by each MPI process

# Load any modulefiles
module load appname/x.y.x

# Inform each MPI process how many OpenMP threads to use (20 in this example)
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# Run the MPI processes (SLURM knows to use 3 compute nodes, each running 2 processes in this example).
mpirun --map-by ppr:1:socket:pe=$OMP_NUM_THREADS mpiapp arg1 arg2 ...
                 #             #
                 #             # pe=$OMP_NUM_THREADS gives each MPI process access to
                 #             # the (20) cores of the socket on which it is running.
                 #
                 # ppr is 'processes per resource'. It describes how MPI processes should
                 # be placed on the node. In this case 1 process per socket is specified.
                 # So the two MPI processes will each run on their own socket in the node
                 # because our compute nodes have two sockets (CPUs) in them.

Note that the above jobscript can be adapted to run single-node MPI+OpenMP mixed-mode jobs by using the multicore partition and requesting only a single compute node (-N 1) with two tasks (-n 2).

Partitions Summary

Currently two parallel partitions are available. Details of each are given below, with corresponding job limits.

Single Node Multi-core(SMP) and small MPI Jobs

Partition name: multicore

  • For jobs of 2 to 40 cores (40 is a new maximum since Intel Cascade Lake nodes were installed)
  • Jobs will use a single compute node. Use for OpenMP (or other multicore/SMP jobs) and small MPI jobs.
  • 4GB of memory per core.
  • 7 day runtime limit.
  • Currently, jobs may be placed on Cascade Lake (max 40 cores) CPUs.
  • Large pool of cores.
Optional Resources Max cores per job, RAM per core Additional usage guidance
NONE NONE

NONE

Multi-node large MPI Jobs

Partition name: multinode

  • For MPI jobs of 80 or more cores, in multiples of 40, up to a maximum of 200.
  • 40-core jobs not permitted as they fit on one compute node so do not utilise the InfiniBand network (see multicore for 2–40 core jobs).
  • 4GB RAM per core.
  • 7 day runtime limit.
  • Currently, jobs may be placed on Cascade Lake (max 40 cores) CPUs.
  • Large pool of cores.
Optional Resources Max cores per job, RAM per core Additional usage guidance
NONE NONE

NONE

Last modified on September 20, 2024 at 12:26 pm by George Leaver