SGE to SLURM

The use of SLURM on CSF4 represents a significant change for CSF3 users who are used to using the SGE batch system.

While SGE has served us well, SLURM has been widely adopted by many other HPC sites, is under active development and has features and flexibility that we need as we introduce new platforms for the research community at the University.

This page shows the SLURM commands and jobscript options next to their SGE counterparts to help you move from SGE to SLURM.

Jobscript Special Lines – SGE (#$) vs SLURM (#SBATCH)

The use of the SLURM batch system means your CSF3 jobscripts will no longer work on CSF4.

This is because the CSF3 jobscript special lines beginning with #$ will be ignored by SLURM. Instead, you should use lines beginning with #SBATCH and will need to change the options you use on those lines.

Note that it is #SBATCH (short for SLURM BATCH) and NOT #$BATCH. This is an easy mistake to make when you begin to modify your SGE jobscripts. Do not use a $ (dollar) symbol in the SLURM special lines.

It is possible to have both SGE and SLURM lines in your jobscripts – they will each ignore the other’s special lines. However, CSF4 uses different modulefile names and there are some differences in the way multi-core and multi-node jobs are run, so we advise writing new jobscripts for use on CSF4.

Examples of CSF3 jobscripts and their equivalent CSF4 jobscript are given below. One suggestion is to name your CSF4 jobscripts jobscript.sbatch and your CSF3 jobscripts jobscript.qsub, but you can, of course, use any naming scheme you like.

The commands used to submit jobs and check on the queue have also changed. See below for the equivalent commands.

Command-line tools – SGE (qsub, …) vs SLURM (sbatch, …)

SGE Commands (CSF3) SLURM Commands (CSF4)
# Batch job submission
qsub jobscript
qsub jobscript arg1 arg2 ...
qsub options -b y executable arg1 ...

# Job queue status
qstat                # Show your jobs (if any)
qstat -u "*"         # Show all jobs
qstat -u username

# Cancel (delete) a job
qdel jobid
qdel jobname
qdel jobid -t taskid
qdel "*"             # Delete all my jobs

# Interactive job
qrsh -l short

# Completed job stats
qacct -j jobid
# Batch job submission
sbatch jobscript
sbatch jobscript arg1 arg2 ...
sbatch options --wrap="executable arg1 ..."

# Job queue status
squeue      # An alias for "squeue --me"
\squeue     # Unaliased squeue shows all jobs
squeue -u username

# Cancel (delete) a job
scancel jobid
scancel -n jobname
scancel jobid_taskid
scancel -u $USER       # Delete all my jobs

# Interactive job
srun --pty bash

# Completed job stats
sacct -j jobid

Job Output Files (stdout and stderr)

SGE job output files, not merged by default (CSF3) SLURM job output files, merged by default (CSF4)
# Individual (non-array) jobs
jobscriptname.oJOBID
jobscriptname.eJOBID

# Array jobs
jobscriptname.oJOBID.TASKID
jobscriptname.eJOBID.TASKID
# Individual (non-array) jobs
slurm-JOBID.out


# Array jobs (see later for more details)
slurm-ARRAYJOBID_TASKID.out

The SLURM files contain the normal and error output that SGE splits in to two files.

The naming and merging of the files can be changed using jobscript options (see below) but for now, in the basic jobscripts shown next, we’ll just accept these default names to keep the jobscripts short.

Jobscripts

You will need to rewrite your SGE (CSF3) jobscripts. You could name them somename.slurm if you like, to make it obvious it is a SLURM jobscript.

Put #SBATCH lines in one block

Please note: all SLURM special lines beginning with #SBATCH must come before ordinary lines that run Linux commands or your application. Any #SBATCH lines appearing after the first non-#SBATCH line will be ignored. For example:

#!/bin/bash --login

# You may put comment lines before and after #SBATCH lines
#SBATCH -p serial
#SBATCH -n 4

# Now the first 'ordinary' line. So no more #SBATCH lines allowed after here
export MY_DATA=~/scratch/data
module load myapp/1.2.3

# Any SBATCH lines here will be ignored!
#SBATCH --job-name  new_job_name

./my_app dataset1.dat

Basic Serial (1-core) Josbscript

Note that in SLURM you must specify one core to be safe – some jobscripts will need the $SLURM_NTASKS environment variable (equivalent of SGE’s $NSLOTS variable) and SLURM only sets it if you explicitly request one core. The need to do this may change in our config in future.

SGE Jobscript (CSF3) SLURM Jobscript (CSF4)
#!/bin/bash --login
#$ -cwd           # Run in current directory

# Default in SGE is to use 1 core






# Modules have a different name format
module load apps/gcc/appname/x.y.z

serialapp.exe in.dat out.dat
#!/bin/bash --login
# Default in SLURM: run in current dir

# OPTIONAL LINE: default partition is serial
#SBATCH -p serial # (or --partition=serial)

# OPTIONAL LINE: default is 1 core in serial
#SBATCH -n 1      # (or --ntasks=1) use 1 core
                  # $SLURM_NTASKS will be set.

# Modules have a different name format
module load appname/x.y.z

serialapp.exe in.dat out.dat

Basic Multi-core (single compute node) Parallel Jobscript

Note that requesting a 1-core multicore job is not possible – the job will be rejected. The minimum number of cores is 2.

SGE Jobscript (CSF3) SLURM Jobscript (CSF4)
#!/bin/bash --login
#$ -cwd           # Run in current directory

# Multi-core on a single node (2--32 cores)
#$ -pe smp.pe 4   # Single-node, 4 cores


# Modules have a different name format
module load apps/gcc/appname/x.y.z

# If running an OpenMP app, use:
export OMP_NUM_THREADS=$NSLOTS
openmpapp.exe in.dat out.dat

# Or an app may have its own flag. EG:
multicoreapp.exe -n $NSLOTS in.dat out.dat
#!/bin/bash --login
# Default in SLURM

# Multi-core on a single node (2--40 cores)
#SBATCH -p multicore # (or --partition=multicore)  
#SBATCH -n 4         # (or --ntasks=4) 4 cores

# Modules have a different name format
module load appname/x.y.z

# If running an OpenMP app, use:
export OMP_NUM_THREADS=$SLURM_NTASKS
openmpapp.exe in.dat out.dat

# Or an app may have its own flag. EG:
multicoreapp.exe -n $SLURM_NTASKS in.dat out.dat

Basic Multi-node Parallel Jobscript

Note that at the moment in SLURM you must specify the number of compute nodes to be safe – this ensures all cores are on the same compute node. The need to do this may change in our config in future.

SGE Jobscript (CSF3) SLURM Jobscript (CSF4)
#!/bin/bash --login
#$ -cwd           # Run in current directory

# Multi-node (all 24 cores in use on each node)
#$ -pe mpi-24-ib.pe 48   # 2 x 24-core nodes





# Modules have a different name format
module load apps/gcc/appname/x.y.z

# Use $NSLOTS to say how many cores to use
mpirun -n $NSLOTS multinodeapp.exe in.dat out.dat
#!/bin/bash --login
# Default in SLURM

# Multi-node (all 40 cores in use on each node)
#SBATCH -p multinode # (or --partition=multinode)
# The number of nodes is now mandatory!
#SBATCH -N 2         # (or --nodes=2)  2x40 cores
# Can optionally also give total number of cores
#SBATCH -n 80        # (or --ntasks=80)  80 cores

# Modules have a different name format
module load appname/x.y.z

# SLURM knows how many cores to use for mpirun
mpirun multinodeapp.exe in.dat out.dat

For an example of a multi-node mixed-mode (OpenMP+MPI) jobscript, please see the parallel jobs page.

Basic Job Array Jobscript

Note that Job Arrays in SLURM have some subtle differences in the way the unique JOBID is handled. Also, if you are renaming the default SLURM output (.out) file then you need to use different wildcards for job arrays.

SGE Jobscript (CSF3) SLURM Jobscript (CSF4)
#!/bin/bash --login
#$ -cwd           # Run in current directory

# Run 100 tasks numbered 1,2,...,100
# (cannot start at zero!!)
# Max permitted array size: 75000
#$ -t 1-100

# Default in SGE is to use 1 core







# Modules have a different name format
module load apps/gcc/appname/x.y.z

# EG: input files are named data.1, data.2, ...
# and output files result.1, result.2, ...
app.exe -in data.$SGE_TASK_ID \
        -out result.$SGE_TASK_ID
#!/bin/bash --login
# Default in SLURM

# Run 100 tasks numbered 1,2,...,100
# (can start at zero, e.g.: 0-99)
# Max permitted array size: 10000
#SBATCH -a 1-100    # (or --array=1-100)

# Note: This is the number of cores to use
# for each jobarray task, not the number of 
# tasks in the job array (see above).
#SBATCH -n 1      # (or --ntasks=1) use 1 core

# OPTIONAL LINE: default partition is serial
#SBATCH -p serial # (or --partition=serial)

# Modules have a different name format
module load appname/x.y.z

# EG: input files are named data.1, data.2, ...
# and output files result.1, result.2, ...
app.exe -in data.$SLURM_ARRAY_TASK_ID \
        -out result.$SLURM_ARRAY_TASK_ID

More Jobscript special lines – SGE vs SLURM

Here are some more example jobscripts special lines for achieving things in SGE and SLURM.

Renaming a job and the output .o and .e files

SGE Jobscript SLURM Jobscript
#!/bin/bash --login
...
# Naming the job is optional.
# Default is name of jobscript
# DOES rename .o and .e output files.
#$ -N jobname

# Naming the output files is optional.
# Default is separate .o and .e files:
# jobname.oJOBID and jobname.eJOBID
# Use of '-N jobname' DOES affect those defaults
#$ -o myjob.out
#$ -e myjob.err

# To join .o and .e in to a single file
# similar to Slurm's default behaviour:
#$ -j y
#!/bin/bash --login
...
# Naming the job is optional.
# Default is name of jobscript
# Does NOT rename .out file.
#SBATCH -J jobname

# Naming the output files is optional.
# Default is a single file for .o and .e:
# slurm-JOBID.out
# Use of '-J jobname' does NOT affect the default
#SBATCH -o myjob.out
#SBATCH -e myjob.err

# Use wildcards to recreate the SGE names
#SBATCH -o %x.o%j      # %x = SLURM_JOB_NAME
#SBATCH -e %x.e%j      # %j = SLURM_JOB_ID

The $SLURM_JOB_NAME variable will tell you the name of your jobscript, unless the -J jobname variable is used to rename your job. Then the env var is set to the value of jobname.

If you wanted to use $SLURM_JOB_NAME to always give you the name of the jobscript from within your job, you would have to remove the -J flag. However, the following command run inside your jobscript will give you the name of the jobscript regardless of whether you use the -J flag or not:

scontrol show jobid $SLURM_JOB_ID | grep Command= | awk -F/ '{print $NF}'

Renaming an array job output .o and .e files

An array job uses slurm-ARRAYJOBID_TASKID.out as the default output file for each task in the array job. This can be renamed but you need to use the %A and %a wildcards (not %j).

SGE Jobscript SLURM Jobscript
#!/bin/bash --login
...
# An array job (cannot start at 0)
#$ -t 1-1000

# Naming the job is optional.
# Default is name of jobscript
#$ -N jobname

# Naming the output files is optional.
# Default is separate .o and .e files:
# jobname.oJOBID and jobname.eJOBID
# Use of '-N jobname' DOES affect those defaults

# To join .o and .e in to a single file
# similar to Slurm's default behaviour:
#$ -j y

#!/bin/bash --login
...
# An array job (CAN start at 0)
#SBATCH -a 0-999     # (or --array=0-999)

# Naming the job is optional.
# Default is name of jobscript
#SBATCH -J jobname

# Naming the output files is optional.
# Default is a single file for .o and .e:
# slurm-ARRAYJOBID_TASKID.out
# Use of '-J jobname' does NOT affect the default

# Use wildcards to recreate the SGE names
#SBATCH -o %x.o%A.%a   # %x = SLURM_JOB_NAME
#SBATCH -e %x.e%A.%a   # %A = SLURM_ARRAY_JOB_ID
                       # %a = SLURM_ARRAY_TASK_ID

Emailing from a job

SLURM can email you when your job begins, ends or fails.

SGE Jobscript SLURM Jobscript
#!/bin/bash --login
...
# Mail events: begin, end, abort
#$ -m bea
#$ -M emailaddr@manchester.ac.uk	
#!/bin/bash --login
...
# Mail events: NONE, BEGIN, END, FAIL, ALL
#SBATCH --mail-type=ALL
#SBATCH --mail-user=emailaddr@manchester.ac.uk

Note that in SLURM, array jobs only send one email, not an email per job-array tasks as happens in SGE. If you want an email from every job-array task, add ARRAY_TASKS to the --mail flag:

#SBATCH --mail-type=ALL,ARRAY_TASKS
                            #
                            # DO NOT USE IF YOUR ARRAY JOB CONTAINS MORE THAN
                            # 20 TASKS!! THE UoM MAIL ROUTERS WILL BLOCK THE CSF!

But please be aware that you will receive A LOT of email if you run a large job array with this flag enabled.

Job Environment Variables

A number of environment variables are available for use in your jobscripts – these are sometimes useful when creating your own log files, for informing applications how many cores they are allowed to use (we’ve already seen $SLURM_NTASKS in the examples above), and for reading sequentially numbered data files in job arrays.

SGE Environment Variables SLURM Environment Variables
$NSLOTS             # Num cores reserved

$JOB_ID             # Unique jobid number
$JOB_NAME           # Name of job

# For array jobs
$JOB_ID             # Same for all tasks
                    # (e.g, 20173)


$SGE_TASK_ID        # Job array task number
                    # (e.g., 1,2,3,...)
$SGE_TASK_FIRST     # First task id
$SGE_TASK_LAST      # Last task id
$SGE_TASK_STEPSIZE  # Taskid increment: default 1


# You will be unlikely to use these:
$PE_HOSTFILE        # Multi-node job host list
$NHOSTS             # Number of nodes in use
$SGE_O_WORKDIR      # Submit directory
$SLURM_NTASKS         # Num cores from -n flag
$SLURM_CPUS_PER_TASK  # Num cores from -c flag
$SLURM_JOB_ID         # Unique job id number
$SLURM_JOB_NAME       # Name of job

# For array jobs
$SLURM_JOB_ID         # DIFFERENT FOR ALL TASKS 
                      # (e.g, 20173,20174,20175,)
$SLURM_ARRAY_JOB_ID   # SAME for all tasks
                      # (e.g, 20173)
$SLURM_ARRAY_TASK_ID  # Job array task number
                      # (e.g., 1,2,3,...)
$SLURM_ARRAY_TASK_MIN # First task id
$SLURM_ARRAY_TASK_MAX # Last task id
$SLURM_ARRAY_TASK_STEP  # Increment: default 1
$SLURM_ARRAY_TASK_COUNT # Number of tasks

# You will be unlikely to use these:
$SLURM_JOB_NODELIST   # Multi-node job host list
$SLURM_JOB_NUM_NODES  # Number of nodes in use
$SLURM_SUBMIT_DIR     # Submit directory

Many more environment variables are available for use in your jobscript. The Slurm sbatch manual (also available on the CSF login node by running man sbatch) documents Input and Output environment variables. The input variables can be set by you before submitting a job to set job options (although we recommend not doing this – it is better to put all options in your jobscript so that you have a permanent record of how you ran the job). The output variables can be used inside your jobscript to get information about the job (e.g., number of cores, job name and so on – we have documented several of these above.)

Last modified on September 23, 2024 at 11:56 am by George Leaver