SGE to SLURM
The use of SLURM on CSF4 represents a significant change for CSF3 users who are used to using the SGE batch system.
While SGE has served us well, SLURM has been widely adopted by many other HPC sites, is under active development and has features and flexibility that we need as we introduce new platforms for the research community at the University.
This page shows the SLURM commands and jobscript options next to their SGE counterparts to help you move from SGE to SLURM.
Jobscript Special Lines – SGE (#$) vs SLURM (#SBATCH)
The use of the SLURM batch system means your CSF3 jobscripts will no longer work on CSF4.
This is because the CSF3 jobscript special lines beginning with #$
will be ignored by SLURM. Instead, you should use lines beginning with #SBATCH
and will need to change the options you use on those lines.
#SBATCH
(short for SLURM BATCH) and NOT #$BATCH
. This is an easy mistake to make when you begin to modify your SGE jobscripts. Do not use a $ (dollar) symbol in the SLURM special lines.It is possible to have both SGE and SLURM lines in your jobscripts – they will each ignore the other’s special lines. However, CSF4 uses different modulefile names and there are some differences in the way multi-core and multi-node jobs are run, so we advise writing new jobscripts for use on CSF4.
Examples of CSF3 jobscripts and their equivalent CSF4 jobscript are given below. One suggestion is to name your CSF4 jobscripts jobscript.sbatch
and your CSF3 jobscripts jobscript.qsub
, but you can, of course, use any naming scheme you like.
The commands used to submit jobs and check on the queue have also changed. See below for the equivalent commands.
Command-line tools – SGE (qsub, …) vs SLURM (sbatch, …)
SGE Commands (CSF3) | SLURM Commands (CSF4) |
---|---|
# Batch job submission qsub jobscript qsub jobscript arg1 arg2 ... qsub options -b y executable arg1 ... # Job queue status qstat # Show your jobs (if any) qstat -u "*" # Show all jobs qstat -u username # Cancel (delete) a job qdel jobid qdel jobname qdel jobid -t taskid qdel "*" # Delete all my jobs # Interactive job qrsh -l short # Completed job stats qacct -j jobid |
# Batch job submission sbatch jobscript sbatch jobscript arg1 arg2 ... sbatch options --wrap="executable arg1 ..." # Job queue status squeue # An alias for "squeue --me" \squeue # Unaliased squeue shows all jobs squeue -u username # Cancel (delete) a job scancel jobid scancel -n jobname scancel jobid_taskid scancel -u $USER # Delete all my jobs # Interactive job srun --pty bash # Completed job stats sacct -j jobid |
Job Output Files (stdout and stderr)
SGE job output files, not merged by default (CSF3) | SLURM job output files, merged by default (CSF4) |
---|---|
# Individual (non-array) jobs jobscriptname.oJOBID jobscriptname.eJOBID # Array jobs jobscriptname.oJOBID.TASKID jobscriptname.eJOBID.TASKID |
# Individual (non-array) jobs slurm-JOBID.out # Array jobs (see later for more details) slurm-ARRAYJOBID_TASKID.out |
The SLURM files contain the normal and error output that SGE splits in to two files.
The naming and merging of the files can be changed using jobscript options (see below) but for now, in the basic jobscripts shown next, we’ll just accept these default names to keep the jobscripts short.
Jobscripts
You will need to rewrite your SGE (CSF3) jobscripts. You could name them somename.slurm
if you like, to make it obvious it is a SLURM jobscript.
Put #SBATCH lines in one block
Please note: all SLURM special lines beginning with #SBATCH
must come before ordinary lines that run Linux commands or your application. Any #SBATCH
lines appearing after the first non-#SBATCH
line will be ignored. For example:
#!/bin/bash --login # You may put comment lines before and after #SBATCH lines #SBATCH -p serial #SBATCH -n 4 # Now the first 'ordinary' line. So no more #SBATCH lines allowed after here export MY_DATA=~/scratch/data module load myapp/1.2.3 # Any SBATCH lines here will be ignored! #SBATCH --job-name new_job_name ./my_app dataset1.dat
Basic Serial (1-core) Josbscript
Note that in SLURM you must specify one core to be safe – some jobscripts will need the $SLURM_NTASKS
environment variable (equivalent of SGE’s $NSLOTS
variable) and SLURM only sets it if you explicitly request one core. The need to do this may change in our config in future.
SGE Jobscript (CSF3) | SLURM Jobscript (CSF4) |
---|---|
#!/bin/bash --login #$ -cwd # Run in current directory # Default in SGE is to use 1 core # Modules have a different name format module load apps/gcc/appname/x.y.z serialapp.exe in.dat out.dat |
#!/bin/bash --login # Default in SLURM: run in current dir # OPTIONAL LINE: default partition is serial #SBATCH -p serial # (or --partition=serial) # OPTIONAL LINE: default is 1 core in serial #SBATCH -n 1 # (or --ntasks=1) use 1 core # $SLURM_NTASKS will be set. # Modules have a different name format module load appname/x.y.z serialapp.exe in.dat out.dat |
Basic Multi-core (single compute node) Parallel Jobscript
Note that requesting a 1-core multicore job is not possible – the job will be rejected. The minimum number of cores is 2.
SGE Jobscript (CSF3) | SLURM Jobscript (CSF4) |
---|---|
#!/bin/bash --login #$ -cwd # Run in current directory # Multi-core on a single node (2--32 cores) #$ -pe smp.pe 4 # Single-node, 4 cores # Modules have a different name format module load apps/gcc/appname/x.y.z # If running an OpenMP app, use: export OMP_NUM_THREADS=$NSLOTS openmpapp.exe in.dat out.dat # Or an app may have its own flag. EG: multicoreapp.exe -n $NSLOTS in.dat out.dat |
#!/bin/bash --login # Default in SLURM # Multi-core on a single node (2--40 cores) #SBATCH -p multicore # (or --partition=multicore) #SBATCH -n 4 # (or --ntasks=4) 4 cores # Modules have a different name format module load appname/x.y.z # If running an OpenMP app, use: export OMP_NUM_THREADS=$SLURM_NTASKS openmpapp.exe in.dat out.dat # Or an app may have its own flag. EG: multicoreapp.exe -n $SLURM_NTASKS in.dat out.dat |
Basic Multi-node Parallel Jobscript
Note that at the moment in SLURM you must specify the number of compute nodes to be safe – this ensures all cores are on the same compute node. The need to do this may change in our config in future.
SGE Jobscript (CSF3) | SLURM Jobscript (CSF4) |
---|---|
#!/bin/bash --login #$ -cwd # Run in current directory # Multi-node (all 24 cores in use on each node) #$ -pe mpi-24-ib.pe 48 # 2 x 24-core nodes # Modules have a different name format module load apps/gcc/appname/x.y.z # Use $NSLOTS to say how many cores to use mpirun -n $NSLOTS multinodeapp.exe in.dat out.dat |
#!/bin/bash --login # Default in SLURM # Multi-node (all 40 cores in use on each node) #SBATCH -p multinode # (or --partition=multinode) # The number of nodes is now mandatory! #SBATCH -N 2 # (or --nodes=2) 2x40 cores # Can optionally also give total number of cores #SBATCH -n 80 # (or --ntasks=80) 80 cores # Modules have a different name format module load appname/x.y.z # SLURM knows how many cores to use for mpirun mpirun multinodeapp.exe in.dat out.dat |
For an example of a multi-node mixed-mode (OpenMP+MPI) jobscript, please see the parallel jobs page.
Basic Job Array Jobscript
Note that Job Arrays in SLURM have some subtle differences in the way the unique JOBID is handled. Also, if you are renaming the default SLURM output (.out) file then you need to use different wildcards for job arrays.
SGE Jobscript (CSF3) | SLURM Jobscript (CSF4) |
---|---|
#!/bin/bash --login #$ -cwd # Run in current directory # Run 100 tasks numbered 1,2,...,100 # (cannot start at zero!!) # Max permitted array size: 75000 #$ -t 1-100 # Default in SGE is to use 1 core # Modules have a different name format module load apps/gcc/appname/x.y.z # EG: input files are named data.1, data.2, ... # and output files result.1, result.2, ... app.exe -in data.$SGE_TASK_ID \ -out result.$SGE_TASK_ID |
#!/bin/bash --login # Default in SLURM # Run 100 tasks numbered 1,2,...,100 # (can start at zero, e.g.: 0-99) # Max permitted array size: 10000 #SBATCH -a 1-100 # (or --array=1-100) # Note: This is the number of cores to use # for each jobarray task, not the number of # tasks in the job array (see above). #SBATCH -n 1 # (or --ntasks=1) use 1 core # OPTIONAL LINE: default partition is serial #SBATCH -p serial # (or --partition=serial) # Modules have a different name format module load appname/x.y.z # EG: input files are named data.1, data.2, ... # and output files result.1, result.2, ... app.exe -in data.$SLURM_ARRAY_TASK_ID \ -out result.$SLURM_ARRAY_TASK_ID |
More Jobscript special lines – SGE vs SLURM
Here are some more example jobscripts special lines for achieving things in SGE and SLURM.
Renaming a job and the output .o and .e files
SGE Jobscript | SLURM Jobscript |
---|---|
#!/bin/bash --login ... # Naming the job is optional. # Default is name of jobscript # DOES rename .o and .e output files. #$ -N jobname # Naming the output files is optional. # Default is separate .o and .e files: # jobname.oJOBID and jobname.eJOBID # Use of '-N jobname' DOES affect those defaults #$ -o myjob.out #$ -e myjob.err # To join .o and .e in to a single file # similar to Slurm's default behaviour: #$ -j y |
#!/bin/bash --login ... # Naming the job is optional. # Default is name of jobscript # Does NOT rename .out file. #SBATCH -J jobname # Naming the output files is optional. # Default is a single file for .o and .e: # slurm-JOBID.out # Use of '-J jobname' does NOT affect the default #SBATCH -o myjob.out #SBATCH -e myjob.err # Use wildcards to recreate the SGE names #SBATCH -o %x.o%j # %x = SLURM_JOB_NAME #SBATCH -e %x.e%j # %j = SLURM_JOB_ID |
The $SLURM_JOB_NAME
variable will tell you the name of your jobscript, unless the -J jobname
variable is used to rename your job. Then the env var is set to the value of jobname.
If you wanted to use $SLURM_JOB_NAME
to always give you the name of the jobscript from within your job, you would have to remove the -J
flag. However, the following command run inside your jobscript will give you the name of the jobscript regardless of whether you use the -J
flag or not:
scontrol show jobid $SLURM_JOB_ID | grep Command= | awk -F/ '{print $NF}'
Renaming an array job output .o and .e files
An array job uses slurm-ARRAYJOBID_TASKID.out
as the default output file for each task in the array job. This can be renamed but you need to use the %A
and %a
wildcards (not %j
).
SGE Jobscript | SLURM Jobscript |
---|---|
#!/bin/bash --login ... # An array job (cannot start at 0) #$ -t 1-1000 # Naming the job is optional. # Default is name of jobscript #$ -N jobname # Naming the output files is optional. # Default is separate .o and .e files: # jobname.oJOBID and jobname.eJOBID # Use of '-N jobname' DOES affect those defaults # To join .o and .e in to a single file # similar to Slurm's default behaviour: #$ -j y |
#!/bin/bash --login ... # An array job (CAN start at 0) #SBATCH -a 0-999 # (or --array=0-999) # Naming the job is optional. # Default is name of jobscript #SBATCH -J jobname # Naming the output files is optional. # Default is a single file for .o and .e: # slurm-ARRAYJOBID_TASKID.out # Use of '-J jobname' does NOT affect the default # Use wildcards to recreate the SGE names #SBATCH -o %x.o%A.%a # %x = SLURM_JOB_NAME #SBATCH -e %x.e%A.%a # %A = SLURM_ARRAY_JOB_ID # %a = SLURM_ARRAY_TASK_ID |
Emailing from a job
SLURM can email you when your job begins, ends or fails.
SGE Jobscript | SLURM Jobscript |
---|---|
#!/bin/bash --login ... # Mail events: begin, end, abort #$ -m bea #$ -M emailaddr@manchester.ac.uk |
#!/bin/bash --login ... # Mail events: NONE, BEGIN, END, FAIL, ALL #SBATCH --mail-type=ALL #SBATCH --mail-user=emailaddr@manchester.ac.uk |
Note that in SLURM, array jobs only send one email, not an email per job-array tasks as happens in SGE. If you want an email from every job-array task, add ARRAY_TASKS
to the --mail
flag:
#SBATCH --mail-type=ALL,ARRAY_TASKS # # DO NOT USE IF YOUR ARRAY JOB CONTAINS MORE THAN # 20 TASKS!! THE UoM MAIL ROUTERS WILL BLOCK THE CSF!
But please be aware that you will receive A LOT of email if you run a large job array with this flag enabled.
Job Environment Variables
A number of environment variables are available for use in your jobscripts – these are sometimes useful when creating your own log files, for informing applications how many cores they are allowed to use (we’ve already seen $SLURM_NTASKS
in the examples above), and for reading sequentially numbered data files in job arrays.
SGE Environment Variables | SLURM Environment Variables |
---|---|
$NSLOTS # Num cores reserved $JOB_ID # Unique jobid number $JOB_NAME # Name of job # For array jobs $JOB_ID # Same for all tasks # (e.g, 20173) $SGE_TASK_ID # Job array task number # (e.g., 1,2,3,...) $SGE_TASK_FIRST # First task id $SGE_TASK_LAST # Last task id $SGE_TASK_STEPSIZE # Taskid increment: default 1 # You will be unlikely to use these: $PE_HOSTFILE # Multi-node job host list $NHOSTS # Number of nodes in use $SGE_O_WORKDIR # Submit directory |
$SLURM_NTASKS # Num cores from -n flag $SLURM_CPUS_PER_TASK # Num cores from -c flag $SLURM_JOB_ID # Unique job id number $SLURM_JOB_NAME # Name of job # For array jobs $SLURM_JOB_ID # DIFFERENT FOR ALL TASKS # (e.g, 20173,20174,20175,) $SLURM_ARRAY_JOB_ID # SAME for all tasks # (e.g, 20173) $SLURM_ARRAY_TASK_ID # Job array task number # (e.g., 1,2,3,...) $SLURM_ARRAY_TASK_MIN # First task id $SLURM_ARRAY_TASK_MAX # Last task id $SLURM_ARRAY_TASK_STEP # Increment: default 1 $SLURM_ARRAY_TASK_COUNT # Number of tasks # You will be unlikely to use these: $SLURM_JOB_NODELIST # Multi-node job host list $SLURM_JOB_NUM_NODES # Number of nodes in use $SLURM_SUBMIT_DIR # Submit directory |
Many more environment variables are available for use in your jobscript. The Slurm sbatch manual (also available on the CSF login node by running man sbatch
) documents Input and Output environment variables. The input variables can be set by you before submitting a job to set job options (although we recommend not doing this – it is better to put all options in your jobscript so that you have a permanent record of how you ran the job). The output variables can be used inside your jobscript to get information about the job (e.g., number of cores, job name and so on – we have documented several of these above.)