Slurm Batch Commands (sbatch, squeue, scancel, sacct)
Audience
The instructions in this section are for users who have been asked to test applications in the Slurm batch system on the upgraded CSF3, thereby helping us to ensure the upgraded system is running as expected.
(answering these requests will slow down our upgrade work!)
Batch Commands
Your applications should be run in the batch system. You’ll need a jobscript (a plain text file) describing your job – its CPU, memory and possibly GPU requirements, and also the commands you actually want the job to run.
Further details on how to write jobscripts are in the sections on serial jobs, parallel jobs, job-arrays and GPU jobs (coming soon).
You’ll then use one or more of the following batch system commands to submit your job to the system and check on its status. These commands should be run from the CSF’s login nodes:
sbatch jobscript
- Submit a job to the batch system, usually by submitting a jobscript. Alternatively you can specify job options on the
sbatch
command-line. We recommend using a jobscript because this allows you to easily reuse your jobscript every time you want to run the job. Remembering the command-line options you used (possibly months ago) is much more difficult.The
sbatch
command will return a unique job-ID number if it accepts the job. You can use this in other commands (see below) and, when requesting support about a job, you should include this number in the details you send in.For example, when submitting a job you will see a message similar to:
[mabcxyz1@login1[csf3] ~]$ sbatch myjobscript Submitted batch job 373
For scripting purposes, you may prefer just to receive the jobid number from the
sbatch
command. Add the--parsable
flag to achieve this:sbatch --parsable myjobscript 12345
When submitting a job, if you see the following errors, something is wrong:
-
sbatch: error: Batch job submission failed: No partition specified or system default partition
You must specify a partition, even for serial jobs. Add to your jobscript:
#SBATCH -p partitionname
. -
sbatch: error: Batch job submission failed: Requested time limit is invalid (missing or exceeds some limit)
You must specify a “wallclock” time limit for your job. The maximum permitted is usually 7 days (or 4 days for GPU and HPC Pool jobs.) Add to your jobscript:
#SBATCH -t timelimt
.
-
squeue
- Report the current status of your jobs in the batch system (queued/waiting, running, in error, finished). Note that if you see no jobs listed when you run
squeue
it means you have no jobs in the system – they have all finished or you haven’t submitted any!Some examples:
In this example
squeue
returns no output which means you have no jobs in the queue, either running or waiting:[mabcxyz1@login1[csf3] ~]$ squeue [mabcxyz1@login1[csf3] ~]$
In this example
squeue
shows we have two jobs running (one using 1 core, the other using 8 cores) and one job waiting (it will use 16 cores when it runs.):[mabcxyz1@login1[csf3] ~]$ squeue JOBID PRIORITY PARTITION NAME USER ACCOUNT ST SUBMIT_TIME START_TIME TIME NODES CPUS NODELIST(REASON) 372 0.0000005 multicore mymulticore mabcxyz1 xy01 R 08/03/25 13:02 08/03/25 13:32 2:04 1 8 node1260 371 0.0000005 serial simple.x mabcxyz1 xy01 R 09/03/25 14:58 09/03/25 15:02 8:22 1 1 node603 403 0.0000003 himem mypythonjob mabcxyz1 xy01 PD 11/03/25 09:25 N/A 0:00 1 4 (Resources) # # # ### # # # # # # Number of # # # # # CPU cores # # # # # # # # If running: date & time the job started # # # # If waiting: N/A # # # # # # R - job is running # # # PD - job is queued waiting # # # CG - Completing (contact us, may indicate an error) # # # # Usually the name of your jobscript # # Every job is given a unique job ID number # (please tell us this number if requesting support)
For more information about monitoring jobs, including how to monitor GPU jobs, please see the job monitoring page.
The following commands are used less frequently but can still be run if you need to:
scancel jobid
- To remove your job from the batch system early, either to terminate a running job before it finishes or to simply remove a queued job before it has started running.
Also use this if your job goes in to an error state or you decide you don’t want a job to run.
Note that if your job is in the
CG
state, please leave it in the queue if requesting support. It is easier for us to diagnose the error if we can see the job. We may ask you toscancel
the job once we have looked at it – there is usually no way to fix an existing job.For example, maybe you realise you’ve given a job the wrong input parameters causing it to produce junk results. You don’t need to leave it running to completion (which might be hours or days). Instead you can kill the job using scancel. You need to know the job-ID number of the job:
[mabcxyz1@login1[csf3] ~]$ scancel 12345
The job will eventually be deleted (it may take a minute or two for this to happen). Use
squeue
to check your list of jobs.Please also see the Deleteing Job Arrays notes.
sacct -j jobid
- Advanced users. Once your job has finished you can use this command to get a summary of information for wall-clock time, max memory consumption and exit status amongst many other statistics about the job. This is useful for diagnosing why a job failed.
Further Information
Our own documentation throughout this site provides lots of examples of writing jobscripts and how to submit jobs. SGE also comes with a set of comprehensive man pages. Some of the most useful ones are:
man sbatch
man squeue
man scancel
man sacct