Slurm Batch Commands (sbatch, squeue, scancel, sacct)

Audience

The instructions in this section are for users who have been asked to test applications in the Slurm batch system on the upgraded CSF3, thereby helping us to ensure the upgraded system is running as expected.

PLEASE DO NOT SUBMIT A REQUEST ASKING FOR ACCESS TO THE NEW SYSTEM – WE WILL CONTACT YOU AT THE APPROPRIATE TIME!!!
(answering these requests will slow down our upgrade work!)

Batch Commands

Your applications should be run in the batch system. You’ll need a jobscript (a plain text file) describing your job – its CPU, memory and possibly GPU requirements, and also the commands you actually want the job to run.

Further details on how to write jobscripts are in the sections on serial jobs, parallel jobs, job-arrays and GPU jobs (coming soon).

You’ll then use one or more of the following batch system commands to submit your job to the system and check on its status. These commands should be run from the CSF’s login nodes:

sbatch jobscript
Submit a job to the batch system, usually by submitting a jobscript. Alternatively you can specify job options on the sbatch command-line. We recommend using a jobscript because this allows you to easily reuse your jobscript every time you want to run the job. Remembering the command-line options you used (possibly months ago) is much more difficult.

The sbatch command will return a unique job-ID number if it accepts the job. You can use this in other commands (see below) and, when requesting support about a job, you should include this number in the details you send in.

For example, when submitting a job you will see a message similar to:

[mabcxyz1@login1[csf3] ~]$ sbatch myjobscript
Submitted batch job 373

For scripting purposes, you may prefer just to receive the jobid number from the sbatch command. Add the --parsable flag to achieve this:

sbatch --parsable myjobscript
12345

When submitting a job, if you see the following errors, something is wrong:

  • sbatch: error: Batch job submission failed: No partition specified or system default partition
    

    You must specify a partition, even for serial jobs. Add to your jobscript: #SBATCH -p partitionname.

  • sbatch: error: Batch job submission failed: Requested time limit is invalid (missing or exceeds some limit)
    

    You must specify a “wallclock” time limit for your job. The maximum permitted is usually 7 days (or 4 days for GPU and HPC Pool jobs.) Add to your jobscript: #SBATCH -t timelimt.

squeue
Report the current status of your jobs in the batch system (queued/waiting, running, in error, finished). Note that if you see no jobs listed when you run squeue it means you have no jobs in the system – they have all finished or you haven’t submitted any!

Some examples:

In this example squeue returns no output which means you have no jobs in the queue, either running or waiting:

[mabcxyz1@login1[csf3] ~]$ squeue
[mabcxyz1@login1[csf3] ~]$

In this example squeue shows we have two jobs running (one using 1 core, the other using 8 cores) and one job waiting (it will use 16 cores when it runs.):

[mabcxyz1@login1[csf3] ~]$ squeue
JOBID PRIORITY  PARTITION NAME         USER     ACCOUNT ST SUBMIT_TIME    START_TIME     TIME NODES CPUS NODELIST(REASON)
  372 0.0000005 multicore mymulticore  mabcxyz1 xy01    R  08/03/25 13:02 08/03/25 13:32 2:04     1    8 node1260
  371 0.0000005 serial    simple.x     mabcxyz1 xy01    R  09/03/25 14:58 09/03/25 15:02 8:22     1    1 node603
  403 0.0000003 himem     mypythonjob  mabcxyz1 xy01    PD 11/03/25 09:25 N/A            0:00     1    4 (Resources)
   #                          #                         #                 ###                          #
   #                          #                         #                  #                           # Number of
   #                          #                         #                  #                           # CPU cores
   #                          #                         #                  #
   #                          #                         #                  # If running: date & time the job started
   #                          #                         #                  # If waiting: N/A
   #                          #                         #
   #                          #                         # R   - job is running
   #                          #                         # PD  - job is queued waiting
   #                          #                         # CG  - Completing (contact us, may indicate an error)
   #                          #
   #                          # Usually the name of your jobscript
   #
   # Every job is given a unique job ID number
   # (please tell us this number if requesting support)

For more information about monitoring jobs, including how to monitor GPU jobs, please see the job monitoring page.

The following commands are used less frequently but can still be run if you need to:

scancel jobid
To remove your job from the batch system early, either to terminate a running job before it finishes or to simply remove a queued job before it has started running.

Also use this if your job goes in to an error state or you decide you don’t want a job to run.

Note that if your job is in the CG state, please leave it in the queue if requesting support. It is easier for us to diagnose the error if we can see the job. We may ask you to scancel the job once we have looked at it – there is usually no way to fix an existing job.

For example, maybe you realise you’ve given a job the wrong input parameters causing it to produce junk results. You don’t need to leave it running to completion (which might be hours or days). Instead you can kill the job using scancel. You need to know the job-ID number of the job:

[mabcxyz1@login1[csf3] ~]$ scancel 12345

The job will eventually be deleted (it may take a minute or two for this to happen). Use squeue to check your list of jobs.

Please also see the Deleteing Job Arrays notes.

sacct -j jobid
Advanced users. Once your job has finished you can use this command to get a summary of information for wall-clock time, max memory consumption and exit status amongst many other statistics about the job. This is useful for diagnosing why a job failed.

Further Information

Our own documentation throughout this site provides lots of examples of writing jobscripts and how to submit jobs. SGE also comes with a set of comprehensive man pages. Some of the most useful ones are:

  • man sbatch
  • man squeue
  • man scancel
  • man sacct

Last modified on March 12, 2025 at 1:53 pm by George Leaver