{"id":8988,"date":"2025-03-11T15:22:52","date_gmt":"2025-03-11T15:22:52","guid":{"rendered":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/?page_id=8988"},"modified":"2026-07-07T18:40:02","modified_gmt":"2026-07-07T17:40:02","slug":"s-commands","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/batch-slurm\/s-commands\/","title":{"rendered":"Slurm Batch Commands (sbatch, squeue, scancel, sacct)"},"content":{"rendered":"<h2>Batch Commands<\/h2>\n<p>Your applications should be run in the batch system. You\u2019ll need a <em>jobscript<\/em> (a plain text file) describing your job &#8211; its CPU, memory and possibly GPU requirements, and also the commands you actually want the job to run.<\/p>\n<p>Further details on how to write jobscripts are in the sections on <a href=\"\/csf3\/batch-slurm\/serial-jobs-slurm\">serial jobs<\/a>, <a href=\"\/csf3\/batch-slurm\/parallel-jobs-slurm\">parallel jobs<\/a>, <a href=\"\/csf3\/batch-slurm\/job-arrays-slurm\/\">job-arrays<\/a> and <a href=\"\/csf3\/batch-slurm\/gpu-jobs-slurm\/\">GPU jobs<\/a>.<\/p>\n<p>You\u2019ll then use one or more of the following batch system commands to <em>submit<\/em> your job to the system and check on its status. These commands should be run from the CSF&#8217;s login nodes.<\/p>\n<h2>Job submission using sbatch<\/h2>\n<dl>\n<dt id=\"sbatch\"><code>sbatch <em>jobscript<\/em><\/code><\/dt>\n<dd>Submit a job to the batch system, usually by submitting a <em>jobscript<\/em>. Alternatively you can specify job options on the <code>sbatch<\/code> command-line. We recommend using a jobscript because this allows you to easily reuse your jobscript every time you want to run the job. Remembering the command-line options you used (possibly months ago) is much more difficult.<\/p>\n<p>The <code>sbatch<\/code> command will return a unique job-ID number if it accepts the job. You can use this in other commands (see below) and, when requesting support about a job, you should include this number in the details you send in.<\/p>\n<p>For example, when submitting a job you will see a message similar to:<\/p>\n<pre>\r\n[mabcxyz1@login1[csf3] ~]$ <strong>sbatch <em>myjobscript<\/em><\/strong>\r\nSubmitted batch job 373\r\n<\/pre>\n<p>Adding flags to the command-line will override the same flag in the jobscript, if present, or add a setting if not currently specified in the jobscript. This is a convenient way to submit the job again but with a different setting to what you&#8217;ve put in the jobscript. Command-line flags must come <em>before<\/em> the name of the jobscript. This example submits a job with a 2-day wallclock, which might be different to what you&#8217;ve specified in the jobscript.<\/p>\n<pre>\r\n[mabcxyz1@login1[csf3] ~]$ <strong>sbatch -t 2-0 <em>myjobscript<\/em><\/strong>\r\nSubmitted batch job 374\r\n<\/pre>\n<\/dd>\n<\/dl>\n<h3>sbatch flags<\/h3>\n<p>The <code>sbatch<\/code> command has an <a href=\"https:\/\/slurm.schedmd.com\/sbatch.html\">extensive list of command-line flags<\/a>. In most cases you can also specify these in your jobscripts using one of:<\/p>\n<pre>\r\n#SBATCH <em>-f value<\/em>             # short form flag (e.g., -p)\r\n#SBATCH <em>--flag=value<\/em>         # long form flag (e.g., --partition)\r\n<\/pre>\n<p>We provide details of a few flags here:<\/p>\n<dl>\n<dt><code>-p, --partition=<em>partition<\/em><\/code><\/dt>\n<dd>Slurm partition (think of this as a queue) that contains a certain type of hardware. Your job will run in this partition. For example, a GPU partition or the multicore AMD job partition or the high memory partition. See the <a href=\"\/csf3\/batch-slurm\/partitions\/\">CSF3 partitions<\/a> page.<\/dd>\n<dt><code>-n,--ntasks=<em>NUM<\/em><\/code><\/dt>\n<dd>Number of tasks, which can usually be thought of as the number of cores to be used by your job. You should use this flag if running MPI parallel code, but it can also be used for other parallel jobs. See the <a href=\"\/csf3\/batch-slurm\/parallel-jobs-slurm\/\">CSF parallel jobs<\/a> page.<\/dd>\n<dt><code>-c,--cpus-per-task=<em>NUM<\/em><\/code><\/dt>\n<dd>Number of cores (which Slurm refers to as CPUs) <em>per task<\/em> (see <code>-n<\/code> above). Used to specify the number of cores that each MPI rank should use, so you should use this when running mixed-mode MPI+OpenMP applications. But can also be used when running OpenMP parallel applications. See the <a href=\"\/csf3\/batch-slurm\/parallel-jobs-slurm\/\">CSF parallel jobs<\/a> page.<\/dd>\n<dt><code>-G, --gpus=<em>NUM<\/em><\/code><\/dt>\n<dd>Number of GPUs. Note that the GPU type (A100, L40S, and so on) is determined by the GPU partition you submit to. See the <a href=\"\/csf3\/batch-slurm\/gpu-jobs-slurm\/\">CSF3 GPU jobs<\/a> page.<\/dd>\n<dt><code>-t,--time=<em>wallclock<\/em><\/code><\/dt>\n<dd>Wallclock time limit for your job. See the <a href=\"\/csf3\/batch-slurm\/timelimits-slurm\/\">CSF3 time limits<\/a> page.<\/dd>\n<dt><code>-a,--array=<em>start<\/em>-<em>end<\/em><\/code><\/dt>\n<dd>Job array to run multiple copies of the same job. See <a href=\"\/csf3\/batch-slurm\/job-arrays-slurm\/\">CSF3 job arrays<\/a> page for further options and examples.<\/dd>\n<dt id=\"jobemail\"><code>--mail-type=<em>flag<\/em><\/code><\/dt>\n<dd>Specify when you want to receive an email about the job. The flag can be one or more (a comma-separated list) of <code>NONE<\/code>, <code>BEGIN<\/code>, <code>END<\/code>, <code>FAIL<\/code>, <code>REQUEUE<\/code>, <code>ALL<\/code>. Please consider how many jobs you are submitting and how many emails you might receive based on you choice of flags.<\/dd>\n<dt><code>--mail-user=<em>emailaddress<\/em><\/code><\/dt>\n<dd>Specify the email address to receive emails sent from Slurm. This can be your Manchester address or an external address.<\/p>\n<p>To avoid needing to specify this flag on every job, you can place your email address in a file named <code>.forward<\/code> (yes, with a &#8220;dot&#8221; at the start of the name). For example, to create this file, run on the login node the command:<\/p>\n<pre>\r\necho <em>my.name<\/em>@manchester.ac.uk > ~\/.forward\r\n        #\r\n        # Change my.name to YOUR own correct email address!\r\n        # This can be a comma-separated list of addresses.\r\n<\/pre>\n<p>Then you can submit jobs using only the <code>--mail-type<\/code> flag (see above.)\n<\/dd>\n<dt><code>-d, --dependency=<em>dependencylist<\/em><\/code><\/dt>\n<dd>See the <a href=\"\/csf3\/batch-slurm\/job-dependencies\/\">Job Dependencies<\/a> section.<\/dd>\n<dt><code>--parsable<\/code><\/dt>\n<dd>For scripting purposes, you may prefer just to receive the jobid number from the <code>sbatch<\/code> command. Add the <code>--parsable<\/code> flag to achieve this:<\/p>\n<pre>sbatch <strong>--parsable<\/strong> <em>myjobscript<\/em>\r\n12345\r\n<\/pre>\n<\/dd>\n<h3>Error messages<\/h3>\n<p>When submitting a job, if you see the following errors, something is wrong:<\/p>\n<pre>sbatch: error: Batch job submission failed: No partition specified or system default partition\r\n<\/pre>\n<p>You <strong>must<\/strong> specify a partition, even for serial jobs. Add to your jobscript: <code>#SBATCH -p <em>partitionname<\/em><\/code>.<\/p>\n<pre>sbatch: error: Batch job submission failed: Requested time limit is invalid (missing or exceeds some limit)\r\n<\/pre>\n<p>You <strong>must<\/strong> specify a &#8220;wallclock&#8221; time limit for your job. The maximum permitted is usually 7 days (or 4 days for GPU and HPC Pool jobs.) Add to your jobscript: <code>#SBATCH -t <em>timelimt<\/em><\/code>.<\/dd>\n<h2>Job Status using squeue<\/h2>\n<dl>\n<dt id=\"squeue\"><code>squeue<\/code><\/dt>\n<dd>Report the current status of <em>your<\/em> jobs in the batch system (queued\/waiting, running, in error, finished). Note that if you see no jobs listed when you run <code>squeue<\/code> it means you have no jobs in the system &#8211; they have all finished or you haven&#8217;t submitted any!<\/dd>\n<\/dl>\n<h3>Examples<\/h3>\n<p>In this example <code>squeue<\/code> returns <em>no output<\/em> which means you have <em>no jobs<\/em> in the queue, either running or waiting:<\/p>\n<pre>[mabcxyz1@login1[csf3] ~]$ <strong>squeue<\/strong>\r\n[mabcxyz1@login1[csf3] ~]$\r\n<\/pre>\n<p>In this example <code>squeue<\/code> shows we have two jobs running (one using 1 core, the other using 8 cores) and one job waiting (it will use 16 cores when it runs.):<\/p>\n<pre>[mabcxyz1@login1[csf3] ~]$ <strong>squeue<\/strong>\r\nJOBID PRIORITY  PARTITION NAME         USER     ST SUBMIT_TIME    START_TIME     TIME NODES CPUS NODELIST(REASON)\r\n  372 0.0000005 multicore mymulticore  mabcxyz1 R  08\/03\/25 13:02 08\/03\/25 13:32 2:04     1    8 node1260\r\n  371 0.0000005 serial    simple.x     mabcxyz1 R  09\/03\/25 14:58 09\/03\/25 15:02 8:22     1    1 node603\r\n  403 0.0000003 himem     mypythonjob  mabcxyz1 PD 11\/03\/25 09:25 N\/A            0:00     1    4 (Resources)\r\n   #                          #                 #                 ###                          #\r\n   #                          #                 #                  #                           # Number of\r\n   #                          #                 #                  #                           # CPU cores\r\n   #                          #                 #                  #\r\n   #                          #                 #                  # If running: date &amp; time the job started\r\n   #                          #                 #                  # If waiting: N\/A\r\n   #                          #                 #\r\n   #                          #                 # R   - job is running\r\n   #                          #                 # PD  - job is queued waiting\r\n   #                          #                 # CG  - Completing (contact us, may indicate an error)\r\n   #                          #\r\n   #                          # Usually the name of your jobscript\r\n   #\r\n   # Every job is given a unique job ID number\r\n   # (<strong>please tell us this number if requesting support<\/strong>)\r\n<\/pre>\n<h3 id=\"reasons\">Reasons for Pending Jobs<\/h3>\n<p>If your job is queued you might see one of the following reasons in the <code>squeue<\/code> output:<\/p>\n<dl>\n<dt><code>(QOSMaxCpuPerUserLimit)<\/code><\/dt>\n<dd>You have reached a global limit for the type of CPU resources you need. Free at the point of use users may see this more frequently than members of contributing groups due to the max-cores-in-use limit applied to all f@pou users.<\/dd>\n<dt><code>(AssocGrpGRES)<\/code><\/dt>\n<dd>The group limit of the Account your jobs run in has been reached (i.e., a total of all users using that Account). The limit might be a resource limit (e.g., number of GPUs) or a job limit (e.g., max number of jobs).<\/dd>\n<dt><code>(QOSMaxGRESPerUser)<\/code><\/dt>\n<dd>You have reached the maximum number of GPUs you can use.<\/dd>\n<dt><code>(Resources)<\/code><\/dt>\n<dd>The resource you&#8217;ve requested (e.g., GPUs) are not available &#8211; they&#8217;re in use by other jobs. It might be that all GPUs are now in use.<\/dd>\n<dt><code>(Priority)<\/code><\/dt>\n<dd>There are jobs ahead of yours in the queue, with a higher priority, that will be scheduled to run before yours. Job priority is a complex calculation, but one of the main factors is the amount of work (jobs) you&#8217;ve run in the last month. If you&#8217;ve already run a lot of jobs, you&#8217;ll receive a slightly lower priority that someone who hasn&#8217;t run much recently. But this &#8220;penalty&#8221; for running a lot of work decays over time so that the CSF doesn&#8217;t penalise your high usage for too long.<\/dd>\n<dt><code>(MaxCpuPerAccount)<\/code><\/dt>\n<dd>The group you are a part of has reached a global limit.<\/dd>\n<dt><code>(DependencyNeverSatisfied)<\/code><\/dt>\n<dd>The job was waiting on an earlier job to achieve some state &#8211; e.g., finish successfully. But if that job failed, then the dependency could not be met, so the current job was not allowed to start. To check what happened to the earlier job, have a look at the current job&#8217;s dependency info:<\/p>\n<pre>\r\nsqueue\r\nJOBID PRIORITY  PARTITION NAME         USER     ST SUBMIT_TIME    START_TIME TIME  NODES CPUS NODELIST(REASON)\r\n  <strong>372<\/strong> 0.0000005 multicore mymulticore  mabcxyz1 <strong>PD<\/strong> 08\/03\/25 13:02 N\/A        0:00  1     8    (<strong>DependencyNeverSatisfied<\/strong>)\r\n\r\nscontrol show job <strong>372<\/strong> | grep Depend\r\n  JobState=PENDING Reason=DependencyNeverSatisfied Dependency=<strong>afterok:361(failed)<\/strong>\r\n<\/pre>\n<p>The <code>afterok:361(failed)<\/code> tells you that the current job (372) was given a dependency on job 361 and that job 361 needed to finish successfully (<code>afterok<\/code>) before 372 would be allowed to run. But instead job 361 <code>failed<\/code>. So job 372&#8217;s dependency will never be satisfied and so it will never run.<\/p>\n<p>You should remove this job from the queue (see <code>scancel<\/code> below).<\/dd>\n<dt><code>(ReqNodeNotAvail, Reserved for maintenance)<\/code><\/dt>\n<dd>The resources you have requested have been flagged for maitenance and as such are temporarily unavailable for your job. Your job will queue until the resource becomes available again. For significant\/lenghthy maintenance work we will always advise all users in advance by email.<\/dd>\n<\/dl>\n<h3>Changing the squeue output format<\/h3>\n<p>You can modify the list of fields (columns) output by the <code>squeue<\/code> command by setting the <code>$SQUEUE_FORMAT<\/code> or <code>$SQUEUE_FORMAT2<\/code> environment variables. In fact, the default set of columns you see is given by the first variable &#8211; it has a default value when you login to the CSF. To see the value, run:<\/p>\n<pre>echo $SQUEUE_FORMAT\r\n%.15i %9p %9P %15j %8u %2t %14V %14S %10M %.6D %.5C %R\r\n<\/pre>\n<p>For more information on the two <code>SQUEUE_FORMAT<\/code> env vars and the column codes you can use, run <code>man squeue<\/code>.<\/p>\n<h3>GPU job status using gpusqueue \/ gpustat<\/h3>\n<p>The Research Infrastructure team provide the following extra command:<\/p>\n<dl>\n<dt id=\"gpusqueue\"><code>gpusqueue<\/code> \/ <code>gpustat<\/code><\/dt>\n<dd>To see a list of GPU jobs you have in the system, you can use the custom <code>gpusqueue<\/code> command. This runs <code>squeue<\/code> but only shows GPU jobs and adds in some extra columns to show the types of GPUs requested:<\/p>\n<pre>\r\n# Show GPU job information\r\ngpusqueue\r\n<\/pre>\n<\/dd>\n<\/dl>\n<p>To monitor running GPU jobs, please see the <a href=\"\/csf3\/software\/tools\/nvitop\/\">nvitop<\/a> command.<\/p>\n<h2>Delete a Job using scancel<\/h2>\n<dl>\n<dt id=\"scancel\"><code>scancel <em>jobid<\/em><\/code><\/dt>\n<dd>To remove <em>your<\/em> job from the batch system early, either to terminate a running job before it finishes or to simply remove a queued job before it has started running.<\/dd>\n<\/dl>\n<p>Also use this if your job goes in to an error state or you decide you don&#8217;t want a job to run.<\/p>\n<p>Note that if your job is in the <code>CG<\/code> state, <strong>please leave it in the queue if requesting support<\/strong>. It is easier for us to diagnose the error if we can see the job. We may ask you to <code>scancel<\/code> the job once we have looked at it &#8211; there is usually no way to fix an existing job.<\/p>\n<p>For example, maybe you realise you&#8217;ve given a job the wrong input parameters causing it to produce junk results. You don&#8217;t need to leave it running to completion (which might be hours or days). Instead you can <em>kill<\/em> the job using scancel. You need to know the job-ID number of the job:<\/p>\n<pre>[mabcxyz1@login1[csf3] ~]$ <strong>scancel <em>12345<\/em><\/strong>\r\n<\/pre>\n<p>The job will eventually be deleted (it may take a minute or two for this to happen). Use <code>squeue<\/code> to check your list of jobs.<\/p>\n<p>To delete <em>all<\/em> of your jobs:<\/p>\n<pre># Delete <strong>all<\/strong> of your jobs. Use with CAUTION!\r\n[mabcxyz1@login1[csf3] ~]$ <strong>scancel --me<\/strong>\r\n<\/pre>\n<p>Please also see the <a href=\"\/csf3\/batch-slurm\/job-arrays-slurm\/#Deleting_Job_Arrays\">Deleteing Job Arrays<\/a> notes.<\/p>\n<h3>Get Info About Finished Jobs<\/h3>\n<dl>\n<dt id=\"sacct\"><code>sacct -j <em>jobid<\/em><\/code><\/dt>\n<dd>Once your job has finished you can use this command to get a summary of information for wall-clock time, max memory consumption and exit status amongst many other statistics about the job. This is useful for diagnosing why a job failed.<\/p>\n<p>A lot of information about a job is available using this command. To see a list of every possible field, run <code>sacct -e<\/code>. To have a &#8220;long&#8221; list of fields automatically displayed when querying a job, use <code>sacct --long -j <em>jobid<\/em><\/code>. Run <code>man sacct<\/code> for further info about his command.<\/p>\n<p>For info about the <code>extern<\/code> step that now appears in the <code>sacct<\/code> output, please see <a href=\"\/csf3\/faqs\/user-faq\/#ufac0220\">this FAQ answer<\/a>.<\/dd>\n<\/dl>\n<h2>Further Information<\/h2>\n<p>Our own documentation throughout this site provides lots of examples of writing jobscripts and how to submit jobs. SGE also comes with a set of comprehensive <em>man pages<\/em>. Some of the most useful ones are:<\/p>\n<ul>\n<li><code>man sbatch<\/code><\/li>\n<li><code>man squeue<\/code><\/li>\n<li><code>man scancel<\/code><\/li>\n<li><code>man sacct<\/code><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Batch Commands Your applications should be run in the batch system. You\u2019ll need a jobscript (a plain text file) describing your job &#8211; its CPU, memory and possibly GPU requirements, and also the commands you actually want the job to run. Further details on how to write jobscripts are in the sections on serial jobs, parallel jobs, job-arrays and GPU jobs. You\u2019ll then use one or more of the following batch system commands to submit.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/batch-slurm\/s-commands\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"parent":9105,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-8988","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/8988","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/comments?post=8988"}],"version-history":[{"count":21,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/8988\/revisions"}],"predecessor-version":[{"id":12426,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/8988\/revisions\/12426"}],"up":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/9105"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/media?parent=8988"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}