{"id":774,"date":"2018-11-08T12:12:39","date_gmt":"2018-11-08T12:12:39","guid":{"rendered":"http:\/\/ri.itservices.manchester.ac.uk\/csf3\/?page_id=774"},"modified":"2026-04-02T16:27:14","modified_gmt":"2026-04-02T15:27:14","slug":"tutorial","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/getting-started\/tutorial\/","title":{"rendered":"Batch System 10 Minute Tutorial (Slurm)"},"content":{"rendered":"<h2>Introduction<\/h2>\n<p>This page offers new CSF users a tutorial that covers usage of the <em>Slurm batch system<\/em> to run a simple job on the CSF.<\/p>\n<p>The tutorial also provides some information about the storage areas on the CSF, and also some common Linux commands used to manage your files.<\/p>\n<p>After doing the tutorial you&#8217;ll be able to use the CSF. A further tutorial is also available for running more complicated parallel jobs.<\/p>\n<p>If you are interested in attending the 1-day Intro to CSF training course which runs a couple of times each semester, please take a look at the <a href=\"http:\/\/app.manchester.ac.uk\/rhpc\">course booking page<\/a> for details of the schedule and availability.<\/p>\n<p>Before we begin the tutorial we&#8217;ll explain what the <em>batch system<\/em> is and why we need to use it.<\/p>\n<h2>Background: What is a batch system and why use it?<\/h2>\n<p>Click on each header below to expand the section:<\/p>\n<details>\n<summary class=\"h3\">Join the queue&#8230;<\/summary>\n<p>Initially a batch system can be thought of as a job queue. You submit jobs to the queue and the system will pick them out of the queue to run them.<\/p>\n<p>The jobs will do whatever commands you ask them to do (for example run an app such as a chemistry app, or a bioinformatics app or whatever application is appropriate to your work).<\/p>\n<p>When the jobs finish, you should have some new files containing the results.<\/p>\n<p>At this point you might be thinking you don&#8217;t like the idea of your work (jobs) waiting in a queue. How long will it queue for? Why can&#8217;t it just run immediately? Read on to find out more.<br \/>\n<\/details>\n<details>\n<summary class=\"h3\">Ask for extra memory, cores or a GPU?<\/summary>\n<p>The applications you&#8217;ll be running on the CSF usually need different amounts of memory, number of CPU cores or even GPUs. <\/p>\n<p>You can request these specific resources for your job. Need a GPU? Simply request it in your job. Need a lot of memory to process a huge dataset? Simply ask for it.<\/p>\n<p>The <em>batch system<\/em> ensures your job only runs when <em>all<\/em> of the required resources are available. It then allocates those resources to your job (so that it runs correctly) and makes sure no other jobs can grab <em>your<\/em> resources.<\/p>\n<p>But don&#8217;t worry if you&#8217;re not sure what resources you&#8217;ll need &#8211; there are sensible defaults. After trying the defaults, you might find your app needs more memory to process your data, or that it can use more CPU cores to make it run faster.<\/p>\n<p>So you might find that your first few attempts at running jobs don&#8217;t actually complete successfully. Maybe you&#8217;ll need to run the jobs again but request more memory. Don&#8217;t worry &#8211; failed jobs don&#8217;t do any harm. You can simply delete the output files from these failed jobs (if there are any), modify your <em>jobscript<\/em> to ask for more resources (more memory, CPUs, &#8230;) and then resubmit your jobs.<br \/>\n<\/details>\n<details>\n<summary class=\"h3\">Fair usage<\/summary>\n<p>The batch system also ensures fair usage for you and others &#8211; there are <em>many<\/em> users and jobs on the system, all making different demands of the resources (memory, CPU cores, GPUs) and so allowing the batch system to choose exactly when to run your job is the only sensible way of running the system. <\/p>\n<p>The fact that jobs are starting and finishing all the time means you rarely have to wait very long for your requested resource to become free so that your jobs can start.<\/p>\n<p>There are other factors which control when jobs run (and how many of your jobs can run at the same time) but the use of a job queue should <em>not<\/em> put you off using the system.<br \/>\n<\/details>\n<details>\n<summary class=\"h3\">Let the CSF get on with it<\/summary>\n<p>An added bonus of a batch system is that once you&#8217;ve <em>submitted<\/em> your jobs to the system, you don&#8217;t actually need to remain logged in. You can log off, go home or go to a meeting or do something else with your PC\/laptop.<\/p>\n<p>Meanwhile the batch system will run your jobs. It can even email you when a job has finished.<\/p>\n<p><em>Without<\/em> a batch system, you would need to remain logged in to the CSF <em>until the job had finished<\/em>, which could be a problem for a simulation that takes several days to complete.<br \/>\n<\/details>\n<details>\n<summary class=\"h3\">No GUI<\/summary>\n<p>Something to note about batch jobs is that you never see an application&#8217;s graphical user interface (GUI), if it has one. Batch jobs run without any interaction &#8211; all options \/ flags \/ input files etc will be specified on the command-line in a <em>jobscript<\/em> (more on those later).<\/p>\n<p>When the app is running, all output will be saved to files. This will be a new way of working if you are used to running an app in a desktop environment (e.g., on Windows).<br \/>\n<\/details>\n<details>\n<summary class=\"h3\">Can I just run my app on the Login Node?<\/summary>\n<p>Running your code or application directly on the login nodes is <em><strong>not<\/strong><\/em> permitted.<\/p>\n<p>The login nodes are for other tasks (transferring files on and off the system, editing jobscripts, submitting jobs to the system, checking results.) They don&#8217;t have a lot of memory, nor many cores, so trying to run your apps there is inefficient and may also adversely affect other users.<\/p>\n<p><strong>Applications found running on the login nodes may be killed by the sysadmins without warning.<\/strong><\/p>\n<div class=\"hint\">\nPlease do take the time to learn about the batch system. While it may be an unfamiliar way of working initially, particularly if you are used to simply running your apps immediately on a desktop PC, there are actually a lot of benefits to using the batch system &#8211; you&#8217;ll see it is a very powerful way of working as you begin to do your real work.<\/p>\n<p>In this tutorial you can try out the sample job below &#8211; it shouldn&#8217;t take more than 10 minutes to work through the instructions on this page.\n<\/p><\/div>\n<\/details>\n<details>\n<summary class=\"h3\">Which batch system does the CSF use?<\/summary>\n<p>The CSF3 now runs the Slurm batch system.<\/p>\n<p>The three main Slurm commands you use are <code>sbatch<\/code>, <code>squeue<\/code> and possibly <code>scancel<\/code>.<\/p>\n<\/details>\n<h2>10 Minute Tutorial: Submitting a First Job to the Batch System<\/h2>\n<p>This tutorial assumes you are <em>already<\/em> logged in to the CSF &#8211; please see the <a href=\"\/csf3\/getting-started\/connecting\/\">login instructions<\/a> for more information.<\/p>\n<p>Here we describe in detail how to submit a simple, first job to the batch system. Please read all of the text, don&#8217;t just look for the commands to type, as it will explain <em>why<\/em> you need to run the commands.<\/p>\n<h3>What type of job will we run?<\/h3>\n<p>We will run a <em>serial<\/em> job &#8211; i.e., it uses only one CPU core. We&#8217;ll see later that many of the real applications on the CSF can use more than one CPU core (a <em>multi-core<\/em> job) to speed up their processing, giving you the results sooner.<\/p>\n<p>You could also request more memory than the default 5-6GB of RAM. You could also request a GPU.<\/p>\n<p>But initially a simple 1-core (<em>serial<\/em>) job will help you become familiar with the principles of the batch system. These jobs are very common &#8211; you may well want to use this type of job in your real work after the tutorial.<\/p>\n<p>Please remember: <strong>Do not<\/strong> simply run jobs on the login node &#8211; use the batch system as described below.<\/p>\n<h3>Step 0: Create a Folder for the Job Files<\/h3>\n<p>In the following steps we will be creating a <em>jobscript<\/em> file. We will explain more about the file in the next step. The job will also create some files (any output generated by the job is saved to files).<\/p>\n<p>Hence we first create a directory (folder) for the job to keep all of the files together in one place. This is important &#8211; you will likely run a lot of jobs on the CSF so it will be easier to manage all of your work if you keep your files tidy.<\/p>\n<p>When you log in to the CSF you are placed in your <em>home directory<\/em>. This area of storage is private to you and, importantly, is backed-up (not all storage areas on the CSF are backed-up). It is <strong>strongly recommended<\/strong> that you keep important files in your <em>home directory<\/em> for safe keeping &#8211; and this includes your jobscripts.<\/p>\n<p>Once you have <a href=\"\/csf3\/getting-started\/connecting\/\">logged in<\/a> you&#8217;ll be at the <em>command-line prompt<\/em>:<\/p>\n<pre class=\"slurm\">\r\n<strong>[<em>mxyzabc1<\/em>@login1<span class=\"csf3el9promptname\">[csf3]<\/span> ~]$<\/strong>   <em>you will type your commands here, \"at the prompt\"<\/em>\r\n  ^            ^   ^   ^\r\n  |            |   |   | \r\n  |            |   |   +--- The directory (folder) you are currently <em>in<\/em>.\r\n  |            |   |        ~ means your <em>home<\/em> folder, which is your private folder.\r\n  |            |   |\r\n  |            |   +--- Name of the system\r\n  |            |\r\n  |            +--- Name of the login node (some systems have more than one login node)\r\n  |\r\n  +--- Your username appears here\r\n<\/pre>\n<p>Now create a directory (usually referred to as a <em>folder<\/em> in Windows or MacOS) in your CSF <em>home<\/em> storage area, for our first test job, by running the following commands at the prompt:<\/p>\n<pre>\r\n# All of these commands are run on the CSF login node at the <em>prompt<\/em>\r\nmkdir ~\/first-job            # Make (create) the directory (folder)\r\ncd ~\/first-job               # Change to (go into) the directory (folder)\r\n<\/pre>\n<p>Notice that the <em>prompt<\/em> has changed to indicate you have moved in to the <code>first-job<\/code> folder:<\/p>\n<pre>\r\n[<em>mxyzabc1<\/em>@login1<span class=\"csf3el9promptname\">[csf3]<\/span> <strong>first-job<\/strong>]$   \r\n                           ^\r\n                           |\r\n                           +--- The prompt shows we are now in the first-job folder\r\n<\/pre>\n<h3>Step 1: Create a &#8220;Jobscript&#8221; &#8211; a job description file<\/h3>\n<p>The <em>jobscript<\/em> file is the thing you submit to the batch system (i.e, the queue of jobs.) It is just a simple plain-text file. It serves two main purposes:<\/p>\n<ol>\n<li>It specifies the number of CPU cores, memory, maximum time the job is allowed to run for, and other resources you need to run your application.<\/li>\n<li>It specifies the actual command(s) needed to run your application and anything else your job will do (e.g., copy files).<\/li>\n<\/ol>\n<div class=\"hint\">A key benefit of the jobscript is that it documents exactly what you did to run your job &#8211; no need to remember what you did 6 months ago as it is all there in the jobscript. If you ever need to run a job again, or run similar jobs, having the jobscript available is very useful.<\/div>\n<p>Hence jobscripts should be considered part of your work that needs to be kept securely in your <em>home directory<\/em>. They are a record of how you ran a simulation or analysis, for example, or how you processed a particular dataset. Jobscripts are therefore part of your research methods.<\/p>\n<p>We now use <a href=\"\/csf3\/software\/tools\/gedit\/\"><code>gedit<\/code><\/a> or <a href=\"\/csf3\/software\/tools\/xnedit\/\"><code>xnedit<\/code><\/a> or another editor, on the CSF login node (running text editors on the login node <em>is<\/em> permitted) to create a file with exactly the following content (<a href=\"#jobscript\">see below<\/a>). You can name the file anything you like, as long as there are no spaces in the name &#8211; in this example we use <code>first-job.txt<\/code> but Linux doesn&#8217;t care what <em>extension<\/em> you use &#8211; <code>.txt<\/code> or <code>.sbatch<\/code> or <code>.jobscript<\/code> for example:<\/p>\n<pre>\r\n# Run this command on the CSF login node at the <em>prompt<\/em>\r\ngedit first-job.txt\r\n  #\r\n  # If gedit opens, you can IGNORE any warnings \/ messages that appear in the terminal from gedit.\r\n  # For example: (gedit:5246): dconf-WARNING **: .........\r\n  #\r\n  # Can't see gedit? Check your \"dock\" - it might have opened but the window is behind others on\r\n  # your desktop.\r\n<\/pre>\n<p>If you see an error similar to<\/p>\n<pre>\r\n(gedit:1639570): Gtk-WARNING **: 16:21:05.503: <strong>cannot open display<\/strong>:\r\n<\/pre>\n<p>Please ensure you have logged-in using the method that allows GUI apps to be used &#8211; this mostly affect Mac user, so please take a look at the <a href=\"\/csf3\/getting-started\/connecting\/linux-mac\/\">Mac login instructions<\/a>.<\/p>\n<ul>\n<li><strong>Note for Windows users<\/strong>: You can create the jobscript below in <code>Notepad<\/code> and then transfer the file to CSF, although we don&#8217;t actually recommend this method. The file can have any name (we&#8217;re using <code>first-job.txt<\/code> but anything will be OK &#8211; you&#8217;ll find that Notepad names files with <code>.txt<\/code> at the end anyway).\n<p>However, you <strong>must run the following command<\/strong> on the login node to convert the file from Windows format to Linux format otherwise the job will report an error when you submit it to the batch system (this is only needed for jobscripts, <strong>not<\/strong> any other file)<\/p>\n<pre>\r\n# Run this command on the CSF login node at the <em>prompt<\/em> if jobscript was written in notepad\r\ndos2unix first-job.txt\r\n           #\r\n           # or whatever filename you used (we assume notepad adds .txt)\r\n<\/pre>\n<p>But we recommend that Windows users <a href=\"http:\/\/ri.itservices.manchester.ac.uk\/userdocs\/windows-users\/mobaxterm\/\">install MobaXterm<\/a> to log in to the CSF. You can then run <code>gedit<\/code> on the CSF login node and you&#8217;ll get a Linux text-editor very similar to Notepad. The file you write will be saved directly on the CSF and will <em>not<\/em> need converting with <code>dos2unix<\/code> because it is already in the correct format.\n<\/li>\n<\/ul>\n<p><strong>Here&#8217;s the jobscript content &#8211; put this in the text file you are creating<\/strong> either in gedit (run on the CSF login node) or notepad (run on your Windows PC):<br \/>\n<a name=\"jobscript\"><\/a><\/p>\n<pre class=\"slurm\">\r\n#!\/bin\/bash --login\r\n\r\n# Slurm options are those that begin with #SBATCH\r\n#SBATCH -p serial     # Run in the \"serial\" partition (compute nodes dedicated to 1-core jobs)\r\n#SBATCH -t 5          # Allow a maximum wallclock time limit of 5 minutes\r\n                      # Our simple job actually only runs for about 2 minutes\r\n                      # but we always set the wallclock limit a little longer.\r\n                      # (Other time formats can be used for days and hours.)\r\n\r\n# Now the example commands to be executed (programs to be run) on a compute node:\r\n# In your real work, you'll run apps such as a chemistry app, or a bio-inf app.\r\n\/bin\/date\r\n\/bin\/hostname\r\n\/bin\/sleep 120\r\n\/bin\/date\r\n<\/pre>\n<p><strong>Note: lines must NOT be indented in your text file &#8211; there should NOT be any spaces at the start of the lines.<\/strong> Cut-n-paste from this web page will work correctly in most browsers in that it won&#8217;t copy any leading space.<\/p>\n<p>This BASH <em>script<\/em> has three parts:<\/p>\n<ol class=\"gaplist\">\n<li>The first line, <code>#!\/bin\/bash --login<\/code>, means that the file you create is treated as a BASH script.\n<p>Linux provides several <em>scripting<\/em> languages but BASH is the one you use at the command-line once you&#8217;ve logged in. So we usually use it for jobscripts too. This means that any commands you would normally type at the command-line can also go into your jobscript to be run as part of a batch job.<\/li>\n<li>The lines beginning with <code>#SBATCH<\/code> are commands to the batch system &#8211; they provide information about your job.\n<p>In this simple jobscript the lines are:<\/p>\n<ul>\n<li><code>#SBATCH -p serial<\/code> runs your job in the &#8220;serial&#8221; job area (partition). This is a dedicated set of compute nodes used to run serial (1-core) jobs.<\/li>\n<li><code>#SBATCH -t 5<\/code> says that your job is allowed to run for <em>no more<\/em> than 5 minutes, once it starts. It is perfectly fine if your job completes its work in <em>less<\/em> time (and in fact our simple job will complete in about 2 minutes.) But if a job was still running once the wallclock time limit is reached, then the batch system will kill the job. So we always give a little extra time on the wallclock limit, just for safety.<\/li>\n<\/ul>\n<\/li>\n<li>Note that the job, when it runs, will be run from the folder (directory) from which you submitted the job. This will be where any output files are written. If the job needed to read some input data file (our job doesn&#8217;t) then they would be read from the submit directory too.<\/li>\n<li>The remaining lines comprise our computational job &#8211; the applications we actually want to run.\n<p>In this example we have a trivial job which runs simple Linux commands to output the date and time, followed by the name of the compute node on which the job runs, then waits for two minutes and finally outputs the date and time again. In a real jobscript you would do something more interesting and useful &#8211; e.g., run MATLAB or Abaqus or a chemistry program.\n<\/li>\n<\/ol>\n<h3>Step 2: Copy to scratch area<\/h3>\n<p>We now copy the jobscript to your <em>scratch<\/em> area.<\/p>\n<div class=\"hint\">We recommend you run jobs from the <em>scratch<\/em> filesystem: it is another area of storage on the CSF that is faster and larger. Your <em>home<\/em> directory is in an area that has a quota to be shared amongst everyone in your group &#8211; if your job fills up that area you will prevent your colleagues from working! Running jobs in the <em>scratch<\/em> area avoids this problem.<\/div>\n<p><strong>PLEASE NOTE:<\/strong> the scratch area is a <em>temporary<\/em> area &#8211; <strong>files unused in the last 3-months can be deleted by the system to free up space<\/strong>. You should always have a copy of important files in your <em>home<\/em> area (or other research data storage visible on the CSF that your research group may have access to). Think of <em>scratch<\/em> as fast, <em>temporary<\/em> storage &#8211; if your job reads and writes large files it will be faster if run from scratch.<\/p>\n<p>A good way of working is to create your important files in the <em>home<\/em> area, then copy them to scratch when you need to use them in your jobs. That way you always have a safe copy in your home area.<\/p>\n<p>So let&#8217;s <em>copy<\/em> our jobscript to the <em>scratch<\/em> area (we keep the original in our <em>home<\/em> area for safe keeping):<\/p>\n<pre>\r\ncp first-job.txt ~\/scratch\r\n<\/pre>\n<p>We can now <em>go into<\/em> the scratch area:<\/p>\n<pre>\r\ncd ~\/scratch\r\n<\/pre>\n<p>Our scratch directory is now our <em>current working directory<\/em>. When we submit the job to the batch queue (see next step) it will run in the scratch area &#8211; remember, the job runs from which ever directory you are <em>in<\/em> when you submit the job.<\/p>\n<p>Any files that the job generates will also be written to the scratch area and if your job wants to read input data files (ours doesn&#8217;t in this example) then it would try to read them from the scratch area.<\/p>\n<p>You will notice the prompt on the command-line will change to indicate where you are currently located:<\/p>\n<pre>\r\n[<em>mxyzabc1<\/em>@login2[csf3] scratch]$ \r\n                           #\r\n                           # The prompt shows your current directory\r\n<\/pre>\n<h3>Step 3: Submit the Job to the Batch System<\/h3>\n<p>Recap: So far we have created a directory for the jobscript in our <em>home<\/em> area, written a jobscript text file there (where it is stored safely on backed-up storage), then copied it to the fast temporary <em>scratch<\/em> storage and <em>changed directory<\/em> to our scratch area where we&#8217;ll run the job from.<\/p>\n<p>The next step is to actually submit the job to the batch system.  Suppose, the above script is saved in a file called <code>first-job.txt<\/code>.  Then the following command will submit your job to the batch system:<\/p>\n<pre>\r\nsbatch first-job.txt\r\n\r\n# You'll see a message:\r\nSubmitted batch job 195501\r\n<\/pre>\n<p>The job id <code>195501<\/code> is a unique number identifying your job (obviously you will receive a different number). You may use this in other commands later.<\/p>\n<h3>Step 4: Check Job Status<\/h3>\n<p>To confirm that your job is queued, or perhaps already running, enter the command<\/p>\n<pre>\r\nsqueue\r\n<\/pre>\n<p>If the job is still <strong>pending<\/strong> (waiting to run) the output from <code>squeue<\/code> will look like the following &#8211; notice the ST column (short for &#8220;State&#8221;) &#8211; it shows &#8220;PD&#8221; for pending:<\/p>\n<pre>\r\n                                                                                       NODELIST\r\n JOBID PRIORITY PARTITION NAME     USER     ST SUBMIT_TIME  START_TIME TIME NODES CPUS (REASON)\r\n195501 0.019104 serial    first-jo mxyzabc1 <strong>PD<\/strong> 21\/05\/25 9:51 N\/A        0:00     1    1 (None)\r\n<\/pre>\n<p>If your job is already <strong>running<\/strong>, the output will look like the following &#8211; notice the ST column shows &#8220;R&#8221; for running and the NODELIST column shows the name of the <em>compute node<\/em> where your job is currently running. This shows that your job is <em>not<\/em> using the login node, even though that&#8217;s where you are running commands, but is instead using one of the more powerful servers in the CSF.<\/p>\n<pre>\r\n                                                                                       NODELIST\r\n JOBID PRIORITY PARTITION NAME     USER     ST SUBMIT_TIME  START_TIME TIME NODES CPUS (REASON)\r\n195501 0.019104 serial    first-jo mxyzabc1 <strong>R<\/strong>  21\/05\/25 9:55 ... 9:55   0:05     1    1 node003\r\n<\/pre>\n<p>If your jobs have finished, <code>squeue<\/code> will show no output &#8211; meaning you have no jobs in the queue, either running or waiting.<\/p>\n<pre>\r\n[<em>mxyzabc1<\/em>@login02 [CSF4] scratch]$ squeue\r\n JOBID PRIORITY PARTITION NAME     USER     ST SUBMIT_TIME  START_TIME TIME NODES CPUS (REASON)\r\n  #\r\n  # No jobs listed means you have no jobs waiting or running (all jobs have finished)\r\n<\/pre>\n<p>If something is wrong with your jobscript you&#8217;ll see <strong>F<\/strong> or some other code. There might also be a <code>REASON<\/code> to help diagnose the problem. Please contact us via the <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/hpc-help\/\">Connect Portal HPC Help form<\/a>, stating your job-ID and the system you are logged in to and we&#8217;ll let you know what has gone wrong.<\/p>\n<p>HINT: the <strong>most common error<\/strong> is making a typo in your jobscript. Errors don&#8217;t do any harm to the CSF, they just mean your job won&#8217;t do what you want it to do. Check the contents of your jobscript (see above for the original text.) <\/p>\n<p>A typo on the first line of the jobscript is often the cause. Please type it carefully: <code>#!\/bin\/bash --login<\/code> (<strong>no<\/strong> spaces at the start of the line.)<\/p>\n<p>If there is no output from the <code>squeue<\/code>command, your job has finished.<\/p>\n<h3>Step 5: Review Job Results\/Output<\/h3>\n<p>Each job will output least one file, containing any output that would normally have been printed to screen. This can including normal information from your app and also error message, if any occurred.<\/p>\n<p>Let&#8217;s <em>list<\/em> the files in the current directory using the Linux <code>ls<\/code> command:<\/p>\n<pre>\r\nls\r\nfirst-job.txt  slurm-<em>195501<\/em>.out\r\n<\/pre>\n<p>We can see our original jobscript <code>first-job.txt<\/code> and a new file <code>slurm-<em>195501<\/em>.out<\/code> that has been generated by the job (remember, the job ID number <code><em>195501<\/em><\/code> will be different for <em>your<\/em> job.)<\/p>\n<p>To look at the contents of the output file:<\/p>\n<pre>\r\ncat slurm-<em>195501<\/em>.out\r\n<\/pre>\n<p>In this example the output file contains:<\/p>\n<pre>\r\nWed May 21 09:55:49 BST 2025\r\nnode003\r\nWed May 21 09:57:49 BST 2025\r\n<\/pre>\n<p>shows the date, twice with a difference of 120 seconds (2 minutes), and the name of the <em>compute node<\/em> on which the job ran, as expected (refer back to the commands we ran in our first <a href=\"#jobscript\">jobscript<\/a>).<\/p>\n<p>Note that the names of the output file is always, by default, <code>slurm-<em>JOBID<\/em>.out<\/code>. It might be easier to keep track of which job output which file if you make the output file use a similar name to that of your jobscript. You can change the start of the name of the output file by adding the following line to your jobscript (change <code>myjobname<\/code> to something meaningful for your job)<\/p>\n<pre>\r\n#SBATCH -o %x.o%j      # %x will be replaced by the jobscript name\r\n                       # %j will be replaced by the JOBID number\r\n<\/pre>\n<p>This would generate an output file named <code>first-job.txt.o195501<\/code> (which will be familiar to users of the SGE batch system, which we used to use on the CSF.)<\/p>\n<p>You&#8217;ve now successfully run a job on the CSF. It was a simple <em>1-core<\/em> job (it used only one CPU core) to run some basic Linux commands. The output of the commands was captured in to the <code>slurm-<em>195501<\/em>.out<\/code> file. By changing the Linux commands to something more useful (e.g., to run your favourite chemistry application) you can get lots of real work done on the CSF.<\/p>\n<h3>Step 6: Copy Results back to &#8220;home&#8221;<\/h3>\n<p>Earlier we said that the <em>scratch<\/em> storage area is temporary (but fast). Hence if we want to keep the results from this job then we should copy them back to the <em>home<\/em> storage area. Let&#8217;s assume we DO want to keep the output from this job. Apart from the usual <code>slurm-<em>195501<\/em>.out<\/code> file, it didn&#8217;t generate any other files. So we&#8217;ll just copy the <code>.out<\/code> file back to <em>home<\/em>:<\/p>\n<pre>\r\n# Copy from the current scratch dir to the job's directory in home\r\ncp slurm-<em>195501<\/em>.out ~\/first-job\/\r\n          #\r\n          # This number will be different for <em>your<\/em> job\r\n<\/pre>\n<p>That&#8217;s it, the output file is now stored in our backed-up home area. We could delete the file from scratch, although sometimes you may wish to leave your files there while you check their contents and possibly use them in future jobs. Remember though, the scratch filesystem will tidy up old files automatically, so at some point they will be deleted.<\/p>\n<p>When you run a real app (e.g., a chemistry app or OpenFOAM) then your jobs may well generate other files (lots of them, possibly large files.) You&#8217;ll need to consider more carefully which files you want to keep.<\/p>\n<h3>Summary<\/h3>\n<p>Points to remember<\/p>\n<ul class=\"gaplist\">\n<li>Do not simply run your apps on the login node. Write a jobscript and submit it to the batch system. Your app will run on a more powerful node and won&#8217;t upset the login node (and the sysadmins!)<\/li>\n<li>You can write your jobscript on the login node using <code>gedit<\/code> or <code>xnedit<\/code>.<\/li>\n<li>Alternatively, if you use <code>notepad<\/code> on MS Windows, ensure you run <code>dos2unix<\/code> on the jobscript once you&#8217;ve transferred it to the CSF.<\/li>\n<li>Keep your important files in your <em>home<\/em> area but copy them to the <em>scratch<\/em> area and run your jobs from there. Don&#8217;t forget to copy important results back to <em>home<\/em>.<\/li>\n<li>Submit the job using <code>sbatch<\/code><\/li>\n<li>Check on the job using <code>squeue<\/code><\/li>\n<li>Look in the <code>slurm-<em>195501<\/em>.out<\/code> file generated by the job for output and errors.<\/li>\n<li>If you have any questions please contact us via the <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/hpc-help\/\">Connect Portal HPC Help form<\/a> &#8211; we&#8217;re here to help.<\/li>\n<\/ul>\n<h2>More on Using the Batch System (parallel jobs, GPUs, high-mem)<\/h2>\n<p>The batch system has a great deal more functionality than described above &#8211; by adding more <code>#SBATCH<\/code> special lines to your jobscript your jobs can make more use of the CSF capabilities. A list of features is given below with links to documentation. You may wish to try the <a href=\"\/csf3\/getting-started\/tutorial-parallel-job\">Parallel Job Tutorial<\/a> once you are familiar with running serial (1-core) jobs on the CSF.<\/p>\n<p>Other features include:<\/p>\n<ul>\n<li>Running <a href=\"\/csf3\/batch\/parallel-jobs-slurm\/\">parallel multi-core\/SMP jobs<\/a> (e.g., using OpenMP)<\/li>\n<li>Running <a href=\"\/csf3\/batch\/parallel-jobs-slurm\/\">parallel multi-host jobs<\/a> (e.g., using MPI)<\/li>\n<li>Running <a href=\"\/csf3\/batch\/job-arrays-slurm\/\">job arrays<\/a> &mdash; submitting 100s, 1000s of similar jobs by means of just <em>one<\/em> sbatch script\/command<\/li>\n<li>Running <a href=\"\/csf3\/batch\/gpu-jobs-slurm\/\">GPU jobs<\/a><\/li>\n<li>Selecting <a href=\"\/csf3\/batch\/high-memory-jobs-slurm\/\">high-memory hardware<\/a><\/li>\n<\/ul>\n<p>These features are fully documented (with example job scripts) in the <a href=\"\/csf3\/batch-slurm\/\">CSF Slurm documentation<\/a>.<\/p>\n<h3>Application Software<\/h3>\n<p>Now that you&#8217;ve run a test job you might want to have a look to see whether the application software you intend to use is already installed on the CSF &#8211; a lot of apps <em>are<\/em> already installed.<\/p>\n<p>Each centrally installed application has its own <a href=\"\/csf3\/software\/\">application webpage<\/a> where you&#8217;ll find examples of how to submit a job for that specific piece of software and any other information relevant to running it in batch, such as extra settings that may be required for it to work.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction This page offers new CSF users a tutorial that covers usage of the Slurm batch system to run a simple job on the CSF. The tutorial also provides some information about the storage areas on the CSF, and also some common Linux commands used to manage your files. After doing the tutorial you&#8217;ll be able to use the CSF. A further tutorial is also available for running more complicated parallel jobs. If you are.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/getting-started\/tutorial\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"parent":12,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-774","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/774","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/comments?post=774"}],"version-history":[{"count":20,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/774\/revisions"}],"predecessor-version":[{"id":12238,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/774\/revisions\/12238"}],"up":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/12"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/media?parent=774"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}