{"id":680,"date":"2021-08-18T09:28:47","date_gmt":"2021-08-18T08:28:47","guid":{"rendered":"http:\/\/ri.itservices.manchester.ac.uk\/csf4\/?page_id=680"},"modified":"2025-07-14T16:01:49","modified_gmt":"2025-07-14T15:01:49","slug":"tutorial","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/getting-started\/tutorial\/","title":{"rendered":"Batch System 10 Minute Tutorial"},"content":{"rendered":"<p><script type=\"text\/javascript\">\nfunction showHide(id){\n    var div = document.getElementById(id);\n    if (div.className === \"divhidden\") {\n        div.className = \"divhidden  active\";\n    } else {\n        div.className = \"divhidden\";\n    }\n}\n<\/script><\/p>\n<h2>Introduction<\/h2>\n<p>This page offers new CSF users a tutorial that covers usage of the <em>batch system<\/em> to run a simple job on the CSF.<\/p>\n<p>The tutorial also provides some information about the storage areas on the CSF and some common Linux commands used to manage your files.<\/p>\n<p>After doing the tutorial you&#8217;ll be able to use the CSF.<\/p>\n<p>If you&#8217;re a CSF3 user moving to CSF4 this tutorial will allow you to practice writing a jobscript for the Slurm batch system and using the Slurm commands to submit and check on your job.<\/p>\n<p>Before we begin the tutorial we&#8217;ll explain what the <em>batch system<\/em> is and why we need to use it.<\/p>\n<h2>Background: What is a batch system and why use it?<\/h2>\n<p>Click on each header below to expand the section:<\/p>\n<details>\n<summary class=\"h3\">Join the queue&#8230;<\/summary>\n<p>Initially a batch system can be thought of as a job queue. You submit jobs to the queue and the system will pick them out of the queue to run them.<\/p>\n<p>The jobs will do whatever commands you ask them to do (for example run an app such as a chemistry app, or a bioinformatics app or whatever application is appropriate to your work).<\/p>\n<p>When the jobs finish, you should have some new files containing the results!<\/p>\n<p>At this point you might be thinking you don&#8217;t like the idea of your work (jobs) waiting in a queue. How long will it queue for? Why can&#8217;t it just run immediately? Read on to find out more.<br \/>\n<\/details>\n<details>\n<summary class=\"h3\">Ask for extra memory, nodes, cores?<\/summary>\n<p>The applications you&#8217;ll be running on the CSF usually need different amounts of memory, number of CPU cores, possible spanning several compute nodes. <\/p>\n<p>You can request these specific resources for your job. Need lots of cores? Simply request them in your job. Need a lot of memory to process a huge dataset? Simply ask for it.<\/p>\n<p>The <em>batch system<\/em> ensures your job only runs when <em>all<\/em> of the required resources are available. It then allocates those resources to your job (so that it runs correctly) and makes sure no other jobs can grab <em>your<\/em> resources.<\/p>\n<p>But don&#8217;t worry if you&#8217;re not sure what resources you&#8217;ll need &#8211; there are sensible defaults. After trying the defaults, you might find your app needs more memory to process your data, or that it can use more CPU cores to make it run faster.<\/p>\n<p>So you might find that your first few attempts at running jobs don&#8217;t actually complete successfully! Maybe you&#8217;ll need to run the jobs again but request more memory. Don&#8217;t worry &#8211; failed jobs don&#8217;t do any harm. You can simply delete the output files from these failed jobs (if there are any), modify your <em>jobscript<\/em> to ask for more resources (more memory, CPUs, &#8230;) and then resubmit your jobs.<br \/>\n<\/details>\n<details>\n<summary class=\"h3\">Fair usage<\/summary>\n<p>The batch system also ensures fair usage for you and others &#8211; there are <em>many<\/em> users and jobs on the system, all making different demands of the resources (memory, CPU cores, nodes) and so allowing the batch system to choose exactly when to run your job is the only sensible way of running the system. <\/p>\n<p>The fact that jobs are starting and finishing all the time means you rarely have to wait very long for your requested resource to become free so that your jobs can start.<\/p>\n<p>There are other factors which control when jobs run (and how many of your jobs can run at the same time) but the use of a job queue should <em>not<\/em> put you off using the system!<br \/>\n<\/details>\n<details>\n<summary class=\"h3\">Let the CSF get on with it<\/summary>\n<p>An added bonus of a batch system is that once you&#8217;ve <em>submitted<\/em> your jobs to the system, you don&#8217;t actually need to remain logged in. You can log off, go home or go to a meeting or do something else with your PC\/laptop.<\/p>\n<p>Meanwhile the batch system will run your jobs. It can even email you when a job has finished.<\/p>\n<p><em>Without<\/em> a batch system, you would need to remain logged in to the CSF <em>until the job had finished<\/em>, which could be a problem for a simulation that takes several days to complete.<br \/>\n<\/details>\n<details>\n<summary class=\"h3\">No GUI<\/summary>\n<p>Something to note about batch jobs is that you never see an application&#8217;s graphical user interface (GUI), if it has one. Batch jobs run without any interaction &#8211; all options \/ flags \/ input files etc will be specified on the command-line in a <em>jobscript<\/em> (more on those later).<\/p>\n<p>When the app is running, all output will be saved to files. This will be a new way of working if you are used to running an app in a desktop environment (e.g., on Windows).<br \/>\n<\/details>\n<details>\n<summary class=\"h3\">Can I just run my app on the Login Node?<\/summary>\n<p>Running your code or application directly on the login nodes is <em><strong>not<\/strong><\/em> permitted.<\/p>\n<p>The login nodes are for other tasks (transferring files on and off the system, editing jobscripts, submitting jobs to the system, checking results.) They don&#8217;t have a lot of memory, nor many cores, so trying to run your apps there is inefficient and may also adversely affect other users.<\/p>\n<p><strong>Applications found running on the login nodes may be killed by the sysadmins without warning.<\/strong><\/p>\n<div class=\"hint\">\nPlease do take the time to learn about the batch system. While it may be an unfamiliar way of working initially, particularly if you are used to simply running your apps immediately on a desktop PC, there are actually a lot of benefits to using the batch system &#8211; you&#8217;ll see it is a very powerful way of working as you begin to do your real work.<\/p>\n<p>In this tutorial you can try out the sample job below &#8211; it shouldn&#8217;t take more than 10 minutes to work through the instructions on this page.<\/p><\/div>\n<\/details>\n<details>\n<summary class=\"h3\">Which batch system does the CSF4 use?<\/summary>\n<p>Currently, CSF4 runs Slurm.<\/p>\n<p>The three main commands you use are <code>sbatch<\/code> (to submit jobs), <code>squeue<\/code> (to check on jobs) and possibly <code>scancel<\/code> (to delete jobs.)<br \/>\n<\/details>\n<h2>10 Minute Tutorial: Submitting a First Job to the Batch System<\/h2>\n<p>This tutorial assumes you are <em>already<\/em> logged in to the CSF &#8211; please see the <a href=\"\/csf4\/getting-started\/connecting\/\">login instructions<\/a> for more information.<\/p>\n<p>Here we describe in detail how to submit a simple, first job to the batch system (we use a batch system called <abbr title=\"SLURM Workload Manager\">Slurm<\/abbr>.) Please read all of the text, don&#8217;t just look for the commands to type, as it will explain <em>why<\/em> you need to run the commands.<\/p>\n<h3>What type of job will we run?<\/h3>\n<p>[<a href=\"#\" onclick=\"showHide('sec2a'); return false;\">Show \/ Hide<\/a>]<\/p>\n<div id=\"sec2a\" class=\"divhidden\">\nWe will run a <em>serial<\/em> job &#8211; i.e., it uses only one CPU core. We&#8217;ll see later that many of the real applications on the CSF can use more than one CPU core (a <em>multi-core<\/em> job) to speed up their processing, giving you the results sooner!<\/p>\n<p>You could also request more memory than the default 4GB of RAM.<\/p>\n<p>But initially a simple 1-core (<em>serial<\/em>) job will help you become familiar with the principles of the batch system. These jobs are very common &#8211; you may well want to use this type of job in your real work after the tutorial.<\/p>\n<p>Please remember: <strong>Do not<\/strong> simply run jobs on the login node &#8211; use the batch system as described below.\n<\/div>\n<h3>Step 0: Create a Folder for the Job Files<\/h3>\n<p>[<a href=\"#\" onclick=\"showHide('sec20'); return false;\">Show \/ Hide<\/a>]<\/p>\n<div id=\"sec20\" class=\"divhidden\">\nIn the following steps we will be creating a <em>jobscript<\/em> file. We will explain more about the file in the next step. The job will also create some files (any output generated by the job is saved to files).<\/p>\n<p>Hence we will create a directory (folder) for the job to keep all of the files together in one place. This is important &#8211; you will likely run a lot of jobs on the CSF so it will makes things easier for you to manage if you keep your files tidy.<\/p>\n<p>When you log in to the CSF you are placed in your <em>home directory<\/em>. This area of storage is private to you and, importantly, is backed-up (not all storage areas on the CSF are backed-up). It is <strong>strongly recommended<\/strong> that you keep important files in your <em>home directory<\/em> for safe keeping &#8211; and this includes your jobscripts!<\/p>\n<p>Once you&#8217;ve <a href=\"\/csf4\/getting-started\/connecting\/\">logged in<\/a> you will be at the <em>command-line prompt<\/em>:<\/p>\n<pre>\r\n<strong>[<em>mxyzabc1<\/em>@login02 [CSF4] ~]$<\/strong>   <em>you will type your commands here, \"at the prompt\"<\/em>\r\n   ^            ^   ^    ^\r\n   |            |   |    | \r\n   |            |   |    +--- The directory (folder) you are currently <em>in<\/em>.\r\n   |            |   |         ~ means your <em>home<\/em> folder which is your private folder.\r\n   |            |   |\r\n   |            |   +--- Name of the system\r\n   |            |\r\n   |            +--- Name of the login node (some systems have more than one login node)\r\n   |\r\n   +--- Your username appears here\r\n<\/pre>\n<p>Now create a directory (usually referred to as a <em>folder<\/em> in Windows or MacOS) in your CSF <em>home<\/em> storage area, for our first test job, by running the following commands at the prompt:<\/p>\n<pre>\r\n# All of these commands are run on the CSF login node at the <em>prompt<\/em>\r\nmkdir ~\/first-job-csf4            # Make (create) the directory (folder)\r\ncd ~\/first-job-csf4               # Change to (go into) the directory (folder)\r\n<\/pre>\n<p>Notice that the <em>prompt<\/em> has changed to indicate you&#8217;ve moved in to the <code>first-job-csf4<\/code> folder:<\/p>\n<pre>\r\n[<em>mxyzabc1<\/em>@login02 [CSF4] <strong>~\/first-job-csf4<\/strong>]$   \r\n                            ^\r\n                            |\r\n                            +--- The prompt shows we are now in the first-job-csf4 folder\r\n<\/pre>\n<\/div>\n<h3>Step 1: Create a &#8220;Jobscript&#8221; &#8211; a job description file<\/h3>\n<p>[<a href=\"#\" onclick=\"showHide('sec21'); return false;\">Show \/ Hide<\/a>]<\/p>\n<div id=\"sec21\" class=\"divhidden\">\nThe <em>jobscript<\/em> file is the thing you submit to the batch system (i.e, the queue of jobs.) It is just a simple plain-text file. It serves two main purposes:<\/p>\n<ol>\n<li>It specifies the number of CPU cores, memory and other resources you need to run your application.<\/li>\n<li>It specifies the actual command(s) needed to run your application and anything else your job will do (e.g., copy files).<\/li>\n<\/ol>\n<table class=\"hint-wide\">\n<tr>\n<td>A key benefit of the jobscript is that it documents exactly what you did to run your job &#8211; no need to remember what you did 6 months ago as it is all there in the jobscript. If you ever need to run a job again, or run similar jobs, having the jobscript available is very useful!<\/td>\n<\/tr>\n<\/table>\n<p>Hence jobscripts should be considered part of your work that needs to be kept securely in your <em>home directory<\/em>. They are a record of how you ran a simulation or analysis, for example, or how you processed a particular dataset. Jobscripts are therefore part of your research methods.<\/p>\n<p>We now use <a href=\"\/csf4\/software\/applications\/gedit\/\">gedit<\/a>, or another editor, on the CSF login node (running text editors on the login node <em>is<\/em> permitted) to create a file with exactly the following content (<a href=\"#jobscript\">see below<\/a>). You can name the file anything you like, as long as there are no spaces in the name &#8211; in this example we use <code>first-job.txt<\/code> but Linux doesn&#8217;t care what <em>extension<\/em> you use &#8211; <code>.txt<\/code> or <code>.sbatch<\/code> or <code>.jobscript<\/code> for example:<\/p>\n<pre>\r\n# Run this command on the CSF login node at the <em>prompt<\/em>\r\ngedit first-job.txt\r\n  #\r\n  # Please IGNORE any warnings \/ messages that appear in the terminal from gedit.\r\n  # For example: (gedit:5246): dconf-WARNING **: .........\r\n<\/pre>\n<ul>\n<li><strong>Note for Windows users<\/strong>: You can create the jobscript below in <code>Notepad<\/code> and then transfer the file to CSF, although we don&#8217;t actually recommend this method. The file can have any name (we&#8217;re using <code>first-job.txt<\/code> but anything will be OK &#8211; you&#8217;ll find that Notepad names files with <code>.txt<\/code> at the end anyway). However, you <strong>must run the following command<\/strong> on the login node to convert the file from Windows format to Linux format otherwise the batch system will reject the job when you try to submit it.)\n<pre>\r\n# Run this command on the CSF login node at the <em>prompt<\/em> if jobscript was written in notepad\r\ndos2unix first-job.txt\r\n           #\r\n           # or whatever filename you used (we assume notepad adds .txt)\r\n<\/pre>\n<p>But we recommend that Windows users <a href=\"http:\/\/ri.itservices.manchester.ac.uk\/userdocs\/windows-users\/mobaxterm\/\">install MobaXterm<\/a> to log in to the CSF. You can then run <code>gedit<\/code> on the CSF login node and you&#8217;ll get a Linux editor very similar to Notepad. The file you write will be saved directly on the CSF and will <em>not<\/em> need converting with <code>dos2unix<\/code> because it is already in the correct format.\n<\/li>\n<\/ul>\n<p><strong>Here&#8217;s the jobscript content &#8211; put this in the text file you are creating<\/strong> either in gedit (run on the CSF login node) or notepad (run on your Windows PC):<br \/>\n<a name=\"jobscript\"><\/a><\/p>\n<pre>\r\n#!\/bin\/bash --login\r\n\r\n# SLURM options (whose lines must begin with #SBATCH)\r\n\r\n# OPTIONAL LINE: default partition is serial\r\n#SBATCH -p serial   # (or --partition=serial)\r\n\r\n# OPTIONAL LINE: default is 1 core in serial\r\n#SBATCH -n 1        # (or --ntasks=1) use 1 core\r\n\r\n# Now the example commands to be executed (programs to be run) on a compute node:\r\n\r\n\/bin\/date\r\n\/bin\/hostname\r\n\/bin\/sleep 120\r\n\/bin\/date\r\n<\/pre>\n<p><strong>Note: lines must NOT be indented in your text file &#8211; there should NOT be any spaces at the start of the lines.<\/strong> Cut-n-paste from this web page will work correctly in most browsers in that it won&#8217;t copy any leading space.<\/p>\n<p>This BASH <em>script<\/em> has three parts:<\/p>\n<ol class=\"gaplist\">\n<li>The first line, <code>#!\/bin\/bash --login<\/code>, means that the file you create is treated as a BASH script. Linux provides several <em>scripting<\/em> languages but BASH is the one you use at the command-line once you&#8217;ve logged in, and also in jobscripts. This means that any commands you would normally type on the login node can also be used in your jobscript to be run as part of a batch job.<\/li>\n<li>The lines beginning with <code>#SBATCH<\/code> provide information about your job to the batch system (SLURM) &#8211; your use them to request resources (number of cores, memory, etc.) They must appear before the normal commands that your job will actually run.\n<ul>\n<li>In this simple jobscript the line <code>#SBATCH -p serial<\/code> indicates the job is a serial job. This is actually optional &#8211; without this line it will be assumed the job is a serial job and only one CPU core will be allocated to the job. Several other <em>partitions<\/em> are available (serial, multicore, multinode) &#8211; these are just areas in the CSF dedicated to running certain types of jobs.<\/li>\n<li>The line <code>#SBATCH -n 1<\/code> indicates that only one CPU core should be allocated to the job. Again, this is actually optional &#8211; jobs running in the <code>serial<\/code> partition always use exactly one CPU core.<\/li>\n<\/ul>\n<\/li>\n<li>The remaining lines comprise our computational job &#8211; the applications we actually want to run. In this example we have a trivial job which runs simple Linux commands to output the date and time, followed by the name of the compute node on which the job runs, then waits for two minutes and finally outputs the date and time again. In a real jobscript you would do something more interesting and useful &#8211; e.g., run MATLAB or Abaqus or a chemistry program.<\/li>\n<\/ol>\n<p>We&#8217;ve now written our first jobscript and it is in our private, backed-up <em>home<\/em> directory. The next section will show how to copy the jobscript to the temporary <em>scratch<\/em> storage so that we can then submit the job from there.\n<\/div>\n<h3>Step 2: Copy to scratch area<\/h3>\n<p>[<a href=\"#\" onclick=\"showHide('sec22'); return false;\">Show \/ Hide<\/a>]<\/p>\n<div id=\"sec22\" class=\"divhidden\">\nWe now copy the jobscript to your <em>scratch<\/em> area.<\/p>\n<table class=\"hint-wide\">\n<tr>\n<td>We recommend you run jobs from the <em>scratch<\/em> filesystem: it is another area of storage on the CSF that is faster and larger. Your <em>home<\/em> directory is in an area that has a quota to be shared amongst everyone in your group &#8211; if your job fills up that area you will prevent your colleagues from working! Running jobs in the <em>scratch<\/em> area avoids this problem.<\/td>\n<\/tr>\n<\/table>\n<p><strong>PLEASE NOTE:<\/strong> the scratch area is a <em>temporary<\/em> area &#8211; <strong>files unused in the last 3-months can be deleted by the system to free up space<\/strong>. You should always have a copy of important files in your <em>home<\/em> area (or other research data storage visible on the CSF that your research group may have access to). Think of <em>scratch<\/em> as fast, <em>temporary<\/em> storage &#8211; if your job reads and writes large files it will be faster if run from scratch.<\/p>\n<p>A good way of working is to create your important files in the <em>home<\/em> area, then copy them to scratch when you need to use them in your jobs. That way you always have a safe copy in your home area.<\/p>\n<p>So let&#8217;s <em>copy<\/em> our jobscript to the <em>scratch<\/em> area (we keep the original in our <em>home<\/em> area for safe keeping):<\/p>\n<pre>\r\ncp first-job.txt ~\/scratch\r\n<\/pre>\n<p>We can now <em>go into<\/em> the scratch area:<\/p>\n<pre>\r\ncd ~\/scratch\r\n<\/pre>\n<p>Our scratch directory is now our <em>current working directory<\/em>. When we submit the job to the batch queue (see next step) it will run in the scratch area &#8211; a job always runs from whichever directory you are <em>in<\/em> when you submit the job. Any files that the job generates will also be written to there (scratch area in this example) and if your job wants to read input data files (ours doesn&#8217;t in this example) then it would try to read them from that directory.<\/p>\n<p>You will notice the prompt on the command-line will change to indicate where you are currently located:<\/p>\n<pre>\r\n[<em>mxyzabc1<\/em>@login02 [CSF4] ~\/scratch]$\r\n                             #\r\n                             # The prompt shows your current directory\r\n<\/pre>\n<\/div>\n<h3>Step 3: Submit the Job to the Batch System<\/h3>\n<p>[<a href=\"#\" onclick=\"showHide('sec23'); return false;\">Show \/ Hide<\/a>]<\/p>\n<div id=\"sec23\" class=\"divhidden\">\nRecap: So far we have created a directory for the jobscript in our <em>home<\/em> area, written a jobscript text file there (where it is stored safely on backed-up storage), then copied it to the fast temporary <em>scratch<\/em> storage and <em>changed directory<\/em> to our scratch area where we&#8217;ll run the job from.<\/p>\n<p>The next step is to actually submit the job to the batch system.  Suppose the above script is saved in a file called <code>first-job.txt<\/code>.  Then the following command will submit your job to the batch system:<\/p>\n<pre>\r\nsbatch first-job.txt\r\n<\/pre>\n<p>You&#8217;ll see a message printed similar to:<\/p>\n<pre>\r\nSubmitted batch job 226650\r\n<\/pre>\n<p>The job id <code>226650<\/code> is a unique number identifying your job (obviously you will receive a different number). You may use this in other commands later.\n<\/div>\n<h3>Step 4: Check Job Status<\/h3>\n<p>[<a href=\"#\" onclick=\"showHide('sec24'); return false;\">Show \/ Hide<\/a>]<\/p>\n<div id=\"sec24\" class=\"divhidden\">\nTo confirm that your job is queued, or perhaps already running, enter the command<\/p>\n<pre>\r\nsqueue\r\n<\/pre>\n<p>If the job is still <strong>pending<\/strong> (waiting to run) the output from <code>squeue<\/code> will look like the following &#8211; notice the ST column:<\/p>\n<pre>\r\n                                                                                               NODELIST\r\n JOBID PRIORITY PARTITION NAME     USER     ACCOUNT ST SUBMIT_TIME  START_TIME TIME NODES CPUS (REASON)\r\n226651 0.019104 serial    first-jo mxyzabc1 <em>group01<\/em> <strong>PD<\/strong> 2\/08\/21 9:51 N\/A        0:00     1    1 (None)\r\n<\/pre>\n<p>If your job is already <strong>running<\/strong>, the output will look like the following &#8211; notice the ST and NODELIST columns:<\/p>\n<pre>\r\n                                                                                               NODELIST\r\n JOBID PRIORITY PARTITION NAME     USER     ACCOUNT ST SUBMIT_TIME  START_TIME TIME NODES CPUS (REASON)\r\n226652 0.019104 serial    first-jo mxyzabc1 <em>group01<\/em> <strong>R<\/strong>  2\/08\/21 9:55 ... 9:55   0:05     1    1 node003\r\n<\/pre>\n<p>If your jobs have finished, <code>squeue<\/code> will show no output &#8211; meaning you have no jobs in the queue, either running or waiting.<\/p>\n<pre>\r\n[<em>mxyzabc1<\/em>@login02 [CSF4] scratch]$ squeue\r\nJOBID PRIORITY PARTITION NAME     USER  ACCOUNT ST SUBMIT_TIME  START_TIME  TIME  NODES  CPUS NODELIST\r\n  #\r\n  # No jobs listed mean you have no jobs waiting or running (all jobs have finished)\r\n<\/pre>\n<p>If something is wrong with your jobscript you&#8217;ll see <strong>F<\/strong> or some other code. There might also be a <code>REASON<\/code> to help diagnose the problem. Please contact us via the <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/hpc-help\/\">Connect Portal HPC Help form<\/a>, stating your job-ID and the system you are logged in to and we&#8217;ll let you know what has gone wrong.\n<\/div>\n<h3>Step 5: Review Job Results\/Output<\/h3>\n<p>[<a href=\"#\" onclick=\"showHide('sec25'); return false;\">Show \/ Hide<\/a>]<\/p>\n<div id=\"sec25\" class=\"divhidden\">\nEach job will output least one file, containing any output that would normally have been printed to screen. This can including normal information from your app and also error message, if any occurred.<\/p>\n<p>Let&#8217;s <em>list<\/em> the files in the current directory using the Linux <code>ls<\/code> command:<\/p>\n<pre>\r\nls\r\nfirst-job.txt  slurm-<em>226652<\/em>.out\r\n<\/pre>\n<p>We can see our original jobscript <code>first-job.txt<\/code> and a new file <code>slurm-<em>226652<\/em>.out<\/code> that has been generated by the job (remember, the job ID number <code><em>226652<\/em><\/code> will be different for <em>your<\/em> job!)<\/p>\n<p>To look at the contents of the output file:<\/p>\n<pre>\r\ncat slurm-<em>226652<\/em>.out\r\n<\/pre>\n<p>In this example the output file contains:<\/p>\n<pre>\r\nMon Aug  2 09:55:49 BST 2021\r\nnode003\r\nMon Aug  2 09:57:49 BST 2021\r\n<\/pre>\n<p>shows the date, twice with a difference of 120 seconds (2 minutes), and the name of the compute node on which the job ran, as expected (refer back to the commands we ran in our first <a href=\"#jobscript\">jobscript<\/a>).\n<\/dd>\n<\/dl>\n<p>Note that the names of the output file is always, by default, <code>slurm-<em>JOBID<\/em>.out<\/code>. It might be easier to keep track of which job output which file if you make the output file use a similar name to that of your jobscript. You can change the start of the name of the output file by adding the following line to your jobscript (change <code>myjobname<\/code> to something meaningful for your job)<\/p>\n<pre>\r\n#SBATCH -o %x.o%j      # %x will be replaced by the jobscript name\r\n                       # %j will be replaced by the JOBID number\r\n<\/pre>\n<p>This would generate an output file named <code>first-job.txt.o226652<\/code> (which will be familiar to CSF3 users.)<\/p>\n<p>You&#8217;ve now successfully run a job on the CSF. It was a simple <em>1-core<\/em> job (it used only one CPU core) to run some basic Linux commands. The output of the commands was captured in to the <code>slurm-<em>226652<\/em>.out<\/code> file. By changing the Linux commands to something more useful (e.g., to run your favourite chemistry application) you can get lots of real work done on the CSF.\n<\/div>\n<h3>Step 6: Copy Results back to &#8220;home&#8221;<\/h3>\n<p>[<a href=\"#\" onclick=\"showHide('sec26'); return false;\">Show \/ Hide<\/a>]<\/p>\n<div id=\"sec26\" class=\"divhidden\">\nEarlier we said that the <em>scratch<\/em> storage area is temporary (but fast). Hence if we want to keep the results from this job then we should copy them back to the <em>home<\/em> storage area. Let&#8217;s assume we DO want to keep the output from this job. Apart from the usual <em>slurm-NNNNN.out<\/em> file, it didn&#8217;t generate any other files. So we&#8217;ll just copy that file back to <em>home<\/em>:<\/p>\n<pre>\r\n# Copy from the current scratch dir to the job's directory in home\r\ncp slurm-<em>226652<\/em>.out ~\/first-job-csf4\/\r\n<\/pre>\n<p>That&#8217;s it, the output file is now stored in our backed-up home area. We could delete the file from scratch, although sometimes you may wish to leave your files there while you check their contents and possibly use them in future jobs. Remember though, the scratch filesystem will tidy up old files automatically, so at some point they will be deleted.<\/p>\n<p>When you run a real app (e.g., a chemistry app or OpenFOAM) then your jobs may well generate other files (lots of them, possibly large files.) You&#8217;ll need to consider more carefully which files you want to keep.\n<\/p><\/div>\n<h3>Summary<\/h3>\n<p>[<a href=\"#\" onclick=\"showHide('sec27'); return false;\">Show \/ Hide<\/a>]<\/p>\n<div id=\"sec27\" class=\"divhidden\">\nPoints to remember<\/p>\n<ul class=\"gaplist\">\n<li>Do not simply run your apps on the login node. Write a jobscript and submit it to the batch system. Your app will run on a more powerful node and won&#8217;t upset the login node (and the sysadmins!)<\/li>\n<li>You can write your jobscript on the login node using <code>gedit<\/code>.<\/li>\n<li>Alternatively if you use <code>notepad<\/code> on windows ensure you run <code>dos2unix<\/code> on the jobscript once you&#8217;ve transferred it to the CSF.<\/li>\n<li>Keep your important files in your <em>home<\/em> area but copy them to the <em>scratch<\/em> area and run your jobs from there. Don&#8217;t forget to copy important results back to <em>home<\/em>.<\/li>\n<li>Submit the job using <code>sbatch<\/code><\/li>\n<li>Check on the job using <code>squeue<\/code><\/li>\n<li>Look in the <code>slurm-<em>NNNNN<\/em>.out<\/code> file generated by the job for output and errors.<\/li>\n<li>If you have any questions please contact us via the <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/hpc-help\/\">Connect Portal HPC Help form<\/a> &#8211; we&#8217;re here to help!<\/li>\n<\/ul>\n<\/div>\n<h2>More on Using the Batch System (multi-core and multi-node parallel jobs)<\/h2>\n<p>The batch system has a great deal more functionality than described above &#8211; by adding more <code>#SBATCH<\/code> special lines to your jobscript your jobs can make more use of the CSF capabilities. A list of features is given below with links to documentation.<\/p>\n<p>Other features include:<\/p>\n<ul>\n<li>Running <a href=\"\/csf4\/batch\/parallel-jobs\/\">parallel multi-core\/SMP jobs<\/a> (e.g., using OpenMP)<\/li>\n<li>Running <a href=\"\/csf4\/batch\/parallel-jobs\/\">parallel multi-host jobs<\/a> (e.g., using MPI)<\/li>\n<li>Running <a href=\"\/csf4\/batch\/job-arrays\/\">job arrays<\/a> &mdash; submitting 100s, 1000s of similar jobs by means of just <em>one<\/em> sbatch script\/command<\/li>\n<\/ul>\n<p>These features are fully documented (with example job scripts) in the <a href=\"\/csf4\/batch\/\">CSF<\/a> SLURM documentation.<\/p>\n<table class=\"hint-wide\">\n<tr>\n<td>Finally, each centrally installed application has its own <a href=\"\/csf4\/software\/\">application webpage<\/a> where you will find examples of how to submit a job for that specific piece of software and any other information relevant to running it in batch such as extra settings that may be required for it to work.<\/td>\n<\/tr>\n<\/table>\n","protected":false},"excerpt":{"rendered":"<p>Introduction This page offers new CSF users a tutorial that covers usage of the batch system to run a simple job on the CSF. The tutorial also provides some information about the storage areas on the CSF and some common Linux commands used to manage your files. After doing the tutorial you&#8217;ll be able to use the CSF. If you&#8217;re a CSF3 user moving to CSF4 this tutorial will allow you to practice writing a.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/getting-started\/tutorial\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":4,"featured_media":0,"parent":19,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-680","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/wp-json\/wp\/v2\/pages\/680","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/wp-json\/wp\/v2\/comments?post=680"}],"version-history":[{"count":21,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/wp-json\/wp\/v2\/pages\/680\/revisions"}],"predecessor-version":[{"id":1429,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/wp-json\/wp\/v2\/pages\/680\/revisions\/1429"}],"up":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/wp-json\/wp\/v2\/pages\/19"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/wp-json\/wp\/v2\/media?parent=680"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}