{"id":2275,"date":"2019-02-12T16:58:41","date_gmt":"2019-02-12T16:58:41","guid":{"rendered":"http:\/\/ri.itservices.manchester.ac.uk\/csf3\/?page_id=2275"},"modified":"2025-12-02T10:28:45","modified_gmt":"2025-12-02T10:28:45","slug":"tutorial-parallel-jobs","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/getting-started\/tutorial-parallel-jobs\/","title":{"rendered":"Parallel Jobs 10 Minute Tutorial (Slurm)"},"content":{"rendered":"<p>Please note: It is assumed you have already done the <a href=\"\/csf3\/getting-started\/tutorial\">Batch System 10 Minute Tutorial<\/a>. If not, please do so before attempting this tutorial.<\/p>\n<h2>Another Tutorial: Submitting a Parallel Job to the Batch System<\/h2>\n<p>The following tutorial is optional and aimed at users wishing to run <em>parallel<\/em> jobs. You may wish to come back to this tutorial once you are more familiar with the CSF.<\/p>\n<p>A <em>parallel job<\/em> can be used when your application software is <em>known to support parallel processing<\/em>.<\/p>\n<p>These applications use more than one CPU core to improve their performance (i.e., give you the results sooner!) They can also access more memory than a serial (1-core) application and so can usually tackle larger problems (e.g., read in larger input data files, solve more equations, run larger simulations.)<\/p>\n<p>Many of the centrally-installed applications on the CSF support parallel processing.<\/p>\n<p>Parallel applications can use multiple CPU cores <em>within<\/em> a <em>single<\/em> compute node.<\/p>\n<p>Some parallel applications even support running larger parallel jobs across <em>multiple<\/em> compute nodes. <\/p>\n<p>Not all software supports parallel processing. If your application does <em>not<\/em> support it then there is no point running a parallel job &#8211; the CSF will not magically make it run on multiple CPU cores.<\/p>\n<p>However, if you have a lot of data files to process, say, or a lot of simulations to run (a parameter sweep) then you may wish to run <em>multiple copies<\/em> of an application to process lots of different datasets at the same time using a type of batch job known as a <a href=\"\/csf3\/batch\/job-arrays-slurm\/\">job array<\/a>. Even if the application is not a <em>parallel<\/em> application, running lots of copies of the app to process lots of datasets can give you your results sooner.<\/p>\n<p>You should check the <a href=\"\/csf3\/software\/\">documentation<\/a> for your particular application to see if it supports parallel processing.<\/p>\n<p>In the following tutorial we will:<\/p>\n<ol class=\"gaplist\">\n<li>Run a simple matrix-multiplication application that multiplies two large square matrices of numbers together. This is a common task in many engineering applications. For the purposes of this tutorial it doesn&#8217;t matter what the task is, but it does demonstrate how to submit a parallel job to the batch system.<\/li>\n<li>Repeat the job with a different number of cores to see how it affects performance.<\/li>\n<li>The tutorial will also show how to use a <em>modulefile<\/em> to access a centrally installed application.<\/li>\n<\/ol>\n<p>The following steps assume you have already logged in to the CSF and have followed the <a href=\"\/csf3\/getting-started\/tutorial\">Batch System 10 Minute Tutorial<\/a> (which explains some of the steps in more detail).<\/p>\n<h3>Step 1: Create a Job Description File (a jobscript)<\/h3>\n<p>As in the previous tutorial, we need a simple text file (the <em>jobscript<\/em>) describing the job we wish to run. We will add some extra information to the jobscript to request more than 1 CPU core (which is the default).<\/p>\n<p>Create a directory (usually referred to as a folder in Windows or MacOS) in your CSF home storage area, for our second test job, by running the following commands at the prompt:<\/p>\n<pre>\r\n# All of these commands are run on the CSF login node at the <em>prompt<\/em>\r\nmkdir ~\/second-job            # Create the directory (folder)\r\ncd ~\/second-job               # Go in to the directory (folder)\r\n<\/pre>\n<p>Now use <code><a href=\"\/csf3\/software\/tools\/gedit\/\">gedit<\/a><\/code>, or another editor, on the CSF login node (running text editors on the login node is permitted) to create a file with exactly the following content (<a href=\"#parjobscript\">see below<\/a>):<\/p>\n<pre>\r\n# Run this command on the CSF login node at the prompt\r\ngedit second-job.txt\r\n<\/pre>\n<p><a name=\"parjobscript\"><\/a><br \/>\n<strong>Here\u2019s the jobscript content \u2013 put this in the text file you are creating<\/strong><\/p>\n<pre>\r\n#!\/bin\/bash --login\r\n<strong>#SBATCH -p multicore<\/strong>  # (or --partition=) Job will use the compute nodes reserved for parallel jobs.\r\n<strong>#SBATCH -n 4<\/strong>          # (or --ntasks=) Number of cores to use.\r\n<strong>#SBATCH -t 0-1<\/strong>        # This is the wallclock time limit. 0-1 is 1 hour. Job will be terminated if\r\n                      # still running after after 1 hour.\r\n\r\n# Set up to use the centrally installed tutorial application. The CSF has modulefiles for 100s of apps.\r\nmodule purge\r\nmodule load apps\/intel-17.0\/tutorial\r\n\r\n# Inform the app how many cores we requested for our job. The app can use this many cores.\r\n# The special $SLURM_NTASKS keyword is automatically set to the number used on the -n line above.\r\nexport OMP_NUM_THREADS=$SLURM_NTASKS\r\n\r\n# Run the app, which in this tutorial is named 'pmp'\r\npmp\r\n<\/pre>\n<p><strong>Note: lines must NOT be indented in your text file \u2013 there should NOT be any spaces at the start of the lines.<\/strong> Cut-n-paste from this web page will work correctly in most browsers in that it won\u2019t copy any leading space.<\/p>\n<p>This BASH script has the following parts:<\/p>\n<ol class=\"gaplist\">\n<li>The first line, <code>#!\/bin\/bash --login<\/code>, means that the file you create is treated as a BASH script (scripts in Linux can use several languages, BASH is the one we use for jobscripts). The <code>--login<\/code> is needed to make the <code>module<\/code> command work inside the jobscript.<\/li>\n<li>The <code>#SBATCH -p partition<\/code> line is new  &#8211; the partition is used to say what type of parallel job will be run. In this case we are running a single compute-node multi-core job. Other types of parallel job are available but we will not cover those here.<\/li>\n<li>The <code>#SBATCH -n 4<\/code> line is new &#8211; this makes the job a parallel job. It asks the batch system to reserve 4 cores (in this example) in the partition.<\/li>\n<li>The <code>#SBATCH -t 0-1<\/code> line sets the maximum time the job is allowed to run for to be 0 days, 1 hour. It is fine if your job completes sooner than this, but if still running after one hour (in this example) then Slurm will kill the job. Our simple parallel program will complete before one hour.<\/li>\n<li>The <code>module purge<\/code> line is new &#8211; it ensure your job starts with a clean environment. Without this, your job will have inherited any modulefiles you had loaded on the login node.<\/li>\n<li>The <code>module load apps\/intel-17.0\/tutorial<\/code> line is new &#8211; this will load a <em>modulefile<\/em> in to the job&#8217;s environment when it runs on a compute node. The modulefile will apply settings (possibly loading other modulefiles) needed to allow the <code>pmp<\/code> application to run. All of the centrally installed applications have modulefiles to make running the apps as easy as possible.<\/li>\n<li>The <code>export OMP_NUM_THREADS=$SLURM_NTASKS<\/code> line is new &#8211; this is how we inform the <code>pmp<\/code> application how many CPU cores it is allowed to use. The app does <em>not<\/em> know this automatically. We reserved 4 cores in the batch system but we must then inform the application that it can use 4 cores.\n<p>The <code>$SLURM_NTASKS<\/code> variable is automatically set by the batch system to the number of cores requested on the <code>#SBATCH -n<\/code> line. So this is a convenient way of always getting the correct number of cores.<\/li>\n<li>The <code>pmp<\/code> line is new &#8211; <code>pmp<\/code> is the name of the parallel matrix multiplication application we are going to run.<\/li>\n<\/ol>\n<h3>Step 2: Copy to scratch area<\/h3>\n<p>We now copy the jobscript to your <em>scratch<\/em> area &#8211; recall we recommend running jobs in your scratch area &#8211; it is faster and permits jobs to write large temporary files without filling up your group&#8217;s home directory quota. But you must remember to copy important results back to the home area for safe keeping.<\/p>\n<pre>cp second-job.txt ~\/scratch<\/pre>\n<p>We can now <em>go in to<\/em> the scratch area:<\/p>\n<pre>\r\ncd ~\/scratch\r\n<\/pre>\n<p>Our scratch directory is now our <em>current working directory<\/em>. When we submit the job to the batch queue (see next step) it will run in the scratch area, outputting any results files there.<\/p>\n<h3>Step 3: Submit the Job to the Batch System<\/h3>\n<p>Assuming your jobscript is called <code>second-job.txt<\/code>, submit your jobscript (the copy that is in your scratch area) to the batch system:<\/p>\n<pre>\r\nsbatch second-job.txt\r\n<\/pre>\n<p>You&#8217;ll see a message printed similar to:<\/p>\n<pre>\r\nSubmitted batch job 195502\r\n<\/pre>\n<p>The job id <code>195502<\/code> is a unique number identifying your job (obviously you will receive a different number). You may use this in other commands later.<\/p>\n<h3>Step 4: Check Job Status<\/h3>\n<p>Use the <code>squeue<\/code> command to check the job status. By looking at the <code>ST<\/code> column (short for &#8220;State&#8221;), you should be able to determine if it is pending (<code>PD<\/code>), running (<code>R<\/code>), failed (<code>F<\/code>) or finished (squeue shows nothing).<\/p>\n<h3>Step 5: Review Job Results\/Output<\/h3>\n<p>The job will have created an output file: <code>slurm-<em>195502<\/em>.out<\/code>, which contains the output from the job. This will include any normal output (e.g., if the <code>pmp<\/code> prints out any messages and also any error messages. Let&#8217;s have a look at the file size by doing a <em>long listing<\/em> which shows more information about the files:<\/p>\n<pre>\r\n# Run the 'ls' command with a '-ltr' flag added for a <em>long<\/em> listing \r\n# with the most recently updated files listed at the bottom of the listing.\r\n<strong>ls -ltr<\/strong>\r\n-rw------- 1 <em>username<\/em> <em>xy01<\/em>       345 May  4 13:16 second-job.txt\r\n-rw-r--r-- 1 <em>username<\/em> <em>xy01<\/em>       337 May  4 13:18 slurm-<em>195502<\/em>.out\r\n  ^                    ^          ^    |_______|    ^     ^\r\n  |                    |          |        |        |     | Your job id number will be different\r\n  |                    |          |        |        |\r\n  +-File permissions   |          |        |        +- Filenames\r\n                       |          |        |\r\n                       |          |        +- Date and time of last update\r\n                       |          |           (i.e. when something was written to the file)\r\n                       |          |\r\n                       |          +- Filesize in bytes.\r\n                       |\r\n                       +- The <em>group<\/em> you are in. It usually indicates\r\n                          your faculty or supervisor.\r\n<\/pre>\n<p>Examine the contents of the <code>slurm-<em>195502<\/em>.out<\/code> file &#8211; any output printed by the <code>pmp<\/code> app will have been captured in to here:<\/p>\n<pre>\r\ncat slurm-<em>195502<\/em>.out\r\n           #\r\n           # Use the jobid number for your job!\r\n<\/pre>\n<p>You will see the number of cores used by <code>pmp<\/code> reported, followed by the 2D matrix size used in the tests, followed by timing information for five runs of the matrix calculation. <\/p>\n<h3>Step 6: Repeat the job with a different number of cores<\/h3>\n<p>To show the effect of using more cores with the <code>pmp<\/code> application, edit your jobscript:<\/p>\n<pre>\r\ngedit second-job.txt\r\n<\/pre>\n<p>then change the number of cores<\/p>\n<pre>\r\n#SBATCH -n 8         # Use 8 cores instead of 4 previously\r\n<\/pre>\n<p>Save the file and resubmit it to the batch system:<\/p>\n<pre>\r\nsbatch second-job.txt\r\nSubmitted batch job <em>195503<\/em>\r\n<\/pre>\n<p>When the job has completed (check using <code>squeue<\/code>) have a look at the timing information for the second run of the job. It should show the five runs of the calculation were done in approximately half the time:<\/p>\n<pre>\r\ncat slurm-<em>195503<\/em>.out\r\n           #\r\n           # Use the jobid number for your job!\r\n<\/pre>\n<p>If you wish to run the <code>pmp<\/code> application with a different number of cores (up to 168 cores are permitted in the <code>multicore<\/code> partition) then edit the jobscript again and resubmit the job.<\/p>\n<h3>Summary<\/h3>\n<p>You have now been able to run a parallel job. It was a single-node multi-core job which used multiple CPU cores within a single compute node. The application supports this type of parallel processing and we could verify that it ran quicker with more cores. We used a <em>modulefile<\/em> to give us access to the centrally installed <code>pmp<\/code> application.<\/p>\n<h2>More on Using the Batch System (parallel jobs, GPUs, high-mem)<\/h2>\n<p>The batch system, SGE, has a great deal more functionality than described above.  Other features, including:<\/p>\n<ul>\n<li>Running <a href=\"\/csf3\/batch\/parallel-jobs-slurm\/\">parallel multi-core\/SMP jobs and the AMD nodes<\/a> (e.g., using OpenMP)<\/li>\n<li>Running <a href=\"\/csf3\/batch\/job-arrays-slurm\/\">job arrays<\/a> &mdash; submitting many similar jobs by means of just one sbatch script\/command<\/li>\n<li>Running <a href=\"\/csf3\/batch\/gpu-jobs-slurm\/\">GPU jobs<\/a><\/li>\n<li>Selecting <a href=\"\/csf3\/batch\/high-memory-jobs-slurm\/\">high-memory jobs<\/a><\/li>\n<\/ul>\n<p>These are fully documented (with example job scripts) in the <a href=\"\/csf3\/batch\/\">CSF<\/a> SGE documentation.<\/p>\n<p>Finally, each centrally installed application has its own <a href=\"\/csf3\/software\/\">application webpage<\/a> where you will find examples of how to submit a job for that specific piece of software and any other information relevant to running it in batch such as extra settings that may be required for it to work.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Please note: It is assumed you have already done the Batch System 10 Minute Tutorial. If not, please do so before attempting this tutorial. Another Tutorial: Submitting a Parallel Job to the Batch System The following tutorial is optional and aimed at users wishing to run parallel jobs. You may wish to come back to this tutorial once you are more familiar with the CSF. A parallel job can be used when your application software.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/getting-started\/tutorial-parallel-jobs\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"parent":12,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2275","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/2275","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/comments?post=2275"}],"version-history":[{"count":20,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/2275\/revisions"}],"predecessor-version":[{"id":11458,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/2275\/revisions\/11458"}],"up":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/12"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/media?parent=2275"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}