{"id":261,"date":"2020-08-13T16:37:51","date_gmt":"2020-08-13T15:37:51","guid":{"rendered":"http:\/\/ri.itservices.manchester.ac.uk\/csf4\/?page_id=261"},"modified":"2026-01-23T11:08:44","modified_gmt":"2026-01-23T11:08:44","slug":"parallel-jobs","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/batch\/parallel-jobs\/","title":{"rendered":"Parallel Jobs"},"content":{"rendered":"<h2>Current Configuration and Parallel Partitions<\/h2>\n<p>For jobs that require two or more CPU cores, the appropriate SLURM <strong>p<\/strong>artition should be selected from the table below. <em><strong>Please also consult the <a href=\"\/csf4\/software\">software page<\/a> specific to the code \/ application you are running for advice on the most suitable partition<\/strong><\/em>.<\/p>\n<h2>Parallel jobscripts using a single compute node<\/h2>\n<p>These jobscript will use CPU cores on a single compute node. There are a couple of ways that you can run your app in the jobscript, depending on which parallel methods the application supports.<\/p>\n<h3>Multicore parallel OpenMP and small MPI jobs<\/h3>\n<p>For jobs requiring between 2 and 40 cores. It will fit on a single compute node. This could be a multi-core (OpenMP) app or a small MPI job. This is the recommended jobscript for single-node parallel jobs. The jobscript will require the following lines:<\/p>\n<pre>\r\n#!\/bin\/bash --login\r\n# Runs in current dir by default\r\n#SBATCH -p multi<strong>core<\/strong>  # (or --partition=) \r\n#SBATCH -n <em>numtasks<\/em>   # (or --ntasks=) Number of MPI procs or CPU cores. 2--40.\r\n                      # The $SLURM_NTASKS variable will be set to this value.\r\n\r\n# Can load modulefiles\r\nmodule load appname\/x.y.x\r\n\r\n# For an OpenMP (multicore) app, inform the app how many cores to use, then run the app\r\nexport OMP_NUM_THREADS=$SLURM_NTASKS\r\n<em>openmpapp <\/em><em>arg1<\/em> <em>arg2 ...<\/em>\r\n\r\n# For an MPI app SLURM knows to run <em>numtasks<\/em> from above\r\nmpirun <em>mpiapp <\/em><em>arg1<\/em> <em>arg2 ...<\/em>\r\n<\/pre>\n<h3>Multicore parallel for OpenMP jobs only<\/h3>\n<p>For jobs requiring between 2 and 40 cores. Note that this method should NOT be used if running MPI applications. It is for multicore applications (usually OpenMP). MPI will not start the expected number of processes (only one MPI process is started which is probably not what you would want for MPI jobs!) It will fit on a single compute node. The jobscript will require the following lines:<\/p>\n<pre>\r\n#!\/bin\/bash --login\r\n# Runs in current dir by default\r\n#SBATCH -p multi<strong>core<\/strong>    # (or --partition=) \r\n#SBATCH -c <em>corespertask<\/em> # (or --cpus-per-task=) Number of cores to use for OpenMP (2--40)\r\n                        # The $SLURM_CPUS_PER_TASK variable will be set to this value.\r\n# Can load modulefiles\r\nmodule load appname\/x.y.x\r\n\r\n# For an OpenMP (multicore) app say how many cores to use then run the app\r\nexport OMP_NUM_THREADS=<strong>$SLURM_CPUS_PER_TASK<\/strong>\r\n<em>openmpapp <\/em><em>arg1<\/em> <em>arg2 ...<\/em>\r\n\r\n# If you start an MPI app SLURM will only start one 1 MPI process!!\r\n# This is because the $SLURM_NTASKS variable is NOT defined (no -n, --numtasks flag above)\r\n# mpirun <em>mpiapp <\/em><em>arg1<\/em> <em>arg2 ...<\/em>\r\n<\/pre>\n<h2>Parallel jobscripts using multiple compute nodes<\/h2>\n<p>These jobscript will use CPU cores on multiple compute nodes. There are a couple of ways can run your job, depending which parallel methods the application supports.<\/p>\n<h3>Multinode parallel large MPI jobs<\/h3>\n<p>These jobs will use at least two 40-core compute nodes (but you can request more than two nodes) and will use all of the cores on each compute node. Your job has exclusive use of the compute nodes (no other jobs will be running on them). There are two method of specifying such jobs &#8211; specifying the number of nodes OR the total number of cores.<\/p>\n<h3>Method 1: specify the total number of cores (tasks) &#8211; DO NOT USE<\/h3>\n<div class=\"warning\">\n<strong>June 2024: THIS METHOD NO LONGER WORKS &#8211; YOU MUST SPECIFY THE NUMBER OF NODES (-N) AND OPTIONALLY THE NUMBER OF TASKS (-n)<\/strong><br \/>\nMethod 1: specify the total number of cores<\/p>\n<pre>\r\n#!\/bin\/bash --login\r\n# Runs in current dir by default\r\n#SBATCH -p multi<strong>node<\/strong>  # (or --partition=) \r\n#SBATCH -n <em>numtasks<\/em>   # (or --ntasks=) 80 or more in multiples of 40. \r\n# <strong>This old method does NOT use the -N (--nodes) flag. We need to use it now!<\/strong>\r\n\r\n# Can load modulefiles\r\nmodule load appname\/x.y.x\r\n\r\n# For an MPI app SLURM knows how many cores to run\r\nmpirun <em>mpiapp <\/em><em>arg1<\/em> <em>arg2 ...<\/em>\r\n<\/pre>\n<p>Remember: specifying <em>only<\/em> the total number of cores (e.g., <code>-n 80<\/code>) will NO LONGER WORK.\n<\/div>\n<h3>Method 2: specify the number of nodes<\/h3>\n<p><strong>THIS METHOD WORKS (but <code>$SLURM_NTASKS<\/code> will not be set &#8211; see below if you need that environment variable)<\/strong><\/p>\n<pre>\r\n#!\/bin\/bash --login\r\n# Runs in current dir by default\r\n#SBATCH -p multi<strong>node<\/strong>  # (or --partition=) \r\n#SBATCH -N <em>numnodes<\/em>   # (or --nodes=) 2 or more. The jobs uses all 40 cores on each node.\r\n                      # Note: $SLURM_NTASKS is NOT set if you use <em>only<\/em> the -N (--nodes) flag.\r\n                      # To use $SLURM_NTASKS in your jobscript, add -n (see below)!!\r\n\r\n# Can load modulefiles\r\nmodule load appname\/x.y.x\r\n\r\n# For an MPI app SLURM knows to run 40 MPI tasks on <em>each<\/em> compute node\r\nmpirun <em>mpiapp <\/em><em>arg1<\/em> <em>arg2 ...<\/em>\r\n<\/pre>\n<p>See the mixed-mode example below for a more complex multi-node job.<\/p>\n<h3>Method 3: specify the number of nodes AND number of cores (tasks) &#8211; RECOMMENDED<\/h3>\n<p><strong>THIS METHOD WORKS (and <code>$SLURM_NTASKS<\/code> will be set)<\/strong><br \/>\nMethod 3: specify the number of nodes AND number of tasks<\/p>\n<pre>\r\n#!\/bin\/bash --login\r\n# Runs in current dir by default\r\n#SBATCH -p multi<strong>node<\/strong>  # (or --partition=) \r\n#SBATCH -N <em>numnodes<\/em>   # (or --nodes=) 2 or more. The jobs uses all 40 cores on each node.\r\n#SBATCH -n <em>numtasks<\/em>   # (or --ntasks=) 80 or more - the TOTAL number of tasks in your job.\r\n\r\n# Can load modulefiles\r\nmodule load appname\/x.y.x\r\n\r\n# For an MPI app SLURM knows to run 40 MPI tasks on <em>each<\/em> compute node\r\nmpirun <em>mpiapp <\/em><em>arg1<\/em> <em>arg2 ...<\/em>\r\n<\/pre>\n<p>See the mixed-mode example below for a more complex multi-node job.<\/p>\n<h3>Multinode parallel large mixed-mode (MPI+OpenMP)<\/h3>\n<p>Mixed-mode jobs run a smaller number of MPI processes which themselves use OpenMP (multicore) to provide some of the parallelism. If your application supports mixed-mode parallelism, this method can often provide performance benefits over entirely MPI parallel jobs, by reducing the amount of MPI communication between the nodes.<\/p>\n<p>These jobs will use at least two 40-core compute nodes (but you can request more than two nodes) and will use <em>all<\/em> of the cores on each compute node. Your job has exclusive use of the compute nodes (no other jobs will be running on them). The jobscript will require the following lines (see below for specific examples):<\/p>\n<pre>\r\n#!\/bin\/bash --login\r\n# Runs in current dir by default\r\n#SBATCH -p multi<strong>node<\/strong>    # (or --partition=)\r\n#SBATCH -N <em>numnodes<\/em>     # (or --nodes=) Total number of compute nodes to use (2 or more.)\r\n#SBATCH -n <em>numtasks<\/em>     # (or --ntasks=) Number of MPI processes to run in total. They will be\r\n                        #                spread across the requested number of nodes.\r\n#SBATCH -c <em>corespertask<\/em> # (or --cpus-per-task=) Number of cores to use for OpenMP in <em>each<\/em> MPI process.\r\n\r\n# Can load modulefiles\r\nmodule load appname\/x.y.x\r\n\r\n# Inform each MPI process how many OpenMP cores to use\r\nexport OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\r\n\r\n# For an MPI+OpenMP app SLURM knows to run <em>numtasks<\/em> MPI procs across <em>numnodes<\/em> nodes\r\n# But we need to ensure each MPI process grabs (<em>binds to<\/em>) the cores requested above (-c <em>corespertask<\/em>)\r\nmpirun --map-by ppr:<em>N<\/em>:<em>RES<\/em>:PE=$OMP_NUM_THREADS <em>mpiapp <\/em><em>arg1<\/em> <em>arg2 ...<\/em>\r\n                   #        #\r\n                   #        # PE=$OMP_NUM_THREADS gives each MPI process access to\r\n                   #        # the cores needed for its OpenMP threads (see below).\r\n                   #\r\n                   # <strong>ppr<\/strong> is 'processes per resource'. It means we are about to specify\r\n                   # how MPI processes and OpenMP threads should be placed on the nodes.\r\n                   # <strong>N<\/strong> is a number of MPI processes (you should use a <em>number<\/em> here.) \r\n                   # <strong>RES<\/strong> is the resource unit (for example 'node' or 'socket'.)\r\n                   # You will have <em>N<\/em> MPI processes placed on each <em>RES<\/em>.\r\n                   # See below for a real example.\r\n<\/pre>\n<h4>Example 1 &#8211; one MPI task per node<\/h4>\n<p>Here we run a <em>mixed-mode<\/em> application use three compute nodes, where <em>each<\/em> compute node runs one MPI process, and each of those MPI processes uses all of the 40-cores on the node for OpenMP threads. Note that the CSF4 compute nodes contain two 20-core sockets (CPUs), so the 40 cores will be provided by 20 cores on the first CPU and 20 cores on the second CPU.<\/p>\n<pre>\r\n+-- <span class=\"blue\">Compute node 1<\/span> --+     +-- <span class=\"blue\">Compute node 2<\/span> --+     +-- <span class=\"blue\">Compute node 3<\/span> --+\r\n|Socket (CPU0)       |     |Socket (CPU0)       |     |Socket (CPU0)       |\r\n|   <span class=\"orange\">MPI Proc<\/span>         |     |   <span class=\"orange\">MPI Proc<\/span>         |     |   <span class=\"orange\">MPI Proc<\/span>         |\r\n|     <span class=\"purple\">20 OpenMP cores<\/span>|     |     <span class=\"purple\">20 OpenMP cores<\/span>|     |     <span class=\"purple\">20 OpenMP cores<\/span>|\r\n+- - - - - - - - - - +     +- - - - - - - - - - +     +- - - - - - - - - - +\r\n|Socket (CPU1)       |     |Socket (CPU1)       |     |Socket (CPU1)       |\r\n|     <span class=\"purple\">20 OpenMP cores<\/span>|     |     <span class=\"purple\">20 OpenMP cores<\/span>|     |     <span class=\"purple\">20 OpenMP cores<\/span>|\r\n|                    |     |                    |     |                    |\r\n+--------------------+     +--------------------+     +--------------------+\r\nThe MPI processes will communicate with each other, using the InfiniBand network between the nodes.\r\n<\/pre>\n<p>The jobscript it as follows:<\/p>\n<pre>\r\n#!\/bin\/bash --login\r\n#SBATCH -p multinode\r\n#SBATCH -N 3          # <span class=\"blue\">3 whole compute nodes (we use all 40-cores on <em>each<\/em> compute node, 120 in total)<\/span>\r\n#SBATCH -n 3          # <span class=\"orange\">3 MPI processes <strong>in total<\/strong> (1 per compute node)<\/span>\r\n#SBATCH -c 40         # <span class=\"purple\">40 cores to be used by <strong>each<\/strong> MPI process<\/span>\r\n\r\n# Load any modulefiles\r\nmodule load appname\/x.y.x\r\n\r\n# Inform each MPI process how many OpenMP threads to use (40 in this example)\r\nexport OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\r\n\r\n# Run the MPI processes (SLURM knows to use 3 compute nodes, each running 1 MPI process in this example).\r\nmpirun --map-by ppr:1:<strong>node<\/strong>:PE=$OMP_NUM_THREADS <em>mpiapp <\/em><em>arg1<\/em> <em>arg2 ...<\/em>\r\n                 #             #\r\n                 #             # PE=$OMP_NUM_THREADS gives each MPI process access to\r\n                 #             # the 40 cores of the <strong>node<\/strong> on which it is running.\r\n                 #\r\n                 # ppr is 'processes per resource'. It describes how MPI processes should\r\n                 # be placed on the node. In this case 1 process per <strong>node<\/strong> is specified.\r\n                 # So the MPI process will have access to the cores of both sockets in the node\r\n                 # because our compute nodes have two sockets (CPUs) in them.\r\n<\/pre>\n<h4>Example 2 &#8211; two MPI tasks per node<\/h4>\n<p>In this example, we run a <em>mixed-mode<\/em> application using three compute nodes again, but now <em>each<\/em> compute node runs <strong>two<\/strong> MPI processes (each placed on a <em>socket<\/em> &#8211; aka a CPU, because our nodes have two CPUs in them) and <strong>each<\/strong> MPI process runs their own 20 OpenMP threads:<\/p>\n<pre>\r\n+-- <span class=\"blue\">Compute node 1<\/span> --+     +-- <span class=\"blue\">Compute node 2<\/span> --+     +-- <span class=\"blue\">Compute node 3<\/span> --+\r\n|Socket (CPU0)       |     |Socket (CPU0)       |     |Socket (CPU0)       |\r\n|   <span class=\"orange\">MPI Proc<\/span>         |     |   <span class=\"orange\">MPI Proc<\/span>         |     |   <span class=\"orange\">MPI Proc<\/span>         |\r\n|     <span class=\"purple\">20 OpenMP cores<\/span>|     |     <span class=\"purple\">20 OpenMP cores<\/span>|     |     <span class=\"purple\">20 OpenMP cores<\/span>|\r\n+- - - - - - - - - - +     +- - - - - - - - - - +     +- - - - - - - - - - +\r\n|Socket (CPU1)       |     |Socket (CPU1)       |     |Socket (CPU1)       |\r\n|   <span class=\"orange\">MPI Proc<\/span>         |     |   <span class=\"orange\">MPI Proc<\/span>         |     |   <span class=\"orange\">MPI Proc<\/span>         | \r\n|     <span class=\"purple\">20 OpenMP cores<\/span>|     |     <span class=\"purple\">20 OpenMP cores<\/span>|     |     <span class=\"purple\">20 OpenMP cores<\/span>|\r\n+--------------------+     +--------------------+     +--------------------+\r\nThe MPI processes will communicate with each other, using the InfiniBand network between the nodes.\r\n<\/pre>\n<p>The jobscript it as follows:<\/p>\n<pre>\r\n#!\/bin\/bash --login\r\n#SBATCH -p multinode\r\n#SBATCH -N 3          # <span class=\"blue\">3 whole compute nodes (we use all 40-cores on <em>each<\/em> compute node, 120 in total)<\/span>\r\n#SBATCH -n 6          # <span class=\"orange\">6 MPI processes <strong>in total<\/strong> (2 per compute node)<\/span>\r\n#SBATCH -c 20         # <span class=\"purple\">20 cores to be used by <strong>each<\/strong> MPI process<\/span>\r\n\r\n# Load any modulefiles\r\nmodule load appname\/x.y.x\r\n\r\n# Inform each MPI process how many OpenMP threads to use (20 in this example)\r\nexport OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\r\n\r\n# Run the MPI processes (SLURM knows to use 3 compute nodes, each running 2 processes in this example).\r\nmpirun --map-by ppr:1:<strong>socket<\/strong>:PE=$OMP_NUM_THREADS <em>mpiapp <\/em><em>arg1<\/em> <em>arg2 ...<\/em>\r\n                 #             #\r\n                 #             # PE=$OMP_NUM_THREADS gives each MPI process access to\r\n                 #             # the 20 cores of the <strong>socket<\/strong> on which it is running.\r\n                 #\r\n                 # ppr is 'processes per resource'. It describes how MPI processes should\r\n                 # be placed on the node. In this case 1 process per <strong>socket<\/strong> is specified.\r\n                 # So the <em>two<\/em> MPI processes will each run on their own socket in the node\r\n                 # because our compute nodes have two sockets (CPUs) in them.\r\n<\/pre>\n<p>Note that the above jobscript can be adapted to run <em>single-node<\/em> MPI+OpenMP mixed-mode jobs by using the <code>multicore<\/code> partition and requesting only a single compute node (<code>-N 1<\/code>) with two tasks (<code>-n 2<\/code>).<\/p>\n<h2>Partitions Summary<\/h2>\n<p>Currently two parallel partitions are available. Details of each are given below, with corresponding job limits.<\/p>\n<h3>Single Node Multi-core(SMP) and small MPI Jobs<\/h3>\n<table width=\"100%\">\n<tbody>\n<tr>\n<a name=\"multicore\"><\/a><\/p>\n<td colspan=\"3\" valign=\"top\">Partition name: <strong>multicore<\/strong><\/p>\n<ul>\n<li>For jobs of <strong>2 to 40 cores<\/strong> (40 is a new maximum since Intel Cascade Lake nodes were installed)<\/li>\n<li>Jobs will use a <em>single<\/em> compute node. Use for OpenMP (or other multicore\/<acronym title=\"Symmetric Multi-Processing\">SMP<\/acronym> jobs) and small <acronym title=\"Message Passing Interface\">MPI<\/acronym> jobs.<\/li>\n<li>4GB of memory per core.<\/li>\n<li>7 day runtime limit.<\/li>\n<li>Currently, jobs may be placed on Cascade Lake (max 40 cores) CPUs.\n<li>Large pool of cores.<\/li>\n<\/ul>\n<\/td>\n<\/tr>\n<tr>\n<th width=\"25%\"><a name=\"optres\"><\/a>Optional Resources<\/th>\n<th  width=\"40%\">Max cores per job, RAM per core<\/th>\n<th>Additional usage guidance<\/th>\n<\/tr>\n<tr>\n<td>NONE<\/td>\n<td>NONE<\/p>\n<td>NONE<\/td>\n<\/tbody>\n<\/table>\n<h3>Multi-node large MPI Jobs<\/h3>\n<table width=\"100%\">\n<tbody>\n<tr>\n<a name=\"multinode\"><\/a><\/p>\n<td colspan=\"3\" valign=\"top\">Partition name: <strong>multinode<\/strong><\/p>\n<ul>\n<li>For <acronym title=\"Message Passing Interface\">MPI<\/acronym> jobs of <strong>80 or more cores<\/strong>, in <strong>multiples of 40<\/strong>, up to a <strong>maximum of 200<\/strong>.<\/li>\n<li>40-core jobs not permitted as they fit on one compute node so do not utilise the InfiniBand network (see <a href=\"#multicore\">multicore<\/a> for 2&#8211;40 core jobs).<\/li>\n<li>4GB RAM per core.<\/li>\n<li>7 day runtime limit.<\/li>\n<li>Currently, jobs may be placed on Cascade Lake (max 40 cores) CPUs.\n<li>Large pool of cores.<\/li>\n<\/ul>\n<\/td>\n<\/tr>\n<tr>\n<th width=\"25%\"><a name=\"optres\"><\/a>Optional Resources<\/th>\n<th  width=\"40%\">Max cores per job, RAM per core<\/th>\n<th>Additional usage guidance<\/th>\n<\/tr>\n<tr>\n<td>NONE<\/td>\n<td>NONE<\/p>\n<td>NONE<\/td>\n<\/tbody>\n<\/table>\n","protected":false},"excerpt":{"rendered":"<p>Current Configuration and Parallel Partitions For jobs that require two or more CPU cores, the appropriate SLURM partition should be selected from the table below. Please also consult the software page specific to the code \/ application you are running for advice on the most suitable partition. Parallel jobscripts using a single compute node These jobscript will use CPU cores on a single compute node. There are a couple of ways that you can run.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/batch\/parallel-jobs\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":4,"featured_media":0,"parent":31,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-261","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/wp-json\/wp\/v2\/pages\/261","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/wp-json\/wp\/v2\/comments?post=261"}],"version-history":[{"count":20,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/wp-json\/wp\/v2\/pages\/261\/revisions"}],"predecessor-version":[{"id":1508,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/wp-json\/wp\/v2\/pages\/261\/revisions\/1508"}],"up":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/wp-json\/wp\/v2\/pages\/31"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf4\/wp-json\/wp\/v2\/media?parent=261"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}