{"id":9877,"date":"2025-05-14T13:15:39","date_gmt":"2025-05-14T12:15:39","guid":{"rendered":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/?page_id=9877"},"modified":"2026-06-24T10:55:35","modified_gmt":"2026-06-24T09:55:35","slug":"high-memory-jobs-slurm","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/batch-slurm\/high-memory-jobs-slurm\/","title":{"rendered":"High Memory Jobs (Slurm)"},"content":{"rendered":"<h2>Default Memory on the CSF<\/h2>\n<p>The AMD 168-core Genoa nodes (<code>-p multicore<\/code>) have 8GB RAM per core. If your job needs more RAM on these nodes, request more cores.<\/p>\n<p>The standard Intel nodes (<code>-p serial<\/code> and <code>-p multicore_small<\/code>) have 4GB to 6GB of RAM per core. If your job needs more RAM on these nodes, request more cores.<\/p>\n<p>High-memory Intel nodes are also available, offering up to 2TB RAM in total , and if really needed, up to 4TB, and are described <a href=\"#memflag\">below<\/a>. When using these nodes, the number of cores and amount of memory can be specified independently.<\/p>\n<p>First we describe how to check if your jobs are running out of memory (RAM).<\/p>\n<div class=\"warning\">We only have a small number of high memory nodes. They should <strong>only<\/strong> be used for work that requires significant amounts of RAM. <strong>Incorrect use of these nodes may result in restrictions being placed on your account<\/strong>.<\/div>\n<div class=\"hint\">Does your parallel job need 8GB per core or less? The AMD nodes in the multicore partition may be suitable for your work<\/div>\n<h2>How to check memory usage of your job<\/h2>\n<p>The batch system keeps track of the resources your jobs are using, and also records statistics about the job once it has finished.<\/p>\n<h3>A completed (successful) job<\/h3>\n<p>You can see the <em>peak<\/em> memory usage with the <code>seff<\/code> command, passing in a JOBID:<\/p>\n<pre>[<em>mabcxyz1<\/em>@login1[csf3] ~]$ seff <em>12345<\/em>\r\nJob ID: 12345\r\nCluster: csf3.man.alces.network\r\nUser\/Group: <em>username<\/em>\/<em>xy<\/em>01\r\nState: COMPLETED (exit code 0)\r\nNodes: 1\r\nCores per node: 2\r\nCPU Utilized: 00:04:13\r\nCPU Efficiency: 49.41% of 00:08:32 core-walltime\r\nJob Wall-clock time: 00:04:16\r\n<strong>Memory Utilized: 21.45 GB<\/strong>                       # <strong>Peak memory usage<\/strong>\r\nMemory Efficiency:<strong>33.5%<\/strong> of 64.00 GB             # A low memory efficiency means this job did NOT need\r\n                                                # to use the himem partition. You should check this.\r\n<\/pre>\n<p>To check a specific jobarray task, use a JOBID of the form <em>jobid_taskid<\/em>:<\/p>\n<pre>seff 12345_501\r\n<\/pre>\n<p>Alternatively, use the <code>sacct<\/code> command to obtain various stats about a job;<\/p>\n<pre>sacct -j <em>12345<\/em>\r\n\r\n# Or to just get the memory usage\r\nsacct  -j <em>12345<\/em> -o maxrss\r\n<\/pre>\n<p>The <code>sacct<\/code> command offers lots of options &#8211; use <code>man sacct<\/code> to get more info.<\/p>\n<p>Depending on the software you are using you may also find memory usage reported in output files.<\/p>\n<h3>A terminated (out of memory) job<\/h3>\n<p>If at any point during the running of the job, the job&#8217;s peak memory usage goes <em>above<\/em> the limit that the job is permitted to use, then the job will be terminated by the batch system.<\/p>\n<p>The <code>seff<\/code> command will show:<\/p>\n<pre>[<em>mabcxyz1<\/em>@login1[csf3] ~]$ seff 12345\r\nState: OUT_OF_MEMORY (exit code 0)\r\n<\/pre>\n<p>You may see the following in your <code>slurm-12345.out<\/code> file:<\/p>\n<pre>[<em>mabcxyz1<\/em>@login1[csf3] ~]$  cat slurm-12345.out\r\n\r\n\/var\/spool\/slurmd\/job12345\/slurm_script: line 4: 1851022 Killed             .\/some-app.exe -in data.dat -out results.dat\r\nslurmstepd: error: Detected 1 oom_kill event in StepId=12345.batch. Some of the step tasks have been OOM Killed.\r\n                               #\r\n                               # OOM is \"out of memory\" - this means Slurm killed your job\r\n                               # because it tried to use more memory than allowed.\r\n<\/pre>\n<p>You will need to resubmit your job, either requesting more cores (if using the standard partitions) or use a high memory partition &#8211; see next.<br \/>\n<a name=\"memflag\"><\/a><\/p>\n<h2>Submitting High Memory Jobs<\/h2>\n<p>Please note: The 2TB <code>himem<\/code> nodes are freely available to everyone. The 4TB <code>vhimem<\/code> node is restricted. Please <a href=\"\/csf3\/help\">request access<\/a> to this node from us before submitting jobs to this partition (we will check that your jobs are running out of memory on the 2TB nodes.)<\/p>\n<p>Memory is a &#8220;consumable&#8221; resource in the <code>himem<\/code> and <code>vhimem<\/code> partitions in the CSF (Slurm) cluster &#8211; you can specify how many cores <em>AND<\/em> how much memory your job requires (in the <code>multicore<\/code> partition, you can&#8217;t do this &#8211; instead you get a fixed amount of RAM per core.)<\/p>\n<p>For users that have previously used the CSF (SGE) cluster, there are <em>no<\/em> <code>mem1500<\/code>, <code>mem2000<\/code> or <code>mem4000<\/code> flags! There, memory was <em>not<\/em> a consumable &#8211; you had to request a certain number of cores, and the <code>-l memXXXX<\/code> flag would land the job on a node that gave a fixed amount of memory-per-core.<\/p>\n<p>On the CSF (Slurm) cluster, your jobs will land on <em>any<\/em> of the high-memory nodes to meet your core and memory requirements. You can <em>optionally<\/em> specify a CPU architecture if that is important to your job, but this will restrict the pool of nodes available for Slurm to choose from, and so might lead to longer queue-wait times.<\/p>\n<p>So, in summary, specify the <em>amount<\/em> of memory your job requires. If you need more than one core, specify the number of cores. But the number of cores does not necessarily determine the amount of memory.<\/p>\n<h3>High Memory Resources<\/h3>\n<p>The following resources are available to your job.<\/p>\n<p>The partition (and wallclock limit) are required. All other flags are optional.<\/p>\n<p>Note:<\/p>\n<ul>\n<li>If not specified, a job will use 1 CPU core.<\/li>\n<li>Memory units <code>M<\/code> (megabytes), <code>G<\/code> (gigabytes) or <code>T<\/code> (terabytes) can be specfied &#8211; e.g., <code>700G<\/code>. If <em>no<\/em> units letter is used, it defaults to <code>M<\/code> megabytes.<\/li>\n<li>Different Intel CPU architectures are available for high-memory jobs, but this is an optional flag. We recommend you <em>do not<\/em> specify an architecture. Slurm will place your job on any node that satisfies your core and memory requirements. You can request a particular <em>architecture<\/em> if required, but doing so may lead to longer queue-wait times.<\/li>\n<\/ul>\n<p><strong>Remember: you DO NOT specify the old SGE flags &#8211; Slurm knows nothing about these. Just say how much memory your job needs.<\/strong><\/p>\n<table class=\"striped\">\n<tbody>\n<tr>\n<th>Partition (required)<\/th>\n<th>Default job mem-per-core if memory not requested (GB)<\/th>\n<th>Max job size (cores)<\/th>\n<th>Max job memory (GB)<\/th>\n<th>Arch Flag (optional, but will activate specific limits shown in the next columns)<\/th>\n<th>Max job size (cores)<\/th>\n<th>Max job memory (GB)<\/th>\n<th>Has SSD storage<\/th>\n<th>Old SGE flag (DO NOT USE)<\/th>\n<\/tr>\n<tr>\n<td rowspan=\"3\"><code>himem<\/code><\/td>\n<td rowspan=\"3\">31<\/td>\n<td rowspan=\"3\">32<\/td>\n<td rowspan=\"3\">2000<\/td>\n<td><code>-C haswell<\/code><\/td>\n<td>16<\/td>\n<td>496<\/td>\n<td>No<\/td>\n<td>mem512<\/td>\n<\/tr>\n<tr>\n<td><code>-C cascadelake<\/code><\/td>\n<td>32<\/td>\n<td>1472<\/td>\n<td>No<\/td>\n<td>mem1500<\/td>\n<\/tr>\n<tr>\n<td><code>-C icelake<\/code> (also <code>ssd<\/code>)<\/td>\n<td>32<\/td>\n<td>2000<\/td>\n<td>Yes<\/td>\n<td>mem2000<\/td>\n<\/tr>\n<tr>\n<td><code>vhimem<\/code><\/td>\n<td>125<\/td>\n<td>32<\/td>\n<td>4000<\/td>\n<td><code>-C icelake<\/code> (also <code>ssd<\/code>)<\/td>\n<td>32<\/td>\n<td>4000<\/td>\n<td>Yes<\/td>\n<td>mem4000<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>If you have access to the 4TB node please do not submit lots of jobs to the queue at the same time, even if they genuinely need the RAM, we have to try and ensure that it not dominated by individuals so that everyone who needs it has an opportunity to run work on it.<\/p>\n<h3>Slurm Memory Flags<\/h3>\n<p>You will need to request an amount of memory for your high-memory job. You can use either of the following Slurm flags:<\/p>\n<pre># Specify the total amount of memory your job will have access to\r\n#SBATCH --mem=<em>n<\/em>G             # A total of <em>n<\/em> Gigabytes\r\n\r\n# OR, specify an amount of memory <em>per core<\/em>\r\n#SBATCH --mem-per-cpu=<em>m<\/em>G     # Your job will get <em>numcores<\/em> x <em>m<\/em> Gigabytes of RAM\r\n<\/pre>\n<p>DO NOT specify BOTH of the above memory flags &#8211; you should only specify one.<\/p>\n<h3>Example jobs<\/h3>\n<p>A high-memory job using the default 1-core and 31GB RAM. Remember &#8211; high-mem jobs can land on <em>any<\/em> of the high-memory compute nodes &#8211; you do not specify the old SGE <code>mem2000<\/code> flag, for example.<\/p>\n<pre>#!\/bin\/bash --login\r\n#SBATCH -p himem       # (or --partition=) A high memory job\r\n#SBATCH -t 1-0         # Wallclock limit, 1-0 is 1 day (max permitted is 7-0, 7 days)\r\nmodule purge\r\nmodule load apps\/some\/thing\/1.2.3\r\nsome-app.exe\r\n<\/pre>\n<p>Further examples will only show the Slurm <code>#SBATCH<\/code> flags for brevity.<\/p>\n<p>A high-memory job requesting 2 cores and 64GB RAM in total<\/p>\n<pre>#!\/bin\/bash --login\r\n#SBATCH -p himem      # (or --partition=) The high memory partition\r\n#SBATCH -n 2          # (or --ntasks=) Number of cores\r\n#SBATCH --mem=64G     # Total memory for the job (actually per-node but jobs only run on one node)\r\n#SBATCH -t 1-0\r\n<\/pre>\n<p>A high-memory job requesting an entire 2000GB node (32 cores max &#8211; if using all of the 2000 GB memory you might as well request all of the cores!)<\/p>\n<pre>#!\/bin\/bash --login\r\n#SBATCH -p himem      # (or --partition=) The high memory partition\r\n#SBATCH -n 32         # (or --ntasks=) Number of cores\r\n#SBATCH --mem=2000G   # Total memory for the job (actually per-node but jobs only run on one node)\r\n#SBATCH -t 1-0\r\n<\/pre>\n<p>A <strong>vhighmem<\/strong> 4000 GB node job, requesting more than 2000GB of RAM. Please <a href=\"\/csf3\/help\">request access<\/a> to this node from us before submitting jobs to this partition. Please include some job IDs of jobs that have exceeded the memory of the nodes in the <code>himem<\/code> partition.<\/p>\n<pre>#!\/bin\/bash --login\r\n#SBATCH -p <strong>vhimem<\/strong>     # (or --partition=) The VERY high memory partition\r\n#SBATCH -n 20         # (or --ntasks=) Number of cores (max is 32)\r\n#SBATCH --mem=3000G   # Total memory for the job (actually per-node but jobs only run on one node)\r\n#SBATCH -t 1-0\r\n<\/pre>\n<h2>Runtimes and queue times on high memory nodes<\/h2>\n<p>The maximum job runtime on higher-memory nodes is the same as other CSF nodes, namely 7 days.<\/p>\n<p>Due to the limited number of high memory nodes we cannot guarantee that jobs submitted to these nodes will start within the 24 hours that we aim for on the standard CSF3 nodes. Queue times may be several days or more.<\/p>\n<div class=\"warning\">We monitor usage of all the high memory nodes and will from time to time advise people if we think they are being incorrectly used. <strong>Persistent unfair use<\/strong> of high memory nodes <strong>may result in a ban from the nodes or limitations<\/strong> being placed on your usage of them.<\/div>\n<h2>Local SSD storage on high memory nodes<\/h2>\n<p>Some of the newer nodes have particularly large, fast local SSD storage in the nodes. This can be useful if your jobs do a lot of disk I\/O &#8211; frequently reading\/writing large files. Your jobs may benefit from first copying your large datasets to the SSD drives, then running in that area where they can write output files. Finally, copy any results you want to keep back to your scratch area.<\/p>\n<p>To ensure your high-memory job lands on a node with SSD storage, add <code>-C ssd<\/code> to your jobscript.<\/p>\n<p>To access the SSD drives within a jobscript, use the preset environment variable <code>$TMPDIR<\/code>. For example<\/p>\n<pre>#!\/bin\/bash\r\n#SBATCH -p highmem      # (or --partition=) The high memory partition\r\n#SBATCH -n 8            # (or --ntasks=) Number of cores (max is 32)\r\n#SBATCH --mem=1200G     # Total memory for the job\r\n#SBATCH -C ssd          # Guarantees that the job lands on a node with SSD storage\r\n#SBATCH -t 1-0          # Wallclock time limit, 1-0 is 1 day (max permitted is 7-0, 7 days)\r\n\r\n# Copy data from scratch to the local SSD drives\r\ncp ~\/scratch\/my_huge_dataset.dat $TMPDIR\r\n\r\n# Go to the SSD drives\r\ncd $TMPDIR\r\n# Run your app\r\n<em>myapp<\/em> my_huge_dataset.dat -o my_huge_results.dat\r\n\r\n# Copy result back to scratch\r\ncp my_huge_results.dat ~\/scratch\r\n<\/pre>\n<p>The <code>$TMPDIR<\/code> area (which is private to your job) will be <em>deleted automatically<\/em> by the batch system when the job ends.<\/p>\n<p>Remember, the <code>$TMPDIR<\/code> location is <em>local to each compute node<\/em>. So you won&#8217;t be able to see the same <code>$TMPDIR<\/code> storage on the login nodes or any other compute node. It is only available while a job is running.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Default Memory on the CSF The AMD 168-core Genoa nodes (-p multicore) have 8GB RAM per core. If your job needs more RAM on these nodes, request more cores. The standard Intel nodes (-p serial and -p multicore_small) have 4GB to 6GB of RAM per core. If your job needs more RAM on these nodes, request more cores. High-memory Intel nodes are also available, offering up to 2TB RAM in total , and if really.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/batch-slurm\/high-memory-jobs-slurm\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"parent":9105,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-9877","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/9877","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/comments?post=9877"}],"version-history":[{"count":20,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/9877\/revisions"}],"predecessor-version":[{"id":12386,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/9877\/revisions\/12386"}],"up":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/9105"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/media?parent=9877"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}