{"id":4682,"date":"2020-07-22T17:08:47","date_gmt":"2020-07-22T16:08:47","guid":{"rendered":"http:\/\/ri.itservices.manchester.ac.uk\/csf3\/?page_id=4682"},"modified":"2024-01-03T15:12:24","modified_gmt":"2024-01-03T15:12:24","slug":"gromacs-2020-3-with-gpu-builds","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/gromacs-2020-3-with-gpu-builds\/","title":{"rendered":"Gromacs 2020.3 (CPU &#038; GPU versions)"},"content":{"rendered":"<h2>Overview<\/h2>\n<p>GROMACS is a package for computing molecular dynamics, simulating Newtonian equations of motion for systems with hundreds to millions of particles. GROMACS is designed for biochemical molecules with complicated bonded interactions (e.g. proteins, lipids, nucleic acids) but can also be used for non-biological systems (e.g. polymers).<\/p>\n<div class=\"warning\">\n<em>Please do <strong>not<\/strong> add the <code>-v<\/code> flag to your <code>mdrun<\/code> command.<\/em><\/p>\n<p>It will write to a log file every second for the duration of your job and can lead to severe overloading of the file servers.<\/td>\n<\/div>\n<h2>Significant Change in this Version<\/h2>\n<p>Within Gromacs 2020, the different gromacs commands (e.g., <code>mdrun<\/code>, <code>grompp<\/code>, <code>g_hbond<\/code>) should now be run using the command:<\/p>\n<pre>gmx <em>command<\/em>\r\n<\/pre>\n<p>where <code><em>command<\/em><\/code> is the name of the command you wish to run (without any <code>g_<\/code> prefix), for example:<\/p>\n<pre>gmx mdrun\r\n<\/pre>\n<p>The <code>gmx<\/code> command changes its name to reflect the gromacs flavour being used but the <code><em>command<\/em><\/code> does not change. For example, if using the <code>mdrun<\/code> command:<\/p>\n<pre># New 2020.3 method                  # Previous 5.0.4 method\r\n# =================                  # =====================\r\ngmx   mdrun                          mdrun\r\ngmx_d mdrun                          mdrun_d\r\nmpirun -n $NSLOTS gmx_mpi   mdrun    mpirun -n $NSLOTS mdrun_mpi\r\nmpirun -n $NSLOTS gmx_mpi_d mdrun    mpirun -n $NSLOTS mdrun_mpi_d\r\n<\/pre>\n<p>The complete list of <code><em>command<\/em><\/code> names can be found by running the following on the login node:<\/p>\n<pre>gmx help commands<\/pre>\n<pre># The following commands are available:\r\nanadock\t\t\tgangle\t\t\trdf\r\nanaeig\t\t\tgenconf\t\t\trms\r\nanalyze\t\t\tgenion\t\t\trmsdist\r\nangle\t\t\tgenrestr\t\trmsf\r\nawh\t\t\tgrompp\t\t\trotacf\r\nbar\t\t\tgyrate\t\t\trotmat\r\nbundle\t\t\th2order\t\t\tsaltbr\r\ncheck\t\t\thbond\t\t\tsans\r\nchi\t\t\thelix\t\t\tsasa\r\ncluster\t\t\thelixorient\t\tsaxs\r\nclustsize\t\thelp\t\t\tselect\r\nconfrms\t\t\thydorder\t\tsham\r\nconvert-tpr\t\tinsert-molecules\tsigeps\r\ncovar\t\t\tlie\t\t\tsolvate\r\ncurrent\t\t\tmake_edi\t\tsorient\r\ndensity\t\t\tmake_ndx\t\tspatial\r\ndensmap\t\t\tmdmat\t\t\tspol\r\ndensorder\t\tmdrun\t\t\ttcaf\r\ndielectric\t\tmindist\t\t\ttraj\r\ndipoles\t\t\tmk_angndx\t\ttrajectory\r\ndisre\t\t\tmorph\t\t\ttrjcat\r\ndistance\t\tmsd\t\t\ttrjconv\r\ndo_dssp\t\t\tnmeig\t\t\ttrjorder\r\ndos\t\t\tnmens\t\t\ttune_pme\r\ndump\t\t\tnmtraj\t\t\tvanhove\r\ndyecoupl\t\torder\t\t\tvelacc\r\ndyndom\t\t\tpairdist\t\tview\r\neditconf\t\tpdb2gmx\t\t\twham\r\neneconv\t\t\tpme_error\t\twheel\r\nenemat\t\t\tpolystat\t\tx2top\r\nenergy\t\t\tpotential\t\txpm2ps\r\nfilter\t\t\tprincipal\r\nfreevolume\t\trama\r\n<\/pre>\n<p>Notice that the command names do NOT start with <code>g_<\/code> and do NOT reference the flavour being run (e.g., <code>_mpi_d<\/code>). Only the main <code>gmx<\/code> command changes its name to reflect the flavour (see below for list of modulefiles for the full list of flavours available).<\/p>\n<p>To obtain more help about a particular command run:<\/p>\n<pre>gmx help <em>command<\/em>\r\n<\/pre>\n<p>For example<\/p>\n<pre>gmx help mdrun\r\n<\/pre>\n<h3>Helper scripts<\/h3>\n<p>To assist with moving to the new command calling method, we have recreated some of the individual commands that you may have used in your jobscript. For example, you can continue to use <code>mdrun<\/code> (or <code>mdrun_d<\/code>) instead of the new <code>gmx mdrun<\/code> (or <code>gmx_d mdrun<\/code>) in this release. These extra commands are automatically included in your environment when you load the gromacs modulefiles. This old method uses the flavour of gromacs in the command name (see above for comparison of new and old commands).<\/p>\n<p>However, please note that some commands are new (since 2020.1) and so can only be run using the new method (<code>gmx <em>command<\/em><\/code>):<\/p>\n<h2>Available Flavours<\/h2>\n<p>For version 2020.3 we have compiled multiple versions of Gromacs, each of which is optimised for a particular CPU architecture. We have also built versions with GPU support (note, GPU versions of Gromacs only support single precision). The module file has been written to detect which CPU the compute node is using and to automatically select the correct Gromacs executable. If you want to ensure you get a particular level of opimisation <a href=\"\/csf3\/batch\/intel-cores\/\">specify an architecture<\/a> in the jobscript e.g <code>-l skylake<\/code>.<\/p>\n<h3>2020.3 for Ivybridge (and Haswell, Broadwell and Skylake nodes) only<\/h3>\n<p>With AVX optimisation.<\/p>\n<h3>2020.3 for Haswell and Broadwell (and Skylake) nodes only<\/h3>\n<p>With AVX2 optimisation.<\/p>\n<h3>2020.3 for Skylake nodes only<\/h3>\n<p>With AVX-512 optimisation.<\/p>\n<h3>2020.3 for Skylake nodes with GPU acceleration.<\/h3>\n<p>With AVX-512 optimisation and with GPU acceleration turned on. Note only single precision versions are available with GPU acceleration.<\/p>\n<h2>Restrictions on use<\/h2>\n<p>GROMACS is free software, available under the GNU General Public License.<\/p>\n<h2>Set up procedure<\/h2>\n<p>You must load the appropriate modulefile:<\/p>\n<pre>module load <em>modulefile<\/em>\r\n<\/pre>\n<p>replacing <em>modulefile<\/em> with one of the modules listed in the table below. The module file will auto-detect and pick a version of Gromacs with AVX optimisations to match the CPU of the compute node(s) you are assigned.<\/p>\n<p>&nbsp;<\/p>\n<table class=\"striped\">\n<tbody>\n<tr>\n<th width=\"27%\">Version<\/th>\n<th width=\"35%\">Modulefile<\/th>\n<th width=\"18%\">Notes<\/th>\n<th width=\"20%\">Typical Executable name<\/th>\n<\/tr>\n<tr>\n<td>Single precision multi-threaded (single-node)<\/td>\n<td>apps\/intel-18.0\/gromacs\/2020.3\/single<\/td>\n<td>non-MPI with GPU acceleration available<\/td>\n<td><code>mdrun<\/code> or <code>gmx\u00a0mdrun<\/code><\/td>\n<\/tr>\n<tr>\n<td>Double precision multi-threaded (single-node)<\/td>\n<td>apps\/intel-18.0\/gromacs\/2020.3\/double<\/td>\n<td>non-MPI<\/td>\n<td><code>mdrun_d<\/code> or <code>gmx_d\u00a0mdrun<\/code><\/td>\n<\/tr>\n<tr>\n<td>Single precision MPI<\/td>\n<td>apps\/intel-18.0\/gromacs\/2020.3\/single_mpi<\/td>\n<td>For MPI<\/td>\n<td><code>mdrun_mpi<\/code> or <code>gmx_mpi\u00a0mdrun<\/code><\/td>\n<\/tr>\n<tr>\n<td>Double precision MPI<\/td>\n<td>apps\/intel-18.0\/gromacs\/2020.3\/double_mpi<\/td>\n<td>For MPI<\/td>\n<td><code>mdrun_mpi_d<\/code> or <code>gmx_mpi_d\u00a0mdrun<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Running the application<\/h2>\n<p>Please do <strong>not<\/strong> run GROMACS on the login node.<\/p>\n<h3>Important notes regarding running jobs in batch<\/h3>\n<p>We now recommend that the relevant module file (single\/double non-MPI\/MPI) is loaded as part of your batch script. The module file itself will select the most suitable build of Gromacs for the processor architecture you end up running on.<\/p>\n<p>Please NOTE the following which is important for running jobs correctly and efficiently:<\/p>\n<p>Ensure you inform gromacs how many cores it can use. This is done using the <code>$NSLOTS<\/code> variable which is automatically set for you in the jobscript to be the number of cores you request in the jobscript header (see later for complete examples). You can use either of the following methods depending whether you want a multi-core job (running on a single compute node) or a larger job running across multiple compute nodes:<\/p>\n<pre># Multi-core (single-node) or Multi-node MPI jobs\r\n\r\nmpirun -n $NSLOTS mdrun_mpi         # Old method (v5.0.4 and earlier)\r\nmpirun -n $NSLOTS mdrun_mpi_d       # Old method (v5.0.4 and earlier)\r\n\r\nmpirun -n $NSLOTS gmx_mpi mdrun     # New method (v5.1.4 and later)\r\nmpirun -n $NSLOTS gmx_mpi_d mdrun   # New method (v5.1.4 and later)<\/pre>\n<p>or<\/p>\n<pre># Single-node multi-threaded job\r\n\r\nexport OMP_NUM_THREADS=$NSLOTS      # Do this for all versions\r\nmdrun                               # Old method (v5.0.4 and earlier)\r\nmdrun_d                             # Old method (v5.0.4 and earlier)\r\n\r\nexport OMP_NUM_THREADS=$NSLOTS      # Do this for all versions\r\ngmx mdrun                           # New method (v5.1.4 and later)\r\ngmx_d mdrun                         # New method (v5.1.4 and later)\r\n\r\n# Single-node multi-threaded job with GPU acceleration.\r\n\r\n<\/pre>\n<p>The examples below can be used for single precision or double precision gromacs. Simply run <code>mdrun<\/code> (single precision) or <code>mdrun_d<\/code> (double precision).<\/p>\n<div class=\"warning\">\n<em>Please do <strong>not<\/strong> add the <code>-v<\/code> flag to your <code>mdrun<\/code> command.<\/em><\/p>\n<p>It will write to a log file every second for the duration of your job and can lead to severe overloading of the file servers.\n<\/p><\/div>\n<h3>Multi-threaded single-precision on Intel nodes, 2 to 32 cores<\/h3>\n<p>Note that GROMACS 2020.3 (unlike v4.5.4) does <strong>not<\/strong> support the <code>-nt<\/code> flag to set the number of threads when using the multithreaded OpenMP (non-MPI) version. Instead set the <code>OMP_NUM_THREADS<\/code> environment variable as shown below.<\/p>\n<p>An example batch submission script to run the <strong>single-precision<\/strong> mdrun executable with 12 threads:<\/p>\n<pre>#!\/bin\/bash --login\r\n#$ -cwd\r\n#$ -pe smp.pe 12            # Can specify 2 to 32 cores in smp.pe\r\n                           \r\nmodule load apps\/intel-18.0\/gromacs\/2020.3\/single\r\nexport OMP_NUM_THREADS=$NSLOTS\r\nmdrun\r\n  #\r\n  # This is the old naming convention (it will still work in this release)\r\n  # The new gromacs convention is to run: gmx mdrun\r\n<\/pre>\n<p>Submit with the command: <code>qsub scriptname<\/code><\/p>\n<p>The system will run your job on a Ivybridge, Haswell, Broadwell or Skylake node depending on the number of cores requested and what is available. Not specifying (recommende) an architecture means that your job will start as soon as any type of compute node that can accommodate it becomes available (it gives the job the biggest pool of nodes to target). To get a more optimised run on Haswell\/Broadwell you should specify the architecture you require, but note that it may take longer for your job to start as specifying an architecture reduces the size of the pool that the system can target.<\/p>\n<h3>Multi-threaded double-precision on Intel nodes, 2 to 32 cores<\/h3>\n<p>An example batch submission script to run the <strong>double-precision<\/strong> <code>mdrun_d<\/code> executable with 8 threads:<\/p>\n<pre>#!\/bin\/bash --login\r\n#$ -cwd\r\n#$ -pe smp.pe 24\r\nmodule load apps\/intel-18.0\/gromacs\/2020.3\/double\r\nexport OMP_NUM_THREADS=$NSLOTS\r\nmdrun_d\r\n  #\r\n  # This is the old naming convention (it will still work in this release)\r\n  # The new gromacs convention is to run: gmx_d mdrun<\/pre>\n<p>Submit with the command: <code>qsub scriptname<\/code><\/p>\n<h3>Single precision MPI (single-node), 2 to 32 cores<\/h3>\n<p>An example batch submission script to run the double-precision <code>mdrun_mpi<\/code> executable on 8 cores using mpi:<\/p>\n<pre>#!\/bin\/bash --login\r\n#$ -cwd\r\n#$ -pe smp.pe 8            \r\nmodule load apps\/intel-18.0\/gromacs\/2020.3\/single_mpi                                          \r\nmpirun -n $NSLOTS mdrun_mpi\r\n<\/pre>\n<p>Submit with the command: <code>qsub scriptname<\/code><\/p>\n<h3>Double precision MPI (single-node), 2 to 32 cores<\/h3>\n<p>An example batch submission script to run the <strong>double-precision<\/strong> <code>mdrun_mpi_d<\/code> executable on 8 cores using mpi:<\/p>\n<pre>#!\/bin\/bash --login\r\n#$ -cwd\r\n#$ -V\r\n#$ -pe smp.pe 8\r\nmodule load apps\/intel-18.0\/gromacs\/2020.3\/double_mpi                                           \r\nmpirun -n $NSLOTS mdrun_mpi_d\r\n  #\r\n  # This is the old naming convention (it will still work in this release)\r\n  # The new gromacs convention is to run: mpirun -n $NSLOTS gmx_mpi_d mdrun\r\n<\/pre>\n<p>Submit with the command: <code>qsub scriptname<\/code><\/p>\n<h3>Single-precision, MPI, 48 cores or more in multiples of 24<\/h3>\n<p>An example batch submission script to run the <strong>single precision<\/strong> <code>mdrun_mpi<\/code> executable with 48 MPI processes (48 cores on two 24-core nodes) with the <code>mpi-24-ib.pe<\/code> parallel environment (Intel Haswell nodes using infiniband):<\/p>\n<pre>#!\/bin\/bash --login\r\n#$ -cwd\r\n#$ -pe mpi-24-ib.pe 48           # EG: Two 24-core Intel Haswell nodes\r\nmodule load apps\/intel-18.0\/gromacs\/2020.3\/single_mpi\r\nmpirun -n $NSLOTS gmx_mpi mdrun\r\n  #\r\n  # This is the old naming convention (it will still work in this release)\r\n  # The new gromacs convention is to run: mpirun -n $NSLOTS gmx_mpi mdrun<\/pre>\n<p>Submit with the command: <code>qsub scriptname<\/code><\/p>\n<h3>Double-precision, MPI, 48 cores or more in multiples of 24<code><\/code><\/h3>\n<p>An example batch submission script to run the <strong>single precision<\/strong> <code>mdrun_mpi<\/code> executable with 48 MPI processes (48 cores on two 24-core nodes) with the <code>mpi-24-ib.pe<\/code> parallel environment (Intel Haswell nodes using infiniband):<code><\/code><\/p>\n<pre>#!\/bin\/bash --login\r\n#$ -cwd\r\n#$ -pe mpi-24-ib.pe 48           # EG: Two 24-core Intel Haswell nodes\r\nmodule load apps\/intel-18.0\/gromacs\/2020.3\/double_mpi\r\nmpirun -n $NSLOTS gmx_mpi_d mdrun\r\n<\/pre>\n<p>Submit with the command: <code>qsub scriptname<\/code><code><\/code><\/p>\n<h3>Multi-threaded single-precision on a single node with one GPU.<\/h3>\n<p><strong>You need to request being added to the relevant group to access <a href=\"\/csf3\/batch\/gpu-jobs\/\">GPUs<\/a> before you can run GROAMACS on them.<\/strong><\/p>\n<p>Please note that if you have &#8216;free at the point of use&#8217; access to the GPUs then the maximum number of GPUs you can request is 2<\/p>\n<p>The maximum number of CPU cores that anyone can request is 8 per GPU.<\/p>\n<pre>#!\/bin\/bash --login\r\n#$ -cwd\r\n#$ -pe smp.pe 8             #Specify the number of CPUs, maximum of 8 per GPU.\r\n#$ -l v100               #This requests a single GPU.\r\n\r\nmodule load apps\/intel-18.0\/gromacs\/2020.3\/single\r\ngmx mdrun -ntmpi 1 -ntomp ${NSLOTS} ...\r\n<\/pre>\n<p>Submit with the command: <code>qsub scriptname<\/code><\/p>\n<p>This requests 1 thread mpi rank for the GPU and $NSLOTS (8 in this case) OpenMP threads per rank.<\/p>\n<h3>Multi-threaded single-precision on a single node with multiple GPUs<\/h3>\n<p><strong>You need to request being added to the relevant group to access <a href=\"\/csf3\/batch\/gpu-jobs\/\">GPUs<\/a> before you can run GROAMACS on them.<\/strong><\/p>\n<p>Please note that if you have &#8216;free at the point of use&#8217; access to the GPUs then the maximum number of GPUs you can request is 2 (please therefore follow the previous example).<\/p>\n<p>The maximum number of CPU cores that anyone can request is 8 per GPU requested e.g. 1 GPU and 8 cores, 2 GPUs and 16 cores.<\/p>\n<pre>#!\/bin\/bash --login\r\n#$ -cwd\r\n#$ -pe smp.pe 16            #Specify the number of CPUs, maximum of 8 per GPU.\r\n#$ -l v100=2             #Specify we want a GPU (nvidia_v100) node with two GPUs, maximum is 4.\r\n\r\nmodule load apps\/intel-18.0\/gromacs\/2020.3\/single\r\ngmx mdrun -ntmpi 2 -ntomp 8 ...\r\n<\/pre>\n<p>Here <code>ntmpi<\/code> is the number of (thread)mpi ranks and <code>ntomp<\/code> is the number of OpenMP threads per rank.<\/p>\n<p>For the example above where we have requested 2 GPUs (and therefore have a maximum of 16 cores to use) sensible combinations are<\/p>\n<pre>-ntmpi 2 -ntomp 8   # 1 Rank per GPU, 8 threads per rank\r\n-ntmpi 4 -ntomp 4   # 2 Ranks per GPU, 4 threads per rank\r\n-ntmpi 8 -ntomp 2   # 4 Ranks per GPU, 2 threads per rank\r\n<\/pre>\n<p>If you have time to experiment you can try each combination to see which gives the best performance, if not, use the following<\/p>\n<pre>export OMP_NUM_THREADS=$((NSLOTS\/NGPUS))\r\ngmx mdrun -ntmpi ${NGPUS} -ntomp ${OMP_NUM_THREADS}\r\n<\/pre>\n<p>Submit with the command: <code>qsub scriptname<\/code><\/p>\n<h2>Error about OpenMP and cut-off scheme<\/h2>\n<p>If you encounter the following error:<\/p>\n<pre>OpenMP threads have been requested with cut-off scheme Group, but these \r\nare only supported with cut-off scheme Verlet\r\n<\/pre>\n<p>then please try using the mpi version of the software. Note that is is possible to run mpi versions on a single node (example above).<\/p>\n<h2>Further info<\/h2>\n<ul>\n<li>You can see a list of all the installed GROMACS utilities with the command: <code>ls $GMXDIR\/bin<\/code><\/li>\n<li><a href=\"http:\/\/www.gromacs.org\/About_Gromacs\">GROMACS web page<\/a><\/li>\n<li><a href=\"http:\/\/www.gromacs.org\/Documentation\/Manual\">GROMACS manuals<\/a><\/li>\n<li><a href=\"http:\/\/www.gromacs.org\/Support\/Mailing_Lists\">GROMACS user mailing list<\/a><\/li>\n<\/ul>\n<p>Updates<\/p>\n<p>Some issues related to free energy calculations were corrected in the 2020.2 release. For further information refer to this <a href=\"http:\/\/manual.gromacs.org\/documentation\/2020.2\/release-notes\/2020\/2020.2.html\"<\/a>link.<\/p>\n<p>July 2020 &#8211; 2020.3 installed with AVX, AVX2 and AVX-512 support enabled and GPU builds<br \/>\nJun 2020 &#8211; 2020.1 installed with AVX, AVX2 and AVX-512 support enabled and GPU builds<br \/>\nDec 2018 &#8211; 2018.4 installed with AVX, AVX2 and AVX-512 support enabled and GPU builds<br \/>\nOct 2018 &#8211; 2016.4 installed with AVX, AVX2 and AVX-512 support enabled and patched with Plumed 2.4.0<br \/>\nOct 2018 &#8211; 2016.4 installed with AVX, AVX2 and AVX-512 support enabled<br \/>\nOct 2018 &#8211; 2016.3 installed with AVX, AVX2 and AVX-512 support enabled<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview GROMACS is a package for computing molecular dynamics, simulating Newtonian equations of motion for systems with hundreds to millions of particles. GROMACS is designed for biochemical molecules with complicated bonded interactions (e.g. proteins, lipids, nucleic acids) but can also be used for non-biological systems (e.g. polymers). Please do not add the -v flag to your mdrun command. It will write to a log file every second for the duration of your job and can.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/gromacs-2020-3-with-gpu-builds\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":14,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-4682","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/4682","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/users\/14"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/comments?post=4682"}],"version-history":[{"count":10,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/4682\/revisions"}],"predecessor-version":[{"id":7344,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/4682\/revisions\/7344"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/media?parent=4682"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}