Gaussian16 Linda (multinode)

Gaussian is a general purpose suite of electronic structure programs. The “Linda” version adds multi-node (larger parallel jobs) capability to Gaussian.

Versions g16c01 Linda is installed on the CSF. It is available as binaries only. The source code is not available on the CSF.

Gaussian 16 (non-Linda) is also available on the CSF.
Gaussian 09 (non-Linda) is also available on the CSF.

Restrictions on use

Gaussian Linda is only available to members for Prof. Kaltsoyannis’s group. All requests to access this version must be approved by Prof. Kaltsoyannis.

Please contact us via its-ri-team@manchester.ac.uk to request access to Gaussian 16 Linda.

Set up procedure

There is only one version available on the CSF – optimized for Intel “Haswell” architectures or equivalently any CPU that provides AVX2 vector extensions. This includes all Intel and AMD compute nodes in the CSF.

The detectcpu modulefile must be loaded inside your jobscript, not on the login node.

After being added to the relevant unix group, you will be able to access the executables by loading the modulefile:

module load apps/binapps/gaussian/g16c01_em64t_detectcpu_linda

Gaussian MUST ONLY be run in batch. Please DO NOT run g16 on the login nodes. Computational work found to be running on the login nodes will be killed WITHOUT WARNING.

Gaussian Scratch

Gaussian uses an environment variable $GAUSS_SCRDIR to specify a directory path for where to write scratch (temporary) files (two-electron integral files, integral derivative files and a read-write file for temporary workings). It is set to your scratch directory (~/scratch) when you load the modulefile. This is a Lustre filesystem which provides good I/O performance.

Do not use your home directory for Gaussian scratch files – the files can be huge making the home area at risk of going over quota. We also recommend using a directory-per-job in your scratch area. See below for how to do this.

A faster, but smaller, local /tmp on each compute node is available should users prefer to use that. It can be more efficient if you have a need to create lots of small files, but space is limited. The minimum /tmp on intel compute nodes is 800GB, the largest is 3.5TB.

Gaussian should delete scratch files automatically when a job completes successfully or dies cleanly. However, it often fails to do this. Scratch files are also not deleted when a job is killed externally or terminates abnormally so that you can use the scratch files to restart the job (if possible). Consequently, leftover files may accumulate in the scratch directory, and it is your responsibility to delete these files. Please check periodically whether you have a lot of temporary Gaussian files that can be deleted.

Using a Scratch Directory per Job

We now recommend using a different scratch directory for each job. This improves file access times if you run many jobs – writing 1000s of scratch files to a single directory can slow down your jobs. It is much better to create a directory for each job within your scratch area. It is also then easy to delete the entire directory if Gaussian has left unwanted scratch files behind.

The example jobscripts below show how to use this method (it is simple – just two extra lines in your jobscript).

Very large Gaussian scratch files

Occasionally some jobs create .rwf files which are very large (several TB). The batch system will not permit a job to create files bigger than 4TB. If your gaussian job fails and the .rwf file is 4TB then it may be that this limit has prevented your job from completing. You should re-run the job and in your input file request that the .rwf file be split into multiple files. For example to split the file into two 3TB files:

%rwf=/scratch/$USER/myjob/one.rwf,3000GB,/scratch/$USER/myjob/two.rwf,3000GB

Number of Cores

If your Gaussian input file contains the number of cores – e.g.:

%NProcsShared
%nprocs

then you must remove this line from your input file.

The Linda modulefile will work out the correct number of cores to use, based on your jobscript settings.

Parallel batch job

You must submit your job to one of the Parallel Environments mpi-24-ib.pe or hpc.pe on the CSF so that it can run as a multi-node job. Our installation will use the fast InfiniBand network for Linda communication. Submitting to the smp.pe or amd.pe environments (to run on a single compute node) will not work.

#!/bin/bash --login
#$ -cwd                     # Run job in directory you submitted from
#$ -pe mpi-24-ib.pe 72      # Number of cores, minimum 48, in multiples of 24
                            # This example uses 3 x 24-core compute nodes.

#### Alternatively, if you have access to a HPC Pool Project
#$ -P hpc-projectcode       # The HPC Project code is required
#$ -pe hpc.pe 128           # This example uses 4 x 32-core compute nodes.
####

## Set up scratch dir (please do this!)
export GAUSS_SCRDIR=/scratch/$USER/gau_temp_$JOB_ID
mkdir -p $GAUSS_SCRDIR

# Load g16 linda 
module load apps/binapps/gaussian/g16c01_em64t_detectcpu_linda

$g16root/g16/g16 < file.inp > file.out

Submit the job using qsub jobscript where jobscript is the name of your jobscript.

Automatic Settings

Note that the Gaussian Linda module will make automatically a lot of the environment settings required by Gaussian. These include:

# The Linda modulefile will set these variables automatically. In general, DO NOT SET THESE YOURSLEF!
GAUSS_PDEF or GAUSS_CDEF   # to specify the number of cores (DO NOT SET YOURSELF)
GAUSS_MDEF                 # to specify the memory to use (default is 3.5GB per core)
GAUSS_WDEF                 # the names of the nodes to use for Linda (DO NOT SET YOURSELF)
GAUSS_SDEF                 # the remote process launcher (DO NOT SET YOURSELF)
GAUSS_LFLAGS2              # the linda stacksize (default is "--LindaOptions -s 20000000")
GAUSS_LFLAGS               # optional flags, e.g., -vv for debugging (default is unset)

Optional Settings

NOTE: We DO NOT recommend that you change the defaults. But we document them here for completeness.

It is possible to override the default amount of memory per core used by Gaussian, by setting the following before you load the modulefile:

export GB_PER_CORE=4      # Some number of GB per core. Using too much will crash your job.
module load ...

It is also possible to increase the number of Linda workers per node, which defaults to one. Each Linda worker will use all of the cores available to it on the compute nodes. The default of 1 is strongly recommended, and increasing this will likely reduce performance. But you can set this before loading the modulefile if you want to experiment:

export NTASKS_PER_NODE=2   # Some number of Linda workers per node. The default of 1 is usually best.
module load ...

Further info

Last modified on October 4, 2024 at 6:56 pm by George Leaver