GATK

Overview

GATK offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Various versions are installed on the CSF – please see modulefiles below.

Restrictions on use

There are no restrictions on accessing GATK4 on the CSF. It is released under the Apache 2.0 license and all use must adhere to that license.

GATKv3 is released under a more restrictive license which prohibits commercial/for-profit use. All usage must adhere to that license. If this is too restrictive, you must switch to GATK4, which is fully open sourced.

Set up procedure

We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.

Load one of the following modulefiles:

module load apps/singularity/gatk/4.5.0.0
module load apps/binapps/gatk/4.4.0.0
module load apps/binapps/gatk/4.1.8.0

# For older versions, first load the bioinf modulefile
module load apps/bioinf
# Then the required gatk modulefile
module load apps/gatk/3.8.0               # See StatusLogger Log4j2 error fix below
module load apps/gatk/3.6.0
module load apps/gatk/3.5.0

Running the application

Please do not run gatk on the login node to process data. Jobs should be submitted to the compute nodes via batch.

You may run gatk -h on the login node to see a list of flags that can be used to run the various GATK tools in your jobscripts.

Please note that complete instructions on how to run gatk are beyond the scope of this page. Please consult the GATK Online Documentation for how to use this application.

StatusLogger Log4j2 Error Fix

This section gives a fix for the StatusLogger error, which has been seen in v3.8.0 and may exist in other versions.

If you receive an error similar to the following, particularly in v3.8.0 when running on the AMD compute nodes (#SBATCH -p multicore):

ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory ...
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. 

then please append the following flags to the gatk command-line in your jobscript:

-jdk_inflater -jdk_deflater

Without these flags, gatk will use some optimized components that only run on Intel CPUs.

See the jobscript examples below.

Serial batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#SBATCH -p serial     # (or --partition=) Run on the nodes dedicated to 1-core jobs
#SBATCH -t 4-0        # Wallclock time limit. 4-0 is 4 days. Max permitted is 7-0.

# Start with a clean environment - modules are inherited from the login node by default.
module purge
module load apps/binapps/gatk/4.4.0.0

# Note: The -jdk_inflater -jdk_deflater may be needed in v3.8.0 jobs on the AMD (-p multicore) nodes
gatk -T RealignerTargetCreator -R my.fasta -I my.bam -o my_realigner.intervals  -jdk_inflater -jdk_deflater

Submit the jobscript using:

sbatch scriptname

where scriptname is the name of your jobscript.

Parallel batch job submission

If the app is multicore capable, given an example parallel jobscript, including suitable partition

#!/bin/bash --login
#SBATCH -p multicore  # (or --partition=) Run on the AMD 168-core nodes
#SBATCH -n 16         # (or --ntasks=) Number of cores to use.
#SBATCH -t 4-0        # Wallclock time limit. 4-0 is 4 days. Max permitted is 7-0.

# Start with a clean environment - modules are inherited from the login node by default.
module purge
module load apps/binapps/gatk/4.4.0.0

# You must inform you app how many cores to use. $SLURM_NTASKS will be set to the -n number above.
# Note: The -jdk_inflater -jdk_deflater may be needed in v3.8.0 jobs on the AMD (-p multicore) nodes
gatk -T RealignerTargetCreator -nt $SLURM_NTASKS -R my.fasta -I my.bam -o my_realigner.intervals -jdk_inflater -jdk_deflater

Submit the jobscript using:

sbatch scriptname

where scriptname is the name of your jobscript.

Further info

Updates

None.

Last modified on July 23, 2025 at 2:54 pm by George Leaver