The CSF2 has been replaced by the CSF3 - please use that system! This documentation may be out of date. Please read the CSF3 documentation instead. To display this old CSF2 page click here. |
GATK
Overview
The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyze high-throughput sequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance.
Versions 3.5 & 4.0.8.1 are installed on the CSF.
Restrictions on use
Version 4 is free software under a BSD licence, so has no usage restrictions. The following only applies to the older version 3 (note, v3 is installed on the CSF).
Version 3 requires that users are added to a unix group to access it. All users, including visitors/collaborators must have an official University IT account and be registered with HR/Student records. Requests for access should be directed to its-ri-team@manchester.ac.uk.
All users MUST read and agree to the GATK 3 license before they can be added to the unix group. The information below provides some further guidance.
What may GATK 3 be used for?
You may use GATK 3 only for academic, non-commercial research purposes. The licence agreement makes this clarification: “Academic sponsored research is not a commercial use under the terms of this Agreement.”
What may GATK 3 not be used for?
You MUST not:
- copy, sublicense or distribute the program, in whole or in part;
- use the program, in whole or in part, for any commercial purpose, including without limitation, as the basis of a commercial software or hardware product or to provide services;
- copy or otherwise adapt the program in order to circumvent the need for obtaining a license;
- copy or othuse any trademark or trade name of the Broad Institute, or any variation, adaptation, or abbreviation, of such marks or trade names, or any names of officers, faculty, students, employees, or agents of the Broad Institute except for the citation attribution in published work as shown below.
How should the program be cited?
If you include the results from GATK 3 in a publication, you MUST include the following citation:
“The GATK3 program was made available through the generosity of Medical and Population Genetics program at the Broad Institute, Inc.”
and follow the citation guidance on the GATK citation webpage.
Export regulations including remote access
You must comply with all United States and United Kingdom export control laws and regulations controlling the export of the software, including, without limitation, all Export Administration Regulations of the United States Department of Commerce. Among other things, these laws and regulations prohibit, or require a license for, the export of certain types of software to specified countries.
Please be aware that allowing remote access from outside the United Kingdom may constitute an export.
Modifications to the program
If you develop bug fixes or modifications to the program, the University is required to provide these promptly to the Broad Institute on their creation and to grant a licence to them. As such modifications may constitute University intellectual property, you should not develop bug fixes or modifications without first seeking guidance about licensing said intellectual property.
Risks
The University is required to indemnify The Broad Institute and a wide range of its officers, students, employees and others against liabilities or damages that they may incur as a result of our use of the software.
Please therefore be aware of and comply with the licence agreement and discuss anything that is unclear with your line management in the first instance to obtain further guidance.
Set up procedure
To access the software you must first load the required modulefile:
module load apps/binapps/gatk/3.5 module load apps/binapps/gatk/4.0.8.1
Running the application
Please do not run GATK on the login node. Jobs should be submitted to the compute nodes via batch.
You may run the following command on the login node to see a list of available analyses in GATK:
gatk -h
It will print a list of the tool names to be used with the -T tool
flag on the GATK command-line in your jobscript (see below).
Serial batch job submission
Make sure you have the modulefile loaded then create a batch submission script, for example:
#!/bin/bash #$ -S /bin/bash #$ -cwd # Job will run from the current directory #$ -V # Job will inherit current environment settings gatk -T CountReads -R exampleFASTA.fasta -I exampleBAM.bam other args... # # # Name of GATK tool (analysis) to run
Submit the jobscript using:
qsub scriptname
where scriptname is the name of your jobscript.
Further info
- The GATK websiteincludes documentation
Updates
None.