CheckM
Overview
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes. It provides robust estimates of genome completeness and contamination by using collocated sets of genes that are ubiquitous and single-copy within a phylogenetic lineage.
Version 1.2.2 is installed on the CSF.
Version 1.1.0 is installed on the CSF.
Restrictions on use
There are no restrictions on accessing this software on the CSF. It is licensed using the GNU General Public License version 3 and all usage must adhere to that license.
Please cite your usage of this software using:
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2014. Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25: 1043-1055.
Set up procedure
We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.
Load one of the following modulefiles:
module load apps/python/checkm/1.2.2 module load apps/python/checkm/1.1.0 # You will also need to load modulefiles for hmm (search and fetch), prodigal and pplacer, as follows: module load apps/gcc/hmmer/3.2.1 module load apps/gcc/prodigal/2.6.3 module load apps/binapps/pplacer/1.1.alpha19
Standard Data (dataRoot)
The checkm standard database has been downloaded and the dataRoot variable has been configured. The data is installed centrally. You do not need to do anything to configure checkm to use this data. For reference, the data was downloaded from https://data.ace.uq.edu.au/public/CheckM_databases/.
Running the application
Please do not run checkm on the login node. It is very memory hungry. Jobs should be submitted to the compute nodes via batch.
Please see the checkm online documentation for an example of typical usage.
You may run checkm
without any args / flags on the login node to get the help text showing the names of the checkm tools that can be run. For example:
checkm ...::: CheckM v1.1.0 :::... Lineage-specific marker set: tree -> Place bins in the reference genome tree tree_qa -> Assess phylogenetic markers found in each bin lineage_set -> Infer lineage-specific marker sets for each bin Taxonomic-specific marker set: ...
Serial batch job submission
PLEASE NOTE: due to the high memory requirements of checkm, running serial jobs is unlikely to give you enough memory. If you are on a compute-node with <40GB of memory per-core, the
--reduced_tree
flag can be added to the checkm
command to reduced the memory requirements to approximately 14GB. Most CSF3 compute nodes offer only 4–5GB per core! The high memory nodes can offer 16GB, 32GB, 46GB, and 50GB per core. But given that checkm can use multiple cores to speed up computation, using a multi-core job is recommended as it will also give you access to more memory.
Create a batch submission script (which will load the modulefile in the jobscript), for example:
#!/bin/bash --login #$ -cwd # Job will run from the current directory # NO -V line - we load modulefiles in the jobscript # Choose the version you require module load apps/python/checkm/1.1.0 # $NSLOTS will be automatically set to 1 (one core) in a serial job checkm tool -t $NSLOTS arg1 arg2 ... # # See the checkm documentation for a list of its available tools
Submit the jobscript using:
qsub scriptname
where scriptname is the name of your jobscript.
Parallel batch job submission
Create a batch submission script (which will load the modulefile in the jobscript), for example:
#!/bin/bash --login #$ -cwd # Job will run from the current directory #$ -pe smp.pe 8 # Number of cores. Can be 2--32. # NO -V line - we load modulefiles in the jobscript # Choose the version you require module load apps/python/checkm/1.1.0 # $NSLOTS will be automatically set to the number of cores requested above checkm tool -t $NSLOTS arg1 arg2 ... # # See the checkm documentation for a list of its available tools
Further info
Updates
None.