CheckM

Overview

CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes. It provides robust estimates of genome completeness and contamination by using collocated sets of genes that are ubiquitous and single-copy within a phylogenetic lineage.

Version 1.2.2 is installed on the CSF.
Version 1.1.0 is installed on the CSF.

Restrictions on use

There are no restrictions on accessing this software on the CSF. It is licensed using the GNU General Public License version 3 and all usage must adhere to that license.

Please cite your usage of this software using:

Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2014. Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25: 1043-1055.

Set up procedure

We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.

Load one of the following modulefiles:

module load apps/python/checkm/1.2.2
module load apps/python/checkm/1.1.0

# You will also need to load modulefiles for hmm (search and fetch), prodigal and pplacer, as follows:
module load apps/gcc/hmmer/3.2.1
module load apps/gcc/prodigal/2.6.3
module load apps/binapps/pplacer/1.1.alpha19

Standard Data (dataRoot)

The checkm standard database has been downloaded and the dataRoot variable has been configured. The data is installed centrally. You do not need to do anything to configure checkm to use this data. For reference, the data was downloaded from https://data.ace.uq.edu.au/public/CheckM_databases/.

Running the application

Please do not run checkm on the login node. It is very memory hungry. Jobs should be submitted to the compute nodes via batch.

Please see the checkm online documentation for an example of typical usage.

You may run checkm without any args / flags on the login node to get the help text showing the names of the checkm tools that can be run. For example:

checkm
                ...::: CheckM v1.1.0 :::...

  Lineage-specific marker set:
    tree         -> Place bins in the reference genome tree
    tree_qa      -> Assess phylogenetic markers found in each bin
    lineage_set  -> Infer lineage-specific marker sets for each bin

  Taxonomic-specific marker set:
...

Serial batch job submission

PLEASE NOTE: due to the high memory requirements of checkm, running serial jobs is unlikely to give you enough memory. If you are on a compute-node with <40GB of memory per-core, the

--reduced_tree

flag can be added to the checkm command to reduced the memory requirements to approximately 14GB. Most CSF3 compute nodes offer only 4–5GB per core! The high memory nodes can offer 16GB, 32GB, 46GB, and 50GB per core. But given that checkm can use multiple cores to speed up computation, using a multi-core job is recommended as it will also give you access to more memory.

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
                    # NO -V line - we load modulefiles in the jobscript

# Choose the version you require
module load apps/python/checkm/1.1.0

# $NSLOTS will be automatically set to 1 (one core) in a serial job
checkm tool -t $NSLOTS arg1 arg2 ...
        #
        # See the checkm documentation for a list of its available tools

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Parallel batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
#$ -pe smp.pe 8     # Number of cores. Can be 2--32.
                    # NO -V line - we load modulefiles in the jobscript

# Choose the version you require
module load apps/python/checkm/1.1.0

# $NSLOTS will be automatically set to the number of cores requested above
checkm tool -t $NSLOTS arg1 arg2 ...
        #
        # See the checkm documentation for a list of its available tools

Further info

Updates

None.

Last modified on November 7, 2022 at 12:07 pm by Chris Grave