Kaiju

Overview

Kaiju is a program for the taxonomic classification of high-throughput sequencing reads, e.g., Illumina or Roche/454, from whole-genome sequencing of metagenomic DNA. Reads are directly assigned to taxa using the NCBI taxonomy and a reference database of protein sequences from microbial and viral genomes.

Version 1.7.2 is installed on the CSF.

Restrictions on use

Details of the licence, restrictions, links to licence docs etc
There are no restrictions on accessing the software on the CSF. It is released under the GNU GPL v3 license and all usage must adhere to that license.

Please cite you use of this software using the reference:

The program is described in Menzel, P. et al. (2016) Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7:11257 (open access).

Set up procedure

We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.

Load one of the following modulefiles:

module load apps/gcc/kaiju/1.7.2

Running the application

Please do not run Kaiju on the login node. Jobs should be submitted to the compute nodes via batch.

You may run the commands with the -h flag to see a list of flags accepted by the program. For example:

kaiju -h
kaiju-makedb -h

Note that kaiju is capable of downloading reference datasets and processing them to create the reference databases. This MUST be done as a batch job and you must inform the kaiju-makedb app how many cores your jobscript has requested (see below). By default kaiju-makedb uses 5 cores so if in doubt, submit a 5-core job. See the examples below for how to run the various executables correctly.

Serial batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
                    # NO -V line - we load modulefiles in the jobscript

# Load the modulefile in the jobscript - use your required version
module load apps/gcc/kaiju/1.7.2

# There are various commands that can be run
kaiju -t nodes.dmp -f db.fmi -i reads.fastq -j reads2.fastq

# When making a database (this is best done with a parallel job - see below)
kaiju-makedb -t $NSLOTS -s source_db

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Parallel batch job submission