Kaiju

Overview

Kaiju is a program for the taxonomic classification of high-throughput sequencing reads, e.g., Illumina or Roche/454, from whole-genome sequencing of metagenomic DNA. Reads are directly assigned to taxa using the NCBI taxonomy and a reference database of protein sequences from microbial and viral genomes.

Version 1.7.2 is installed on the CSF.

Restrictions on use

Details of the licence, restrictions, links to licence docs etc
There are no restrictions on accessing the software on the CSF. It is released under the GNU GPL v3 license and all usage must adhere to that license.

Please cite you use of this software using the reference:

Set up procedure

We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.

Load one of the following modulefiles:

module load apps/gcc/kaiju/1.7.2

Running the application

Please do not run Kaiju on the login node. Jobs should be submitted to the compute nodes via batch.

You may run the commands with the -h flag to see a list of flags accepted by the program. For example:

kaiju -h
kaiju-makedb -h

Note that kaiju is capable of downloading reference datasets and processing them to create the reference databases. This MUST be done as a batch job and you must inform the kaiju-makedb app how many cores your jobscript has requested (see below). By default kaiju-makedb uses 5 cores so if in doubt, submit a 5-core job. See the examples below for how to run the various executables correctly.

Serial batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
                    # NO -V line - we load modulefiles in the jobscript

# Load the modulefile in the jobscript - use your required version
module load apps/gcc/kaiju/1.7.2

# There are various commands that can be run
kaiju -t nodes.dmp -f db.fmi -i reads.fastq -j reads2.fastq

# When making a database (this is best done with a parallel job - see below)
kaiju-makedb -t $NSLOTS -s source_db

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Parallel batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
#$ -pe smp.pe 8     # Number of cores, can be 2 -- 32

# Load the modulefile in the jobscript - use your required version
module load apps/gcc/kaiju/1.7.2

# There are various commands that can be run. The flag to say how
# many cores to use varies from app to app so you should look this up.
# $NSLOTS is automatically set to the number of cores requested above.

# Some examples:
kaiju -z $NSLOTS -t nodes.dmp -f db.fmi -i reads.fastq ...
kaijup -z $NSLOTS -f proteins.fmi -i reads.fastq
kaijux -z $NSLOTS -f proteins.fmi -i reads.fastq ...
kaiju-mergeOutputs -i in1.tsv -j in2.tsv .....
kaiju-mkfmi filename     # Will read filename.bwt and filename.sa
kaiju-mkbwt -n $NSLOTS args filename
kaiju2krona -t nodes.dmp -n names.dmp -i kaiju.out -o kaiju2krona.out
kaiju-convertNR -t nodes.dmp -g prot.accession2taxid -i nr

# When making a database
kaiju-makedb -t $NSLOTS -s source_db

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Further info

Updates

None.

Last modified on July 26, 2019 at 4:28 pm by George Leaver