Kaiju
Overview
Kaiju is a program for the taxonomic classification of high-throughput sequencing reads, e.g., Illumina or Roche/454, from whole-genome sequencing of metagenomic DNA. Reads are directly assigned to taxa using the NCBI taxonomy and a reference database of protein sequences from microbial and viral genomes.
Version 1.7.2 is installed on the CSF.
Restrictions on use
Details of the licence, restrictions, links to licence docs etc
There are no restrictions on accessing the software on the CSF. It is released under the GNU GPL v3 license and all usage must adhere to that license.
Please cite you use of this software using the reference:
- The program is described in Menzel, P. et al. (2016) Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7:11257 (open access).
Set up procedure
We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.
Load one of the following modulefiles:
module load apps/gcc/kaiju/1.7.2
Running the application
Please do not run Kaiju on the login node. Jobs should be submitted to the compute nodes via batch.
You may run the commands with the -h
flag to see a list of flags accepted by the program. For example:
kaiju -h kaiju-makedb -h
Note that kaiju is capable of downloading reference datasets and processing them to create the reference databases. This MUST be done as a batch job and you must inform the kaiju-makedb
app how many cores your jobscript has requested (see below). By default kaiju-makedb
uses 5 cores so if in doubt, submit a 5-core job. See the examples below for how to run the various executables correctly.
Serial batch job submission
Create a batch submission script (which will load the modulefile in the jobscript), for example:
#!/bin/bash --login #$ -cwd # Job will run from the current directory # NO -V line - we load modulefiles in the jobscript # Load the modulefile in the jobscript - use your required version module load apps/gcc/kaiju/1.7.2 # There are various commands that can be run kaiju -t nodes.dmp -f db.fmi -i reads.fastq -j reads2.fastq # When making a database (this is best done with a parallel job - see below) kaiju-makedb -t $NSLOTS -s source_db
Submit the jobscript using:
qsub scriptname
where scriptname is the name of your jobscript.
Parallel batch job submission
Create a batch submission script (which will load the modulefile in the jobscript), for example:
#!/bin/bash --login #$ -cwd # Job will run from the current directory #$ -pe smp.pe 8 # Number of cores, can be 2 -- 32 # Load the modulefile in the jobscript - use your required version module load apps/gcc/kaiju/1.7.2 # There are various commands that can be run. The flag to say how # many cores to use varies from app to app so you should look this up. # $NSLOTS is automatically set to the number of cores requested above. # Some examples: kaiju -z $NSLOTS -t nodes.dmp -f db.fmi -i reads.fastq ... kaijup -z $NSLOTS -f proteins.fmi -i reads.fastq kaijux -z $NSLOTS -f proteins.fmi -i reads.fastq ... kaiju-mergeOutputs -i in1.tsv -j in2.tsv ..... kaiju-mkfmi filename # Will read filename.bwt and filename.sa kaiju-mkbwt -n $NSLOTS args filename kaiju2krona -t nodes.dmp -n names.dmp -i kaiju.out -o kaiju2krona.out kaiju-convertNR -t nodes.dmp -g prot.accession2taxid -i nr # When making a database kaiju-makedb -t $NSLOTS -s source_db
Submit the jobscript using:
qsub scriptname
where scriptname is the name of your jobscript.
Further info
Updates
None.