Diamond

Overview

Diamond is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data. It will do pairwise alignment of proteins and translated DNA at 500x-20,000x speed of BLAST, Frameshift alignments for long read analysis and supports various output formats, including BLAST pairwise, tabular and XML, as well as taxonomic classification.

Version 0.9.26 (binary) is installed on the CSF.

Restrictions on use

There are no restrictions on accessing the software on the CSF. The software is released under the GNU GPL v3 license and all usage should adhere to that license.

Set up procedure

We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.

Load one of the following modulefiles:

module load apps/binapps/diamond/0.9.26

Running the application

Please do not run Diamond on the login node. Jobs should be submitted to the compute nodes via batch.

You may run the following command to obtain help test – diamond accepts a large number of flags:

diamond help

You can also read the manual using:

evince $DIAMONDDIR/diamond_manual.pdf

Serial batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
                    # NO -V line - we load modulefiles in the jobscript

module load apps/binapps/diamond/0.9.26

# In the commands below $NSLOTS is set to the number of cores to use.
# For serial jobscripts this is 1. You must inform diamond to use 1 core.

# To set up a reference database for diamond
diamond makedb --in nr.faa -d nr -p $NSLOTS

# The alignment task may then be initiated using the blastx command like this:
diamond blastx -d nr -q reads.fna -o matches.m8 -p $NSLOTS

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Parallel batch job submission

To convert the serial jobscript above in to a parallel jobscript, simply add the line:

#$ -pe smp.pe 8       # Number of cores (can be 2--32).

You should use the -p $NSLOTS flag on the diamond command-line as shown above.

Further info

Updates

None.

Last modified on June 4, 2020 at 4:32 pm by Ben Pietras