BLAST

Overview

The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

Two pre-compiled binary versions are available on the CSF:

  • BLAST+ – a new suite of BLAST tools that utilizes the NCBI C++ Toolkit. The BLAST+ applications have a number of performance and feature improvements over the legacy BLAST applications. Versions 2.13.0 and 2.4.0 are available.
  • BLAST (legacy) 2.2.26 – blastall, blastpgp, etc

For further information about the differences:

Restrictions on use

The software is public domain and available to all users. For further info load the BLAST+ modulefile and consult $BLASTP_HOME/LICENSE

Set up procedure

For BLAST+ load the following modulefiles:

module load apps/binapps/blast/2.13.0            # Load apps/gcc/perl/5.34.0 for you
module load apps/binapps/blast/2.9.0
module load apps/binapps/blast/2.4.0

# You should also load the following if downloading data, e.g., using update_blastdb.pl
module load tools/env/proxy

and for BLAST (legacy):

module load apps/binapps/blast/legacy/2.2.26

If you have your own set of databases you will need a .ncbirc file in your home directory which tells BLAST where to find them. For example:

[BLAST]
BLASTDB=$HOME/genome_analysis/Blast/data
               #
               # The the correct path for where ever you store your own databases

NCBI BLAST Databases

We have downloaded the following FASTA database files from the NCBI BLAST repository:

# Download using:
# wget -N --continue --directory-prefix FASTA ftp://ftp.ncbi.nlm.nih.gov/blast/db/v5/FASTA/nr.gz
# Then gunzip'd after checking the md5sum of the download.
FASTA/nr
FASTA/nt
FASTA/pdbaa
FASTA/swissport

We have downloaded the following BLAST pre-formatted databases from the NCBI BLAST repository:

# Downloaded using:
# update_blastdb.pl --source aws --decompress --num_threads $NSLOTS DBNAME
nr    # See nr.NN.*
nt    # See nt.NN.*

The 2.13.0 modulefile (see above) will set the following env var to point to the location of the files:

NCBI_BLAST_DIR=/mnt/data-sets/ncbi/blast

The intention is to save users some RDS space by having a central copy of some of the databases, where we have received more than one request for the same DB.

Running the application

Please do not run BLAST on the login node. All work must be submitted to batch.

Serial batch job submission

Make sure you have the relevant modulefile loaded then create a batch submission script, for example:

#!/bin/bash  --login
#$ -cwd
module load apps/binapps/blast/2.13.0

blastx -db efungi_pep_no_afua -query auto_351/contigs.fa -num_threads $NSLOTS -out 351un.out \
       -evalue 1e-20 -max_target_seqs 1 -outfmt 6

Submit the job using:

qsub script

where script is the name of your jobscript file.

Parallel batch job submission

Make sure you have the relevant modulefile loaded then create a batch submission script, for example:

#!/bin/bash --login
#$ -cwd
#$ -pe smp.pe 4 
module load apps/binapps/blast/2.13.0

blastx -db efungi_pep_no_afua -query auto_351/contigs.fa -num_threads $NSLOTS -out 351un.out \
       -evalue 1e-20 -max_target_seqs 1 -outfmt 6

Submit the job using:

qsub script

where script is the name of your jobscript file.

Further info

Last modified on August 3, 2022 at 5:28 pm by George Leaver