BLAST
Overview
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
Two pre-compiled binary versions are available on the CSF:
- BLAST+ – a new suite of BLAST tools that utilizes the NCBI C++ Toolkit. The BLAST+ applications have a number of performance and feature improvements over the legacy BLAST applications. Versions 2.13.0 and 2.4.0 are available.
- BLAST (legacy) 2.2.26 – blastall, blastpgp, etc
For further information about the differences:
Restrictions on use
The software is public domain and available to all users. For further info load the BLAST+ modulefile and consult $BLASTP_HOME/LICENSE
Set up procedure
For BLAST+ load the following modulefiles:
module load apps/binapps/blast/2.13.0 # Load apps/gcc/perl/5.34.0 for you module load apps/binapps/blast/2.9.0 module load apps/binapps/blast/2.4.0 # You should also load the following if downloading data, e.g., using update_blastdb.pl module load tools/env/proxy
and for BLAST (legacy):
module load apps/binapps/blast/legacy/2.2.26
If you have your own set of databases you will need a .ncbirc
file in your home directory which tells BLAST where to find them. For example:
[BLAST] BLASTDB=$HOME/genome_analysis/Blast/data # # The the correct path for where ever you store your own databases
NCBI BLAST Databases
We have downloaded the following FASTA database files from the NCBI BLAST repository:
# Download using: # wget -N --continue --directory-prefix FASTA ftp://ftp.ncbi.nlm.nih.gov/blast/db/v5/FASTA/nr.gz # Then gunzip'd after checking the md5sum of the download. FASTA/nr FASTA/nt FASTA/pdbaa FASTA/swissport
We have downloaded the following BLAST pre-formatted databases from the NCBI BLAST repository:
# Downloaded using: # update_blastdb.pl --source aws --decompress --num_threads $NSLOTS DBNAME nr # See nr.NN.* nt # See nt.NN.*
The 2.13.0
modulefile (see above) will set the following env var to point to the location of the files:
NCBI_BLAST_DIR=/mnt/data-sets/ncbi/blast
The intention is to save users some RDS space by having a central copy of some of the databases, where we have received more than one request for the same DB.
Running the application
Please do not run BLAST on the login node. All work must be submitted to batch.
Serial batch job submission
Make sure you have the relevant modulefile loaded then create a batch submission script, for example:
#!/bin/bash --login #$ -cwd module load apps/binapps/blast/2.13.0 blastx -db efungi_pep_no_afua -query auto_351/contigs.fa -num_threads $NSLOTS -out 351un.out \ -evalue 1e-20 -max_target_seqs 1 -outfmt 6
Submit the job using:
qsub script
where script
is the name of your jobscript file.
Parallel batch job submission
Make sure you have the relevant modulefile loaded then create a batch submission script, for example:
#!/bin/bash --login #$ -cwd #$ -pe smp.pe 4 module load apps/binapps/blast/2.13.0 blastx -db efungi_pep_no_afua -query auto_351/contigs.fa -num_threads $NSLOTS -out 351un.out \ -evalue 1e-20 -max_target_seqs 1 -outfmt 6
Submit the job using:
qsub script
where script
is the name of your jobscript file.