Ensembl Tools
Overview
Ensembl provides a Variant Effect Predictor script (written in perl) to determine the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Versions 79 and 80 of Ensembl Tools are installed on the iCSF.
Restrictions on use
There are no restrictions on access this software on the iCSF. The software is released under the Apache License and all users are required to read and adhere to the license terms.
Set up procedure
To access the software you must first load one of the following modulefile:
module load apps/binapps/ensembl/80/vep module load apps/binapps/ensembl/79/vep
This will automatically load the perl 5.20.2 modulefile.
Running the application
Please note that the VEP script will not be able to connect to an external MySQL database. Hence you should use the script in its offline mode.
You will need to download the cache files for offline use. Please see Ensembl Caches and Databases Download website for full details. A brief summary is given here:
First create a directory for the cache files. By default the VEP script assumes the cache files will be in your home directory in a hidden folder named .vep
. However due to the size of the cache files (approx 4GB each when compressed, more when uncompressed) it is advised you put them in an RDS area – e.g., if your group has a shared directory on the Isilon storage or if you have a shortcut in your home directory named data
pointing at additional Isilon storage for your group. In the example below we assume you have a shortcut named data
in your home directory. But you can specify any directory name using the flag as indicated below.
# Create a directory for your VEP cache files cd ~/data mkdir vep cd vep # Download from the VEP website. For example for version 80, a 4GB (compressed) file: wget ftp://ftp.ensembl.org/pub/release-80/variation/VEP/homo_sapiens_vep_80_GRCh37.tar.gz # Uncompress it (expands to 6.1GB) tar xzf homo_sapiens_vep_80_GRCh37.tar.gz # Optional - remove the original compressed archive rm homo_sapiens_vep_80_GRCh37.tar.gz
To run the VEP, ensure you have loaded the modulefile above then:
# Notice we use the ~/data/vep directory created earlier variant_effect_predictor.pl --cache --dir_cache ~/data/vep --offline -i input.vcf -o output.txt
An example input vcf
file is available by using the filename:
$ENSEMBL_VEP_DIR/example_GRCh38.vcf
This requires the ensembl modulefile to be loaded before use.
Further info
Updates
None.