Ensembl Tools

Overview

Ensembl provides a Variant Effect Predictor script (written in perl) to determine the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Versions 79 and 80 of Ensembl Tools are installed on the iCSF.

Restrictions on use

There are no restrictions on access this software on the iCSF. The software is released under the Apache License and all users are required to read and adhere to the license terms.

Set up procedure

To access the software you must first load one of the following modulefile:

module load apps/binapps/ensembl/80/vep
module load apps/binapps/ensembl/79/vep

This will automatically load the perl 5.20.2 modulefile.

Running the application

Please note that the VEP script will not be able to connect to an external MySQL database. Hence you should use the script in its offline mode.

You will need to download the cache files for offline use. Please see Ensembl Caches and Databases Download website for full details. A brief summary is given here:

First create a directory for the cache files. By default the VEP script assumes the cache files will be in your home directory in a hidden folder named .vep. However due to the size of the cache files (approx 4GB each when compressed, more when uncompressed) it is advised you put them in an RDS area – e.g., if your group has a shared directory on the Isilon storage or if you have a shortcut in your home directory named data pointing at additional Isilon storage for your group. In the example below we assume you have a shortcut named data in your home directory. But you can specify any directory name using the flag as indicated below.

# Create a directory for your VEP cache files
cd ~/data
mkdir vep
cd vep

# Download from the VEP website. For example for version 80, a 4GB (compressed) file:
wget ftp://ftp.ensembl.org/pub/release-80/variation/VEP/homo_sapiens_vep_80_GRCh37.tar.gz

# Uncompress it (expands to 6.1GB)
tar xzf homo_sapiens_vep_80_GRCh37.tar.gz

# Optional - remove the original compressed archive
rm homo_sapiens_vep_80_GRCh37.tar.gz

To run the VEP, ensure you have loaded the modulefile above then:

# Notice we use the ~/data/vep directory created earlier
variant_effect_predictor.pl --cache --dir_cache ~/data/vep --offline  -i input.vcf -o output.txt

An example input vcf file is available by using the filename:

$ENSEMBL_VEP_DIR/example_GRCh38.vcf

This requires the ensembl modulefile to be loaded before use.

Further info

Updates

None.

Last modified on June 2, 2015 at 5:42 pm by Site Admin