tabix

Overview

Tabix is a generic indexer for TAB-delimited genome position files. Also provided is bgzip which is a block compression/decompression utility. Closely associated with SAMtools.

Version 0.2.6 is installed on the iCSF. It was compiled with gcc 4.7.0.

Version 1.3.1 is available through SAMtools

Restrictions on use

The software is open source. All iCSF users may access it.

Set up procedure

To access the software you must first load the modulefile:

module load apps/gcc/tabix/0.2.6

Version 1.3.1 is available through SAMtools

Running the application

It is possible to use tabix to download vcf files, or, more usefully, portions of large vcf files from remote repositories. You should use http:// when specifying the remote repository address, not ftp:// so that tabix can download through the University proxy server (Incline does not have direct access to the outside world – off-campus servers). For example:

tabix -fh http://ftp.1000genomes.ebi.ac.uk/somepath/blah.genotypes.vcf.gz \
        1:14720000-15600000 | gzip -c > data.vcf.gz

Notice that in the above command we pipe the downloaded data (which is plain text data) through the gzip command to produce a compressed file named data.vcf.gz. Text data usually compresses well and so this method will save you a lot of disk space.

To view the contents of the compressed file (without needing to first incompress it) use the following command to page through it:

zless data.vcf.gz

Further info

Updates

None.

Last modified on August 1, 2016 at 2:16 pm by Site Admin