TEtranscripts

Overview

TEtranscripts is a software package that utilizes both unambiguously (uniquely) and ambiguously (multi-) mapped reads to perform differential enrichment analyses from high throughput sequencing experiments.

Version 2.0.5 is installed on the CSF.

Restrictions on use

There are no restrictions on accessing this software on the CSF. It is released under the GPL v3 license and all usage must adhere to that license.

Please cite your use of this software using:

Jin Y., Tam O.H., Paniagua E. and Hammell M. (2015). TEtranscripts: A package for including transposable elements in differential expression analysis of RNA-seq datasets. Bioinformatics 31: 3593-3599. Pubmed ID: 26206304

Set up procedure

We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.

Load one of the following modulefiles:

module load apps/python/tetranscripts/2.0.5           # Also loads modulefiles for
                                                      # Pysam 0.15.2 and R 3.6.1 (inc DESeq2)

Reference and Test Data

The modulefile will set the following environment variables which can be used in your jobscripts to access the reference / pre-built data and test data files. NOTE: all downloaded datafiles have been uncompressed.

$TE_DATA       set to /mnt/data-sets/tetranscripts/
$TE_TESTDATA   set to /mnt/data-sets/tetranscripts/test_data
$TE_PREBUILT   set to /mnt/data-sets/tetranscripts/Prebuilt_indices/
$TE_GTF        set to /mnt/data-sets/tetranscripts/TE_GTF/

To see what prebuilt and GTF data-files area available, run the following on the login node after loading the modulefile there.

ls $TE_PREBUILT
ls $TE_GTF
ls $TE_TESTDATA

You can then use the filenames when specifying commmand-line flags for TEtranscript. For example:

TEtranscripts ... --TE $TE_GTF/ce10_rmsk_TE.gtf

Running the application

Please do not run TEtranscripts or TEcount on the login node. Jobs should be submitted to the compute nodes via batch.

Serial batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
#$ -l mem512        # It is recommended to run on the high-memory nodes.
                    # This will give you 32GB memory to work with (1 core).
                    # NO -V line - we load modulefiles in the jobscript

module load apps/python/tetranscripts/2.0.5

TEtranscripts -t RNAseq1.bam RNAseq2.bam -c CtlRNAseq1.bam CtlRNAseq.bam \
              --GTF genic-GTF-file --TE TE-GTF-file
                      #
                      # See above for environment variables that provide
                      # the location of the downloaded reference data

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Further info

Updates

None.

Last modified on December 10, 2019 at 2:24 pm by George Leaver