RepeatMasker

Overview

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.

Version 4.0.9 (patch 2) is available on the CSF3.

Restrictions on use

This software is opensource.

Set up procedure

To access the software you must first load the modulefile:

module load apps/gcc/repeatmasker/4.0.9.p2

Running the application

Please do not run RepeatMasker on the login node. Jobs should be submitted to the compute nodes via batch.

Parallel batch job submission

Make sure you have the modulefile loaded then create a batch submission script, for example:

#!/bin/bash
#$ -S /bin/bash
#$ -V
#$ -cwd
#$ -pe smp.pe 4

RepeatMasker -pa $[NSLOTS / 2] sample.fas

Important notes:

  • By default, RepeatMasker will start 2 threads for every CPU in the node resulting in badly overloaded nodes which will cause your jobs and those of others running on the same node to run inefficiently. Users must prevent this behavior by setting -pa N where N is equal to the number of allocated CPUs divided by 2.
  • In -pa N, “N” must always be greater than or equal to 2; -pa 1 causes RepeatMasker to run as though the -pa option was not specified and to start 2 threads for each CPU.
  • In the above example we use $[NSLOTS / 2] which automatically take the number specified on the smp.pe line and divides it by two.

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Further info

repeatmasker.org

Last modified on September 26, 2019 at 3:11 pm by Chris Grave