RepeatMasker
Overview
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Version 4.0.9 (patch 2) is available on the CSF3.
Restrictions on use
This software is opensource.
Set up procedure
To access the software you must first load the modulefile:
module load apps/gcc/repeatmasker/4.0.9.p2
Running the application
Please do not run RepeatMasker on the login node. Jobs should be submitted to the compute nodes via batch.
Parallel batch job submission
Make sure you have the modulefile loaded then create a batch submission script, for example:
#!/bin/bash #$ -S /bin/bash #$ -V #$ -cwd #$ -pe smp.pe 4 RepeatMasker -pa $[NSLOTS / 2] sample.fas
Important notes:
- By default, RepeatMasker will start 2 threads for every CPU in the node resulting in badly overloaded nodes which will cause your jobs and those of others running on the same node to run inefficiently. Users must prevent this behavior by setting -pa N where N is equal to the number of allocated CPUs divided by 2.
- In -pa N, “N” must always be greater than or equal to 2; -pa 1 causes RepeatMasker to run as though the -pa option was not specified and to start 2 threads for each CPU.
- In the above example we use
$[NSLOTS / 2]
which automatically take the number specified on the smp.pe line and divides it by two.
Submit the jobscript using:
qsub scriptname
where scriptname is the name of your jobscript.