The CSF2 has been replaced by the CSF3 - please use that system! This documentation may be out of date. Please read the CSF3 documentation instead. To display this old CSF2 page click here. |
Shellfish
Overview
Shellfish carries out a variety of tasks related to principal component analysis of genome-wide SNP data.
Version 1.3 is available.
Restrictions on use
The software is accessible to all users.
Set up procedure
Please load the shellfish modulefile: module load apps/gcc/shellfish/1.3
Note: This module loads an additional module for python 2.7.1 so unloading it will remove python 2.7.1 from your environment.
Running the application
Shellfish must be used in batch.
Serial batch job submission
- Make sure you have loaded the module.
- Write a submission script, for example:
#!/bin/bash ## Inherit user environment from the login node #$ -V ## Use the current directory as the working directory for SGE output and determining paths to files #$ -cwd ## Create a directory called 'out' for the results and cd to it. mkdir out && out shellfish.py --ignore-sge --pca --numpcs 2 --maxprocs $NSLOTS --file ../input/data shellfish.py --ignore-sge --snpload --numpcs 2 --maxprocs $NSLOTS --evecs shellfish.evecs --file ../input/data
- Submit with:
qsub scriptname
Parallel batch job submission
- Make sure you have loaded the module.
- shellfish is not mpi based, therefore you should use
smp.pe
to ensure all processes run on just one compute node. - Write a submission script, for example:
#!/bin/bash ## Inherit user environment from the login node #$ -V ## Use the current directory as the working directory for SGE output and determining paths to files #$ -cwd ## Request parallel environment and a number cores #$ -pe smp.pe 12 ## Create a directory called 'out' for the results and cd to it. mkdir out && out shellfish.py --ignore-sge --pca --numpcs 2 --maxprocs $NSLOTS --file ../input/data shellfish.py --ignore-sge --snpload --numpcs 2 --maxprocs $NSLOTS --evecs shellfish.evecs --file ../input/data
- Submit with:
qsub scriptname
Additional notes about the example scripts
--ignore-sge
– perhaps unintuitively, this option is required on the CSF otherwise your job will fail or will be allocated resource, but do nothing until it reaches the time limit, at which point it will be automatically killed.- The above script assumes you have a directory called
input
containing the genotype files. - The variable
NSLOTS
is an SGE variable which ensuresmaxprocs
matches the number of cores requested. For a serial job you do not need to specify a pe – SGE assumes that without it you want one core and schedules the job accordingly.
Further info
- From the command line:
shellfish.py --help
gives information about the options available. - Shellfish website