The CSF2 has been replaced by the CSF3 - please use that system! This documentation may be out of date. Please read the CSF3 documentation instead. To display this old CSF2 page click here. |
fastStructure
Overview
fastStructure is a fast algorithm for inferring population structure from large SNP genotype data. It is based on a variational Bayesian framework for posterior inference and is written in Python2.x.
Version 1.0 is installed on the CSF.
Restrictions on use
There are no restrictions on accessing this software on the CSF. It is released under the MIT license and all use must fall within that license.
Set up procedure
To access the software you must first load the modulefile:
module load apps/gcc/python-packages/anaconda-2.5.0/faststructure/1.0
This will automatically load the anaconda python 2.5.0 modulefile which provides python 2.7.11.
The following python scripts are available for use:
chooseK.py distruct.py structure.py
Running the application
Please do not run fastStructure on the login node. Jobs should be submitted to the compute nodes via batch.
Serial batch job submission
Make sure you have the modulefile loaded then create a batch submission script, for example:
#!/bin/bash #$ -S /bin/bash #$ -cwd # Job will run from the current directory #$ -V # Job will inherit current environment settings structure.py args...
Submit the jobscript using:
qsub scriptname
where scriptname is the name of your jobscript.
Parallel batch job submission
The app is serial only. If you have multiple datasets to process you should submit them as a job array so that you can have many jobs running at the same time.
Further info
Updates
None.