Mothur
Overview
Mothur mothur is an open source software package for bioinformatics data processing. The package is frequently used in the analysis of DNA from uncultured microbes. mothur is capable of processing data generated from several DNA sequencing methods including 454 pyrosequencing, Illumina HiSeq and MiSeq, Sanger, PacBio, and IonTorrent
Restrictions on use
There are no restrictions on accessing this software.
Set up procedure
To access the software you must first load the modulefile:
module load apps/binapps/mothur/1.42.3
Running the application
Please do not run Mothur on the login node. Jobs should be run interactively on the backend nodes (via qrsh
) or submitted to the compute nodes via batch.
Batch usage
The majority of jobs should be run in batch.
Various mothur commands can take advantage of multiple processors. To specify multiple processors, add the processors option to a command. Once a mothur command has had the processors option set it is persistent therefore subsequent multi-processor commands will inherit the original processor option. This applies to both single node (smp) and multi-node (mpi) multi core jobs.
In batch mode you can supply a file with a list of mothur commands that you want mothur to run. For example the batch file could look like:
#Create mothur input file using txt editor of your choice $ vim stability.batch #Enter the following (example usage) pcr.seqs(fasta=silva.bacteria.fasta, start=11894, end=25319, keepdots=F) system(mv silva.bacteria.pcr.fasta silva.v4.fasta) #change the name of the file from stability.files to whatever suits your study make.contigs(file=stability.files, processors=8) screen.seqs(fasta=current, group=current, maxambig=0, maxlength=275) unique.seqs() count.seqs(name=current, group=current) align.seqs(fasta=current, reference=silva.v4.fasta) screen.seqs(fasta=current, count=current, start=1968, end=11550, maxhomop=8) filter.seqs(fasta=current, vertical=T, trump=.) unique.seqs(fasta=current, count=current) pre.cluster(fasta=current, count=current, diffs=2) chimera.uchime(fasta=current, count=current, dereplicate=t) remove.seqs(fasta=current, accnos=current) classify.seqs(fasta=current, count=current, reference=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80) remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota) remove.groups(count=current, fasta=current, taxonomy=current, groups=Mock) cluster.split(fasta=current, count=current, taxonomy=current, splitmethod=classify, taxlevel=4, cutoff=0.15) make.shared(list=current, count=current, label=0.03) classify.otu(list=current, count=current, taxonomy=current, label=0.03) phylotype(taxonomy=current) make.shared(list=current, count=current, label=1) classify.otu(list=current, count=current, taxonomy=current, label=1)
Submit this job using a suitable job script. *** Please Note*** In the above example the mothur input file (stability.batch) file requires 8 processors therefore job script must also request at least 8 processors:
#!/bin/bash --login #$ -cwd #$ -pe smp.pe 8 # Load any required modulefiles module load apps/binapps/mothur/1.42.3 mothur stability.batch
Submit the jobscript using
qsub jobscript
where jobscript
is the name of your jobscript file (not your mothur input file!)
Interactive usage
Interactive jobs should be primarily used for debugging purposes.
To schedule a multi-core interactive job run the following command on the login node:
qrsh -l short -pe smp.pe 2 # # Number of required cores
You will be logged in to a compute node and will have access to 2 CPU cores (in this example) and 4GB per core. You should ensure you run use only the number of cores reserved for you.
# Once logged onto a node you run mothur interactively $ module load apps/binapps/mothur/1.42.3 $ mothur Linux version Using ReadLine,Boost,HDF5 mothur v.1.42.3 Last updated: 8/20/19 by Patrick D. Schloss Department of Microbiology & Immunology University of Michigan http://www.mothur.org When using, please cite: Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41. Distributed under the GNU General Public License Type 'help()' for information on the commands that are available For questions and analysis support, please visit our forum at https://forum.mothur.org Type 'quit()' to exit program [NOTE]: Setting random seed to 19760620. Interactive Mode mothur > make.contigs(file=stability.files, processors=2) ... omitted for clarity ... mothur > summary.seqs(fasta=stability.trim.contigs.fasta) Using 2 processors. Start End NBases Ambigs Polymer NumSeqs Minimum: 1 248 248 0 3 1 2.5%-tile: 1 252 252 0 3 3810 25%-tile: 1 252 252 0 4 38091 Median: 1 252 252 0 4 76181 75%-tile: 1 253 253 0 5 114271 97.5%-tile: 1 253 253 6 6 148552 Maximum: 1 502 502 249 243 152360 Mean: 1 252 252 0 4 # of Seqs: 152360 It took 2 secs to summarize 152360 sequences. Output File Names: stability.trim.contigs.summary mothur > quit()