Canu
Overview
Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION).
The application is written in C++ and can be used in batch or interactively (via the qrsh command).
Version 2.0 and 1.8 are installed on the CSF.
Restrictions on use
There are no restrictions on using this software on the CSF. The application is released under the GPLv2 license.
Set up procedure
To access the software you must load ONE of the modulefiles:
module load apps/binapps/canu/2.1.1 module load apps/binapps/canu/2.0 module load apps/canu/1.8
We recommend that you load the modulefile in your jobscript – see examples below.
Running the application
Please do not run canu on the login node. Jobs should be submitted to the compute nodes via batch or run interactively via the qrsh command. Note that canu has the ability to submit batch jobs on your behalf. However this does not work correctly on the CSF (because canu will try to submit jobs from other jobs which is not supported on the CSF). So you must run it from a jobscript and submit the job to the batch system in the usual way.
Serial batch job submission
Example jobscript
#!/bin/bash --login #$ -cwd # Job will run from the current directory # Load the software module load apps/binapps/canu/2.0 # $NSLOTS is automatically set to 1 in a serial job # You MUST set the maxThreads, mhapThreads and maxMemory options # to tell canu how much resource it can use. We assume 4GB per core. # The useGrid=false option is needed to run from a batch job. canu -d run1 -p godzilla genomeSize=1g maxThreads=$NSLOTS mhapThreads=$NSLOTS maxMemory=$((4*NSLOTS)) \ useGrid=false -nanopore-raw reads/*.fasta.gz
Submit the jobscript using:
qsub scriptname
where scriptname is the name of your jobscript.
Parallel batch job submission
Example jobscript
#!/bin/bash --login #$ -cwd # Job will run from the current directory #$ -pe smp.pe 8 # Number of cores (2-32 permitted) # Load the software module load apps/binapps/canu/2.0 # $NSLOTS is automatically set to the number given on the -pe line above. # You MUST set the maxThreads, mhapThreads and maxMemory options # to tell canu how much resource it can use. We assume 4GB per core. # The useGrid=false option is needed to run from a batch job. canu -d run1 -p godzilla genomeSize=1g maxThreads=$NSLOTS mhapThreads=$NSLOTS maxMemory=$((4*NSLOTS)) \ useGrid=false -nanopore-raw reads/*.fasta.gz # # # # Ensure 4GB/core # # is used # # If your pipeline uses the MHAP algorithm # this ensures Java will use the correct # number of cores
Submit the jobscript using:
qsub scriptname
where scriptname is the name of your jobscript.
Further info
- Documentation is available on the canu website.
Updates
None.