Canu

Overview

Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION).

The application is written in C++ and can be used in batch or interactively (via the qrsh command).

Version 2.0 and 1.8 are installed on the CSF.

Restrictions on use

There are no restrictions on using this software on the CSF. The application is released under the GPLv2 license.

Set up procedure

To access the software you must load ONE of the modulefiles:

module load apps/binapps/canu/2.1.1
module load apps/binapps/canu/2.0
module load apps/canu/1.8

We recommend that you load the modulefile in your jobscript – see examples below.

Running the application

Please do not run canu on the login node. Jobs should be submitted to the compute nodes via batch or run interactively via the qrsh command. Note that canu has the ability to submit batch jobs on your behalf. However this does not work correctly on the CSF (because canu will try to submit jobs from other jobs which is not supported on the CSF). So you must run it from a jobscript and submit the job to the batch system in the usual way.

Serial batch job submission

Example jobscript

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory

# Load the software

module load apps/binapps/canu/2.0

# $NSLOTS is automatically set to 1 in a serial job
# You MUST set the maxThreads, mhapThreads and maxMemory options
# to tell canu how much resource it can use. We assume 4GB per core.
# The useGrid=false option is needed to run from a batch job.

canu -d run1 -p godzilla genomeSize=1g maxThreads=$NSLOTS mhapThreads=$NSLOTS maxMemory=$((4*NSLOTS)) \
    useGrid=false -nanopore-raw reads/*.fasta.gz

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Parallel batch job submission

Example jobscript

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
#$ -pe smp.pe 8     # Number of cores (2-32 permitted)

# Load the software

module load apps/binapps/canu/2.0

# $NSLOTS is automatically set to the number given on the -pe line above.
# You MUST set the maxThreads, mhapThreads and maxMemory options
# to tell canu how much resource it can use. We assume 4GB per core.
# The useGrid=false option is needed to run from a batch job.

canu -d run1 -p godzilla genomeSize=1g maxThreads=$NSLOTS mhapThreads=$NSLOTS maxMemory=$((4*NSLOTS)) \        
   useGrid=false -nanopore-raw reads/*.fasta.gz            #                    #
                                                           #                    # Ensure 4GB/core 
                                                           #                    # is used    
                                                           #
                                                           # If your pipeline uses the MHAP algorithm
                                                           # this ensures Java will use the correct 
                                                           # number of cores

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Further info

  • Documentation is available on the canu website.

Updates

None.

Last modified on April 7, 2021 at 11:55 am by George Leaver