MEGAHIT

Overview

MEGAHIT is an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph.

Version 1.1.4 is installed on the CSF.

Restrictions on use

There are no restrictions on accessing the software on the CSF. It is released under the GPU GPLv3 license and all usage must adhere to this license.

Please cite your usage of MEGAHIT using the citation instructions.

Set up procedure

We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.

Load the following modulefile:

module load apps/gcc/megahit/1.1.4          # Provides CPU and GPU megahit tools

Running the application

Please do not run megahit on the login node. Jobs should be submitted to the compute nodes via batch.

For a complete list of megahit options, run:

megahit -h

Serial batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
                    # NO -V line - we load modulefiles in the jobscript

# We recommend loading the modulefile inside the jobscript
module load apps/gcc/megahit/1.1.4

# $NSLOTS is automatically set to the number of cores (1 for a serial job)
megahit -t $NSLOTS args... -o out_dir
                                 #
                                 # If an output directory name is not given megahit will
                                 # use 'megahit_out'. You should use a unique name if
                                 # running more than one job in the same directory and you
                                 # can use $JOB_ID to use the current job's unique number.
                                 # For example: my_output_dir.$JOB_ID 

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Parallel batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
#$ -pe smp.pe 16    # Number of cores (can be 2--32)

# We recommend loading the modulefile inside the jobscript
module load apps/gcc/megahit/1.1.4

# $NSLOTS is automatically set to the number of cores requested above
megahit -t $NSLOTS args... -o out_dir
                                 #
                                 # If an output directory name is not given megahit will
                                 # use 'megahit_out'. You should use a unique name if
                                 # running more than one job in the same directory and you
                                 # can use $JOB_ID to use the current job's unique number.
                                 # For example: my_output_dir.$JOB_ID 

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

GPU batch job submission

You need to request being added to the relevant group to access GPUs before you can run megahit on them.

Only the megahit_sdbg_build_gpu tool supports GPU usage. But you use the megahit command to control the running of this tool: Create a jobscript similar to:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
#$ -pe smp.pe 2     # Number of cores (can be 2--32). You must request at least
                    # 2 cores when running a GPU job (megahit will use at least 2 cores)
#$ -l v100          # Request an NVidia v100 GPU (only works if you have been permitted to use GPUs)

# We recommend loading the modulefile inside the jobscript (it will also load the CUDA modulefile)
module load apps/gcc/megahit/1.1.4

# $NSLOTS is automatically set to the number of cores requested above
megahit -t $NSLOTS args... -o out_dir --use-gpu
                                 #
                                 # If an output directory name is not given megahit will
                                 # use 'megahit_out'. You should use a unique name if
                                 # running more than one job in the same directory and you
                                 # can use $JOB_ID to use the current job's unique number.
                                 # For example: my_output_dir.$JOB_ID 

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Further info

Updates

None.

Last modified on April 8, 2019 at 4:08 pm by George Leaver