Medaka

Overview

Medaka is an open-source application from Oxford Nanopore. It is a tool to create a consensus sequence from nanopore sequencing data. This task is performed using neural networks applied from a pileup of individual sequencing reads against a draft assembly.

Versions 0.9.2 for CPU and GPU are installed on the CSF.

Restrictions on use

There are no restrictions on accessing the software on the CSF. It is released under the Mozilla Public License 2.0 and all usage must adhere to that license.

Set up procedure

We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.

Load one of the following modulefiles:

# Load one of the following modulefiles
module load apps/python/medaka/0.9.2           # CPU only
module load apps/python/medaka/0.9.2-gpu       # GPU (uses GPU-enable tensorflow)

The above modulefiles will load any required dependency modulefiles. This includes the modulefiles for tabix, samtools and minimap2 (in the CPU version. The GPU version has been compiled from source and so these tools are compiled as part of the
installation process.) The CUDA modulefiles will be loaded for the GPU version.

Running the application

Please do not run Medaka on the login node. Jobs should be submitted to the compute nodes via batch.

Serial batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
                    # NO -V line - we load modulefiles in the jobscript

# Load the required CPU version
module load apps/python/medaka/0.9.2

# Note that $NSLOTS is set to the number of cores: 1 for a serial
medaka_consensus -i basecalls.fa -d assm_final.fa -o resultsdir -t $NSLOTS -m r941_min_high

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Parallel batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
#$ -pe smp.pe 8     # Number of cores

# Load the required CPU version
module load apps/python/medaka/0.9.2

# Note that $NSLOTS is set to the number of cores requested above
medaka_consensus -i basecalls.fa -d assm_final.fa -o resultsdir -t $NSLOTS -m r941_min_high

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

GPU batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
#$ -v100            # Run on a single Nvidia v100 GPU
#$ -pe smp.pe 8     # Number of cores (can use up to 8 cores per GPU)

# Load the required GPU version
module load apps/python/medaka/0.9.2-gpu

# Note that $NSLOTS is set to the number of cores requested above
medaka_consensus -i basecalls.fa -d assm_final.fa -o resultsdir -t $NSLOTS -m r941_min_high

The Medaka website advises the following setting may be required when running on the GPU:

export TF_FORCE_GPU_ALLOW_GROWTH=true

If your GPU jobs fail with GPU-memory errors, try adding the above line to your jobscript.

Further info

Updates

None.

Last modified on December 9, 2019 at 4:32 pm by George Leaver