Medaka
Overview
Medaka is an open-source application from Oxford Nanopore. It is a tool to create a consensus sequence from nanopore sequencing data. This task is performed using neural networks applied from a pileup of individual sequencing reads against a draft assembly.
Versions 0.9.2 for CPU and GPU are installed on the CSF.
Restrictions on use
There are no restrictions on accessing the software on the CSF. It is released under the Mozilla Public License 2.0 and all usage must adhere to that license.
Set up procedure
We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.
Load one of the following modulefiles:
# Load one of the following modulefiles module load apps/python/medaka/0.9.2 # CPU only module load apps/python/medaka/0.9.2-gpu # GPU (uses GPU-enable tensorflow)
The above modulefiles will load any required dependency modulefiles. This includes the modulefiles for tabix, samtools and minimap2 (in the CPU version. The GPU version has been compiled from source and so these tools are compiled as part of the
installation process.) The CUDA modulefiles will be loaded for the GPU version.
Running the application
Please do not run Medaka on the login node. Jobs should be submitted to the compute nodes via batch.
Serial batch job submission
Create a batch submission script (which will load the modulefile in the jobscript), for example:
#!/bin/bash --login #$ -cwd # Job will run from the current directory # NO -V line - we load modulefiles in the jobscript # Load the required CPU version module load apps/python/medaka/0.9.2 # Note that $NSLOTS is set to the number of cores: 1 for a serial medaka_consensus -i basecalls.fa -d assm_final.fa -o resultsdir -t $NSLOTS -m r941_min_high
Submit the jobscript using:
qsub scriptname
where scriptname is the name of your jobscript.
Parallel batch job submission
Create a batch submission script (which will load the modulefile in the jobscript), for example:
#!/bin/bash --login #$ -cwd # Job will run from the current directory #$ -pe smp.pe 8 # Number of cores # Load the required CPU version module load apps/python/medaka/0.9.2 # Note that $NSLOTS is set to the number of cores requested above medaka_consensus -i basecalls.fa -d assm_final.fa -o resultsdir -t $NSLOTS -m r941_min_high
Submit the jobscript using:
qsub scriptname
where scriptname is the name of your jobscript.
GPU batch job submission
Create a batch submission script (which will load the modulefile in the jobscript), for example:
#!/bin/bash --login #$ -cwd # Job will run from the current directory #$ -v100 # Run on a single Nvidia v100 GPU #$ -pe smp.pe 8 # Number of cores (can use up to 8 cores per GPU) # Load the required GPU version module load apps/python/medaka/0.9.2-gpu # Note that $NSLOTS is set to the number of cores requested above medaka_consensus -i basecalls.fa -d assm_final.fa -o resultsdir -t $NSLOTS -m r941_min_high
The Medaka website advises the following setting may be required when running on the GPU:
export TF_FORCE_GPU_ALLOW_GROWTH=true
If your GPU jobs fail with GPU-memory errors, try adding the above line to your jobscript.
Further info
Updates
None.