bwa-mem2

Overview

bwa-mem2 bwa-mem2 is the next version of the bwa-mem algorithm in bwa (a software package for mapping DNA sequences against a large reference genome, such as the human genome). It produces alignment identical to bwa and is ~1.3-3.1x faster depending on the use-case, dataset and the running machine.

Version 2.2.1 is installed on the CSF, however there are 2 different binary executables, one runs only on INTEL CPUs (-p multicore_small) and the other one runs only on AMD CPUs (-p multicore).

Restrictions on use

There are no restrictions on accessing the software on the CSF. It is distributed under the open source MIT License, all usage must adhere to that license.

Set up procedure

We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.

Load one of the following modulefiles:

module load apps/binapps/bwa-mem2/2.2.1   #USE THIS VERSION IF LOOKING TO RUN ON INTEL NODES (-p multicore_small) 
module load apps/gcc/bwa-mem2/2.2.1       #USE THIS VERSION IF LOOKING TO RUN ON AMD NODES (-p multicore)

Running the application

Please do not run bwa-mem2 on the login node. Jobs should be submitted to the compute nodes via batch.

You may run the following to obtain help on the command-line flags:

bwa-mem2

Indexing

# Indexing the reference sequence (Requires 28N GB memory where N is the size of the reference sequence).

bwa-mem2 index [-p prefix] in.fasta
# Where in.fasta is the path to reference sequence fasta file and prefix is the prefix of the names of the files that store the resultant index. Default is in.fasta.

Mapping

# Run "bwa-mem2 mem" to get all options

bwa-mem2 mem -t $SLURM_NTASKS prefix> reads.fq/fa > out.sam
# Where prefix is the prefix specified when creating the index or the path to the reference fasta file in case no prefix was provided.

Serial batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#SBATCH -p serial  # Partition is required. Runs on Intel hardware.
#SBATCH -t 4-0     # Wallclock limit (days-hours). Required!
                   # Max permitted is 7 days (7-0).

# Choose the version compiled only for INTEL cpus
module purge
module load apps/binapps/bwa-mem2/2.2.1 #

bwa-mem2 mem prefix reads.fq/fa > out.sam

# Where prefix is the prefix specified when creating the index or the path to the reference fasta file in case no prefix was provided.

Submit the jobscript using:

sbatch scriptname

where scriptname is the name of your jobscript.

If you need more RAM (memory) to complete the analysis successfully, and you may well do!, please add the flags mentioned at the high-memory jobs page for more information.

Parallel batch job submission