Nextflow

Overview

Nextflow is a scientific workflow system for creating scalable, portable, and reproducible workflows. It is based on the dataflow programming model, which greatly simplifies the writing of parallel and distributed pipelines, allowing you to focus on the flow of data and computation.

It consists of a Domain Specific Language (DSL), currently called Nextflow DSL2, based on Apache Groovy. It runs on the Java Virtual Machine (JVM).

You can either code your own workflow pipeline using the nextflow DSL2 or you can use one of the pre-existing pipelines developed and published by the community. You can find these hosted in community repositories such as the nf-core project or sequencing vendor specific such as Oxford Nanopore EPI2ME.

Restrictions on use

Nextflow is released under the open source Apache 2.0 license and can be accessed by all CSF users. Be aware that your usage must still adhere to the license terms and that publicly available pipelines you use may have different licenses e.g. nf-core uses the standard MIT License, but Oxford Nanopore (EPI2ME) apply their own Public License.

Set up procedure

We have created a module which will set the required environment variables, load the Java virtual machine, and provide a Nextflow profile, which is compatible with the CSF3 Slurm partitions.

To load the module:

module load apps/binapps/nextflow/25.10.4        # also loads Java openJDK 25.0.1
A pipeline you want to run might require a specific version of Nextflow, you generally do not need to request a new Nextflow module to achieve this, see below for details.

Running Nextflow

Nextflow is a bit different to other software on the CSF, please read and understand the information below before raising support requests for issues related to Nextflow.

When you run Nextflow, it launches a lightweight main task, which in turn submits a series of sub-tasks to Slurm on the CSF to run the steps in the pipeline.

Because the main task must be available for the whole of the pipeline run, we do not recommend submitting it as a Slurm job itself as it may time-out if sub-tasks have to wait for resources. Instead, you may run the Nextflow main task on one of the login nodes.

Pipelines (e.g. from nf-core) and their dependencies will be automatically downloaded into the ~/scratch/.nextflow directory structure. Be aware that if you use multiple pipelines or different versions of the same pipeline, which download dependencies, this scratch location can become large and you may want to clean up periodically.

Please always run Nextflow from within a directory you have created in scratch. Nextflow will download and create a lot of data while running a workflow and launching several jobs, therefore storing these data in your $HOME directory is not ideal.

If you are an experienced Nextflow user, you can find the path to the Manchester profile configuration file in the environment variable $NXF_UOM_CONFIG after loading the Nextflow module.

Running Nextflow from a login node

We recommend launching Nextflow from one of the three login nodes. Make a note of which login node you use, as if you logout you would need to (re)connect to the same node in order to monitor or cancel the pipeline run. Specific login nodes can be specified when logging-in like: ssh username@login3-csf3.itservices.manchester.ac.uk to go to login node 3.

The generic command sequence to run Nextflow on the CSF3 is:

module load apps/binapps/nextflow/25.10.4        # load the module

nextflow -bg run <pipeline> -c <nextflow_config_file> \
         -profile <profile_1,profile_2,...> [arg...] &> NXF_OUT.log

Explanation of the options shown:

  • -bg:

    Run Nextflow as a background process. With this option the Nextflow process will keep running in the background (in the login node), even if you log out. You can find the Nextflow process ID (PID) stored in the .nextflow.pid file created in the directory from which you launched nextflow. See also: cancelling a pipeline run.

  • <pipeline>:

    For a local pipeline this is your pipeline script, e.g. /path/to/main.nf.
    For a pipeline from an online repository, you can give the project URL or a short version if supported e.g. nf-core pipelines.

  • -c <nextflow_config_file>:

    A custom configuration file is needed to define the parameters and limits specific to the CSF. By loading our nextflow module you have access to our custom University of Manchester configuration file, using the variable $NXF_UOM_CONFIG.

  • -profile <profile_1,profile_2,...>:

    use one or more of the defined profiles (in a comma separated list). A profile is a set of configuration settings to be used during pipeline execution. Profiles may be defined in a pipeline’s configuration file (in the pipeline’s project directory) and in the custom configuration file loaded with the -c option.

    In our $NXF_UOM_CONFIG Nextflow configuration file we currently have the csf3himem profile defined. This will instruct Nextflow to launch all jobs to the himem Slurm partition. This is chosen as the himem partition allows both single and multicore jobs and has large memory, which is a typical requirement for bioinformatics pipelines.

  • [arg..]:

    One or more specific pipeline arguments. They always start with a double dash(--) in contrast to the generic nextflow options which start with a single hyphen (e.g. -bg). Each pipeline may have its own different options defined. For published pipelines these are usually described in the pipeline’s documentation.

  • &> NXF_OUT.log:

    This section of the command redirects any standard output or error messages from the terminal into a file called NXF-OUT.log and returns you to the command prompt.

Example test runs

RNAseq pipeline from nf-core

The basic command to run the test case for the rnaseq pipeline published in the nf-core project is:

nextflow -bg run nf-core/rnaseq -c $NXF_UOM_CONFIG \
         -profile test,singularity,csf3himem --outdir output-rnaseq &> NXF_OUT.log

To run the pipeline for real, remove test from the -profile arguments and add the required arguments for the pipeline at the end e.g. for inputs, outputs, settings. These arguments will be defined in the pipeline documentation.

If you want to run a specific revision of the pipeline (recommended), include the -r option:

nextflow -bg run nf-core/rnaseq -r 3.22.2  -c $NXF_UOM_CONFIG \
         -profile test,singularity,csf3himem --outdir output-rnaseq &> NXF_OUT.log

wf-human-variation workflow from Oxford Nanopore EPI2ME Labs

Detailed instructions to run the workflow are provided in its Github project page.

Create a working directory in scratch, and from inside that directory download and unpack the demo dataset:

wget https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-human-variation/hg002%2bmods.v1/wf-human-variation-demo.tar.gz
tar -xzvf wf-human-variation-demo.tar.gz

Then run the pipeline:

nextflow run epi2me-labs/wf-human-variation \
    -bg \
    -c $NXF_UOM_CONFIG \
    -profile singularity,csf3himem \
    --bam 'wf-human-variation-demo/demo.bam' \
    --ref 'wf-human-variation-demo/demo.fasta' \
    --bed 'wf-human-variation-demo/demo.bed' \
    --sample_name 'DEMO' \
    --snp \
    --sv \
    --mod \
    --phased &> NXF_OUT.log

Running a specific version of Nextflow

If you need a specific Nextflow version set the NXF_VER variable at the start of the command:

NXF_VER=23.10.1 nextflow -bg run nf-core/rnaseq -r 3.22.2  -c $NXF_UOM_CONFIG \
    -profile test,singularity,csf3himem --outdir output-rnaseq &> NXF_OUT.log

It doesn’t matter if this version is older or newer than the one we have installed on the CSF. The requested version will be downloaded into the ~/scratch/.nextflow directory structure and run from there.

Defining the version as shown above applies only to the run in which it is specified, it does not persist for future runs.

Monitoring a pipeline’s progress

To check the progress of a pipeline, go to the directory you have launched the pipeline from and read the NXF_OUT.log file like:

cat NXF_OUT.log

Or if you want to get live updates as the log file grows:

tail -f NXF_OUT.log

If you want more detail, then check the more verbose .nextflow.log file, in the same manner as described above.

After a pipeline run has finished, you can get a report, analyse or debug it, by running nextflow log [options] from the launch directory.

Cancelling a pipeline run

To cleanly cancel a pipeline run (along with all related Slurm jobs):

  1. Make sure you are logged in the same login node you launched the pipeline from.

  2. Type kill PID, where the PID is the Nextflow Process ID; the number stored in the .nextflow.pid file located in the directory you launched the pipeline from.

    An elegant option to do this is to change to the launch directory, that contains the .pid file, and run: kill $(cat .nextflow.pid).

Resuming a pipeline run

Nextflow also provides the ability to resume a cancelled or interrupted pipeline run. Please read the official documentation on how to achieve this.

Further Info

Official documentation

https://docs.seqera.io/nextflow/

Official courses

https://training.nextflow.io/latest/

Example tutorials for piplelines

https://www.nextflow.io/docs/edge/tutorials/rnaseq-nf.html
https://www.nextflow.io/docs/edge/tutorials/data-lineage.html

VSCode integration

https://www.nextflow.io/docs/latest/vscode.html

Last modified on March 10, 2026 at 8:14 pm by Paraskevas Mitsides