Alphafold

Overview

AlphaFold is an application for predicting models of protein structures.

AlphaFold requires a suite of supporting tools to be installed, therefore Alphafold and all supporting tools are available within a singularity container.

Restrictions on use

Although the AlphaFold code is licensed under the Apache License, Version 2.0 (the “License”) and therefore is considered open source. Alphafold utilises various genetic databases, model parameters, and third party software all using a variety of licenses.

Users wishing to access Alphafold should review and agree in an email to its-ri-team@manchester.ac.uk that they will abide to the T&Cs of the various licenses.

Please follow this link for further information and to view all associated licenses  – Alphafold License and Disclaimer 

Only users who have been added to the Alphafold group can run the application.

Any publication that discloses findings arising from using this source code or the model parameters should cite the AlphaFold paper and, if applicable, the AlphaFold-Multimer paper.

Access to GPUs is not automatic. If you wish to use this software on GPUs please let us know when you request access.

Set up procedure

We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.

Load one of the following modulefiles:

module load apps/singularity/alphafold/2.3.0
module load apps/singularity/alphafold/2.1.1  
module load apps/singularity/alphafold/2.0

Version 2.1.1 (and up) supports multimer modelling, however should not be considered as stable as the monomer Alphafold system

Running the application

Please do not run Alphafold on the login node. Jobs should be submitted to the compute nodes via batch.

Ideally, Alphafold should be run in a parallel environment using 8 cores (there are steps in the code which are hardcoded to use 8 cores) with or without a single GPU (by default will use GPU).  Unfortunately, Alphafold cannot use more than 1x GPU

Genetic Databases

AlphaFold needs multiple genetic (sequence) databases to run, they have already been downloaded and Alphafold has been setup to access them by default.  Please note users will need to be a member of the alphafold unix group in order to access them see Restrictions on use section.

Parallel batch job submission with GPU

Please note that access to GPUs is not enabled by default and needs to be requested.

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
#$ -l v100=1        # Job will run using 1 x GPU        
#$ -pe smp.pe 8


# Load the version you require
module load apps/singularity/alphafold/2.1.1

run_alphafold.sh -f $PWD/filename.fasta -t YYYY-MM-DD -o $PWD/output_directory -m model_preset
  #
  # PLEASE NOTE: $PWD is required so the singularity container is able to map to the CSF filesystem 

Parallel batch job submission without GPU

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory      
#$ -pe smp.pe 8


# Load the version you require
module load apps/singularity/alphafold/2.1.1

run_alphafold.sh -f $PWD/filename.fasta -t YYYY-MM-DD -o $PWD/output_directory -m model_preset -g false
  #
  # PLEASE NOTE: $PWD is required so the singularity container is able to map to the CSF filesystem

Required and Optional Parameters

Required Parameters:

-o <output_dir> Path to a directory that will store the results.
-m <model_preset> Choose preset model configuration - the monomer model (monomer), the monomer model with extra ensembling (monomer_casp14), monomer model with pTM head (monomer_ptm), or multimer model (multimer) (default: 'monomer')
-f <fasta_path> Path to a FASTA file containing one sequence
-t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets

Optional Parameters:

-g <use_gpu> Enable NVIDIA runtime to run with GPUs (default: true)
-c <db_preset> Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config (full_dbs) (default: 'full_dbs')
-p <use_precomputed_msas> Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have changed (default: 'false')
-l <is_prokaryote> Optional for multimer system, not used by the single chain system. A boolean specifying true where the target complex is from a prokaryote, and false where it is not, or where the origin is unknown. This value determine the pairing method for the MSA (default: 'None')
-b <benchmark> Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: 'false')

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

If you wish to build your own singularity image you should copy the above files to your local PC and run singularity there. For security reasons only the sysadmins can build images on the CSF.

Further info

Github Deepmind/alphafold 

Guide with input files

One of our Alphafold users on CSF3 has very kindly provided a guide with example input files to help get new users started.

Updates

None.

Last modified on August 21, 2024 at 5:40 pm by George Leaver