Alphafold
Overview
AlphaFold is an application for predicting models of protein structures.
AlphaFold requires a suite of supporting tools to be installed, therefore Alphafold and all supporting tools are available within a singularity container.
Restrictions on use
Although the AlphaFold code is licensed under the Apache License, Version 2.0 (the “License”) and therefore is considered open source. Alphafold utilises various genetic databases, model parameters, and third party software all using a variety of licenses.
Users wishing to access Alphafold should review and agree in an email to its-ri-team@manchester.ac.uk that they will abide to the T&Cs of the various licenses.
Please follow this link for further information and to view all associated licenses – Alphafold License and Disclaimer
Only users who have been added to the Alphafold group can run the application.
Any publication that discloses findings arising from using this source code or the model parameters should cite the AlphaFold paper and, if applicable, the AlphaFold-Multimer paper.
Access to GPUs is not automatic. If you wish to use this software on GPUs please let us know when you request access.
Set up procedure
We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.
Load one of the following modulefiles:
module load apps/singularity/alphafold/2.3.0 module load apps/singularity/alphafold/2.1.1 module load apps/singularity/alphafold/2.0
Version 2.1.1 (and up) supports multimer modelling, however should not be considered as stable as the monomer Alphafold system
Running the application
Please do not run Alphafold on the login node. Jobs should be submitted to the compute nodes via batch.
Ideally, Alphafold should be run in a parallel environment using 8 cores (there are steps in the code which are hardcoded to use 8 cores) with or without a single GPU (by default will use GPU). Unfortunately, Alphafold cannot use more than 1x GPU
Genetic Databases
AlphaFold needs multiple genetic (sequence) databases to run, they have already been downloaded and Alphafold has been setup to access them by default. Please note users will need to be a member of the alphafold
unix group in order to access them see Restrictions on use section.
Parallel batch job submission with GPU
Please note that access to GPUs is not enabled by default and needs to be requested.
Create a batch submission script (which will load the modulefile in the jobscript), for example:
#!/bin/bash --login #$ -cwd # Job will run from the current directory #$ -l v100=1 # Job will run using 1 x GPU #$ -pe smp.pe 8 # Load the version you require module load apps/singularity/alphafold/2.1.1 run_alphafold.sh -f $PWD/filename.fasta -t YYYY-MM-DD -o $PWD/output_directory -m model_preset # # PLEASE NOTE: $PWD is required so the singularity container is able to map to the CSF filesystem
Parallel batch job submission without GPU
Create a batch submission script (which will load the modulefile in the jobscript), for example:
#!/bin/bash --login #$ -cwd # Job will run from the current directory #$ -pe smp.pe 8 # Load the version you require module load apps/singularity/alphafold/2.1.1 run_alphafold.sh -f $PWD/filename.fasta -t YYYY-MM-DD -o $PWD/output_directory -m model_preset -g false # # PLEASE NOTE: $PWD is required so the singularity container is able to map to the CSF filesystem
Required and Optional Parameters
Required Parameters:
-o <output_dir> Path to a directory that will store the results. -m <model_preset> Choose preset model configuration - the monomer model (monomer), the monomer model with extra ensembling (monomer_casp14), monomer model with pTM head (monomer_ptm), or multimer model (multimer) (default: 'monomer') -f <fasta_path> Path to a FASTA file containing one sequence -t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets
Optional Parameters:
-g <use_gpu> Enable NVIDIA runtime to run with GPUs (default: true) -c <db_preset> Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config (full_dbs) (default: 'full_dbs') -p <use_precomputed_msas> Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have changed (default: 'false') -l <is_prokaryote> Optional for multimer system, not used by the single chain system. A boolean specifying true where the target complex is from a prokaryote, and false where it is not, or where the origin is unknown. This value determine the pairing method for the MSA (default: 'None') -b <benchmark> Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: 'false')
Submit the jobscript using:
qsub scriptname
where scriptname is the name of your jobscript.
If you wish to build your own singularity image you should copy the above files to your local PC and run singularity there. For security reasons only the sysadmins can build images on the CSF.
Further info
Guide with input files
One of our Alphafold users on CSF3 has very kindly provided a guide with example input files to help get new users started.
Updates
None.