Description

This package provides an implementation of the inference pipeline of AlphaFold 3.

Software Category: bio

For detailed information, visit the AlphaFold website.

Available Versions

To find the available versions, run:

module spider alphafold

For detailed information about a particular version, including the load command, run module spider <name/version>. For example:

module spider alphafold/3.0.0

Module	Version	Module Load Command
alphafold	3.0.0	module load gcc/11.4.0 alphafold/3.0.0
alphafold	2.3.0	module load apptainer/1.3.4 alphafold/2.3.0
alphafold	2.3.2-dev	module load apptainer/1.3.4 alphafold/2.3.2-dev

AlphaFold 3

Model Parameters

The AlphaFold 3 model parameters are subject to the Terms of Use defined here. Our module does not contain the model parameters; instead, each user must submit their own request to DeepMind. Visit here for further instructions.

Upon approval you will receive a download url for the file af3.bin.zst (~1 GB). Place it in a directory that is not shared with others, e.g. ~/af3.

DIR=~/af3
mkdir $DIR
cd $DIR
wget <your_download_url>
unzstd af3.bin.zst

The last command will extract the file into af3.bin.

Slurm Script

#!/bin/bash
#SBATCH -A mygroup           # your allocation account
#SBATCH -p gpu               # partition
#SBATCH --gres=gpu:1         # number of GPUs
#SBATCH -C "a40|a6000|a100"  # compatible with A40, A6000, A100
#SBATCH -c 8                 # number of cores
#SBATCH -t 10:00:00          # time

module purge
module load gcc alphafold

python $EBROOTALPHAFOLD/app/alphafold/run_alphafold.py \
    --db_dir=$ALPHAFOLD_DATA_PATH \
    --model_dir=$HOME/af3 \
    --json_path=fold_input.json \
    --output_dir=$PWD

If you put the model parameters in a different location, change the value of --model_dir. To see the complete list of flags run:

python $EBROOTALPHAFOLD/app/alphafold/run_alphafold.py --help

Refer to the official documentation for more information.

AlphaFold 2

Installation details

We prepared a Docker image based on the official Dockerfile with some modifications.

AlphaFold does not use TensorFlow on the GPU (instead it uses JAX). See issue. Changed tensorflow to tensorflow-cpu.
There is no need to have system CUDA libraries since they are already included in the conda environment.
Switched to micromamba instead of Miniconda.

With a three-stage build, our Docker image is only 5.4 GB on disk (2.1 GB compressed on DockerHub), almost half the size using the official Dockerfile (10.1 GB).

For further details see here.

AlphaFold launch command

Please refer to run_alphafold.py for all available options.

Launch script `run`

For your convenience, we have prepared a launch script run that takes care of the Apptainer command and the database paths, since these are unlikely to change. If you do need to customize anything please use the full Apptainer command.

Explanation of Apptainer flags

The database and models are stored in $ALPHAFOLD_DATA_PATH.
A cache file ld.so.cache will be written to /etc, which is not allowed on the HPC system. The workaround is to bind-mount e.g. the current working directory to /etc inside the container. [-B .:/etc]
You must launch AlphaFold from /app/alphafold inside the container due to this issue. [--pwd /app/alphafold]
The --nv flag enables GPU support.

Explanation of AlphaFold flags

The default command of the container is /app/run_alphafold.sh.
As a consequence of the Apptainer --pwd flag, the fasta and output paths must be full paths (e.g. /scratch/$USER/mydir, not relative paths (e.g. ./mydir). You may use $PWD as demonstrated.
The max_template_date is of the form YYYY-MM-DD.
Only the database paths in mark_flags_as_required of run_alphafold.py are included because the optional paths depend on db_preset (full_dbs or reduced_dbs) and model_preset.

Slurm Script

Below are some Slurm script templates for version 2.3.

Monomer with `full_dbs`

#!/bin/bash
#SBATCH -A mygroup      # your allocation account
#SBATCH -p gpu          # partition
#SBATCH --gres=gpu:1    # number of GPUs
#SBATCH -C "v100|a100"  # request a V100 or A100 GPU
#SBATCH -N 1            # number of nodes
#SBATCH -c 8            # number of cores
#SBATCH -t 10:00:00     # time

module purge
module load apptainer alphafold

run --fasta_paths=$PWD/your_fasta_file \
    --output_dir=$PWD/outdir \
    --model_preset=monomer \
    --db_preset=full_dbs \
    --bfd_database_path=/data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --pdb70_database_path=/data/pdb70/pdb70 \
    --uniref30_database_path=/data/uniref30/UniRef30_2021_03 \
    --max_template_date=YYYY-MM-DD \
    --use_gpu_relax=True

Multimer with `reduced_dbs`

#!/bin/bash
#SBATCH -A mygroup      # your allocation account
#SBATCH -p gpu          # partition
#SBATCH --gres=gpu:1    # number of GPUs
#SBATCH -C "v100|a100"  # request a V100 or A100 GPU
#SBATCH -N 1            # number of nodes
#SBATCH -c 8            # number of cores
#SBATCH -t 10:00:00     # time

module purge
module load apptainer alphafold

run --fasta_paths=$PWD/your_fasta_file \
    --output_dir=$PWD/outdir \
    --model_preset=multimer \
    --db_preset=reduced_dbs \
    --pdb_seqres_database_path=/data/pdb_seqres/pdb_seqres.txt \
    --uniprot_database_path=/data/uniprot/uniprot.fasta \
    --small_bfd_database_path=/data/small_bfd/bfd-first_non_consensus_sequences.fasta \
    --max_template_date=YYYY-MM-DD \
    --use_gpu_relax=True

Notes

For users running large protein jobs: Version 2.3.2-dev is based on commit 020cd6d, about 2 years after the official 2.3.2 release. The reason for using a development version is that the package requirements are updated for compatibility on the H200 GPU. Users who have experienced out-of-memory errors for large protein calculations should request an H200 GPU (--gres=gpu:h200) and load the 2.3.2-dev version.
Before upgrading to a newer version, please always check the official repo for details, especially on any changes to the parameters, databases, and flags.

You may need to request 8 CPU cores due to this line printed in the output:

Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpys2ocad8/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 ./seq.fasta /share/resources/data/alphafold/mgnify/mgy_clusters.fa"

You must provide a value for --max_template_date. If you are predicting the structure of a protein that is already in PDB and you wish to avoid using it as a template, then max_template_date must be set to be before the release date of the structure. If you do not need to specify a date, by default you can set today’s date. For example, if you are running the simulation on August 7th 2021, set -–max_template_date = 2021-08-07. See here.
You are not required to use the run wrapper script. You can always provide the full apptainer command.

Updated May 1, 2025 | HPC, software bio, gpu, multi-core

« Return to HPC Overview

AlphaFold and UVA HPC

Description

Available Versions

AlphaFold 3

Model Parameters

Slurm Script

AlphaFold 2

Installation details

AlphaFold launch command

Launch script `run`

Explanation of Apptainer flags

Explanation of AlphaFold flags

Slurm Script

Monomer with `full_dbs`

Multimer with `reduced_dbs`

Notes

Description

Available Versions

AlphaFold 3

Model Parameters

Slurm Script

AlphaFold 2

Installation details

AlphaFold launch command

Launch script run

Explanation of Apptainer flags

Explanation of AlphaFold flags

Slurm Script

Monomer with full_dbs

Multimer with reduced_dbs

Notes

Launch script `run`

Monomer with `full_dbs`

Multimer with `reduced_dbs`