FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours
- URL: http://arxiv.org/abs/2203.00854v2
- Date: Fri, 4 Mar 2022 10:08:04 GMT
- Title: FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours
- Authors: Shenggan Cheng, Ruidong Wu, Zhongming Yu, Binrui Li, Xiwen Zhang, Jian
Peng, Yang You
- Abstract summary: We propose FastFold, a highly efficient implementation of the protein structure prediction model for training and inference.
FastFold includes a series of GPU optimizations based on a thorough analysis of AlphaFold's performance.
Experimental results show that FastFold reduces overall training time from 11 days to 67 hours and achieves 7.5-9.5X speedup for long-sequence inference.
- Score: 11.847436777986323
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Protein structure prediction is an important method for understanding gene
translation and protein function in the domain of structural biology. AlphaFold
introduced the Transformer model to the field of protein structure prediction
with atomic accuracy. However, training and inference of the AlphaFold model
are time-consuming and expensive because of the special performance
characteristics and huge memory consumption. In this paper, we propose
FastFold, a highly efficient implementation of the protein structure prediction
model for training and inference. FastFold includes a series of GPU
optimizations based on a thorough analysis of AlphaFold's performance.
Meanwhile, with Dynamic Axial Parallelism and Duality Async Operation, FastFold
achieves high model parallelism scaling efficiency, surpassing existing popular
model parallelism techniques. Experimental results show that FastFold reduces
overall training time from 11 days to 67 hours and achieves 7.5-9.5X speedup
for long-sequence inference. Furthermore, We scaled FastFold to 512 GPUs and
achieved an aggregate of 6.02 PetaFLOPs with 90.1% parallel efficiency. The
implementation can be found at https://github.com/hpcaitech/FastFold
Related papers
- SaDiT: Efficient Protein Backbone Design via Latent Structural Tokenization and Diffusion Transformers [50.18388227899971]
We present SaDiT, a novel framework that accelerates protein backbone generation by integrating SaProt Tokenization with a Diffusion Transformer (DiT) architecture.<n>Experiments demonstrate that SaDiT outperforms state-of-the-art models, including RFDiffusion and Proteina, in both computational speed and structural viability.
arXiv Detail & Related papers (2026-02-06T13:50:13Z) - Triangle Multiplication Is All You Need For Biomolecular Structure Representations [56.26342479807906]
We introduce Pairmixer, a streamlined alternative that eliminates triangle attention while preserving higher-order geometric reasoning capabilities.<n>Pairmixer substantially improves computational efficiency, matching state-of-the-art structure predictors across folding and docking benchmarks.<n>Within BoltzDesign, for example, Pairmixer delivers over 2x faster sampling and scales to sequences 30% longer than the memory limits of Pairformer.
arXiv Detail & Related papers (2025-10-21T17:59:02Z) - Protein Folding with Neural Ordinary Differential Equations [9.980631693646528]
We propose a continuous-depth formulation of the Evoformer, replacing its 48 discrete blocks with a Neural ODE parameterization that preserves its core attention-based operations.<n>We find that the Neural ODE-based Evoformer produces structurally plausible predictions and reliably captures certain secondary structure elements, such as alpha-helices.<n>Our model achieves this performance using dramatically fewer resources, just 17.5 hours of training on a single GPU.
arXiv Detail & Related papers (2025-10-17T22:56:03Z) - Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale [68.6602625868888]
We introduce convolutional multi-hybrid architectures, with a design grounded on two simple observations.
Operators in hybrid models can be tailored to token manipulation tasks such as in-context recall, multi-token recall, and compression.
We train end-to-end 1.2 to 2.9 times faster than optimized Transformers, and 1.1 to 1.4 times faster than previous generation hybrids.
arXiv Detail & Related papers (2025-02-25T19:47:20Z) - PredFormer: Transformers Are Effective Spatial-Temporal Predictive Learners [65.93130697098658]
This paper proposes PredFormer, a pure transformer-based framework for predictive learning.
With its recurrent-free, transformer-based design, PredFormer is both simple and efficient.
experiments on synthetic and real-world datasets demonstrate that PredFormer achieves state-the-art performance.
arXiv Detail & Related papers (2024-10-07T03:52:06Z) - Improving AlphaFlow for Efficient Protein Ensembles Generation [64.10918970280603]
We propose a feature-conditioned generative model called AlphaFlow-Lit to realize efficient protein ensembles generation.
AlphaFlow-Lit performs on-par with AlphaFlow and surpasses its distilled version without pretraining, all while achieving a significant sampling acceleration of around 47 times.
arXiv Detail & Related papers (2024-07-08T13:36:43Z) - ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours [4.886207598730398]
We conduct a comprehensive analysis on the AlphaFold training procedure based on Openfold.
We identify that inefficient communications and overhead-dominated computations were the key factors that prevented AlphaFold from effective scaling.
We introduce ScaleFold, a systematic training method that incorporated optimizations specifically for these factors.
arXiv Detail & Related papers (2024-04-17T04:55:33Z) - AlphaFold Meets Flow Matching for Generating Protein Ensembles [11.1639408863378]
We develop a flow-based generative modeling approach for learning and sampling the conformational landscapes of proteins.
Our method provides a superior combination of precision and diversity compared to AlphaFold with MSA subsampling.
Our method can diversify a static PDB structure with faster wall-clock convergence to certain equilibrium properties than replicate MD trajectories.
arXiv Detail & Related papers (2024-02-07T13:44:47Z) - Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - AlphaFold Distillation for Protein Design [25.190210443632825]
Inverse protein folding is crucial in bio-engineering and drug discovery.
Forward folding models like AlphaFold offer a potential solution by accurately predicting structures from sequences.
We propose using knowledge distillation on folding model confidence metrics to create a faster and end-to-end differentiable distilled model.
arXiv Detail & Related papers (2022-10-05T19:43:06Z) - Revisiting Multi-Scale Feature Fusion for Semantic Segmentation [90.32746095413447]
In this paper, we demonstrate that neither high internal resolution nor atrous convolutions are necessary for accurate semantic segmentation.
We develop a simplified segmentation model, named ESeg, which has neither high internal resolution nor expensive atrous convolutions.
Our simple method can achieve better accuracy with faster speed than prior art across multiple datasets.
arXiv Detail & Related papers (2022-03-23T19:14:11Z) - Shatter: An Efficient Transformer Encoder with Single-Headed
Self-Attention and Relative Sequence Partitioning [14.164984597158501]
Transformer architecture, based on self-attention, is the foundation of large pretrained models such as BERT.
We present an alternative self-attention architecture, Shatter, that more efficiently encodes sequence information.
We conduct extensive experiments showing that Shatter achieves better performance than BERT.
arXiv Detail & Related papers (2021-08-30T07:42:12Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z) - Real-Time Execution of Large-scale Language Models on Mobile [49.32610509282623]
We find the best model structure of BERT for a given computation size to match specific devices.
Our framework can guarantee the identified model to meet both resource and real-time specifications of mobile devices.
Specifically, our model is 5.2x faster on CPU and 4.1x faster on GPU with 0.5-2% accuracy loss compared with BERT-base.
arXiv Detail & Related papers (2020-09-15T01:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.