FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours
- URL: http://arxiv.org/abs/2203.00854v2
- Date: Fri, 4 Mar 2022 10:08:04 GMT
- Title: FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours
- Authors: Shenggan Cheng, Ruidong Wu, Zhongming Yu, Binrui Li, Xiwen Zhang, Jian
Peng, Yang You
- Abstract summary: We propose FastFold, a highly efficient implementation of the protein structure prediction model for training and inference.
FastFold includes a series of GPU optimizations based on a thorough analysis of AlphaFold's performance.
Experimental results show that FastFold reduces overall training time from 11 days to 67 hours and achieves 7.5-9.5X speedup for long-sequence inference.
- Score: 11.847436777986323
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Protein structure prediction is an important method for understanding gene
translation and protein function in the domain of structural biology. AlphaFold
introduced the Transformer model to the field of protein structure prediction
with atomic accuracy. However, training and inference of the AlphaFold model
are time-consuming and expensive because of the special performance
characteristics and huge memory consumption. In this paper, we propose
FastFold, a highly efficient implementation of the protein structure prediction
model for training and inference. FastFold includes a series of GPU
optimizations based on a thorough analysis of AlphaFold's performance.
Meanwhile, with Dynamic Axial Parallelism and Duality Async Operation, FastFold
achieves high model parallelism scaling efficiency, surpassing existing popular
model parallelism techniques. Experimental results show that FastFold reduces
overall training time from 11 days to 67 hours and achieves 7.5-9.5X speedup
for long-sequence inference. Furthermore, We scaled FastFold to 512 GPUs and
achieved an aggregate of 6.02 PetaFLOPs with 90.1% parallel efficiency. The
implementation can be found at https://github.com/hpcaitech/FastFold
Related papers
- PredFormer: Transformers Are Effective Spatial-Temporal Predictive Learners [65.93130697098658]
This paper proposes PredFormer, a pure transformer-based framework for predictive learning.
With its recurrent-free, transformer-based design, PredFormer is both simple and efficient.
experiments on synthetic and real-world datasets demonstrate that PredFormer achieves state-the-art performance.
arXiv Detail & Related papers (2024-10-07T03:52:06Z) - Improving AlphaFlow for Efficient Protein Ensembles Generation [64.10918970280603]
We propose a feature-conditioned generative model called AlphaFlow-Lit to realize efficient protein ensembles generation.
AlphaFlow-Lit performs on-par with AlphaFlow and surpasses its distilled version without pretraining, all while achieving a significant sampling acceleration of around 47 times.
arXiv Detail & Related papers (2024-07-08T13:36:43Z) - ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours [4.886207598730398]
We conduct a comprehensive analysis on the AlphaFold training procedure based on Openfold.
We identify that inefficient communications and overhead-dominated computations were the key factors that prevented AlphaFold from effective scaling.
We introduce ScaleFold, a systematic training method that incorporated optimizations specifically for these factors.
arXiv Detail & Related papers (2024-04-17T04:55:33Z) - AlphaFold Meets Flow Matching for Generating Protein Ensembles [11.1639408863378]
We develop a flow-based generative modeling approach for learning and sampling the conformational landscapes of proteins.
Our method provides a superior combination of precision and diversity compared to AlphaFold with MSA subsampling.
Our method can diversify a static PDB structure with faster wall-clock convergence to certain equilibrium properties than replicate MD trajectories.
arXiv Detail & Related papers (2024-02-07T13:44:47Z) - Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - AlphaFold Distillation for Protein Design [25.190210443632825]
Inverse protein folding is crucial in bio-engineering and drug discovery.
Forward folding models like AlphaFold offer a potential solution by accurately predicting structures from sequences.
We propose using knowledge distillation on folding model confidence metrics to create a faster and end-to-end differentiable distilled model.
arXiv Detail & Related papers (2022-10-05T19:43:06Z) - Revisiting Multi-Scale Feature Fusion for Semantic Segmentation [90.32746095413447]
In this paper, we demonstrate that neither high internal resolution nor atrous convolutions are necessary for accurate semantic segmentation.
We develop a simplified segmentation model, named ESeg, which has neither high internal resolution nor expensive atrous convolutions.
Our simple method can achieve better accuracy with faster speed than prior art across multiple datasets.
arXiv Detail & Related papers (2022-03-23T19:14:11Z) - Shatter: An Efficient Transformer Encoder with Single-Headed
Self-Attention and Relative Sequence Partitioning [14.164984597158501]
Transformer architecture, based on self-attention, is the foundation of large pretrained models such as BERT.
We present an alternative self-attention architecture, Shatter, that more efficiently encodes sequence information.
We conduct extensive experiments showing that Shatter achieves better performance than BERT.
arXiv Detail & Related papers (2021-08-30T07:42:12Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z) - Real-Time Execution of Large-scale Language Models on Mobile [49.32610509282623]
We find the best model structure of BERT for a given computation size to match specific devices.
Our framework can guarantee the identified model to meet both resource and real-time specifications of mobile devices.
Specifically, our model is 5.2x faster on CPU and 4.1x faster on GPU with 0.5-2% accuracy loss compared with BERT-base.
arXiv Detail & Related papers (2020-09-15T01:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.