Related papers: FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours

FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours

URL: http://arxiv.org/abs/2203.00854v2
Date: Fri, 4 Mar 2022 10:08:04 GMT
Title: FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours
Authors: Shenggan Cheng, Ruidong Wu, Zhongming Yu, Binrui Li, Xiwen Zhang, Jian Peng, Yang You
Abstract summary: We propose FastFold, a highly efficient implementation of the protein structure prediction model for training and inference. FastFold includes a series of GPU optimizations based on a thorough analysis of AlphaFold's performance. Experimental results show that FastFold reduces overall training time from 11 days to 67 hours and achieves 7.5-9.5X speedup for long-sequence inference.
Score: 11.847436777986323
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Protein structure prediction is an important method for understanding gene translation and protein function in the domain of structural biology. AlphaFold introduced the Transformer model to the field of protein structure prediction with atomic accuracy. However, training and inference of the AlphaFold model are time-consuming and expensive because of the special performance characteristics and huge memory consumption. In this paper, we propose FastFold, a highly efficient implementation of the protein structure prediction model for training and inference. FastFold includes a series of GPU optimizations based on a thorough analysis of AlphaFold's performance. Meanwhile, with Dynamic Axial Parallelism and Duality Async Operation, FastFold achieves high model parallelism scaling efficiency, surpassing existing popular model parallelism techniques. Experimental results show that FastFold reduces overall training time from 11 days to 67 hours and achieves 7.5-9.5X speedup for long-sequence inference. Furthermore, We scaled FastFold to 512 GPUs and achieved an aggregate of 6.02 PetaFLOPs with 90.1% parallel efficiency. The implementation can be found at https://github.com/hpcaitech/FastFold

Related papers

Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale [68.6602625868888]
We introduce convolutional multi-hybrid architectures, with a design grounded on two simple observations. Operators in hybrid models can be tailored to token manipulation tasks such as in-context recall, multi-token recall, and compression. We train end-to-end 1.2 to 2.9 times faster than optimized Transformers, and 1.1 to 1.4 times faster than previous generation hybrids.
arXiv Detail & Related papers (2025-02-25T19:47:20Z)
PredFormer: Transformers Are Effective Spatial-Temporal Predictive Learners [65.93130697098658]
This paper proposes PredFormer, a pure transformer-based framework for predictive learning. With its recurrent-free, transformer-based design, PredFormer is both simple and efficient. experiments on synthetic and real-world datasets demonstrate that PredFormer achieves state-the-art performance.
arXiv Detail & Related papers (2024-10-07T03:52:06Z)
Improving AlphaFlow for Efficient Protein Ensembles Generation [64.10918970280603]
We propose a feature-conditioned generative model called AlphaFlow-Lit to realize efficient protein ensembles generation. AlphaFlow-Lit performs on-par with AlphaFlow and surpasses its distilled version without pretraining, all while achieving a significant sampling acceleration of around 47 times.
arXiv Detail & Related papers (2024-07-08T13:36:43Z)
ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours [4.886207598730398]
We conduct a comprehensive analysis on the AlphaFold training procedure based on Openfold. We identify that inefficient communications and overhead-dominated computations were the key factors that prevented AlphaFold from effective scaling. We introduce ScaleFold, a systematic training method that incorporated optimizations specifically for these factors.
arXiv Detail & Related papers (2024-04-17T04:55:33Z)
AlphaFold Meets Flow Matching for Generating Protein Ensembles [11.1639408863378]
We develop a flow-based generative modeling approach for learning and sampling the conformational landscapes of proteins. Our method provides a superior combination of precision and diversity compared to AlphaFold with MSA subsampling. Our method can diversify a static PDB structure with faster wall-clock convergence to certain equilibrium properties than replicate MD trajectories.
arXiv Detail & Related papers (2024-02-07T13:44:47Z)
Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone. We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z)
AlphaFold Distillation for Protein Design [25.190210443632825]
Inverse protein folding is crucial in bio-engineering and drug discovery. Forward folding models like AlphaFold offer a potential solution by accurately predicting structures from sequences. We propose using knowledge distillation on folding model confidence metrics to create a faster and end-to-end differentiable distilled model.
arXiv Detail & Related papers (2022-10-05T19:43:06Z)
Revisiting Multi-Scale Feature Fusion for Semantic Segmentation [90.32746095413447]
In this paper, we demonstrate that neither high internal resolution nor atrous convolutions are necessary for accurate semantic segmentation. We develop a simplified segmentation model, named ESeg, which has neither high internal resolution nor expensive atrous convolutions. Our simple method can achieve better accuracy with faster speed than prior art across multiple datasets.
arXiv Detail & Related papers (2022-03-23T19:14:11Z)
Shatter: An Efficient Transformer Encoder with Single-Headed Self-Attention and Relative Sequence Partitioning [14.164984597158501]
Transformer architecture, based on self-attention, is the foundation of large pretrained models such as BERT. We present an alternative self-attention architecture, Shatter, that more efficiently encodes sequence information. We conduct extensive experiments showing that Shatter achieves better performance than BERT.
arXiv Detail & Related papers (2021-08-30T07:42:12Z)
EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network. Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z)
Real-Time Execution of Large-scale Language Models on Mobile [49.32610509282623]
We find the best model structure of BERT for a given computation size to match specific devices. Our framework can guarantee the identified model to meet both resource and real-time specifications of mobile devices. Specifically, our model is 5.2x faster on CPU and 4.1x faster on GPU with 0.5-2% accuracy loss compared with BERT-base.
arXiv Detail & Related papers (2020-09-15T01:59:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.