Related papers: ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours

ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours

URL: http://arxiv.org/abs/2404.11068v1
Date: Wed, 17 Apr 2024 04:55:33 GMT
Title: ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours
Authors: Feiwen Zhu, Arkadiusz Nowaczynski, Rundong Li, Jie Xin, Yifei Song, Michal Marcinkiewicz, Sukru Burc Eryilmaz, Jun Yang, Michael Andersch,
Abstract summary: We conduct a comprehensive analysis on the AlphaFold training procedure based on Openfold. We identify that inefficient communications and overhead-dominated computations were the key factors that prevented AlphaFold from effective scaling. We introduce ScaleFold, a systematic training method that incorporated optimizations specifically for these factors.
Score: 4.886207598730398
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: AlphaFold2 has been hailed as a breakthrough in protein folding. It can rapidly predict protein structures with lab-grade accuracy. However, its implementation does not include the necessary training code. OpenFold is the first trainable public reimplementation of AlphaFold. AlphaFold training procedure is prohibitively time-consuming, and gets diminishing benefits from scaling to more compute resources. In this work, we conducted a comprehensive analysis on the AlphaFold training procedure based on Openfold, identified that inefficient communications and overhead-dominated computations were the key factors that prevented the AlphaFold training from effective scaling. We introduced ScaleFold, a systematic training method that incorporated optimizations specifically for these factors. ScaleFold successfully scaled the AlphaFold training to 2080 NVIDIA H100 GPUs with high resource utilization. In the MLPerf HPC v3.0 benchmark, ScaleFold finished the OpenFold benchmark in 7.51 minutes, shown over $6\times$ speedup than the baseline. For training the AlphaFold model from scratch, ScaleFold completed the pretraining in 10 hours, a significant improvement over the seven days required by the original AlphaFold pretraining baseline.

Related papers

Improving AlphaFlow for Efficient Protein Ensembles Generation [64.10918970280603]
We propose a feature-conditioned generative model called AlphaFlow-Lit to realize efficient protein ensembles generation. AlphaFlow-Lit performs on-par with AlphaFlow and surpasses its distilled version without pretraining, all while achieving a significant sampling acceleration of around 47 times.
arXiv Detail & Related papers (2024-07-08T13:36:43Z)
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations [62.132347451049455]
Scale has become a main ingredient in obtaining strong machine learning models. In this work, we argue that scale and training research has been needlessly complex due to reliance on the cosine schedule. We show that weight averaging yields improved performance along the training trajectory, without additional training costs, across different scales.
arXiv Detail & Related papers (2024-05-28T17:33:54Z)
Breaking MLPerf Training: A Case Study on Optimizing BERT [9.486916730173661]
We present novel approaches for fast large-scale training of BERT model. Load balancing is imperative in distributed BERT training since its training are characterized by samples with various lengths. We propose two new ideas, (1) local presorting based on dataset stratification for load balancing and (2) bucket-wise gradient clipping before allreduce.
arXiv Detail & Related papers (2024-02-04T11:12:17Z)
An Emulator for Fine-Tuning Large Language Models using Small Language Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales. We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training. Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z)
Stable and low-precision training for large-scale vision-language models [108.62077651227607]
We introduce new methods for accelerating and stabilizing training for large language-vision models. For acceleration, we introduce SwitchBack, a linear layer for int8 quantized training which provides a speed-up of 13-25%. For stability, we analyze loss spikes and find they consistently occur 1-8 after the squared gradients become under-estimated.
arXiv Detail & Related papers (2023-04-25T17:38:18Z)
HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle [19.331098164638544]
We implement AlphaFold2 using PaddlePaddle, namely HelixFold, to improve training and inference speed and reduce memory consumption. Compared with the original AlphaFold2 and OpenFold, HelixFold needs only 7.5 days to complete the full end-to-end training. HelixFold's accuracy could be on par with AlphaFold2 on the CASP14 and CAMEO datasets.
arXiv Detail & Related papers (2022-07-12T11:43:50Z)
FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours [11.847436777986323]
We propose FastFold, a highly efficient implementation of the protein structure prediction model for training and inference. FastFold includes a series of GPU optimizations based on a thorough analysis of AlphaFold's performance. Experimental results show that FastFold reduces overall training time from 11 days to 67 hours and achieves 7.5-9.5X speedup for long-sequence inference.
arXiv Detail & Related papers (2022-03-02T03:59:51Z)
Fast Certified Robust Training via Better Initialization and Shorter Warmup [95.81628508228623]
We propose a new IBP and principled regularizers during the warmup stage to stabilize certified bounds. We find that batch normalization (BN) is a crucial architectural element to build best-performing networks for certified training.
arXiv Detail & Related papers (2021-03-31T17:58:58Z)
Large-Scale Training System for 100-Million Classification at Alibaba [43.58719630882661]
extreme classification has become an essential topic for deep learning. It is very challenging to train a deep model with millions of classes due to the memory and explosion in the last output layer. We build a hybrid parallel training framework to make the training process feasible. Second, we propose a novel softmax variation named KNN softmax, which reduces both the GPU memory consumption and computation costs.
arXiv Detail & Related papers (2021-02-09T06:53:31Z)
EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets [106.79387235014379]
EarlyBERT is a general computationally-efficient training algorithm applicable to both pre-training and fine-tuning of large-scale language models. We are the first to identify structured winning tickets in the early stage of BERT training, and use them for efficient training. EarlyBERT easily achieves comparable performance to standard BERT with 3545% less training time.
arXiv Detail & Related papers (2020-12-31T20:38:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.