OneFlow: Redesign the Distributed Deep Learning Framework from Scratch
- URL: http://arxiv.org/abs/2110.15032v2
- Date: Fri, 29 Oct 2021 02:33:23 GMT
- Title: OneFlow: Redesign the Distributed Deep Learning Framework from Scratch
- Authors: Jinhui Yuan and Xinqi Li and Cheng Cheng and Juncheng Liu and Ran Guo
and Shenghang Cai and Chi Yao and Fei Yang and Xiaodong Yi and Chuan Wu and
Haoran Zhang and Jie Zhao
- Abstract summary: OneFlow is a novel distributed training framework based on an SBP (split, broadcast and partial-value) abstraction and the actor model.
SBP enables much easier programming of data parallelism and model parallelism than existing frameworks.
OneFlow outperforms many well-known customized libraries built on top of the state-of-the-art frameworks.
- Score: 17.798586916628174
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning frameworks such as TensorFlow and PyTorch provide a productive
interface for expressing and training a deep neural network (DNN) model on a
single device or using data parallelism. Still, they may not be flexible or
efficient enough in training emerging large models on distributed devices,
which require more sophisticated parallelism beyond data parallelism. Plugins
or wrappers have been developed to strengthen these frameworks for model or
pipeline parallelism, but they complicate the usage and implementation of
distributed deep learning. Aiming at a simple, neat redesign of distributed
deep learning frameworks for various parallelism paradigms, we present OneFlow,
a novel distributed training framework based on an SBP (split, broadcast and
partial-value) abstraction and the actor model. SBP enables much easier
programming of data parallelism and model parallelism than existing frameworks,
and the actor model provides a succinct runtime mechanism to manage the complex
dependencies imposed by resource constraints, data movement and computation in
distributed deep learning. We demonstrate the general applicability and
efficiency of OneFlow for training various large DNN models with case studies
and extensive experiments. The results show that OneFlow outperforms many
well-known customized libraries built on top of the state-of-the-art
frameworks. The code of OneFlow is available at:
https://github.com/Oneflow-Inc/oneflow.
Related papers
- Transformer Architecture for NetsDB [0.0]
We create an end-to-end implementation of a transformer for deep learning model serving in NetsDB.
We load out weights from our model for distributed processing, deployment, and efficient inferencing.
arXiv Detail & Related papers (2024-05-08T04:38:36Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - SWARM Parallelism: Training Large Models Can Be Surprisingly
Communication-Efficient [69.61083127540776]
Deep learning applications benefit from using large models with billions of parameters.
Training these models is notoriously expensive due to the need for specialized HPC clusters.
We consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions.
arXiv Detail & Related papers (2023-01-27T18:55:19Z) - Does compressing activations help model parallel training? [64.59298055364336]
We present the first empirical study on the effectiveness of compression methods for model parallelism.
We implement and evaluate three common classes of compression algorithms.
We evaluate these methods across more than 160 settings and 8 popular datasets.
arXiv Detail & Related papers (2023-01-06T18:58:09Z) - Decentralized Training of Foundation Models in Heterogeneous
Environments [77.47261769795992]
Training foundation models, such as GPT-3 and PaLM, can be extremely expensive.
We present the first study of training large foundation models with model parallelism in a decentralized regime over a heterogeneous network.
arXiv Detail & Related papers (2022-06-02T20:19:51Z) - Amazon SageMaker Model Parallelism: A General and Flexible Framework for
Large Model Training [10.223511922625065]
We present Amazon SageMaker model parallelism, a software library that integrates with PyTorch.
It enables easy training of large models using model parallelism and other memory-saving features.
We evaluate performance over GPT-3, RoBERTa, BERT, and neural collaborative filtering.
arXiv Detail & Related papers (2021-11-10T22:30:21Z) - Model-Parallel Model Selection for Deep Learning Systems [0.0]
inefficiencies in machine learning (ML) training prevent practical usage of state-of-the-art models for most users.
Many ML practitioners have turned to model parallelism as a method of distributing the computational requirements across several devices.
We propose a new form of "shard parallelism" combining task and model parallelism, then package it into a framework we name Hydra.
arXiv Detail & Related papers (2021-07-14T03:20:37Z) - TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale
Language Models [60.23234205219347]
TeraPipe is a high-performance token-level pipeline parallel algorithm for synchronous model-parallel training of Transformer-based language models.
We show that TeraPipe can speed up the training by 5.0x for the largest GPT-3 model with 175 billion parameters on an AWS cluster.
arXiv Detail & Related papers (2021-02-16T07:34:32Z) - Parallel Training of Deep Networks with Local Updates [84.30918922367442]
Local parallelism is a framework which parallelizes training of individual layers in deep networks by replacing global backpropagation with truncated layer-wise backpropagation.
We show results in both vision and language domains across a diverse set of architectures, and find that local parallelism is particularly effective in the high-compute regime.
arXiv Detail & Related papers (2020-12-07T16:38:45Z) - Towards a Scalable and Distributed Infrastructure for Deep Learning
Applications [4.4979162962108905]
Phylanx offers a productivity-oriented execution tree that can be executed on multiple nodes.
We present Phylanx that has the potential to alleviate shortcomings in distributed deep learning frameworks.
arXiv Detail & Related papers (2020-10-06T20:38:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.