Amazon SageMaker Model Parallelism: A General and Flexible Framework for
Large Model Training
- URL: http://arxiv.org/abs/2111.05972v1
- Date: Wed, 10 Nov 2021 22:30:21 GMT
- Title: Amazon SageMaker Model Parallelism: A General and Flexible Framework for
Large Model Training
- Authors: Can Karakus, Rahul Huilgol, Fei Wu, Anirudh Subramanian, Cade Daniel,
Derya Cavdar, Teng Xu, Haohan Chen, Arash Rahnama, Luis Quintela
- Abstract summary: We present Amazon SageMaker model parallelism, a software library that integrates with PyTorch.
It enables easy training of large models using model parallelism and other memory-saving features.
We evaluate performance over GPT-3, RoBERTa, BERT, and neural collaborative filtering.
- Score: 10.223511922625065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With deep learning models rapidly growing in size, systems-level solutions
for large-model training are required. We present Amazon SageMaker model
parallelism, a software library that integrates with PyTorch, and enables easy
training of large models using model parallelism and other memory-saving
features. In contrast to existing solutions, the implementation of the
SageMaker library is much more generic and flexible, in that it can
automatically partition and run pipeline parallelism over arbitrary model
architectures with minimal code change, and also offers a general and
extensible framework for tensor parallelism, which supports a wider range of
use cases, and is modular enough to be easily applied to new training scripts.
The library also preserves the native PyTorch user experience to a much larger
degree, supporting module re-use and dynamic graphs, while giving the user full
control over the details of the training step. We evaluate performance over
GPT-3, RoBERTa, BERT, and neural collaborative filtering, and demonstrate
competitive performance over existing solutions.
Related papers
- FlexModel: A Framework for Interpretability of Distributed Large
Language Models [0.0]
We present FlexModel, a software package providing a streamlined interface for engaging with models distributed across multi- GPU and multi-node configurations.
The library is compatible with existing model distribution libraries and encapsulates PyTorch models.
It exposes user-registerable HookFunctions to facilitate straightforward interaction with distributed model internals.
arXiv Detail & Related papers (2023-12-05T21:19:33Z) - CoLLiE: Collaborative Training of Large Language Models in an Efficient
Way [59.09824823710863]
CoLLiE is an efficient library that facilitates collaborative training of large language models.
With its modular design and comprehensive functionality, CoLLiE offers a balanced blend of efficiency, ease of use, and customization.
arXiv Detail & Related papers (2023-12-01T08:02:16Z) - Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism [91.9372563527801]
Existing MoE models suffer from tremendous inner-node and inter-node communication overhead.
We propose a novel MoE architecture called Pipeline MoE (PPMoE) to tackle them.
PPMoE builds expert parallel incorporating with tensor parallel and replaces communication-intensive all-to-all dispatching and gathering.
arXiv Detail & Related papers (2023-04-22T14:09:14Z) - SWARM Parallelism: Training Large Models Can Be Surprisingly
Communication-Efficient [69.61083127540776]
Deep learning applications benefit from using large models with billions of parameters.
Training these models is notoriously expensive due to the need for specialized HPC clusters.
We consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions.
arXiv Detail & Related papers (2023-01-27T18:55:19Z) - Does compressing activations help model parallel training? [64.59298055364336]
We present the first empirical study on the effectiveness of compression methods for model parallelism.
We implement and evaluate three common classes of compression algorithms.
We evaluate these methods across more than 160 settings and 8 popular datasets.
arXiv Detail & Related papers (2023-01-06T18:58:09Z) - OneFlow: Redesign the Distributed Deep Learning Framework from Scratch [17.798586916628174]
OneFlow is a novel distributed training framework based on an SBP (split, broadcast and partial-value) abstraction and the actor model.
SBP enables much easier programming of data parallelism and model parallelism than existing frameworks.
OneFlow outperforms many well-known customized libraries built on top of the state-of-the-art frameworks.
arXiv Detail & Related papers (2021-10-28T11:32:14Z) - Model-Parallel Model Selection for Deep Learning Systems [0.0]
inefficiencies in machine learning (ML) training prevent practical usage of state-of-the-art models for most users.
Many ML practitioners have turned to model parallelism as a method of distributing the computational requirements across several devices.
We propose a new form of "shard parallelism" combining task and model parallelism, then package it into a framework we name Hydra.
arXiv Detail & Related papers (2021-07-14T03:20:37Z) - TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale
Language Models [60.23234205219347]
TeraPipe is a high-performance token-level pipeline parallel algorithm for synchronous model-parallel training of Transformer-based language models.
We show that TeraPipe can speed up the training by 5.0x for the largest GPT-3 model with 175 billion parameters on an AWS cluster.
arXiv Detail & Related papers (2021-02-16T07:34:32Z) - Scaling Distributed Deep Learning Workloads beyond the Memory Capacity
with KARMA [58.040931661693925]
We propose a strategy that combines redundant recomputing and out-of-core methods.
We achieve an average of 1.52x speedup in six different models over the state-of-the-art out-of-core methods.
Our data parallel out-of-core solution can outperform complex hybrid model parallelism in training large models, e.g. Megatron-LM and Turning-NLG.
arXiv Detail & Related papers (2020-08-26T07:24:34Z) - FlexServe: Deployment of PyTorch Models as Flexible REST Endpoints [6.730473762151365]
integration of artificial intelligence capabilities into modern software systems is increasingly being simplified through the use of cloud-based services and representational state transfer architecture.
Insufficient information regarding underlying model provenance and the lack of control over model evolution serve as an impediment to the more widespread adoption of these services in many operational environments which have strict security requirements.
arXiv Detail & Related papers (2020-02-29T18:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.