Related papers: torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models

torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models

URL: http://arxiv.org/abs/2004.09910v1
Date: Tue, 21 Apr 2020 11:27:00 GMT
Title: torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models
Authors: Chiheon Kim, Heungsub Lee, Myungryong Jeong, Woonhyuk Baek, Boogeon Yoon, Ildoo Kim, Sungbin Lim, Sungwoong Kim
Abstract summary: We design and implement a ready-to-use library in PyTorch for performing micro-batch pipeline parallelism with checkpointing proposed by GPipe. We show that each component is necessary to fully benefit from pipeline parallelism in such environment, and demonstrate the efficiency of the library.
Score: 19.024035785367044
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We design and implement a ready-to-use library in PyTorch for performing micro-batch pipeline parallelism with checkpointing proposed by GPipe (Huang et al., 2019). In particular, we develop a set of design components to enable pipeline-parallel gradient computation in PyTorch's define-by-run and eager execution environment. We show that each component is necessary to fully benefit from pipeline parallelism in such environment, and demonstrate the efficiency of the library by applying it to various network architectures including AmoebaNet-D and U-Net. Our library is available at https://github.com/kakaobrain/torchgpipe .

Related papers

Scaling Deep Learning Training with MPMD Pipeline Parallelism [0.5817641705019472]
JaxPP is a system for efficiently scaling the training of large deep learning models with flexible pipeline parallelism. We introduce a seamless programming model that allows implementing user-defined pipeline schedules for gradient accumulation. JaxPP automatically distributes tasks, corresponding to pipeline stages, over a cluster of nodes and automatically infers the communication among them.
arXiv Detail & Related papers (2024-12-18T22:15:11Z)
Pipeline Parallelism with Controllable Memory [6.135123843073223]
We show that almost all existing pipeline schedules are memory inefficient. We introduce a family of memory efficient building blocks with controllable activation memory. We can achieve almost zero pipeline bubbles while maintaining the same activation memory as 1F1B.
arXiv Detail & Related papers (2024-05-24T08:54:36Z)
torchgfn: A PyTorch GFlowNet library [56.071033896777784]
torchgfn is a PyTorch library that aims to address this need. It provides users with a simple API for environments and useful abstractions for samplers and losses.
arXiv Detail & Related papers (2023-05-24T00:20:59Z)
Deep Pipeline Embeddings for AutoML [11.168121941015015]
AutoML is a promising direction for democratizing AI by automatically deploying Machine Learning systems with minimal human expertise. Existing Pipeline Optimization techniques fail to explore deep interactions between pipeline stages/components. This paper proposes a novel neural architecture that captures the deep interaction between the components of a Machine Learning pipeline.
arXiv Detail & Related papers (2023-05-23T12:40:38Z)
Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism [91.9372563527801]
Existing MoE models suffer from tremendous inner-node and inter-node communication overhead. We propose a novel MoE architecture called Pipeline MoE (PPMoE) to tackle them. PPMoE builds expert parallel incorporating with tensor parallel and replaces communication-intensive all-to-all dispatching and gathering.
arXiv Detail & Related papers (2023-04-22T14:09:14Z)
Trieste: Efficiently Exploring The Depths of Black-box Functions with TensorFlow [50.691232400959656]
Trieste is an open-source Python package for Bayesian optimization and active learning. Our library enables the plug-and-play of popular models within sequential decision-making loops.
arXiv Detail & Related papers (2023-02-16T17:21:49Z)
Continual Inference: A Library for Efficient Online Inference with Deep Neural Networks in PyTorch [97.03321382630975]
Continual Inference is a Python library for implementing Continual Inference Networks (CINs) in PyTorch. We offer a comprehensive introduction to CINs and their implementation in practice, and provide best-practices and code examples for composing complex modules for modern Deep Learning.
arXiv Detail & Related papers (2022-04-07T13:03:09Z)
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models [60.23234205219347]
TeraPipe is a high-performance token-level pipeline parallel algorithm for synchronous model-parallel training of Transformer-based language models. We show that TeraPipe can speed up the training by 5.0x for the largest GPT-3 model with 175 billion parameters on an AWS cluster.
arXiv Detail & Related papers (2021-02-16T07:34:32Z)
BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training [9.551339069298011]
BaPipe is a pipeline parallelism training framework for distributed deep learning. It automatically explores pipeline parallelism training methods and balanced partition strategies for distributed training. BaPipe provides up to 3.2x speedup and 4x memory reduction in various platforms.
arXiv Detail & Related papers (2020-12-23T08:57:39Z)
Fully Convolutional Networks for Panoptic Segmentation [91.84686839549488]
We present a conceptually simple, strong, and efficient framework for panoptic segmentation, called Panoptic FCN. Our approach aims to represent and predict foreground things and background stuff in a unified fully convolutional pipeline. Panoptic FCN encodes each object instance or stuff category into a specific kernel weight with the proposed kernel generator.
arXiv Detail & Related papers (2020-12-01T18:31:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.