Pipeline Parallelism with Controllable Memory
- URL: http://arxiv.org/abs/2405.15362v4
- Date: Mon, 04 Nov 2024 03:18:13 GMT
- Title: Pipeline Parallelism with Controllable Memory
- Authors: Penghui Qi, Xinyi Wan, Nyamdavaa Amar, Min Lin,
- Abstract summary: We show that almost all existing pipeline schedules are memory inefficient.
We introduce a family of memory efficient building blocks with controllable activation memory.
We can achieve almost zero pipeline bubbles while maintaining the same activation memory as 1F1B.
- Score: 6.135123843073223
- License:
- Abstract: Pipeline parallelism has been widely explored, but most existing schedules lack a systematic methodology. In this paper, we propose a framework to decompose pipeline schedules as repeating a building block, and show that the lifespan of the building block decides the peak activation memory of the pipeline schedule. Guided by the observations, we find that almost all existing pipeline schedules, to the best of our knowledge, are memory inefficient. To address this, we introduce a family of memory efficient building blocks with controllable activation memory, which can reduce the peak activation memory to 1/2 of 1F1B without sacrificing efficiency, and even to 1/3 with comparable throughput. We can also achieve almost zero pipeline bubbles while maintaining the same activation memory as 1F1B. Our evaluations demonstrate that in pure pipeline parallelism settings, our methods outperform 1F1B by from 7% to 55% in terms of throughput. When employing a grid search over hybrid parallelism hyperparameters in practical scenarios, our methods demonstrate a 16% throughput improvement over the 1F1B baseline for large language models. The implementation is open-sourced at https://github.com/sail-sg/zero-bubble-pipeline-parallelism.
Related papers
- 2BP: 2-Stage Backpropagation [0.0]
This paper introduces 2-stage backpropagation (2BP)
By splitting the backward propagation step into two separate stages, we can reduce idle compute time.
Using 2BP, we were able to achieve a 1.70x increase in throughput compared to traditional methods.
arXiv Detail & Related papers (2024-05-28T11:02:01Z) - PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference [5.704297874096985]
PipeFusion partitions images into patches and the model layers across multiple GPU.
It employs a patch-level pipeline parallel strategy to orchestrate communication and computation efficiently.
arXiv Detail & Related papers (2024-05-23T11:00:07Z) - Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent
Weight Prediction [37.05698088730229]
Asynchronous pipeline model parallelism with a "1F1B" (one forward, one backward) schedule generates little bubble overhead.
"1F1B" schedule inevitably leads to weight inconsistency and weight staleness issues due to the cross-training of different mini-batches.
We propose an-dependent weight prediction strategy (a.k.a PipeOptim) for asynchronous pipeline training.
arXiv Detail & Related papers (2023-12-01T01:52:38Z) - Zero Bubble Pipeline Parallelism [6.7021820542657045]
Experimental evaluations show that our method outperforms the 1F1B schedule up to 23% in throughput under a similar memory limit.
We believe our results mark a major step forward in harnessing the true potential of pipeline parallelism.
arXiv Detail & Related papers (2023-11-30T10:40:34Z) - UniPT: Universal Parallel Tuning for Transfer Learning with Efficient
Parameter and Memory [69.33445217944029]
PETL is an effective strategy for adapting pre-trained models to downstream domains.
Recent PETL works focus on the more valuable memory-efficient characteristic.
We propose a new memory-efficient PETL strategy, Universal Parallel Tuning (UniPT)
arXiv Detail & Related papers (2023-08-28T05:38:43Z) - Pipe-BD: Pipelined Parallel Blockwise Distillation [7.367308544773381]
We propose Pipe-BD, a novel parallelization method for blockwise distillation.
Pipe-BD aggressively utilizes pipeline parallelism for blockwise distillation.
We implement Pipe-BD on PyTorch, and experiments reveal that Pipe-BD is effective on multiple scenarios, models, and datasets.
arXiv Detail & Related papers (2023-01-29T13:38:43Z) - RMM: Reinforced Memory Management for Class-Incremental Learning [102.20140790771265]
Class-Incremental Learning (CIL) trains classifiers under a strict memory budget.
Existing methods use a static and ad hoc strategy for memory allocation, which is often sub-optimal.
We propose a dynamic memory management strategy that is optimized for the incremental phases and different object classes.
arXiv Detail & Related papers (2023-01-14T00:07:47Z) - BiBERT: Accurate Fully Binarized BERT [69.35727280997617]
BiBERT is an accurate fully binarized BERT to eliminate the performance bottlenecks.
Our method yields impressive 56.3 times and 31.2 times saving on FLOPs and model size.
arXiv Detail & Related papers (2022-03-12T09:46:13Z) - Group Fisher Pruning for Practical Network Compression [58.25776612812883]
We present a general channel pruning approach that can be applied to various complicated structures.
We derive a unified metric based on Fisher information to evaluate the importance of a single channel and coupled channels.
Our method can be used to prune any structures including those with coupled channels.
arXiv Detail & Related papers (2021-08-02T08:21:44Z) - TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale
Language Models [60.23234205219347]
TeraPipe is a high-performance token-level pipeline parallel algorithm for synchronous model-parallel training of Transformer-based language models.
We show that TeraPipe can speed up the training by 5.0x for the largest GPT-3 model with 175 billion parameters on an AWS cluster.
arXiv Detail & Related papers (2021-02-16T07:34:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.