PaReprop: Fast Parallelized Reversible Backpropagation
- URL: http://arxiv.org/abs/2306.09342v1
- Date: Thu, 15 Jun 2023 17:59:32 GMT
- Title: PaReprop: Fast Parallelized Reversible Backpropagation
- Authors: Tyler Zhu and Karttikeya Mangalam
- Abstract summary: Reversible transformers have been introduced as an exciting new method for extremely memory-efficient training.
They come with an additional computation overhead of activation re-computation in the backpropagation phase.
We present PaReprop, a fast Parallelized Reversible Backpropagation algorithm.
- Score: 6.901732343162485
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The growing size of datasets and deep learning models has made faster and
memory-efficient training crucial. Reversible transformers have recently been
introduced as an exciting new method for extremely memory-efficient training,
but they come with an additional computation overhead of activation
re-computation in the backpropagation phase. We present PaReprop, a fast
Parallelized Reversible Backpropagation algorithm that parallelizes the
additional activation re-computation overhead in reversible training with the
gradient computation itself in backpropagation phase. We demonstrate the
effectiveness of the proposed PaReprop algorithm through extensive benchmarking
across model families (ViT, MViT, Swin and RoBERTa), data modalities (Vision &
NLP), model sizes (from small to giant), and training batch sizes. Our
empirical results show that PaReprop achieves up to 20% higher training
throughput than vanilla reversible training, largely mitigating the theoretical
overhead of 25% lower throughput from activation recomputation in reversible
training. Project page: https://tylerzhu.com/pareprop.
Related papers
- PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation [61.57833648734164]
We propose a novel Parallel Yielding Re-Activation (PYRA) method for training-inference efficient task adaptation.
PYRA outperforms all competing methods under both low compression rate and high compression rate.
arXiv Detail & Related papers (2024-03-14T09:06:49Z) - Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs [22.557682089926004]
We show that updating a small subset of parameters can suffice to recover or even enhance performance after pruning.
We introduce two novel LoRA variants that, unlike standard LoRA, allow merging adapters back without compromising sparsity.
arXiv Detail & Related papers (2023-12-23T11:45:22Z) - Towards Memory- and Time-Efficient Backpropagation for Training Spiking
Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing.
We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency.
Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z) - Towards Vision Transformer Unrolling Fixed-Point Algorithm: a Case Study
on Image Restoration [21.79667520132755]
We propose a framework to unroll the FP and approximate each unrolled process via Transformer blocks, called FPformer.
In order to fully exploit the capability of the Transformer, we apply the proposed model to image restoration, using self-supervised pre-training and supervised fine-tuning.
Using self-supervised pre-training and supervised fine-tuning, the proposed FPformer, FPRformer, and FPAformer achieve competitive performance with state-of-the-art image restoration methods and better training efficiency.
arXiv Detail & Related papers (2023-01-29T02:59:14Z) - Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution.
Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x.
We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z) - Mesa: A Memory-saving Training Framework for Transformers [58.78933015299703]
We present Mesa, a memory-saving training framework for Transformers.
Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training.
Experiments on ImageNet, CIFAR-100 and ADE20K demonstrate that Mesa can reduce half of the memory footprints during training.
arXiv Detail & Related papers (2021-11-22T11:23:01Z) - Improving Computational Efficiency in Visual Reinforcement Learning via
Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER)
SEER is a simple modification of existing off-policy deep reinforcement learning methods.
We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z) - Dithered backprop: A sparse and quantized backpropagation algorithm for
more efficient deep neural network training [18.27946970159625]
We propose a method for reducing the computational cost of backprop, which we named dithered backprop.
We show that our method is fully compatible to state-of-the-art training methods that reduce the bit-precision of training down to 8-bits.
arXiv Detail & Related papers (2020-04-09T17:59:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.