Stochastic Backpropagation: A Memory Efficient Strategy for Training
Video Models
- URL: http://arxiv.org/abs/2203.16755v1
- Date: Thu, 31 Mar 2022 02:24:53 GMT
- Title: Stochastic Backpropagation: A Memory Efficient Strategy for Training
Video Models
- Authors: Feng Cheng, Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Li, Wei
Xia
- Abstract summary: We propose a memory efficient method, named Backpropagation (SBP), for training deep neural networks on videos.
Experiments show that SBP can be applied to a wide range of models for video tasks, leading to up to 80.0% GPU memory saving and 10% training speedup with less than 1% accuracy drop on action recognition and temporal action detection.
- Score: 42.31924917984774
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a memory efficient method, named Stochastic Backpropagation (SBP),
for training deep neural networks on videos. It is based on the finding that
gradients from incomplete execution for backpropagation can still effectively
train the models with minimal accuracy loss, which attributes to the high
redundancy of video. SBP keeps all forward paths but randomly and independently
removes the backward paths for each network layer in each training step. It
reduces the GPU memory cost by eliminating the need to cache activation values
corresponding to the dropped backward paths, whose amount can be controlled by
an adjustable keep-ratio. Experiments show that SBP can be applied to a wide
range of models for video tasks, leading to up to 80.0% GPU memory saving and
10% training speedup with less than 1% accuracy drop on action recognition and
temporal action detection.
Related papers
- Block Selective Reprogramming for On-device Training of Vision Transformers [12.118303034660531]
We present block selective reprogramming (BSR) in which we fine-tune only a fraction of total blocks of a pre-trained model.
Compared to the existing alternatives, our approach simultaneously reduces training memory by up to 1.4x and compute cost by up to 2x.
arXiv Detail & Related papers (2024-03-25T08:41:01Z) - DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation [13.768426626459558]
We propose Dropping Backward Propagation (DropBP) to reduce computational costs and activation memory while maintaining accuracy.
DropBP randomly drops layers during backward propagation, which is essentially equivalent to training shallow submodules.
It can reduce training time by 44% with comparable accuracy to the baseline, accelerate convergence to the same perplexity by 1.5x, and enable training with a sequence length 6.2x larger on a single NVIDIA-A100 GPU.
arXiv Detail & Related papers (2024-02-27T14:51:11Z) - Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal
Action Localization [65.33914980022303]
Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content.
Most methods can only train on pre-extracted features without optimizing them for the localization problem.
We propose a novel end-to-end method Re2TAL, which rewires pretrained video backbones for reversible TAL.
arXiv Detail & Related papers (2022-11-25T12:17:30Z) - An In-depth Study of Stochastic Backpropagation [44.953669040828345]
We study Backpropagation (SBP) when training deep neural networks for standard image classification and object detection tasks.
During backward propagation, SBP calculates gradients by only using a subset of feature maps to save the GPU memory and computational cost.
Experiments on image classification and object detection show that SBP can save up to 40% of GPU memory with less than 1% accuracy.
arXiv Detail & Related papers (2022-09-30T23:05:06Z) - On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory.
Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z) - Mesa: A Memory-saving Training Framework for Transformers [58.78933015299703]
We present Mesa, a memory-saving training framework for Transformers.
Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training.
Experiments on ImageNet, CIFAR-100 and ADE20K demonstrate that Mesa can reduce half of the memory footprints during training.
arXiv Detail & Related papers (2021-11-22T11:23:01Z) - GDP: Stabilized Neural Network Pruning via Gates with Differentiable
Polarization [84.57695474130273]
Gate-based or importance-based pruning methods aim to remove channels whose importance is smallest.
GDP can be plugged before convolutional layers without bells and whistles, to control the on-and-off of each channel.
Experiments conducted over CIFAR-10 and ImageNet datasets show that the proposed GDP achieves the state-of-the-art performance.
arXiv Detail & Related papers (2021-09-06T03:17:10Z) - Improving compute efficacy frontiers with SliceOut [31.864949424541344]
We introduce SliceOut -- a dropout-inspired scheme to train deep learning models faster without impacting final test accuracy.
At test time, turning off SliceOut performs an implicit ensembling across a linear number of architectures that preserves test accuracy.
This leads to faster processing of large computational workloads overall, and significantly reduce the resulting energy consumption and CO2emissions.
arXiv Detail & Related papers (2020-07-21T15:59:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.