Compressing the Backward Pass of Large-Scale Neural Architectures by
Structured Activation Pruning
- URL: http://arxiv.org/abs/2311.16883v2
- Date: Wed, 29 Nov 2023 14:41:36 GMT
- Title: Compressing the Backward Pass of Large-Scale Neural Architectures by
Structured Activation Pruning
- Authors: Daniel Barley, Holger Fr\"oning
- Abstract summary: Sparsity in Deep Neural Networks (DNNs) has gained attention as a solution.
This work focuses on ephemeral sparsity, aiming to reduce memory consumption during training.
We report the effectiveness of activation pruning by evaluating training speed, accuracy, and memory usage of large-scale neural architectures.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The rise of Deep Neural Networks (DNNs) has led to an increase in model size
and complexity, straining the memory capacity of GPUs. Sparsity in DNNs,
characterized as structural or ephemeral, has gained attention as a solution.
This work focuses on ephemeral sparsity, aiming to reduce memory consumption
during training. It emphasizes the significance of activations, an often
overlooked component, and their role in memory usage. This work employs
structured pruning in Block Sparse Compressed Row (BSR) format in combination
with a magnitude-based criterion to efficiently prune activations. We
furthermore introduce efficient block-sparse operators for GPUs and showcase
their effectiveness, as well as the superior compression offered by block
sparsity. We report the effectiveness of activation pruning by evaluating
training speed, accuracy, and memory usage of large-scale neural architectures
on the example of ResMLP on image classification tasks. As a result, we observe
a memory reduction of up to 32% while maintaining accuracy. Ultimately, our
approach aims to democratize large-scale model training, reduce GPU
requirements, and address ecological concerns.
Related papers
- HASN: Hybrid Attention Separable Network for Efficient Image Super-resolution [5.110892180215454]
lightweight methods for single image super-resolution achieved impressive performance due to limited hardware resources.
We find that using residual connections after each block increases the model's storage and computational cost.
We use depthwise separable convolutions, fully connected layers, and activation functions as the basic feature extraction modules.
arXiv Detail & Related papers (2024-10-13T14:00:21Z) - Less Memory Means smaller GPUs: Backpropagation with Compressed Activations [1.7065506903618906]
The ever-growing scale of deep neural networks (DNNs) has lead to an equally rapid growth in computational resource requirements.
Many recent architectures, most prominently Large Language Models, have to be trained using supercomputers with thousands of accelerators.
With this approach we are able to reduce the peak memory consumption by 29% at the cost of a longer training schedule.
arXiv Detail & Related papers (2024-09-18T11:57:05Z) - Topology-aware Embedding Memory for Continual Learning on Expanding Networks [63.35819388164267]
We present a framework to tackle the memory explosion problem using memory replay techniques.
PDGNNs with Topology-aware Embedding Memory (TEM) significantly outperform state-of-the-art techniques.
arXiv Detail & Related papers (2024-01-24T03:03:17Z) - Activation Compression of Graph Neural Networks using Block-wise
Quantization with Improved Variance Minimization [0.21756081703275998]
We present an improvement to the EXACT strategy by using block-wise quantization of the intermediate activation maps.
We show further reduction in memory consumption (>15%), and runtime speedup per epoch (about 5%) even when performing extreme extents of quantization.
arXiv Detail & Related papers (2023-09-21T07:59:08Z) - FPGA Resource-aware Structured Pruning for Real-Time Neural Networks [3.294652922898631]
Pruning sparsifies a neural network, reducing the number of multiplications and memory.
We propose a hardware-centric formulation of pruning, by formulating it as a knapsack problem with resource-aware tensor structures.
Proposed method achieves reductions ranging between 55% and 92% in the DSP utilization and up to 81% in BRAM utilization.
arXiv Detail & Related papers (2023-08-09T18:14:54Z) - GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction.
These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization.
We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z) - Mesa: A Memory-saving Training Framework for Transformers [58.78933015299703]
We present Mesa, a memory-saving training framework for Transformers.
Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training.
Experiments on ImageNet, CIFAR-100 and ADE20K demonstrate that Mesa can reduce half of the memory footprints during training.
arXiv Detail & Related papers (2021-11-22T11:23:01Z) - ActNN: Reducing Training Memory Footprint via 2-Bit Activation
Compressed Training [68.63354877166756]
ActNN is a memory-efficient training framework that stores randomly quantized activations for back propagation.
ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
arXiv Detail & Related papers (2021-04-29T05:50:54Z) - Improving Computational Efficiency in Visual Reinforcement Learning via
Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER)
SEER is a simple modification of existing off-policy deep reinforcement learning methods.
We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.