ActNN: Reducing Training Memory Footprint via 2-Bit Activation
Compressed Training
- URL: http://arxiv.org/abs/2104.14129v1
- Date: Thu, 29 Apr 2021 05:50:54 GMT
- Title: ActNN: Reducing Training Memory Footprint via 2-Bit Activation
Compressed Training
- Authors: Jianfei Chen, Lianmin Zheng, Zhewei Yao, Dequan Wang, Ion Stoica,
Michael W. Mahoney, Joseph E. Gonzalez
- Abstract summary: ActNN is a memory-efficient training framework that stores randomly quantized activations for back propagation.
ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
- Score: 68.63354877166756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The increasing size of neural network models has been critical for
improvements in their accuracy, but device memory is not growing at the same
rate. This creates fundamental challenges for training neural networks within
limited memory environments. In this work, we propose ActNN, a memory-efficient
training framework that stores randomly quantized activations for back
propagation. We prove the convergence of ActNN for general network
architectures, and we characterize the impact of quantization on the
convergence via an exact expression for the gradient variance. Using our
theory, we propose novel mixed-precision quantization strategies that exploit
the activation's heterogeneity across feature dimensions, samples, and layers.
These techniques can be readily applied to existing dynamic graph frameworks,
such as PyTorch, simply by substituting the layers. We evaluate ActNN on
mainstream computer vision models for classification, detection, and
segmentation tasks. On all these tasks, ActNN compresses the activation to 2
bits on average, with negligible accuracy loss. ActNN reduces the memory
footprint of the activation by 12x, and it enables training with a 6.6x to 14x
larger batch size.
Related papers
- Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors [4.95475852994362]
We propose a new form of quantization to tile neural network layers with sequences of bits to achieve sub-bit compression of binary-weighted neural networks.
We employ the approach to both fully-connected and convolutional layers, which make up the breadth of space in most neural architectures.
arXiv Detail & Related papers (2024-07-16T15:55:38Z) - Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients [51.82488018573326]
We present QP-SBGD, a novel layer-wise optimiser tailored towards training neural networks with binary weights.
BNNs reduce the computational requirements and energy consumption of deep learning models with minimal loss in accuracy.
Our algorithm is implemented layer-wise, making it suitable to train larger networks on resource-limited quantum hardware.
arXiv Detail & Related papers (2023-10-23T17:32:38Z) - Towards Zero Memory Footprint Spiking Neural Network Training [7.4331790419913455]
Spiking Neural Networks (SNNs) process information using discrete-time events known as spikes rather than continuous values.
In this paper, we introduce an innovative framework characterized by a remarkably low memory footprint.
Our design is able to achieve a $mathbf58.65times$ reduction in memory usage compared to the current SNN node.
arXiv Detail & Related papers (2023-08-16T19:49:24Z) - Deep Multi-Threshold Spiking-UNet for Image Processing [51.88730892920031]
This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture.
To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy.
Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart.
arXiv Detail & Related papers (2023-07-20T16:00:19Z) - Nesting Forward Automatic Differentiation for Memory-Efficient Deep
Neural Network Training [23.536294640280087]
We propose the nested forward automatic differentiation (Forward-AD) for the element-wise activation function for memory-efficient training.
Our evaluation shows that nested Forward-AD reduces the memory footprint up to 1.97x than the baseline model.
arXiv Detail & Related papers (2022-09-22T04:48:48Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - GACT: Activation Compressed Training for General Architectures [37.25798396630518]
Activation Compressed Training (ACT) is a promising approach to reduce training memory footprint.
GACT reduces the activation memory for convolutional NNs, transformers, and graph NNs by up to 8.1x, enabling training with a 4.2x to 24.7x larger batch size.
arXiv Detail & Related papers (2022-06-22T20:06:23Z) - Edge Inference with Fully Differentiable Quantized Mixed Precision
Neural Networks [1.131071436917293]
Quantizing parameters and operations to lower bit-precision offers substantial memory and energy savings for neural network inference.
This paper proposes a new quantization approach for mixed precision convolutional neural networks (CNNs) targeting edge-computing.
arXiv Detail & Related papers (2022-06-15T18:11:37Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.