Related papers: ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

URL: http://arxiv.org/abs/2104.14129v1
Date: Thu, 29 Apr 2021 05:50:54 GMT
Title: ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
Authors: Jianfei Chen, Lianmin Zheng, Zhewei Yao, Dequan Wang, Ion Stoica, Michael W. Mahoney, Joseph E. Gonzalez
Abstract summary: ActNN is a memory-efficient training framework that stores randomly quantized activations for back propagation. ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
Score: 68.63354877166756
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The increasing size of neural network models has been critical for improvements in their accuracy, but device memory is not growing at the same rate. This creates fundamental challenges for training neural networks within limited memory environments. In this work, we propose ActNN, a memory-efficient training framework that stores randomly quantized activations for back propagation. We prove the convergence of ActNN for general network architectures, and we characterize the impact of quantization on the convergence via an exact expression for the gradient variance. Using our theory, we propose novel mixed-precision quantization strategies that exploit the activation's heterogeneity across feature dimensions, samples, and layers. These techniques can be readily applied to existing dynamic graph frameworks, such as PyTorch, simply by substituting the layers. We evaluate ActNN on mainstream computer vision models for classification, detection, and segmentation tasks. On all these tasks, ActNN compresses the activation to 2 bits on average, with negligible accuracy loss. ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.

Related papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation [2.649410674489787]
Spiking Neural Networks (SNNs) are highly energy-efficient during inference. Their ability to process event-driven inputs, such as data from dynamic vision sensors (DVS), further enhances their applicability to edge computing tasks. We present the first benchmark for the DVS gesture recognition task using SNNs optimized for the many-core neuromorphic chip SpiNNaker2.
arXiv Detail & Related papers (2025-04-09T10:09:29Z)
Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors [4.95475852994362]
We propose a new form of quantization to tile neural network layers with sequences of bits to achieve sub-bit compression of binary-weighted neural networks. We employ the approach to both fully-connected and convolutional layers, which make up the breadth of space in most neural architectures.
arXiv Detail & Related papers (2024-07-16T15:55:38Z)
Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients [51.82488018573326]
We present QP-SBGD, a novel layer-wise optimiser tailored towards training neural networks with binary weights. BNNs reduce the computational requirements and energy consumption of deep learning models with minimal loss in accuracy. Our algorithm is implemented layer-wise, making it suitable to train larger networks on resource-limited quantum hardware.
arXiv Detail & Related papers (2023-10-23T17:32:38Z)
Towards Zero Memory Footprint Spiking Neural Network Training [7.4331790419913455]
Spiking Neural Networks (SNNs) process information using discrete-time events known as spikes rather than continuous values. In this paper, we introduce an innovative framework characterized by a remarkably low memory footprint. Our design is able to achieve a $mathbf58.65times$ reduction in memory usage compared to the current SNN node.
arXiv Detail & Related papers (2023-08-16T19:49:24Z)
Deep Multi-Threshold Spiking-UNet for Image Processing [51.88730892920031]
This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture. To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy. Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart.
arXiv Detail & Related papers (2023-07-20T16:00:19Z)
Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training [23.536294640280087]
We propose the nested forward automatic differentiation (Forward-AD) for the element-wise activation function for memory-efficient training. Our evaluation shows that nested Forward-AD reduces the memory footprint up to 1.97x than the baseline model.
arXiv Detail & Related papers (2022-09-22T04:48:48Z)
BiTAT: Neural Network Binarization with Task-dependent Aggregated Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation. Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration. This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z)
GACT: Activation Compressed Training for General Architectures [37.25798396630518]
Activation Compressed Training (ACT) is a promising approach to reduce training memory footprint. GACT reduces the activation memory for convolutional NNs, transformers, and graph NNs by up to 8.1x, enabling training with a 4.2x to 24.7x larger batch size.
arXiv Detail & Related papers (2022-06-22T20:06:23Z)
Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks [1.131071436917293]
Quantizing parameters and operations to lower bit-precision offers substantial memory and energy savings for neural network inference. This paper proposes a new quantization approach for mixed precision convolutional neural networks (CNNs) targeting edge-computing.
arXiv Detail & Related papers (2022-06-15T18:11:37Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data. In this paper, we present and evaluate different strategies for the binarization of graph neural networks. We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.