MetaGrad: Adaptive Gradient Quantization with Hypernetworks
- URL: http://arxiv.org/abs/2303.02347v2
- Date: Wed, 1 Nov 2023 02:31:05 GMT
- Title: MetaGrad: Adaptive Gradient Quantization with Hypernetworks
- Authors: Kaixin Xu, Alina Hui Xiu Lee, Ziyuan Zhao, Zhe Wang, Min Wu, Weisi Lin
- Abstract summary: Quantization aware Training (QAT) accelerates the forward pass during the neural network training and inference.
In this work, we propose to solve this problem by incorporating the gradients into the computation graph of the next training via a hypernetwork.
Various experiments on CIFAR-10 dataset with different CNN network architectures demonstrate that our hypernetwork-based approach can effectively reduce the negative effect of gradient quantization noise.
- Score: 46.55625589293897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A popular track of network compression approach is Quantization aware
Training (QAT), which accelerates the forward pass during the neural network
training and inference. However, not much prior efforts have been made to
quantize and accelerate the backward pass during training, even though that
contributes around half of the training time. This can be partly attributed to
the fact that errors of low-precision gradients during backward cannot be
amortized by the training objective as in the QAT setting. In this work, we
propose to solve this problem by incorporating the gradients into the
computation graph of the next training iteration via a hypernetwork. Various
experiments on CIFAR-10 dataset with different CNN network architectures
demonstrate that our hypernetwork-based approach can effectively reduce the
negative effect of gradient quantization noise and successfully quantizes the
gradients to INT4 with only 0.64 accuracy drop for VGG-16 on CIFAR-10.
Related papers
- 1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit [41.993927897814785]
Fully quantized training (FQT) accelerates the training of deep neural networks by quantizing the activations, weights, and gradients into lower precision.
We make a first attempt to 1-bit FQT to explore the ultimate limit of FQT (the lowest achievable precision)
arXiv Detail & Related papers (2024-08-26T13:42:43Z) - One-Step Forward and Backtrack: Overcoming Zig-Zagging in Loss-Aware
Quantization Training [12.400950982075948]
Weight quantization is an effective technique to compress deep neural networks for their deployment on edge devices with limited resources.
Traditional loss-aware quantization methods commonly use the quantized gradient to replace the full-precision gradient.
This paper proposes a one-step forward and backtrack way for loss-aware quantization to get more accurate and stable gradient direction.
arXiv Detail & Related papers (2024-01-30T05:42:54Z) - Take A Shortcut Back: Mitigating the Gradient Vanishing for Training Spiking Neural Networks [15.691263438655842]
Spiking Neural Network (SNN) is a biologically inspired neural network infrastructure that has recently garnered significant attention.
Training an SNN directly poses a challenge due to the undefined gradient of the firing spike process.
We propose a shortcut back-propagation method in our paper, which advocates for transmitting the gradient directly from the loss to the shallow layers.
arXiv Detail & Related papers (2024-01-09T10:54:41Z) - Balance is Essence: Accelerating Sparse Training via Adaptive Gradient
Correction [29.61757744974324]
Deep neural networks require significant memory and computation costs.
Sparse training is one of the most common techniques to reduce these costs.
In this work, we aim to overcome this problem and achieve space-time co-efficiency.
arXiv Detail & Related papers (2023-01-09T18:50:03Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - A Statistical Framework for Low-bitwidth Training of Deep Neural
Networks [70.77754244060384]
Fully quantized training (FQT) uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model.
One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties.
arXiv Detail & Related papers (2020-10-27T13:57:33Z) - Variance Reduction for Deep Q-Learning using Stochastic Recursive
Gradient [51.880464915253924]
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance.
This paper introduces the framework for updating the gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
arXiv Detail & Related papers (2020-07-25T00:54:20Z) - Towards Unified INT8 Training for Convolutional Neural Network [83.15673050981624]
We build a unified 8-bit (INT8) training framework for common convolutional neural networks.
First, we empirically find the four distinctive characteristics of gradients, which provide us insightful clues for gradient quantization.
We propose two universal techniques, including Direction Sensitive Gradient Clipping that reduces the direction deviation of gradients.
arXiv Detail & Related papers (2019-12-29T08:37:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.