Distribution Adaptive INT8 Quantization for Training CNNs
- URL: http://arxiv.org/abs/2102.04782v1
- Date: Tue, 9 Feb 2021 11:58:10 GMT
- Title: Distribution Adaptive INT8 Quantization for Training CNNs
- Authors: Kang Zhao, Sida Huang, Pan Pan, Yinghan Li, Yingya Zhang, Zhenyu Gu,
Yinghui Xu
- Abstract summary: In this paper, we propose a novel INT8 quantization training framework for convolutional neural network.
Specifically, we adopt Gradient Vectorized Quantization to quantize the gradient, based on the observation that layer-wise gradients contain multiple distributions along the channel dimension.
Then, Magnitude-aware Clipping Strategy is introduced by taking the magnitudes of gradients into consideration when minimizing the quantization error.
- Score: 12.708068468737286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Researches have demonstrated that low bit-width (e.g., INT8) quantization can
be employed to accelerate the inference process. It makes the gradient
quantization very promising since the backward propagation requires
approximately twice more computation than forward one. Due to the variability
and uncertainty of gradient distribution, a lot of methods have been proposed
to attain training stability. However, most of them ignore the channel-wise
gradient distributions and the impact of gradients with different magnitudes,
resulting in the degradation of final accuracy. In this paper, we propose a
novel INT8 quantization training framework for convolutional neural network to
address the above issues. Specifically, we adopt Gradient Vectorized
Quantization to quantize the gradient, based on the observation that layer-wise
gradients contain multiple distributions along the channel dimension. Then,
Magnitude-aware Clipping Strategy is introduced by taking the magnitudes of
gradients into consideration when minimizing the quantization error, and we
present a theoretical derivation to solve the quantization parameters of
different distributions. Experimental results on broad range of computer vision
tasks, such as image classification, object detection and video classification,
demonstrate that the proposed Distribution Adaptive INT8 Quantization training
method has achieved almost lossless training accuracy for different backbones,
including ResNet, MobileNetV2, InceptionV3, VGG and AlexNet, which is superior
to the state-of-the-art techniques. Moreover, we further implement the INT8
kernel that can accelerate the training iteration more than 200% under the
latest Turing architecture, i.e., our method excels on both training accuracy
and speed.
Related papers
- Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients [24.973203825917906]
We show that lowering the error for large-magnitude gradients boosts the quantization performance significantly.
We also introduce an interval update algorithm that adjusts the quantization interval adaptively to maintain a small quantization error for large gradients.
arXiv Detail & Related papers (2024-07-17T15:06:12Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Neural Networks with Quantization Constraints [111.42313650830248]
We present a constrained learning approach to quantization training.
We show that the resulting problem is strongly dual and does away with gradient estimations.
We demonstrate that the proposed approach exhibits competitive performance in image classification tasks.
arXiv Detail & Related papers (2022-10-27T17:12:48Z) - DAQ: Distribution-Aware Quantization for Deep Image Super-Resolution
Networks [49.191062785007006]
Quantizing deep convolutional neural networks for image super-resolution substantially reduces their computational costs.
Existing works either suffer from a severe performance drop in ultra-low precision of 4 or lower bit-widths, or require a heavy fine-tuning process to recover the performance.
We propose a novel distribution-aware quantization scheme (DAQ) which facilitates accurate training-free quantization in ultra-low precision.
arXiv Detail & Related papers (2020-12-21T10:19:42Z) - A Statistical Framework for Low-bitwidth Training of Deep Neural
Networks [70.77754244060384]
Fully quantized training (FQT) uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model.
One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties.
arXiv Detail & Related papers (2020-10-27T13:57:33Z) - QuantNet: Learning to Quantize by Learning within Fully Differentiable
Framework [32.465949985191635]
This paper proposes a meta-based quantizer named QuantNet, which utilizes a differentiable sub-network to directly binarize the full-precision weights.
Our method not only solves the problem of gradient mismatching, but also reduces the impact of discretization errors, caused by the binarizing operation in the deployment.
arXiv Detail & Related papers (2020-09-10T01:41:05Z) - EasyQuant: Post-training Quantization via Scale Optimization [15.443708111143412]
The 8 bits quantization has been widely applied to accelerate network inference in various deep learning applications.
There are two kinds of quantization methods, training-based quantization and post-training quantization.
arXiv Detail & Related papers (2020-06-30T10:43:02Z) - Optimal Gradient Quantization Condition for Communication-Efficient
Distributed Training [99.42912552638168]
Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications.
In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution.
Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
arXiv Detail & Related papers (2020-02-25T18:28:39Z) - Towards Unified INT8 Training for Convolutional Neural Network [83.15673050981624]
We build a unified 8-bit (INT8) training framework for common convolutional neural networks.
First, we empirically find the four distinctive characteristics of gradients, which provide us insightful clues for gradient quantization.
We propose two universal techniques, including Direction Sensitive Gradient Clipping that reduces the direction deviation of gradients.
arXiv Detail & Related papers (2019-12-29T08:37:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.