Towards Unified INT8 Training for Convolutional Neural Network
- URL: http://arxiv.org/abs/1912.12607v1
- Date: Sun, 29 Dec 2019 08:37:53 GMT
- Title: Towards Unified INT8 Training for Convolutional Neural Network
- Authors: Feng Zhu, Ruihao Gong, Fengwei Yu, Xianglong Liu, Yanfei Wang, Zhelong
Li, Xiuqi Yang, Junjie Yan
- Abstract summary: We build a unified 8-bit (INT8) training framework for common convolutional neural networks.
First, we empirically find the four distinctive characteristics of gradients, which provide us insightful clues for gradient quantization.
We propose two universal techniques, including Direction Sensitive Gradient Clipping that reduces the direction deviation of gradients.
- Score: 83.15673050981624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently low-bit (e.g., 8-bit) network quantization has been extensively
studied to accelerate the inference. Besides inference, low-bit training with
quantized gradients can further bring more considerable acceleration, since the
backward process is often computation-intensive. Unfortunately, the
inappropriate quantization of backward propagation usually makes the training
unstable and even crash. There lacks a successful unified low-bit training
framework that can support diverse networks on various tasks. In this paper, we
give an attempt to build a unified 8-bit (INT8) training framework for common
convolutional neural networks from the aspects of both accuracy and speed.
First, we empirically find the four distinctive characteristics of gradients,
which provide us insightful clues for gradient quantization. Then, we
theoretically give an in-depth analysis of the convergence bound and derive two
principles for stable INT8 training. Finally, we propose two universal
techniques, including Direction Sensitive Gradient Clipping that reduces the
direction deviation of gradients and Deviation Counteractive Learning Rate
Scaling that avoids illegal gradient update along the wrong direction. The
experiments show that our unified solution promises accurate and efficient INT8
training for a variety of networks and tasks, including MobileNetV2,
InceptionV3 and object detection that prior studies have never succeeded.
Moreover, it enjoys a strong flexibility to run on off-the-shelf hardware, and
reduces the training time by 22% on Pascal GPU without too much optimization
effort. We believe that this pioneering study will help lead the community
towards a fully unified INT8 training for convolutional neural networks.
Related papers
- Enabling On-device Continual Learning with Binary Neural Networks [3.180732240499359]
We propose a solution that combines recent advancements in the field of Continual Learning (CL) and Binary Neural Networks (BNNs)
Specifically, our approach leverages binary latent replay activations and a novel quantization scheme that significantly reduces the number of bits required for gradient computation.
arXiv Detail & Related papers (2024-01-18T11:57:05Z) - Stable and low-precision training for large-scale vision-language models [108.62077651227607]
We introduce new methods for accelerating and stabilizing training for large language-vision models.
For acceleration, we introduce SwitchBack, a linear layer for int8 quantized training which provides a speed-up of 13-25%.
For stability, we analyze loss spikes and find they consistently occur 1-8 after the squared gradients become under-estimated.
arXiv Detail & Related papers (2023-04-25T17:38:18Z) - MetaGrad: Adaptive Gradient Quantization with Hypernetworks [46.55625589293897]
Quantization aware Training (QAT) accelerates the forward pass during the neural network training and inference.
In this work, we propose to solve this problem by incorporating the gradients into the computation graph of the next training via a hypernetwork.
Various experiments on CIFAR-10 dataset with different CNN network architectures demonstrate that our hypernetwork-based approach can effectively reduce the negative effect of gradient quantization noise.
arXiv Detail & Related papers (2023-03-04T07:26:34Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Gradient Descent on Neural Networks Typically Occurs at the Edge of
Stability [94.4070247697549]
Full-batch gradient descent on neural network training objectives operates in a regime we call the Edge of Stability.
In this regime, the maximum eigenvalue of the training loss Hessian hovers just above the numerical value $2 / text(step size)$, and the training loss behaves non-monotonically over short timescales, yet consistently decreases over long timescales.
arXiv Detail & Related papers (2021-02-26T22:08:19Z) - Distribution Adaptive INT8 Quantization for Training CNNs [12.708068468737286]
In this paper, we propose a novel INT8 quantization training framework for convolutional neural network.
Specifically, we adopt Gradient Vectorized Quantization to quantize the gradient, based on the observation that layer-wise gradients contain multiple distributions along the channel dimension.
Then, Magnitude-aware Clipping Strategy is introduced by taking the magnitudes of gradients into consideration when minimizing the quantization error.
arXiv Detail & Related papers (2021-02-09T11:58:10Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z) - FrostNet: Towards Quantization-Aware Network Architecture Search [8.713741951284886]
We present a new network architecture search (NAS) procedure to find a network that guarantees both full-precision (FLOAT32) and quantized (INT8) performances.
Our FrostNets achieve higher recognition accuracy than existing CNNs with comparable latency when quantized.
arXiv Detail & Related papers (2020-06-17T06:40:43Z) - Binary Neural Networks: A Survey [126.67799882857656]
The binary neural network serves as a promising technique for deploying deep models on resource-limited devices.
The binarization inevitably causes severe information loss, and even worse, its discontinuity brings difficulty to the optimization of the deep network.
We present a survey of these algorithms, mainly categorized into the native solutions directly conducting binarization, and the optimized ones using techniques like minimizing the quantization error, improving the network loss function, and reducing the gradient error.
arXiv Detail & Related papers (2020-03-31T16:47:20Z) - Shifted and Squeezed 8-bit Floating Point format for Low-Precision
Training of Deep Neural Networks [13.929168096016957]
We introduce a novel methodology for training deep neural networks using 8-bit floating point (FP8) numbers.
Reduced bit precision allows for a larger effective memory and increased computational speed.
We show that, unlike previous 8-bit precision training methods, the proposed method works out-of-the-box for representative models.
arXiv Detail & Related papers (2020-01-16T06:38:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.