Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks
- URL: http://arxiv.org/abs/2012.13762v1
- Date: Sat, 26 Dec 2020 15:21:18 GMT
- Title: Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks
- Authors: Tuan Hoang and Thanh-Toan Do and Tam V. Nguyen and Ngai-Man Cheung
- Abstract summary: This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
- Score: 73.29587731448345
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes two novel techniques to train deep convolutional neural
networks with low bit-width weights and activations. First, to obtain low
bit-width weights, most existing methods obtain the quantized weights by
performing quantization on the full-precision network weights. However, this
approach would result in some mismatch: the gradient descent updates
full-precision weights, but it does not update the quantized weights. To
address this issue, we propose a novel method that enables {direct} updating of
quantized weights {with learnable quantization levels} to minimize the cost
function using gradient descent. Second, to obtain low bit-width activations,
existing works consider all channels equally. However, the activation
quantizers could be biased toward a few channels with high-variance. To address
this issue, we propose a method to take into account the quantization errors of
individual channels. With this approach, we can learn activation quantizers
that minimize the quantization errors in the majority of channels. Experimental
results demonstrate that our proposed method achieves state-of-the-art
performance on the image classification task, using AlexNet, ResNet and
MobileNetV2 architectures on CIFAR-100 and ImageNet datasets.
Related papers
- Post-Training Quantization for Re-parameterization via Coarse & Fine
Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight.
We develop an improved KL metric to determine optimal quantization scales for activation.
For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Standard Deviation-Based Quantization for Deep Neural Networks [17.495852096822894]
Quantization of deep neural networks is a promising approach that reduces the inference cost.
We propose a new framework to learn the quantization intervals (discrete values) using the knowledge of the network's weight and activation distributions.
Our scheme simultaneously prunes the network's parameters and allows us to flexibly adjust the pruning ratio during the quantization process.
arXiv Detail & Related papers (2022-02-24T23:33:47Z) - Post-training Quantization for Neural Networks with Provable Guarantees [9.58246628652846]
We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism.
We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
arXiv Detail & Related papers (2022-01-26T18:47:38Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - One Model for All Quantization: A Quantized Network Supporting Hot-Swap
Bit-Width Adjustment [36.75157407486302]
We propose a method to train a model for all quantization that supports diverse bit-widths.
We use wavelet decomposition and reconstruction to increase the diversity of weights.
Our method can achieve accuracy comparable to dedicated models trained at the same precision.
arXiv Detail & Related papers (2021-05-04T08:10:50Z) - DAQ: Distribution-Aware Quantization for Deep Image Super-Resolution
Networks [49.191062785007006]
Quantizing deep convolutional neural networks for image super-resolution substantially reduces their computational costs.
Existing works either suffer from a severe performance drop in ultra-low precision of 4 or lower bit-widths, or require a heavy fine-tuning process to recover the performance.
We propose a novel distribution-aware quantization scheme (DAQ) which facilitates accurate training-free quantization in ultra-low precision.
arXiv Detail & Related papers (2020-12-21T10:19:42Z) - Searching for Low-Bit Weights in Quantized Neural Networks [129.8319019563356]
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators.
We present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.
arXiv Detail & Related papers (2020-09-18T09:13:26Z) - Gradient $\ell_1$ Regularization for Quantization Robustness [70.39776106458858]
We derive a simple regularization scheme that improves robustness against post-training quantization.
By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
arXiv Detail & Related papers (2020-02-18T12:31:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.