Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via
Generalized Straight-Through Estimation
- URL: http://arxiv.org/abs/2111.14826v1
- Date: Mon, 29 Nov 2021 18:59:55 GMT
- Title: Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via
Generalized Straight-Through Estimation
- Authors: Zechun Liu and Kwang-Ting Cheng and Dong Huang and Eric Xing and
Zhiqiang Shen
- Abstract summary: Nonuniform-to-Uniform Quantization (N2UQ) is a method that can maintain the strong representation ability of nonuniform methods while being hardware-friendly and efficient.
N2UQ outperforms state-of-the-art nonuniform quantization methods by 0.71.8% on ImageNet.
- Score: 48.838691414561694
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The nonuniform quantization strategy for compressing neural networks usually
achieves better performance than its counterpart, i.e., uniform strategy, due
to its superior representational capacity. However, many nonuniform
quantization methods overlook the complicated projection process in
implementing the nonuniformly quantized weights/activations, which incurs
non-negligible time and space overhead in hardware deployment. In this study,
we propose Nonuniform-to-Uniform Quantization (N2UQ), a method that can
maintain the strong representation ability of nonuniform methods while being
hardware-friendly and efficient as the uniform quantization for model
inference. We achieve this through learning the flexible in-equidistant input
thresholds to better fit the underlying distribution while quantizing these
real-valued inputs into equidistant output levels. To train the quantized
network with learnable input thresholds, we introduce a generalized
straight-through estimator (G-STE) for intractable backward derivative
calculation w.r.t. threshold parameters. Additionally, we consider entropy
preserving regularization to further reduce information loss in weight
quantization. Even under this adverse constraint of imposing uniformly
quantized weights and activations, our N2UQ outperforms state-of-the-art
nonuniform quantization methods by 0.7~1.8% on ImageNet, demonstrating the
contribution of N2UQ design. Code will be made publicly available.
Related papers
- NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search [7.971065005161565]
quantization is a technique to convert floating point representations to low bit-width fixed point representations.
We show how to learn new quantized weights over the entire quantized space.
We show the ability of the method to achieve state-of-the-art compression rates in both, data-free and data-driven configurations.
arXiv Detail & Related papers (2023-08-10T14:19:58Z) - Designing strong baselines for ternary neural network quantization
through support and mass equalization [7.971065005161565]
Deep neural networks (DNNs) offer the highest performance in a wide range of applications in computer vision.
This computational burden can be dramatically reduced by quantizing floating point values to ternary values.
We show experimentally that our approach allows to significantly improve the performance of ternary quantization through a variety of scenarios.
arXiv Detail & Related papers (2023-06-30T07:35:07Z) - Attention Round for Post-Training Quantization [0.9558392439655015]
This paper presents a novel quantification method called Attention Round.
The probability of being mapped to different quantified values is negatively correlated with the distance between the quantified values and w, and decay with a Gaussian function.
For ResNet18 and MobileNetV2, the post-training quantization proposed in this paper only require 1,024 training data and 10 minutes to complete the quantization process.
arXiv Detail & Related papers (2022-07-07T05:04:21Z) - Improved Quantum Algorithms for Fidelity Estimation [77.34726150561087]
We develop new and efficient quantum algorithms for fidelity estimation with provable performance guarantees.
Our algorithms use advanced quantum linear algebra techniques, such as the quantum singular value transformation.
We prove that fidelity estimation to any non-trivial constant additive accuracy is hard in general.
arXiv Detail & Related papers (2022-03-30T02:02:16Z) - Power-of-Two Quantization for Low Bitwidth and Hardware Compliant Neural
Networks [1.398698203665363]
In this paper, we explore non-linear quantization techniques for exploiting lower bit precision.
We developed the Quantization Aware Training (QAT) algorithm that allowed training of low bit width Power-of-Two (PoT) networks.
At the same time, PoT quantization vastly reduces the computational complexity of the neural network.
arXiv Detail & Related papers (2022-03-09T19:57:14Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers.
We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z) - DAQ: Distribution-Aware Quantization for Deep Image Super-Resolution
Networks [49.191062785007006]
Quantizing deep convolutional neural networks for image super-resolution substantially reduces their computational costs.
Existing works either suffer from a severe performance drop in ultra-low precision of 4 or lower bit-widths, or require a heavy fine-tuning process to recover the performance.
We propose a novel distribution-aware quantization scheme (DAQ) which facilitates accurate training-free quantization in ultra-low precision.
arXiv Detail & Related papers (2020-12-21T10:19:42Z) - AUSN: Approximately Uniform Quantization by Adaptively Superimposing
Non-uniform Distribution for Deep Neural Networks [0.7378164273177589]
Existing uniform and non-uniform quantization methods exhibit an inherent conflict between the representing range and representing resolution.
We propose a novel quantization method to quantize the weight and activation.
The key idea is to Approximate the Uniform quantization by Adaptively Superposing multiple Non-uniform quantized values, namely AUSN.
arXiv Detail & Related papers (2020-07-08T05:10:53Z) - Gradient $\ell_1$ Regularization for Quantization Robustness [70.39776106458858]
We derive a simple regularization scheme that improves robustness against post-training quantization.
By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
arXiv Detail & Related papers (2020-02-18T12:31:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.