AUSN: Approximately Uniform Quantization by Adaptively Superimposing
Non-uniform Distribution for Deep Neural Networks
- URL: http://arxiv.org/abs/2007.03903v1
- Date: Wed, 8 Jul 2020 05:10:53 GMT
- Title: AUSN: Approximately Uniform Quantization by Adaptively Superimposing
Non-uniform Distribution for Deep Neural Networks
- Authors: Liu Fangxin, Zhao Wenbo, Wang Yanzhi, Dai Changzhi, Jiang Li
- Abstract summary: Existing uniform and non-uniform quantization methods exhibit an inherent conflict between the representing range and representing resolution.
We propose a novel quantization method to quantize the weight and activation.
The key idea is to Approximate the Uniform quantization by Adaptively Superposing multiple Non-uniform quantized values, namely AUSN.
- Score: 0.7378164273177589
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quantization is essential to simplify DNN inference in edge applications.
Existing uniform and non-uniform quantization methods, however, exhibit an
inherent conflict between the representing range and representing resolution,
and thereby result in either underutilized bit-width or significant accuracy
drop. Moreover, these methods encounter three drawbacks: i) the absence of a
quantitative metric for in-depth analysis of the source of the quantization
errors; ii) the limited focus on the image classification tasks based on CNNs;
iii) the unawareness of the real hardware and energy consumption reduced by
lowering the bit-width. In this paper, we first define two quantitative
metrics, i.e., the Clipping Error and rounding error, to analyze the
quantization error distribution. We observe that the boundary- and rounding-
errors vary significantly across layers, models and tasks. Consequently, we
propose a novel quantization method to quantize the weight and activation. The
key idea is to Approximate the Uniform quantization by Adaptively Superposing
multiple Non-uniform quantized values, namely AUSN. AUSN is consist of a
decoder-free coding scheme that efficiently exploits the bit-width to its
extreme, a superposition quantization algorithm that can adapt the coding
scheme to different DNN layers, models and tasks without extra hardware design
effort, and a rounding scheme that can eliminate the well-known bit-width
overflow and re-quantization issues. Theoretical analysis~(see Appendix A) and
accuracy evaluation on various DNN models of different tasks show the
effectiveness and generalization of AUSN. The synthesis~(see Appendix B)
results on FPGA show $2\times$ reduction of the energy consumption, and
$2\times$ to $4\times$ reduction of the hardware resource.
Related papers
- Low-bit Quantization for Deep Graph Neural Networks with
Smoothness-aware Message Propagation [3.9177379733188715]
We present an end-to-end solution that aims to address these challenges for efficient GNNs in resource constrained environments.
We introduce a quantization based approach for all stages of GNNs, from message passing in training to node classification.
The proposed quantizer learns quantization ranges and reduces the model size with comparable accuracy even under low-bit quantization.
arXiv Detail & Related papers (2023-08-29T00:25:02Z) - QEBVerif: Quantization Error Bound Verification of Neural Networks [6.327780998441913]
quantization is widely regarded as one promising technique for deploying deep neural networks (DNNs) on edge devices.
Existing verification methods focus on either individual neural networks (DNNs or QNNs) or quantization error bound for partial quantization.
We propose a quantization error bound verification method, named QEBVerif, where both weights and activation tensors are quantized.
arXiv Detail & Related papers (2022-12-06T06:34:38Z) - Symmetry Regularization and Saturating Nonlinearity for Robust
Quantization [5.1779694507922835]
We present three insights to robustify a network against quantization.
We propose two novel methods called symmetry regularization (SymReg) and saturating nonlinearity (SatNL)
Applying the proposed methods during training can enhance the robustness of arbitrary neural networks against quantization.
arXiv Detail & Related papers (2022-07-31T02:12:28Z) - A Comprehensive Survey on Model Quantization for Deep Neural Networks in
Image Classification [0.0]
A promising approach is quantization, in which the full-precision values are stored in low bit-width precision.
We present a comprehensive survey of quantization concepts and methods, with a focus on image classification.
We explain the replacement of floating-point operations with low-cost bitwise operations in a quantized DNN and the sensitivity of different layers in quantization.
arXiv Detail & Related papers (2022-05-14T15:08:32Z) - Post-training Quantization for Neural Networks with Provable Guarantees [9.58246628652846]
We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism.
We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
arXiv Detail & Related papers (2022-01-26T18:47:38Z) - Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via
Generalized Straight-Through Estimation [48.838691414561694]
Nonuniform-to-Uniform Quantization (N2UQ) is a method that can maintain the strong representation ability of nonuniform methods while being hardware-friendly and efficient.
N2UQ outperforms state-of-the-art nonuniform quantization methods by 0.71.8% on ImageNet.
arXiv Detail & Related papers (2021-11-29T18:59:55Z) - Mixed Precision Low-bit Quantization of Neural Network Language Models
for Speech Recognition [67.95996816744251]
State-of-the-art language models (LMs) represented by long-short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming increasingly complex and expensive for practical applications.
Current quantization methods are based on uniform precision and fail to account for the varying performance sensitivity at different parts of LMs to quantization errors.
Novel mixed precision neural network LM quantization methods are proposed in this paper.
arXiv Detail & Related papers (2021-11-29T12:24:02Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - DAQ: Distribution-Aware Quantization for Deep Image Super-Resolution
Networks [49.191062785007006]
Quantizing deep convolutional neural networks for image super-resolution substantially reduces their computational costs.
Existing works either suffer from a severe performance drop in ultra-low precision of 4 or lower bit-widths, or require a heavy fine-tuning process to recover the performance.
We propose a novel distribution-aware quantization scheme (DAQ) which facilitates accurate training-free quantization in ultra-low precision.
arXiv Detail & Related papers (2020-12-21T10:19:42Z) - Gradient $\ell_1$ Regularization for Quantization Robustness [70.39776106458858]
We derive a simple regularization scheme that improves robustness against post-training quantization.
By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
arXiv Detail & Related papers (2020-02-18T12:31:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.