Related papers: AUSN: Approximately Uniform Quantization by Adaptively Superimposing Non-uniform Distribution for Deep Neural Networks

AUSN: Approximately Uniform Quantization by Adaptively Superimposing Non-uniform Distribution for Deep Neural Networks

URL: http://arxiv.org/abs/2007.03903v1
Date: Wed, 8 Jul 2020 05:10:53 GMT
Title: AUSN: Approximately Uniform Quantization by Adaptively Superimposing Non-uniform Distribution for Deep Neural Networks
Authors: Liu Fangxin, Zhao Wenbo, Wang Yanzhi, Dai Changzhi, Jiang Li
Abstract summary: Existing uniform and non-uniform quantization methods exhibit an inherent conflict between the representing range and representing resolution. We propose a novel quantization method to quantize the weight and activation. The key idea is to Approximate the Uniform quantization by Adaptively Superposing multiple Non-uniform quantized values, namely AUSN.
Score: 0.7378164273177589
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Quantization is essential to simplify DNN inference in edge applications. Existing uniform and non-uniform quantization methods, however, exhibit an inherent conflict between the representing range and representing resolution, and thereby result in either underutilized bit-width or significant accuracy drop. Moreover, these methods encounter three drawbacks: i) the absence of a quantitative metric for in-depth analysis of the source of the quantization errors; ii) the limited focus on the image classification tasks based on CNNs; iii) the unawareness of the real hardware and energy consumption reduced by lowering the bit-width. In this paper, we first define two quantitative metrics, i.e., the Clipping Error and rounding error, to analyze the quantization error distribution. We observe that the boundary- and rounding- errors vary significantly across layers, models and tasks. Consequently, we propose a novel quantization method to quantize the weight and activation. The key idea is to Approximate the Uniform quantization by Adaptively Superposing multiple Non-uniform quantized values, namely AUSN. AUSN is consist of a decoder-free coding scheme that efficiently exploits the bit-width to its extreme, a superposition quantization algorithm that can adapt the coding scheme to different DNN layers, models and tasks without extra hardware design effort, and a rounding scheme that can eliminate the well-known bit-width overflow and re-quantization issues. Theoretical analysis~(see Appendix A) and accuracy evaluation on various DNN models of different tasks show the effectiveness and generalization of AUSN. The synthesis~(see Appendix B) results on FPGA show $2\times$ reduction of the energy consumption, and $2\times$ to $4\times$ reduction of the hardware resource.

Related papers

Low-bit Quantization for Deep Graph Neural Networks with Smoothness-aware Message Propagation [3.9177379733188715]
We present an end-to-end solution that aims to address these challenges for efficient GNNs in resource constrained environments. We introduce a quantization based approach for all stages of GNNs, from message passing in training to node classification. The proposed quantizer learns quantization ranges and reduces the model size with comparable accuracy even under low-bit quantization.
arXiv Detail & Related papers (2023-08-29T00:25:02Z)
QEBVerif: Quantization Error Bound Verification of Neural Networks [6.327780998441913]
quantization is widely regarded as one promising technique for deploying deep neural networks (DNNs) on edge devices. Existing verification methods focus on either individual neural networks (DNNs or QNNs) or quantization error bound for partial quantization. We propose a quantization error bound verification method, named QEBVerif, where both weights and activation tensors are quantized.
arXiv Detail & Related papers (2022-12-06T06:34:38Z)
Symmetry Regularization and Saturating Nonlinearity for Robust Quantization [5.1779694507922835]
We present three insights to robustify a network against quantization. We propose two novel methods called symmetry regularization (SymReg) and saturating nonlinearity (SatNL) Applying the proposed methods during training can enhance the robustness of arbitrary neural networks against quantization.
arXiv Detail & Related papers (2022-07-31T02:12:28Z)
A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification [0.0]
A promising approach is quantization, in which the full-precision values are stored in low bit-width precision. We present a comprehensive survey of quantization concepts and methods, with a focus on image classification. We explain the replacement of floating-point operations with low-cost bitwise operations in a quantized DNN and the sensitivity of different layers in quantization.
arXiv Detail & Related papers (2022-05-14T15:08:32Z)
Post-training Quantization for Neural Networks with Provable Guarantees [9.58246628652846]
We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism. We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
arXiv Detail & Related papers (2022-01-26T18:47:38Z)
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation [48.838691414561694]
Nonuniform-to-Uniform Quantization (N2UQ) is a method that can maintain the strong representation ability of nonuniform methods while being hardware-friendly and efficient. N2UQ outperforms state-of-the-art nonuniform quantization methods by 0.71.8% on ImageNet.
arXiv Detail & Related papers (2021-11-29T18:59:55Z)
Mixed Precision Low-bit Quantization of Neural Network Language Models for Speech Recognition [67.95996816744251]
State-of-the-art language models (LMs) represented by long-short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming increasingly complex and expensive for practical applications. Current quantization methods are based on uniform precision and fail to account for the varying performance sensitivity at different parts of LMs to quantization errors. Novel mixed precision neural network LM quantization methods are proposed in this paper.
arXiv Detail & Related papers (2021-11-29T12:24:02Z)
Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks. DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons. We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z)
Direct Quantization for Training Highly Accurate Low Bit-width Deep Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations. First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights. Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z)
DAQ: Distribution-Aware Quantization for Deep Image Super-Resolution Networks [49.191062785007006]
Quantizing deep convolutional neural networks for image super-resolution substantially reduces their computational costs. Existing works either suffer from a severe performance drop in ultra-low precision of 4 or lower bit-widths, or require a heavy fine-tuning process to recover the performance. We propose a novel distribution-aware quantization scheme (DAQ) which facilitates accurate training-free quantization in ultra-low precision.
arXiv Detail & Related papers (2020-12-21T10:19:42Z)
Gradient $\ell_1$ Regularization for Quantization Robustness [70.39776106458858]
We derive a simple regularization scheme that improves robustness against post-training quantization. By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
arXiv Detail & Related papers (2020-02-18T12:31:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.