CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level
Continuous Sparsification
- URL: http://arxiv.org/abs/2212.02770v1
- Date: Tue, 6 Dec 2022 05:44:21 GMT
- Title: CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level
Continuous Sparsification
- Authors: Lirui Xiao, Huanrui Yang, Zhen Dong, Kurt Keutzer, Li Du, Shanghang
Zhang
- Abstract summary: Mixed-precision quantization has been widely applied on deep neural networks (DNNs)
Previous attempts on bit-level regularization and pruning-based dynamic precision adjustment during training suffer from noisy gradients and unstable convergence.
We propose Continuous Sparsification Quantization (CSQ), a bit-level training method to search for mixed-precision quantization schemes with improved stability.
- Score: 51.81850995661478
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mixed-precision quantization has been widely applied on deep neural networks
(DNNs) as it leads to significantly better efficiency-accuracy tradeoffs
compared to uniform quantization. Meanwhile, determining the exact precision of
each layer remains challenging. Previous attempts on bit-level regularization
and pruning-based dynamic precision adjustment during training suffer from
noisy gradients and unstable convergence. In this work, we propose Continuous
Sparsification Quantization (CSQ), a bit-level training method to search for
mixed-precision quantization schemes with improved stability. CSQ stabilizes
the bit-level mixed-precision training process with a bi-level gradual
continuous sparsification on both the bit values of the quantized weights and
the bit selection in determining the quantization precision of each layer. The
continuous sparsification scheme enables fully-differentiable training without
gradient approximation while achieving an exact quantized model in the end.A
budget-aware regularization of total model size enables the dynamic growth and
pruning of each layer's precision towards a mixed-precision quantization scheme
of the desired size. Extensive experiments show CSQ achieves better
efficiency-accuracy tradeoff than previous methods on multiple models and
datasets.
Related papers
- MixQuant: Mixed Precision Quantization with a Bit-width Optimization
Search [7.564770908909927]
Quantization is a technique for creating efficient Deep Neural Networks (DNNs)
We propose MixQuant, a search algorithm that finds the optimal custom quantization bit-width for each layer weight based on roundoff error.
We show that combining MixQuant with BRECQ, a state-of-the-art quantization method, yields better quantized model accuracy than BRECQ alone.
arXiv Detail & Related papers (2023-09-29T15:49:54Z) - Neural Networks with Quantization Constraints [111.42313650830248]
We present a constrained learning approach to quantization training.
We show that the resulting problem is strongly dual and does away with gradient estimations.
We demonstrate that the proposed approach exhibits competitive performance in image classification tasks.
arXiv Detail & Related papers (2022-10-27T17:12:48Z) - Mixed Precision Low-bit Quantization of Neural Network Language Models
for Speech Recognition [67.95996816744251]
State-of-the-art language models (LMs) represented by long-short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming increasingly complex and expensive for practical applications.
Current quantization methods are based on uniform precision and fail to account for the varying performance sensitivity at different parts of LMs to quantization errors.
Novel mixed precision neural network LM quantization methods are proposed in this paper.
arXiv Detail & Related papers (2021-11-29T12:24:02Z) - Mixed Precision of Quantization of Transformer Language Models for
Speech Recognition [67.95996816744251]
State-of-the-art neural language models represented by Transformers are becoming increasingly complex and expensive for practical applications.
Current low-bit quantization methods are based on uniform precision and fail to account for the varying performance sensitivity at different parts of the system to quantization errors.
The optimal local precision settings are automatically learned using two techniques.
Experiments conducted on Penn Treebank (PTB) and a Switchboard corpus trained LF-MMI TDNN system.
arXiv Detail & Related papers (2021-11-29T09:57:00Z) - RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise
Mixed Schemes and Multiple Precisions [43.27226390407956]
This work proposes a novel Deep Neural Network (DNN) quantization framework, namely RMSMP, with a Row-wise Mixed-Scheme and Multi-Precision approach.
The proposed RMSMP is tested for the image classification and natural language processing (BERT) applications.
It achieves the best accuracy performance among state-of-the-arts under the same equivalent precisions.
arXiv Detail & Related papers (2021-10-30T02:53:35Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network
Quantization [32.770842274996774]
Mixed-precision quantization can potentially achieve the optimal tradeoff between performance and compression rate of deep neural networks.
Previous methods either examine only a small manually-designed search space or utilize a cumbersome neural architecture search to explore the vast search space.
This work proposes bit-level sparsity quantization (BSQ) to tackle the mixed-precision quantization from a new angle of inducing bit-level sparsity.
arXiv Detail & Related papers (2021-02-20T22:37:41Z) - FracBits: Mixed Precision Quantization via Fractional Bit-Widths [29.72454879490227]
Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths.
We propose a novel learning-based algorithm to derive mixed precision models end-to-end under target computation constraints.
arXiv Detail & Related papers (2020-07-04T06:09:09Z) - Optimal Gradient Quantization Condition for Communication-Efficient
Distributed Training [99.42912552638168]
Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications.
In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution.
Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
arXiv Detail & Related papers (2020-02-25T18:28:39Z) - Post-training Quantization with Multiple Points: Mixed Precision without
Mixed Precision [20.081543082708688]
We propose multipoint quantization, a method that approximates a full-precision weight vector using a linear combination of multiple vectors of low-bit numbers.
We show that our method outperforms a range of state-of-the-art methods on ImageNet classification and it can be generalized to more challenging tasks like PASCAL VOC object detection.
arXiv Detail & Related papers (2020-02-20T22:37:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.