BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network
Quantization
- URL: http://arxiv.org/abs/2102.10462v1
- Date: Sat, 20 Feb 2021 22:37:41 GMT
- Title: BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network
Quantization
- Authors: Huanrui Yang, Lin Duan, Yiran Chen, Hai Li
- Abstract summary: Mixed-precision quantization can potentially achieve the optimal tradeoff between performance and compression rate of deep neural networks.
Previous methods either examine only a small manually-designed search space or utilize a cumbersome neural architecture search to explore the vast search space.
This work proposes bit-level sparsity quantization (BSQ) to tackle the mixed-precision quantization from a new angle of inducing bit-level sparsity.
- Score: 32.770842274996774
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mixed-precision quantization can potentially achieve the optimal tradeoff
between performance and compression rate of deep neural networks, and thus,
have been widely investigated. However, it lacks a systematic method to
determine the exact quantization scheme. Previous methods either examine only a
small manually-designed search space or utilize a cumbersome neural
architecture search to explore the vast search space. These approaches cannot
lead to an optimal quantization scheme efficiently. This work proposes
bit-level sparsity quantization (BSQ) to tackle the mixed-precision
quantization from a new angle of inducing bit-level sparsity. We consider each
bit of quantized weights as an independent trainable variable and introduce a
differentiable bit-sparsity regularizer. BSQ can induce all-zero bits across a
group of weight elements and realize the dynamic precision reduction, leading
to a mixed-precision quantization scheme of the original model. Our method
enables the exploration of the full mixed-precision space with a single
gradient-based optimization process, with only one hyperparameter to tradeoff
the performance and compression. BSQ achieves both higher accuracy and higher
bit reduction on various model architectures on the CIFAR-10 and ImageNet
datasets comparing to previous methods.
Related papers
- MixQuant: Mixed Precision Quantization with a Bit-width Optimization
Search [7.564770908909927]
Quantization is a technique for creating efficient Deep Neural Networks (DNNs)
We propose MixQuant, a search algorithm that finds the optimal custom quantization bit-width for each layer weight based on roundoff error.
We show that combining MixQuant with BRECQ, a state-of-the-art quantization method, yields better quantized model accuracy than BRECQ alone.
arXiv Detail & Related papers (2023-09-29T15:49:54Z) - CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level
Continuous Sparsification [51.81850995661478]
Mixed-precision quantization has been widely applied on deep neural networks (DNNs)
Previous attempts on bit-level regularization and pruning-based dynamic precision adjustment during training suffer from noisy gradients and unstable convergence.
We propose Continuous Sparsification Quantization (CSQ), a bit-level training method to search for mixed-precision quantization schemes with improved stability.
arXiv Detail & Related papers (2022-12-06T05:44:21Z) - SDQ: Stochastic Differentiable Quantization with Mixed Precision [46.232003346732064]
We present a novel Differentiable Quantization (SDQ) method that can automatically learn the MPQ strategy.
After the optimal MPQ strategy is acquired, we train our network with entropy-aware bin regularization and knowledge distillation.
SDQ outperforms all state-of-the-art mixed datasets or single precision quantization with a lower bitwidth.
arXiv Detail & Related papers (2022-06-09T12:38:18Z) - Post-training Quantization for Neural Networks with Provable Guarantees [9.58246628652846]
We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism.
We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
arXiv Detail & Related papers (2022-01-26T18:47:38Z) - Automatic Mixed-Precision Quantization Search of BERT [62.65905462141319]
Pre-trained language models such as BERT have shown remarkable effectiveness in various natural language processing tasks.
These models usually contain millions of parameters, which prevents them from practical deployment on resource-constrained devices.
We propose an automatic mixed-precision quantization framework designed for BERT that can simultaneously conduct quantization and pruning in a subgroup-wise level.
arXiv Detail & Related papers (2021-12-30T06:32:47Z) - Mixed Precision Low-bit Quantization of Neural Network Language Models
for Speech Recognition [67.95996816744251]
State-of-the-art language models (LMs) represented by long-short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming increasingly complex and expensive for practical applications.
Current quantization methods are based on uniform precision and fail to account for the varying performance sensitivity at different parts of LMs to quantization errors.
Novel mixed precision neural network LM quantization methods are proposed in this paper.
arXiv Detail & Related papers (2021-11-29T12:24:02Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Effective and Fast: A Novel Sequential Single Path Search for
Mixed-Precision Quantization [45.22093693422085]
Mixed-precision quantization model can match different quantization bit-precisions according to the sensitivity of different layers to achieve great performance.
It is a difficult problem to quickly determine the quantization bit-precision of each layer in deep neural networks according to some constraints.
We propose a novel sequential single path search (SSPS) method for mixed-precision quantization.
arXiv Detail & Related papers (2021-03-04T09:15:08Z) - Searching for Low-Bit Weights in Quantized Neural Networks [129.8319019563356]
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators.
We present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.
arXiv Detail & Related papers (2020-09-18T09:13:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.