One Model for All Quantization: A Quantized Network Supporting Hot-Swap
Bit-Width Adjustment
- URL: http://arxiv.org/abs/2105.01353v1
- Date: Tue, 4 May 2021 08:10:50 GMT
- Title: One Model for All Quantization: A Quantized Network Supporting Hot-Swap
Bit-Width Adjustment
- Authors: Qigong Sun, Xiufang Li, Yan Ren, Zhongjian Huang, Xu Liu, Licheng
Jiao, Fang Liu
- Abstract summary: We propose a method to train a model for all quantization that supports diverse bit-widths.
We use wavelet decomposition and reconstruction to increase the diversity of weights.
Our method can achieve accuracy comparable to dedicated models trained at the same precision.
- Score: 36.75157407486302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As an effective technique to achieve the implementation of deep neural
networks in edge devices, model quantization has been successfully applied in
many practical applications. No matter the methods of quantization aware
training (QAT) or post-training quantization (PTQ), they all depend on the
target bit-widths. When the precision of quantization is adjusted, it is
necessary to fine-tune the quantized model or minimize the quantization noise,
which brings inconvenience in practical applications. In this work, we propose
a method to train a model for all quantization that supports diverse bit-widths
(e.g., form 8-bit to 1-bit) to satisfy the online quantization bit-width
adjustment. It is hot-swappable that can provide specific quantization
strategies for different candidates through multiscale quantization. We use
wavelet decomposition and reconstruction to increase the diversity of weights,
thus significantly improving the performance of each quantization candidate,
especially at ultra-low bit-widths (e.g., 3-bit, 2-bit, and 1-bit).
Experimental results on ImageNet and COCO show that our method can achieve
accuracy comparable performance to dedicated models trained at the same
precision.
Related papers
- MixQuant: Mixed Precision Quantization with a Bit-width Optimization
Search [7.564770908909927]
Quantization is a technique for creating efficient Deep Neural Networks (DNNs)
We propose MixQuant, a search algorithm that finds the optimal custom quantization bit-width for each layer weight based on roundoff error.
We show that combining MixQuant with BRECQ, a state-of-the-art quantization method, yields better quantized model accuracy than BRECQ alone.
arXiv Detail & Related papers (2023-09-29T15:49:54Z) - Vertical Layering of Quantized Neural Networks for Heterogeneous
Inference [57.42762335081385]
We study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one.
We can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model.
arXiv Detail & Related papers (2022-12-10T15:57:38Z) - Attention Round for Post-Training Quantization [0.9558392439655015]
This paper presents a novel quantification method called Attention Round.
The probability of being mapped to different quantified values is negatively correlated with the distance between the quantified values and w, and decay with a Gaussian function.
For ResNet18 and MobileNetV2, the post-training quantization proposed in this paper only require 1,024 training data and 10 minutes to complete the quantization process.
arXiv Detail & Related papers (2022-07-07T05:04:21Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Differentiable Model Compression via Pseudo Quantization Noise [99.89011673907814]
We propose to add independent pseudo quantization noise to model parameters during training to approximate the effect of a quantization operator.
We experimentally verify that our method outperforms state-of-the-art quantization techniques on several benchmarks and architectures for image classification, language modeling, and audio source separation.
arXiv Detail & Related papers (2021-04-20T14:14:03Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - Searching for Low-Bit Weights in Quantized Neural Networks [129.8319019563356]
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators.
We present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.
arXiv Detail & Related papers (2020-09-18T09:13:26Z) - Robust Quantization: One Model to Rule Them All [13.87610199914036]
We propose a method that provides intrinsic robustness to the model against a broad range of quantization processes.
Our method is motivated by theoretical arguments and enables us to store a single generic model capable of operating at various bit-widths and quantization policies.
arXiv Detail & Related papers (2020-02-18T16:14:36Z) - Gradient $\ell_1$ Regularization for Quantization Robustness [70.39776106458858]
We derive a simple regularization scheme that improves robustness against post-training quantization.
By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
arXiv Detail & Related papers (2020-02-18T12:31:34Z) - Post-Training Piecewise Linear Quantization for Deep Neural Networks [13.717228230596167]
Quantization plays an important role in the energy-efficient deployment of deep neural networks on resource-limited devices.
We propose a piecewise linear quantization scheme to enable accurate approximation for tensor values that have bell-shaped distributions with long tails.
Compared to state-of-the-art post-training quantization methods, our proposed method achieves superior performance on image classification, semantic segmentation, and object detection with minor overhead.
arXiv Detail & Related papers (2020-01-31T23:47:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.