PD-Quant: Post-Training Quantization based on Prediction Difference
Metric
- URL: http://arxiv.org/abs/2212.07048v3
- Date: Mon, 27 Mar 2023 05:47:22 GMT
- Title: PD-Quant: Post-Training Quantization based on Prediction Difference
Metric
- Authors: Jiawei Liu, Lin Niu, Zhihang Yuan, Dawei Yang, Xinggang Wang, Wenyu
Liu
- Abstract summary: Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types.
How to determine the appropriate quantization parameters is the main problem facing now.
PD-Quant is a method that addresses this limitation by considering global information.
- Score: 43.81334288840746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Post-training quantization (PTQ) is a neural network compression technique
that converts a full-precision model into a quantized model using
lower-precision data types. Although it can help reduce the size and
computational cost of deep neural networks, it can also introduce quantization
noise and reduce prediction accuracy, especially in extremely low-bit settings.
How to determine the appropriate quantization parameters (e.g., scaling factors
and rounding of weights) is the main problem facing now. Existing methods
attempt to determine these parameters by minimize the distance between features
before and after quantization, but such an approach only considers local
information and may not result in the most optimal quantization parameters. We
analyze this issue and ropose PD-Quant, a method that addresses this limitation
by considering global information. It determines the quantization parameters by
using the information of differences between network prediction before and
after quantization. In addition, PD-Quant can alleviate the overfitting problem
in PTQ caused by the small number of calibration sets by adjusting the
distribution of activations. Experiments show that PD-Quant leads to better
quantization parameters and improves the prediction accuracy of quantized
models, especially in low-bit settings. For example, PD-Quant pushes the
accuracy of ResNet-18 up to 53.14% and RegNetX-600MF up to 40.67% in weight
2-bit activation 2-bit. The code is released at
https://github.com/hustvl/PD-Quant.
Related papers
- Towards Accurate Post-training Quantization for Reparameterized Models [6.158896686945439]
Current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation.
This is primarily caused by channel-specific and sample-specific outliers.
We propose RepAPQ, a novel framework that preserves the accuracy of quantized reparameterization models.
arXiv Detail & Related papers (2024-02-25T15:42:12Z) - Post-Training Quantization for Re-parameterization via Coarse & Fine
Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight.
We develop an improved KL metric to determine optimal quantization scales for activation.
For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z) - GHN-QAT: Training Graph Hypernetworks to Predict Quantization-Robust
Parameters of Unseen Limited Precision Neural Networks [80.29667394618625]
Graph Hypernetworks (GHN) can predict the parameters of varying unseen CNN architectures with surprisingly good accuracy.
Preliminary research has explored the use of GHNs to predict quantization-robust parameters for 8-bit and 4-bit quantized CNNs.
We show that quantization-aware training can significantly improve quantized accuracy for GHN predicted parameters of 4-bit quantized CNNs.
arXiv Detail & Related papers (2023-09-24T23:01:00Z) - Designing strong baselines for ternary neural network quantization
through support and mass equalization [7.971065005161565]
Deep neural networks (DNNs) offer the highest performance in a wide range of applications in computer vision.
This computational burden can be dramatically reduced by quantizing floating point values to ternary values.
We show experimentally that our approach allows to significantly improve the performance of ternary quantization through a variety of scenarios.
arXiv Detail & Related papers (2023-06-30T07:35:07Z) - PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language
Models [52.09865918265002]
We propose a novel quantize before fine-tuning'' framework, PreQuant.
PreQuant is compatible with various quantization strategies, with outlier-aware fine-tuning incorporated to correct the induced quantization error.
We demonstrate the effectiveness of PreQuant on the GLUE benchmark using BERT, RoBERTa, and T5.
arXiv Detail & Related papers (2023-05-30T08:41:33Z) - Improving Post-Training Quantization on Object Detection with Task
Loss-Guided Lp Metric [43.81334288840746]
Post-Training Quantization (PTQ) transforms a full-precision model into low bit-width directly.
PTQ suffers severe accuracy drop when applied to complex tasks such as object detection.
DetPTQ employs the ODOL-based adaptive Lp metric to select the optimal quantization parameters.
arXiv Detail & Related papers (2023-04-19T16:11:21Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - One Model for All Quantization: A Quantized Network Supporting Hot-Swap
Bit-Width Adjustment [36.75157407486302]
We propose a method to train a model for all quantization that supports diverse bit-widths.
We use wavelet decomposition and reconstruction to increase the diversity of weights.
Our method can achieve accuracy comparable to dedicated models trained at the same precision.
arXiv Detail & Related papers (2021-05-04T08:10:50Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - Searching for Low-Bit Weights in Quantized Neural Networks [129.8319019563356]
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators.
We present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.
arXiv Detail & Related papers (2020-09-18T09:13:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.