Related papers: PD-Quant: Post-Training Quantization based on Prediction Difference Metric

PD-Quant: Post-Training Quantization based on Prediction Difference Metric

URL: http://arxiv.org/abs/2212.07048v3
Date: Mon, 27 Mar 2023 05:47:22 GMT
Title: PD-Quant: Post-Training Quantization based on Prediction Difference Metric
Authors: Jiawei Liu, Lin Niu, Zhihang Yuan, Dawei Yang, Xinggang Wang, Wenyu Liu
Abstract summary: Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types. How to determine the appropriate quantization parameters is the main problem facing now. PD-Quant is a method that addresses this limitation by considering global information.
Score: 43.81334288840746
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types. Although it can help reduce the size and computational cost of deep neural networks, it can also introduce quantization noise and reduce prediction accuracy, especially in extremely low-bit settings. How to determine the appropriate quantization parameters (e.g., scaling factors and rounding of weights) is the main problem facing now. Existing methods attempt to determine these parameters by minimize the distance between features before and after quantization, but such an approach only considers local information and may not result in the most optimal quantization parameters. We analyze this issue and ropose PD-Quant, a method that addresses this limitation by considering global information. It determines the quantization parameters by using the information of differences between network prediction before and after quantization. In addition, PD-Quant can alleviate the overfitting problem in PTQ caused by the small number of calibration sets by adjusting the distribution of activations. Experiments show that PD-Quant leads to better quantization parameters and improves the prediction accuracy of quantized models, especially in low-bit settings. For example, PD-Quant pushes the accuracy of ResNet-18 up to 53.14% and RegNetX-600MF up to 40.67% in weight 2-bit activation 2-bit. The code is released at https://github.com/hustvl/PD-Quant.

Related papers

CondiQuant: Condition Number Based Low-Bit Quantization for Image Super-Resolution [59.91470739501034]
We propose CondiQuant, a condition number based low-bit post-training quantization for image super-resolution. We show that CondiQuant outperforms existing state-of-the-art post-training quantization methods in accuracy without computation overhead.
arXiv Detail & Related papers (2025-02-21T14:04:30Z)
PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models [64.84734437930362]
Large Language Models (LLMs) suffer severe performance degradation when facing extremely low-bit (sub 2-bit) quantization. We propose an extremely low-bit PTQ method called PTQ1.61, which enables weight quantization to 1.61-bit for the first time. Experiments indicate our PTQ1.61 achieves state-of-the-art performance in extremely low-bit quantization.
arXiv Detail & Related papers (2025-02-18T08:04:58Z)
Towards Accurate Post-training Quantization for Reparameterized Models [6.158896686945439]
Current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation. This is primarily caused by channel-specific and sample-specific outliers. We propose RepAPQ, a novel framework that preserves the accuracy of quantized reparameterization models.
arXiv Detail & Related papers (2024-02-25T15:42:12Z)
Post-Training Quantization for Re-parameterization via Coarse & Fine Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight. We develop an improved KL metric to determine optimal quantization scales for activation. For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z)
GHN-QAT: Training Graph Hypernetworks to Predict Quantization-Robust Parameters of Unseen Limited Precision Neural Networks [80.29667394618625]
Graph Hypernetworks (GHN) can predict the parameters of varying unseen CNN architectures with surprisingly good accuracy. Preliminary research has explored the use of GHNs to predict quantization-robust parameters for 8-bit and 4-bit quantized CNNs. We show that quantization-aware training can significantly improve quantized accuracy for GHN predicted parameters of 4-bit quantized CNNs.
arXiv Detail & Related papers (2023-09-24T23:01:00Z)
Designing strong baselines for ternary neural network quantization through support and mass equalization [7.971065005161565]
Deep neural networks (DNNs) offer the highest performance in a wide range of applications in computer vision. This computational burden can be dramatically reduced by quantizing floating point values to ternary values. We show experimentally that our approach allows to significantly improve the performance of ternary quantization through a variety of scenarios.
arXiv Detail & Related papers (2023-06-30T07:35:07Z)
PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models [52.09865918265002]
We propose a novel quantize before fine-tuning'' framework, PreQuant. PreQuant is compatible with various quantization strategies, with outlier-aware fine-tuning incorporated to correct the induced quantization error. We demonstrate the effectiveness of PreQuant on the GLUE benchmark using BERT, RoBERTa, and T5.
arXiv Detail & Related papers (2023-05-30T08:41:33Z)
Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric [43.81334288840746]
Post-Training Quantization (PTQ) transforms a full-precision model into low bit-width directly. PTQ suffers severe accuracy drop when applied to complex tasks such as object detection. DetPTQ employs the ODOL-based adaptive Lp metric to select the optimal quantization parameters.
arXiv Detail & Related papers (2023-04-19T16:11:21Z)
Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks. DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons. We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z)
One Model for All Quantization: A Quantized Network Supporting Hot-Swap Bit-Width Adjustment [36.75157407486302]
We propose a method to train a model for all quantization that supports diverse bit-widths. We use wavelet decomposition and reconstruction to increase the diversity of weights. Our method can achieve accuracy comparable to dedicated models trained at the same precision.
arXiv Detail & Related papers (2021-05-04T08:10:50Z)
Direct Quantization for Training Highly Accurate Low Bit-width Deep Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations. First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights. Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z)
Searching for Low-Bit Weights in Quantized Neural Networks [129.8319019563356]
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators. We present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.
arXiv Detail & Related papers (2020-09-18T09:13:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.