Benchmarking the Reliability of Post-training Quantization: a Particular
Focus on Worst-case Performance
- URL: http://arxiv.org/abs/2303.13003v1
- Date: Thu, 23 Mar 2023 02:55:50 GMT
- Title: Benchmarking the Reliability of Post-training Quantization: a Particular
Focus on Worst-case Performance
- Authors: Zhihang Yuan, Jiawei Liu, Jiaxiang Wu, Dawei Yang, Qiang Wu, Guangyu
Sun, Wenyu Liu, Xinggang Wang, Bingzhe Wu
- Abstract summary: Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures.
Despite its effectiveness and convenience, the reliability of PTQ methods in the presence of some extrem cases such as distribution shift and data noise remains largely unexplored.
This paper first investigates this problem on various commonly-used PTQ methods.
- Score: 53.45700148820669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Post-training quantization (PTQ) is a popular method for compressing deep
neural networks (DNNs) without modifying their original architecture or
training procedures. Despite its effectiveness and convenience, the reliability
of PTQ methods in the presence of some extrem cases such as distribution shift
and data noise remains largely unexplored. This paper first investigates this
problem on various commonly-used PTQ methods. We aim to answer several research
questions related to the influence of calibration set distribution variations,
calibration paradigm selection, and data augmentation or sampling strategies on
PTQ reliability. A systematic evaluation process is conducted across a wide
range of tasks and commonly-used PTQ paradigms. The results show that most
existing PTQ methods are not reliable enough in term of the worst-case group
performance, highlighting the need for more robust methods. Our findings
provide insights for developing PTQ methods that can effectively handle
distribution shift scenarios and enable the deployment of quantized DNNs in
real-world applications.
Related papers
- Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis [89.60263788590893]
Post-training Quantization (PTQ) technique has been extensively adopted for large language models (LLMs) compression.
Existing algorithms focus primarily on performance, overlooking the trade-off among model size, performance, and quantization bitwidth.
arXiv Detail & Related papers (2025-02-18T07:35:35Z) - Uncertainty Quantification with the Empirical Neural Tangent Kernel [12.388707890314539]
We propose a post-hoc, sampling-based UQ method for over- parameterized networks at the end of training.
We demonstrate that our method effectively approximates the posterior of a Gaussian process using the empirical Neural Tangent Kernel.
We show that our method not only outperforms competing approaches in computational efficiency (often reducing costs by multiple factors) but also maintains state-of-the-art performance across a variety of UQ metrics for both regression and classification tasks.
arXiv Detail & Related papers (2025-02-05T04:01:34Z) - Rethinking Post-Training Quantization: Introducing a Statistical Pre-Calibration Approach [22.25748046511075]
Post-training Quantization (PTQ) techniques rely on calibration processes to maintain their accuracy.
We propose a weight-adaptive PTQ method that can be considered a precursor to calibration-based PTQ methods.
We show that our proposed approach can perform on par with most common calibration-based PTQ methods.
arXiv Detail & Related papers (2025-01-15T19:44:15Z) - TTAQ: Towards Stable Post-training Quantization in Continuous Domain Adaptation [3.7024647541541014]
Post-training quantization (PTQ) reduces excessive hardware cost by quantizing full-precision models into lower bit representations on a tiny calibration set.
Traditional PTQ methods typically encounter failure in dynamic and ever-changing real-world scenarios.
We propose a novel and stable quantization process for test-time adaptation (TTA), dubbed TTAQ, to address the performance degradation of traditional PTQ.
arXiv Detail & Related papers (2024-12-13T06:34:59Z) - Distributing Quantum Computations, Shot-wise [1.2061873132374783]
NISQ era constraints, high sensitivity to noise and limited qubit count, impose significant barriers on the usability of QPUs.
We propose a methodological framework, termed shot-wise, which enables the distribution of shots for a single circuit across multiple QPUs.
arXiv Detail & Related papers (2024-11-25T16:16:54Z) - Process Reward Model with Q-Value Rankings [18.907163177605607]
Process Reward Modeling (PRM) is critical for complex reasoning and decision-making tasks.
We introduce the Process Q-value Model (PQM), a novel framework that redefines PRM in the context of a Markov Decision Process.
PQM optimize Q-value rankings based on a novel comparative loss function, enhancing the model's ability to capture the intricate dynamics among sequential decisions.
arXiv Detail & Related papers (2024-10-15T05:10:34Z) - Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment [49.36799270585947]
No-reference point cloud quality assessment (NR-PCQA) aims to automatically evaluate the perceptual quality of distorted point clouds without available reference.
We propose a novel contrastive pre-training framework tailored for PCQA (CoPA)
Our method outperforms the state-of-the-art PCQA methods on popular benchmarks.
arXiv Detail & Related papers (2024-03-15T07:16:07Z) - On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts.
We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z) - PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language
Models [52.09865918265002]
We propose a novel quantize before fine-tuning'' framework, PreQuant.
PreQuant is compatible with various quantization strategies, with outlier-aware fine-tuning incorporated to correct the induced quantization error.
We demonstrate the effectiveness of PreQuant on the GLUE benchmark using BERT, RoBERTa, and T5.
arXiv Detail & Related papers (2023-05-30T08:41:33Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.