Related papers: Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance

Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance

URL: http://arxiv.org/abs/2303.13003v1
Date: Thu, 23 Mar 2023 02:55:50 GMT
Title: Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance
Authors: Zhihang Yuan, Jiawei Liu, Jiaxiang Wu, Dawei Yang, Qiang Wu, Guangyu Sun, Wenyu Liu, Xinggang Wang, Bingzhe Wu
Abstract summary: Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures. Despite its effectiveness and convenience, the reliability of PTQ methods in the presence of some extrem cases such as distribution shift and data noise remains largely unexplored. This paper first investigates this problem on various commonly-used PTQ methods.
Score: 53.45700148820669
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures. Despite its effectiveness and convenience, the reliability of PTQ methods in the presence of some extrem cases such as distribution shift and data noise remains largely unexplored. This paper first investigates this problem on various commonly-used PTQ methods. We aim to answer several research questions related to the influence of calibration set distribution variations, calibration paradigm selection, and data augmentation or sampling strategies on PTQ reliability. A systematic evaluation process is conducted across a wide range of tasks and commonly-used PTQ paradigms. The results show that most existing PTQ methods are not reliable enough in term of the worst-case group performance, highlighting the need for more robust methods. Our findings provide insights for developing PTQ methods that can effectively handle distribution shift scenarios and enable the deployment of quantized DNNs in real-world applications.

Related papers

FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation [55.12070409045766]
Post-training quantization (PTQ) has stood out as a cost-effective and promising model compression paradigm in recent years.<n>Current PTQ methods for Vision Transformers (ViTs) still suffer from significant accuracy degradation, especially under low-bit quantization.
arXiv Detail & Related papers (2025-06-13T07:57:38Z)
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization [0.0]
Post-training quantization has emerged as a widely used technique for compressing large language models (LLMs) without retraining. The accumulation of quantization errors across layers significantly degrades performance, particularly in low-bit regimes. We propose Quantization Error propagation (QEP), a lightweight and general framework that enhances layer-wise PTQ by explicitly propagating the quantization error.
arXiv Detail & Related papers (2025-04-13T15:56:00Z)
APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers [71.2294205496784]
We propose textbfAPHQ-ViT, a novel PTQ approach based on importance estimation with Average Perturbation Hessian (APH) We show that APHQ-ViT using linear quantizers outperforms existing PTQ methods by substantial margins in 3-bit and 4-bit across different vision tasks.
arXiv Detail & Related papers (2025-04-03T11:48:56Z)
Uncertainty Quantification with the Empirical Neural Tangent Kernel [12.388707890314539]
We propose a post-hoc, sampling-based UQ method for over- parameterized networks at the end of training. We demonstrate that our method effectively approximates the posterior of a Gaussian process using the empirical Neural Tangent Kernel. We show that our method not only outperforms competing approaches in computational efficiency (often reducing costs by multiple factors) but also maintains state-of-the-art performance across a variety of UQ metrics for both regression and classification tasks.
arXiv Detail & Related papers (2025-02-05T04:01:34Z)
Rethinking Post-Training Quantization: Introducing a Statistical Pre-Calibration Approach [22.25748046511075]
Post-training Quantization (PTQ) techniques rely on calibration processes to maintain their accuracy. We propose a weight-adaptive PTQ method that can be considered a precursor to calibration-based PTQ methods. We show that our proposed approach can perform on par with most common calibration-based PTQ methods.
arXiv Detail & Related papers (2025-01-15T19:44:15Z)
TTAQ: Towards Stable Post-training Quantization in Continuous Domain Adaptation [3.7024647541541014]
Post-training quantization (PTQ) reduces excessive hardware cost by quantizing full-precision models into lower bit representations on a tiny calibration set. Traditional PTQ methods typically encounter failure in dynamic and ever-changing real-world scenarios. We propose a novel and stable quantization process for test-time adaptation (TTA), dubbed TTAQ, to address the performance degradation of traditional PTQ.
arXiv Detail & Related papers (2024-12-13T06:34:59Z)
Distributing Quantum Computations, Shot-wise [1.2061873132374783]
NISQ era constraints, high sensitivity to noise and limited qubit count, impose significant barriers on the usability of QPUs. We propose a methodological framework, termed shot-wise, which enables the distribution of shots for a single circuit across multiple QPUs.
arXiv Detail & Related papers (2024-11-25T16:16:54Z)
Process Reward Model with Q-Value Rankings [18.907163177605607]
Process Reward Modeling (PRM) is critical for complex reasoning and decision-making tasks. We introduce the Process Q-value Model (PQM), a novel framework that redefines PRM in the context of a Markov Decision Process. PQM optimize Q-value rankings based on a novel comparative loss function, enhancing the model's ability to capture the intricate dynamics among sequential decisions.
arXiv Detail & Related papers (2024-10-15T05:10:34Z)
Attention-aware Post-training Quantization without Backpropagation [11.096116957844014]
Quantization is a promising solution for deploying large-scale language models on resource-constrained devices. Existing quantization approaches rely on gradient-based optimization. We propose a novel PTQ algorithm that considers inter-layer dependencies without relying on backpropagation.
arXiv Detail & Related papers (2024-06-19T11:53:21Z)
Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment [49.36799270585947]
No-reference point cloud quality assessment (NR-PCQA) aims to automatically evaluate the perceptual quality of distorted point clouds without available reference. We propose a novel contrastive pre-training framework tailored for PCQA (CoPA) Our method outperforms the state-of-the-art PCQA methods on popular benchmarks.
arXiv Detail & Related papers (2024-03-15T07:16:07Z)
EDA-DM: Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models [8.742501879586309]
Quantization can effectively reduce model complexity, and post-training quantization (PTQ) is highly promising for compressing and accelerating diffusion models.<n>Existing PTQ methods suffer from distribution mismatch issues at both calibration sample level and reconstruction output level.<n>We propose EDA-DM, a standardized PTQ method that efficiently addresses the above issues.
arXiv Detail & Related papers (2024-01-09T14:42:49Z)
On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts. We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z)
PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models [52.09865918265002]
We propose a novel quantize before fine-tuning'' framework, PreQuant. PreQuant is compatible with various quantization strategies, with outlier-aware fine-tuning incorporated to correct the induced quantization error. We demonstrate the effectiveness of PreQuant on the GLUE benchmark using BERT, RoBERTa, and T5.
arXiv Detail & Related papers (2023-05-30T08:41:33Z)
Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective [74.48124653728422]
Post-training quantization (PTQ) is widely regarded as one of the most efficient compression methods practically. We argue that an overlooked problem of oscillation is in the PTQ methods.
arXiv Detail & Related papers (2023-03-21T14:52:52Z)
Parameter-Parallel Distributed Variational Quantum Algorithm [7.255056332088222]
Variational quantum algorithms (VQAs) have emerged as a promising near-term technique to explore practical quantum advantage on noisy devices. Here, we propose a parameter-parallel distributed variational quantum algorithm (PPD-VQA) to accelerate the training process by parameter-parallel training with multiple quantum processors. The achieved results suggest that the PPD-VQA could provide a practical solution for coordinating multiple quantum processors to handle large-scale real-word applications.
arXiv Detail & Related papers (2022-07-31T15:09:12Z)
Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods. Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.