Provable Post-Training Quantization: Theoretical Analysis of OPTQ and Qronos
- URL: http://arxiv.org/abs/2508.04853v1
- Date: Wed, 06 Aug 2025 20:00:40 GMT
- Title: Provable Post-Training Quantization: Theoretical Analysis of OPTQ and Qronos
- Authors: Haoyu Zhang, Shihao Zhang, Ian Colbert, Rayan Saab,
- Abstract summary: Post-training quantization (PTQ) has become a crucial tool for reducing the memory and compute costs of modern deep neural networks.<n>OPTQ framework-also known as GPTQ-has emerged as a leading method due to its computational efficiency and strong empirical performance.<n>Despite its widespread adoption, OPTQ lacks rigorous quantitative theoretical guarantees.
- Score: 11.469337174377046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Post-training quantization (PTQ) has become a crucial tool for reducing the memory and compute costs of modern deep neural networks, including large language models (LLMs). Among PTQ algorithms, the OPTQ framework-also known as GPTQ-has emerged as a leading method due to its computational efficiency and strong empirical performance. Despite its widespread adoption, however, OPTQ lacks rigorous quantitative theoretical guarantees. This paper presents the first quantitative error bounds for both deterministic and stochastic variants of OPTQ, as well as for Qronos, a recent related state-of-the-art PTQ algorithm. We analyze how OPTQ's iterative procedure induces quantization error and derive non-asymptotic 2-norm error bounds that depend explicitly on the calibration data and a regularization parameter that OPTQ uses. Our analysis provides theoretical justification for several practical design choices, including the widely used heuristic of ordering features by decreasing norm, as well as guidance for selecting the regularization parameter. For the stochastic variant, we establish stronger infinity-norm error bounds, which enable control over the required quantization alphabet and are particularly useful for downstream layers and nonlinearities. Finally, we extend our analysis to Qronos, providing new theoretical bounds, for both its deterministic and stochastic variants, that help explain its empirical advantages.
Related papers
- FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation [55.12070409045766]
Post-training quantization (PTQ) has stood out as a cost-effective and promising model compression paradigm in recent years.<n>Current PTQ methods for Vision Transformers (ViTs) still suffer from significant accuracy degradation, especially under low-bit quantization.
arXiv Detail & Related papers (2025-06-13T07:57:38Z) - PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models [64.84734437930362]
Large Language Models (LLMs) suffer severe performance degradation when facing extremely low-bit (sub 2-bit) quantization.<n>We propose an extremely low-bit PTQ method called PTQ1.61, which enables weight quantization to 1.61-bit for the first time.<n>Experiments indicate our PTQ1.61 achieves state-of-the-art performance in extremely low-bit quantization.
arXiv Detail & Related papers (2025-02-18T08:04:58Z) - Unified Stochastic Framework for Neural Network Quantization and Pruning [11.721939479875271]
This paper introduces a unified framework for post-training quantization and pruning using path-following algorithms.<n>Our approach builds on the Path Following Quantization (SPFQ) method, extending its applicability to pruning and low-bit quantization regimes.
arXiv Detail & Related papers (2024-12-24T05:38:01Z) - Benchmarking the Reliability of Post-training Quantization: a Particular
Focus on Worst-case Performance [53.45700148820669]
Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures.
Despite its effectiveness and convenience, the reliability of PTQ methods in the presence of some extrem cases such as distribution shift and data noise remains largely unexplored.
This paper first investigates this problem on various commonly-used PTQ methods.
arXiv Detail & Related papers (2023-03-23T02:55:50Z) - QFT: Post-training quantization via fast joint finetuning of all degrees
of freedom [1.1744028458220428]
We rethink quantized network parameterization in HW-aware fashion, towards a unified analysis of all quantization DoF.
Our single-step simple and extendable method, dubbed quantization-aware finetuning (QFT), achieves 4-bit weight quantization results on-par with SoTA.
arXiv Detail & Related papers (2022-12-05T22:38:58Z) - End-to-end resource analysis for quantum interior point methods and portfolio optimization [63.4863637315163]
We provide a complete quantum circuit-level description of the algorithm from problem input to problem output.
We report the number of logical qubits and the quantity/depth of non-Clifford T-gates needed to run the algorithm.
arXiv Detail & Related papers (2022-11-22T18:54:48Z) - A kernel-based quantum random forest for improved classification [0.0]
Quantum Machine Learning (QML) to enhance traditional classical learning methods has seen various limitations to its realisation.
We extend the linear quantum support vector machine (QSVM) with kernel function computed through quantum kernel estimation (QKE)
To limit overfitting, we further extend the model to employ a low-rank Nystr"om approximation to the kernel matrix.
arXiv Detail & Related papers (2022-10-05T15:57:31Z) - Theoretical Error Performance Analysis for Variational Quantum Circuit
Based Functional Regression [83.79664725059877]
In this work, we put forth an end-to-end quantum neural network, namely, TTN-VQC, for dimensionality reduction and functional regression.
We also characterize the optimization properties of TTN-VQC by leveraging the Polyak-Lojasiewicz (PL) condition.
arXiv Detail & Related papers (2022-06-08T06:54:07Z) - A Convergence Theory for Over-parameterized Variational Quantum
Eigensolvers [21.72347971869391]
The Variational Quantum Eigensolver (VQE) is a promising candidate for quantum applications on near-term Noisy Intermediate-Scale Quantum (NISQ) computers.
We provide the first rigorous analysis of the convergence of VQEs in the over- parameterization regime.
arXiv Detail & Related papers (2022-05-25T04:06:50Z) - A Statistical Framework for Low-bitwidth Training of Deep Neural
Networks [70.77754244060384]
Fully quantized training (FQT) uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model.
One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties.
arXiv Detail & Related papers (2020-10-27T13:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.