Cat: Post-Training Quantization Error Reduction via Cluster-based Affine Transformation
- URL: http://arxiv.org/abs/2509.26277v2
- Date: Tue, 07 Oct 2025 09:44:19 GMT
- Title: Cat: Post-Training Quantization Error Reduction via Cluster-based Affine Transformation
- Authors: Ali Zoljodi, Radu Timofte, Masoud Daneshtalab,
- Abstract summary: Post-Training Quantization (PTQ) reduces the memory footprint and computational overhead of deep neural networks by converting full-precision (FP) values into quantized and compressed data types.<n>While PTQ is more cost-efficient than Quantization-Aware Training (QAT), it is highly susceptible to accuracy degradation under a low-bit quantization regime.<n>We propose Cluster-based Affine Transformation (CAT), an error-reduction framework that employs cluster-specific parameters to align LQ outputs with FP counterparts.
- Score: 47.791962198275066
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Post-Training Quantization (PTQ) reduces the memory footprint and computational overhead of deep neural networks by converting full-precision (FP) values into quantized and compressed data types. While PTQ is more cost-efficient than Quantization-Aware Training (QAT), it is highly susceptible to accuracy degradation under a low-bit quantization (LQ) regime (e.g., 2-bit). Affine transformation is a classical technique used to reduce the discrepancy between the information processed by a quantized model and that processed by its full-precision counterpart; however, we find that using plain affine transformation, which applies a uniform affine parameter set for all outputs, worsens the results in low-bit PTQ. To address this, we propose Cluster-based Affine Transformation (CAT), an error-reduction framework that employs cluster-specific parameters to align LQ outputs with FP counterparts. CAT refines LQ outputs with only a negligible number of additional parameters, without requiring fine-tuning of the model or quantization parameters. We further introduce a novel PTQ framework integrated with CAT. Experiments on ImageNet-1K show that this framework consistently outperforms prior PTQ methods across diverse architectures and LQ settings, achieving up to 53.18% Top-1 accuracy on W2A2 ResNet-18. Moreover, CAT enhances existing PTQ baselines by more than 3% when used as a plug-in. We plan to release our implementation alongside the publication of this paper.
Related papers
- QuantKAN: A Unified Quantization Framework for Kolmogorov Arnold Networks [6.860988566886594]
Kolmogorov Arnold Networks (KANs) replace linear transformations with spline-based function approximations distributed along network edges.<n>KANs offer strong expressivity and interpretability, but their heterogeneous spline and base branch parameters hinder efficient quantization.<n>We present QuantKAN, a unified framework for quantizing KANs across both quantization aware training (QAT) and post-training quantization regimes.
arXiv Detail & Related papers (2025-11-24T02:05:16Z) - Beyond Outliers: A Study of Optimizers Under Quantization [82.75879062804955]
We study impact of choice on model robustness under quantization.<n>We evaluate how model performance degrades when trained with different baselines.<n>We derive scaling laws for quantization-aware training under different parameters.
arXiv Detail & Related papers (2025-09-27T21:15:22Z) - PT$^2$-LLM: Post-Training Ternarization for Large Language Models [52.4629647715623]
Large Language Models (LLMs) have shown impressive capabilities across diverse tasks, but their large memory and compute demands hinder deployment.<n>We propose PT$2$-LLM, a post-training ternarization framework tailored for LLMs.<n>At its core is an Asymmetric Ternary Quantizer equipped with a two-stage refinement pipeline.
arXiv Detail & Related papers (2025-09-27T03:01:48Z) - PTQTP: Post-Training Quantization to Trit-Planes for Large Language Models [29.616604431869746]
Post-training quantization of large language models (LLMs) to extremely low bit-widths remains challenging.<n>Existing ultra-low-bit PTQ methods rely on binary approximations or complex compensation mechanisms.<n>We introduce PTQ to Trit-Planes (PTQTP), the first ternary-weight PTQ framework that decomposes weight matrices into structured ternary -1, 0, 1 trit-planes.
arXiv Detail & Related papers (2025-09-21T09:07:20Z) - End-to-End On-Device Quantization-Aware Training for LLMs at Inference Cost [53.25965863436039]
Quantization-aware training (QAT) provides a more principled solution, but its reliance on backpropagation incurs prohibitive memory costs.<n>We propose ZeroQAT, a zeroth-order optimization-based QAT framework that supports both weight and activation quantization.<n>Experiments show that ZeroQAT consistently outperforms representative PTQ and QAT baselines while requiring significantly less memory.
arXiv Detail & Related papers (2025-08-21T01:18:27Z) - PTQAT: A Hybrid Parameter-Efficient Quantization Algorithm for 3D Perception Tasks [9.463776523295303]
Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) represent two mainstream model quantization approaches.<n>We propose PTQAT, a novel general hybrid quantization algorithm for the efficient deployment of 3D perception networks.
arXiv Detail & Related papers (2025-08-14T11:55:21Z) - FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation [55.12070409045766]
Post-training quantization (PTQ) has stood out as a cost-effective and promising model compression paradigm in recent years.<n>Current PTQ methods for Vision Transformers (ViTs) still suffer from significant accuracy degradation, especially under low-bit quantization.
arXiv Detail & Related papers (2025-06-13T07:57:38Z) - Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression [55.323397702682506]
Post-training quantization (PTQ) reduces a model's memory footprint by mapping full precision weights into low bit weights without costly retraining.<n>We develop a new mixed-precision PTQ approach, Task-Circuit Quantization (TaCQ), that draws parallels to automated circuit discovery.
arXiv Detail & Related papers (2025-04-10T02:19:03Z) - PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models [64.84734437930362]
Large Language Models (LLMs) suffer severe performance degradation when facing extremely low-bit (sub 2-bit) quantization.<n>We propose an extremely low-bit PTQ method called PTQ1.61, which enables weight quantization to 1.61-bit for the first time.<n>Experiments indicate our PTQ1.61 achieves state-of-the-art performance in extremely low-bit quantization.
arXiv Detail & Related papers (2025-02-18T08:04:58Z) - AffineQuant: Affine Transformation Quantization for Large Language Models [58.45460102764]
Post-Training Quantization (PTQ) has emerged as a subject of considerable interest due to its compression efficiency and cost-effectiveness in the context of training.
Existing PTQ methods for Large-scale Language Models (LLMs) limit the optimization scope to scaling transformations between pre- and post-quantization weights.
In this paper, we advocate for the direct optimization using equivalent Affine transformations in PTQ (AffineQuant)
arXiv Detail & Related papers (2024-03-19T08:40:21Z) - Towards Accurate Post-training Quantization for Reparameterized Models [6.158896686945439]
Current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation.
This is primarily caused by channel-specific and sample-specific outliers.
We propose RepAPQ, a novel framework that preserves the accuracy of quantized reparameterization models.
arXiv Detail & Related papers (2024-02-25T15:42:12Z) - RepQ-ViT: Scale Reparameterization for Post-Training Quantization of
Vision Transformers [2.114921680609289]
We propose RepQ-ViT, a novel PTQ framework for vision transformers (ViTs)
RepQ-ViT decouples the quantization and inference processes.
It can outperform existing strong baselines and encouragingly improve the accuracy of 4-bit PTQ of ViTs to a usable level.
arXiv Detail & Related papers (2022-12-16T02:52:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.