Related papers: Cat: Post-Training Quantization Error Reduction via Cluster-based Affine Transformation

Cat: Post-Training Quantization Error Reduction via Cluster-based Affine Transformation

URL: http://arxiv.org/abs/2509.26277v2
Date: Tue, 07 Oct 2025 09:44:19 GMT
Title: Cat: Post-Training Quantization Error Reduction via Cluster-based Affine Transformation
Authors: Ali Zoljodi, Radu Timofte, Masoud Daneshtalab,
Abstract summary: Post-Training Quantization (PTQ) reduces the memory footprint and computational overhead of deep neural networks by converting full-precision (FP) values into quantized and compressed data types.<n>While PTQ is more cost-efficient than Quantization-Aware Training (QAT), it is highly susceptible to accuracy degradation under a low-bit quantization regime.<n>We propose Cluster-based Affine Transformation (CAT), an error-reduction framework that employs cluster-specific parameters to align LQ outputs with FP counterparts.
Score: 47.791962198275066
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Post-Training Quantization (PTQ) reduces the memory footprint and computational overhead of deep neural networks by converting full-precision (FP) values into quantized and compressed data types. While PTQ is more cost-efficient than Quantization-Aware Training (QAT), it is highly susceptible to accuracy degradation under a low-bit quantization (LQ) regime (e.g., 2-bit). Affine transformation is a classical technique used to reduce the discrepancy between the information processed by a quantized model and that processed by its full-precision counterpart; however, we find that using plain affine transformation, which applies a uniform affine parameter set for all outputs, worsens the results in low-bit PTQ. To address this, we propose Cluster-based Affine Transformation (CAT), an error-reduction framework that employs cluster-specific parameters to align LQ outputs with FP counterparts. CAT refines LQ outputs with only a negligible number of additional parameters, without requiring fine-tuning of the model or quantization parameters. We further introduce a novel PTQ framework integrated with CAT. Experiments on ImageNet-1K show that this framework consistently outperforms prior PTQ methods across diverse architectures and LQ settings, achieving up to 53.18% Top-1 accuracy on W2A2 ResNet-18. Moreover, CAT enhances existing PTQ baselines by more than 3% when used as a plug-in. We plan to release our implementation alongside the publication of this paper.

Related papers

QuantKAN: A Unified Quantization Framework for Kolmogorov Arnold Networks [6.860988566886594]
Kolmogorov Arnold Networks (KANs) replace linear transformations with spline-based function approximations distributed along network edges.<n>KANs offer strong expressivity and interpretability, but their heterogeneous spline and base branch parameters hinder efficient quantization.<n>We present QuantKAN, a unified framework for quantizing KANs across both quantization aware training (QAT) and post-training quantization regimes.
arXiv Detail & Related papers (2025-11-24T02:05:16Z)
Beyond Outliers: A Study of Optimizers Under Quantization [82.75879062804955]
We study impact of choice on model robustness under quantization.<n>We evaluate how model performance degrades when trained with different baselines.<n>We derive scaling laws for quantization-aware training under different parameters.
arXiv Detail & Related papers (2025-09-27T21:15:22Z)
PT$^2$-LLM: Post-Training Ternarization for Large Language Models [52.4629647715623]
Large Language Models (LLMs) have shown impressive capabilities across diverse tasks, but their large memory and compute demands hinder deployment.<n>We propose PT$2$-LLM, a post-training ternarization framework tailored for LLMs.<n>At its core is an Asymmetric Ternary Quantizer equipped with a two-stage refinement pipeline.
arXiv Detail & Related papers (2025-09-27T03:01:48Z)
PTQTP: Post-Training Quantization to Trit-Planes for Large Language Models [29.616604431869746]
Post-training quantization of large language models (LLMs) to extremely low bit-widths remains challenging.<n>Existing ultra-low-bit PTQ methods rely on binary approximations or complex compensation mechanisms.<n>We introduce PTQ to Trit-Planes (PTQTP), the first ternary-weight PTQ framework that decomposes weight matrices into structured ternary -1, 0, 1 trit-planes.
arXiv Detail & Related papers (2025-09-21T09:07:20Z)
End-to-End On-Device Quantization-Aware Training for LLMs at Inference Cost [53.25965863436039]
Quantization-aware training (QAT) provides a more principled solution, but its reliance on backpropagation incurs prohibitive memory costs.<n>We propose ZeroQAT, a zeroth-order optimization-based QAT framework that supports both weight and activation quantization.<n>Experiments show that ZeroQAT consistently outperforms representative PTQ and QAT baselines while requiring significantly less memory.
arXiv Detail & Related papers (2025-08-21T01:18:27Z)
PTQAT: A Hybrid Parameter-Efficient Quantization Algorithm for 3D Perception Tasks [9.463776523295303]
Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) represent two mainstream model quantization approaches.<n>We propose PTQAT, a novel general hybrid quantization algorithm for the efficient deployment of 3D perception networks.
arXiv Detail & Related papers (2025-08-14T11:55:21Z)
FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation [55.12070409045766]
Post-training quantization (PTQ) has stood out as a cost-effective and promising model compression paradigm in recent years.<n>Current PTQ methods for Vision Transformers (ViTs) still suffer from significant accuracy degradation, especially under low-bit quantization.
arXiv Detail & Related papers (2025-06-13T07:57:38Z)
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression [55.323397702682506]
Post-training quantization (PTQ) reduces a model's memory footprint by mapping full precision weights into low bit weights without costly retraining.<n>We develop a new mixed-precision PTQ approach, Task-Circuit Quantization (TaCQ), that draws parallels to automated circuit discovery.
arXiv Detail & Related papers (2025-04-10T02:19:03Z)
PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models [64.84734437930362]
Large Language Models (LLMs) suffer severe performance degradation when facing extremely low-bit (sub 2-bit) quantization.<n>We propose an extremely low-bit PTQ method called PTQ1.61, which enables weight quantization to 1.61-bit for the first time.<n>Experiments indicate our PTQ1.61 achieves state-of-the-art performance in extremely low-bit quantization.
arXiv Detail & Related papers (2025-02-18T08:04:58Z)
AffineQuant: Affine Transformation Quantization for Large Language Models [58.45460102764]
Post-Training Quantization (PTQ) has emerged as a subject of considerable interest due to its compression efficiency and cost-effectiveness in the context of training. Existing PTQ methods for Large-scale Language Models (LLMs) limit the optimization scope to scaling transformations between pre- and post-quantization weights. In this paper, we advocate for the direct optimization using equivalent Affine transformations in PTQ (AffineQuant)
arXiv Detail & Related papers (2024-03-19T08:40:21Z)
Towards Accurate Post-training Quantization for Reparameterized Models [6.158896686945439]
Current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation. This is primarily caused by channel-specific and sample-specific outliers. We propose RepAPQ, a novel framework that preserves the accuracy of quantized reparameterization models.
arXiv Detail & Related papers (2024-02-25T15:42:12Z)
RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers [2.114921680609289]
We propose RepQ-ViT, a novel PTQ framework for vision transformers (ViTs) RepQ-ViT decouples the quantization and inference processes. It can outperform existing strong baselines and encouragingly improve the accuracy of 4-bit PTQ of ViTs to a usable level.
arXiv Detail & Related papers (2022-12-16T02:52:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.