FAQ: Mitigating Quantization Error via Regenerating Calibration Data with Family-Aware Quantization
- URL: http://arxiv.org/abs/2601.11200v1
- Date: Fri, 16 Jan 2026 11:22:23 GMT
- Title: FAQ: Mitigating Quantization Error via Regenerating Calibration Data with Family-Aware Quantization
- Authors: Haiyang Xiao, Weiqing Li, Jinyue Guo, Guochao Jiang, Guohua Liu, Yuewei Zhang,
- Abstract summary: textbfFAQ (Family-Aware Quantization) is a calibration data regeneration framework.<n>It regenerates a series of high-fidelity calibration data using a highly consistent knowledge system.<n>It reduces accuracy loss by up to 28.5% compared to the baseline with original calibration data.
- Score: 9.164335834135551
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Although post-training quantization (PTQ) provides an efficient numerical compression scheme for deploying large language models (LLMs) on resource-constrained devices, the representativeness and universality of calibration data remain a core bottleneck in determining the accuracy of quantization parameters. Traditional PTQ methods typically rely on limited samples, making it difficult to capture the activation distribution during the inference phase, leading to biases in quantization parameters. To address this, we propose \textbf{FAQ} (Family-Aware Quantization), a calibration data regeneration framework that leverages prior knowledge from LLMs of the same family to generate high-fidelity calibration samples. Specifically, FAQ first inputs the original calibration samples into a larger LLM from the same family as the target model, regenerating a series of high-fidelity calibration data using a highly consistent knowledge system. Subsequently, this data, carrying Chain-of-Thought reasoning and conforming to the expected activation distribution, undergoes group competition under expert guidance to select the best samples, which are then re-normalized to enhance the effectiveness of standard PTQ. Experiments on multiple model series, including Qwen3-8B, show that FAQ reduces accuracy loss by up to 28.5\% compared to the baseline with original calibration data, demonstrating its powerful potential and contribution.
Related papers
- Regularized Calibration with Successive Rounding for Post-Training Quantization [32.31386646428613]
Post-training quantization (PTQ) enables efficient inference by mapping pretrained weights to low-bit formats without retraining.<n>We show that interpolating between symmetric and asymmetric calibration acts as a form of regularization.<n>We derive a simple successive rounding procedure that naturally incorporates asymmetric calibration.
arXiv Detail & Related papers (2026-02-05T17:18:02Z) - Enhancing Post-Training Quantization via Future Activation Awareness [84.76726857601753]
Post-training quantization (PTQ) is a widely used method to compress large language models (LLMs) without fine-tuning.<n>We propose Future-Aware Quantization (FAQ), which leverages future-layer activations to guide quantization.<n>FAQ consistently outperforms prior methods with negligible extra cost, requiring no backward passes, data reconstruction, or tuning.
arXiv Detail & Related papers (2026-01-28T12:03:30Z) - Structured Matrix Scaling for Multi-Class Calibration [48.07988618116422]
Post-hoc recalibration methods are widely used to ensure that classifiers provide faithful probability estimates.<n>We argue that parametric recalibration functions based on logistic regression can be motivated from a simple theoretical setting for both binary and multiclass classification.
arXiv Detail & Related papers (2025-11-05T18:09:14Z) - Beyond Outliers: A Study of Optimizers Under Quantization [82.75879062804955]
We study impact of choice on model robustness under quantization.<n>We evaluate how model performance degrades when trained with different baselines.<n>We derive scaling laws for quantization-aware training under different parameters.
arXiv Detail & Related papers (2025-09-27T21:15:22Z) - End-to-End On-Device Quantization-Aware Training for LLMs at Inference Cost [53.25965863436039]
Quantization-aware training (QAT) provides a more principled solution, but its reliance on backpropagation incurs prohibitive memory costs.<n>We propose ZeroQAT, a zeroth-order optimization-based QAT framework that supports both weight and activation quantization.<n>Experiments show that ZeroQAT consistently outperforms representative PTQ and QAT baselines while requiring significantly less memory.
arXiv Detail & Related papers (2025-08-21T01:18:27Z) - Rethinking Post-Training Quantization: Introducing a Statistical Pre-Calibration Approach [22.25748046511075]
Post-training Quantization (PTQ) techniques rely on calibration processes to maintain their accuracy.<n>We propose a weight-adaptive PTQ method that can be considered a precursor to calibration-based PTQ methods.<n>We show that our proposed approach can perform on par with most common calibration-based PTQ methods.
arXiv Detail & Related papers (2025-01-15T19:44:15Z) - TTAQ: Towards Stable Post-training Quantization in Continuous Domain Adaptation [3.7024647541541014]
Post-training quantization (PTQ) reduces excessive hardware cost by quantizing full-precision models into lower bit representations on a tiny calibration set.<n>Traditional PTQ methods typically encounter failure in dynamic and ever-changing real-world scenarios.<n>We propose a novel and stable quantization process for test-time adaptation (TTA), dubbed TTAQ, to address the performance degradation of traditional PTQ.
arXiv Detail & Related papers (2024-12-13T06:34:59Z) - Towards Accurate Post-training Quantization for Reparameterized Models [6.158896686945439]
Current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation.
This is primarily caused by channel-specific and sample-specific outliers.
We propose RepAPQ, a novel framework that preserves the accuracy of quantized reparameterization models.
arXiv Detail & Related papers (2024-02-25T15:42:12Z) - Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency.
Results show that consistency-based calibration methods outperform existing post-hoc approaches.
We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z) - Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing [49.800746112114375]
We propose a novel post-training quantization method (Progressive and Relaxing) for text-to-image diffusion models.
We are the first to achieve quantization for Stable Diffusion XL while maintaining the performance.
arXiv Detail & Related papers (2023-11-10T09:10:09Z) - PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language
Models [52.09865918265002]
We propose a novel quantize before fine-tuning'' framework, PreQuant.
PreQuant is compatible with various quantization strategies, with outlier-aware fine-tuning incorporated to correct the induced quantization error.
We demonstrate the effectiveness of PreQuant on the GLUE benchmark using BERT, RoBERTa, and T5.
arXiv Detail & Related papers (2023-05-30T08:41:33Z) - Sharp Calibrated Gaussian Processes [58.94710279601622]
State-of-the-art approaches for designing calibrated models rely on inflating the Gaussian process posterior variance.
We present a calibration approach that generates predictive quantiles using a computation inspired by the vanilla Gaussian process posterior variance.
Our approach is shown to yield a calibrated model under reasonable assumptions.
arXiv Detail & Related papers (2023-02-23T12:17:36Z) - Data Quality-aware Mixed-precision Quantization via Hybrid Reinforcement
Learning [22.31766292657812]
Mixed-precision quantization mostly predetermines the model bit-width settings before actual training.
We propose a novel Data Quality-aware Mixed-precision Quantization framework, dubbed DQMQ, to dynamically adapt quantization bit-widths to different data qualities.
arXiv Detail & Related papers (2023-02-09T06:14:00Z) - Calibrate and Prune: Improving Reliability of Lottery Tickets Through
Prediction Calibration [40.203492372949576]
Supervised models with uncalibrated confidences tend to be overconfident even when making wrong prediction.
We study how explicit confidence calibration in the over- parameterized network impacts the quality of the resulting lottery tickets.
Our empirical studies reveal that including calibration mechanisms consistently lead to more effective lottery tickets.
arXiv Detail & Related papers (2020-02-10T15:42:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.