StableQAT: Stable Quantization-Aware Training at Ultra-Low Bitwidths
- URL: http://arxiv.org/abs/2601.19320v1
- Date: Tue, 27 Jan 2026 08:00:57 GMT
- Title: StableQAT: Stable Quantization-Aware Training at Ultra-Low Bitwidths
- Authors: Tianyi Chen, Sihan Chen, Xiaoyi Qu, Dan Zhao, Ruomei Yan, Jongwoo Ko, Luming Liang, Pashmina Cameron,
- Abstract summary: Quantization-aware training (QAT) is essential for deploying large models under strict memory and latency constraints.<n>Common approaches based on the straight-through estimator (STE) or soft quantizers often suffer from mismatch, instability, or high computational overhead.<n>We propose StableQAT, a unified and efficient QAT framework that stabilizes training in ultra low-bit settings.
- Score: 49.94623294999562
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quantization-aware training (QAT) is essential for deploying large models under strict memory and latency constraints, yet achieving stable and robust optimization at ultra-low bitwidths remains challenging. Common approaches based on the straight-through estimator (STE) or soft quantizers often suffer from gradient mismatch, instability, or high computational overhead. As such, we propose StableQAT, a unified and efficient QAT framework that stabilizes training in ultra low-bit settings via a novel, lightweight, and theoretically grounded surrogate for backpropagation derived from a discrete Fourier analysis of the rounding operator. StableQAT strictly generalizes STE as the latter arises as a special case of our more expressive surrogate family, yielding smooth, bounded, and inexpensive gradients that improve QAT training performance and stability across various hyperparameter choices. In experiments, StableQAT exhibits stable and efficient QAT at 2-4 bit regimes, demonstrating improved training stability, robustness, and superior performance with negligible training overhead against standard QAT techniques. Our code is available at https://github.com/microsoft/StableQAT.
Related papers
- 1-Bit Wonder: Improving QAT Performance in the Low-Bit Regime through K-Means Quantization [6.530091512185435]
Quantization-aware training (QAT) is an effective method to drastically reduce the memory footprint of LLMs.<n>We show that k-means based weight quantization outperforms integer formats and can be implemented efficiently on standard hardware.
arXiv Detail & Related papers (2026-02-17T13:23:26Z) - What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study [59.44848132298657]
Post-training quantization (PTQ) usually comes with the cost of large accuracy drops, especially for reasoning tasks under low-bit settings.<n>In this study, we present a systematic empirical study of quantization-aware training (QAT) for reasoning models.
arXiv Detail & Related papers (2026-01-21T11:22:29Z) - SASQ: Static Activation Scaling for Quantization-Aware Training in Large Language Models [6.235887167172886]
We propose SASQ: a lightweight QAT framework specifically tailored for activation quantization factors.<n>On LLaMA2-7B, it achieves 5.2% lower perplexity than QuaRot and 4.7% lower perplexity than the FP16 model on WikiText2.
arXiv Detail & Related papers (2025-12-16T15:12:34Z) - CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training [73.46600457802693]
We introduce a new method that counteracts the loss induced by quantization.<n>CAGE significantly improves upon the state-of-theart methods in terms of accuracy, for similar computational cost.<n>For QAT pre-training of Llama models, CAGE matches the accuracy achieved at 4-bits (W4A4) with the prior best method.
arXiv Detail & Related papers (2025-10-21T16:33:57Z) - End-to-End On-Device Quantization-Aware Training for LLMs at Inference Cost [53.25965863436039]
Quantization-aware training (QAT) provides a more principled solution, but its reliance on backpropagation incurs prohibitive memory costs.<n>We propose ZeroQAT, a zeroth-order optimization-based QAT framework that supports both weight and activation quantization.<n>Experiments show that ZeroQAT consistently outperforms representative PTQ and QAT baselines while requiring significantly less memory.
arXiv Detail & Related papers (2025-08-21T01:18:27Z) - TensorHyper-VQC: A Tensor-Train-Guided Hypernetwork for Robust and Scalable Variational Quantum Computing [50.95799256262098]
We introduceHyper-VQC, a novel tensor-train (TT)-guided hypernetwork framework for quantum machine learning.<n>Our framework delegates the generation of quantum circuit parameters to a classical TT network, effectively decoupling optimization from quantum hardware.<n>These results positionHyper-VQC as a scalable and noise-resilient framework for advancing practical quantum machine learning on near-term devices.
arXiv Detail & Related papers (2025-08-01T23:37:55Z) - Stabilizing Quantization-Aware Training by Implicit-Regularization on Hessian Matrix [0.7261171488281837]
We find that the sharp landscape of loss, which leads to a dramatic performance drop, is an essential factor that causes instability.<n>We propose Feature-Perturbed Quantization (FPQ) to generalize and employ the feature distillation method to the quantized model.
arXiv Detail & Related papers (2025-03-14T07:56:20Z) - QuEST: Stable Training of LLMs with 1-Bit Weights and Activations [27.644652093888745]
QuEST is a new method for training sparse or quantized language models.<n>We show optimality at 4-bits and stable convergence as low as 1-bit weights and activations.<n> Experiments on Llama-type architectures show that QuEST induces stable scaling laws across the entire range of hardware-supported precisions.
arXiv Detail & Related papers (2025-02-07T15:23:34Z) - SPEQ: Offline Stabilization Phases for Efficient Q-Learning in High Update-To-Data Ratio Reinforcement Learning [51.10866035483686]
High update-to-data (UTD) ratio algorithms in reinforcement learning (RL) improve sample efficiency but incur high computational costs, limiting real-world scalability.<n>We propose Offline Stabilization Phases for Efficient Q-Learning (SPEQ), an RL algorithm that combines low-UTD online training with periodic offline stabilization phases.<n>During these phases, Q-functions are fine-tuned with high UTD ratios on a fixed replay buffer, reducing redundant updates on suboptimal data.
arXiv Detail & Related papers (2025-01-15T09:04:19Z) - In-Distribution Consistency Regularization Improves the Generalization of Quantization-Aware Training [16.475151881506914]
We propose Consistency Regularization (CR) to improve the generalization ability of Quantization-Aware Training (QAT)<n>Our approach significantly outperforms current state-of-the-art QAT methods and even the FP counterparts.
arXiv Detail & Related papers (2024-02-21T03:19:48Z) - Poster: Self-Supervised Quantization-Aware Knowledge Distillation [6.463799944811755]
Quantization-aware training (QAT) starts with a pre-trained full-precision model and performs quantization during retraining.
Existing QAT works require supervision from the labels and they suffer from accuracy loss due to reduced precision.
This paper proposes a novel Self-Supervised Quantization-Aware Knowledge Distillation framework (SQAKD)
arXiv Detail & Related papers (2023-09-22T23:52:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.