Fugu-MT 論文翻訳(概要): WinQ: Accelerating Quantization-Aware Training of Language Models Around Saddle Points

論文の概要: WinQ: Accelerating Quantization-Aware Training of Language Models Around Saddle Points

arxiv url: http://arxiv.org/abs/2605.17471v1
Date: Sun, 17 May 2026 14:20:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:48.107735
Title: WinQ: Accelerating Quantization-Aware Training of Language Models Around Saddle Points
Title（参考訳）: WinQ:サドルポイント周辺の言語モデルの量子化を加速する訓練
Authors: Dongyue Li, Zechun Liu, Kai Yi, Zhenshuo Zhang, Changsheng Zhao, Raghuraman Krishnamoorthi, Harshit Khaitan, Hongyang R. Zhang, Steven Li,
Abstract要約: 量子化対応トレーニングは、言語モデルの量子化に広く採用されている。主なボトルネックは、その緩やかな収束と初期のパフォーマンス高原である。我々は、QATを高速化するWinQと呼ばれるアルゴリズムを提案する。
参考スコア（独自算出の注目度）: 33.435461430591126
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Quantization-aware training (QAT) is widely adopted to quantize language models by training full-precision weights using gradients from the quantized model. The main bottleneck is its slow convergence and early performance plateau, particularly below 4-bit-widths. While this problem has been observed in prior work, its precise cause remains unclear. In this paper, we analyze the convergence of QAT by estimating the spectrum of the loss-surface Hessians. We find that the weights converge to flat regions around saddle points, where a large fraction of the Hessian eigenvalues are both positive and negative. During training, an increasing fraction of Hessian eigenvalues concentrates around zero, whose magnitude decreases. At lower bit-widths, the magnitude of eigenvalues in the Hessian spectrum is significantly smaller. To mitigate these issues, we propose an algorithm called WinQ to accelerate QAT, which involves: (1) periodically resetting weights to the linear interpolation of full-precision and quantized weights, reducing the distance to the quantization grid and increasing eigenvalue magnitude, and (2) computing gradients of noise-injected weights to regularize the Hessian. Extensive experiments show that WinQ accelerates QAT by up to 4 times across various quantization methods and models. Under the same training cost, WinQ improves state-of-the-art sub-4-bit quantization by up to 8.8%. These results are consistent across 16 settings with different language models, quantization methods, and bit widths.
Abstract（参考訳）: 量子化対応学習(QAT)は、量子化モデルからの勾配を用いて全精度重みを訓練することにより、言語モデルの量子化に広く採用されている。主なボトルネックは、収束が遅く、特に4ビット幅未満の初期のパフォーマンス高原である。この問題は以前の研究で指摘されてきたが、正確な原因は不明である。本稿では,損失面ヘッセンのスペクトルを推定し,QATの収束度を解析する。重みはサドル点の周りの平坦な領域に収束し、ヘッセン固有値の大部分が正と負の両方である。訓練中、ヘッセン固有値の増加分は0の周りに集中し、その大きさは減少する。低ビット幅では、ヘッセンスペクトルの固有値の大きさは著しく小さい。これらの問題を緩和するために,本研究では,(1)完全精度と量子化重みの線形補間に重みを周期的にリセットし,量子化格子までの距離を小さくし,固有値等級を増大させるWinQと呼ばれるアルゴリズムを提案し,(2)Hessianを正規化するためのノイズ注入重みの勾配を計算する。大規模な実験により、WinQは様々な量子化法とモデルでQATを最大4倍加速することが示された。同じトレーニングコストの下で、WinQは最先端のサブ4ビット量子化を最大8.8%改善する。これらの結果は、異なる言語モデル、量子化方法、ビット幅で16の設定で一致している。

論文の概要: WinQ: Accelerating Quantization-Aware Training of Language Models Around Saddle Points

関連論文リスト