Fugu-MT 論文翻訳(概要): Rethinking Residual Errors in Compensation-based LLM Quantization

論文の概要: Rethinking Residual Errors in Compensation-based LLM Quantization

arxiv url: http://arxiv.org/abs/2604.07955v1
Date: Thu, 09 Apr 2026 08:20:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-10 18:34:05.794168
Title: Rethinking Residual Errors in Compensation-based LLM Quantization
Title（参考訳）: 補償型LCM量子化における残留誤差の再考
Authors: Shuaiting Li, Juncan Deng, Kedong Xu, Rongtao Deng, Hong Gu, Minghan Jiang, Haibin Shen, Kejie Huang,
Abstract要約: 我々は、各量子化層の出力をその完全精度と整合させる非対称キャリブレーションプロセスを導入する。残差は前層の出力差だけでなく,各層内の補償重みと原重みとの差からも生じる。提案手法はGPTQとGPTAQの両方とシームレスに統合され,量子化性能が大幅に向上した。
参考スコア（独自算出の注目度）: 15.416446372209924
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Methods based on weight compensation, which iteratively apply quantization and weight compensation to minimize the output error, have recently demonstrated remarkable success in quantizing Large Language Models (LLMs). The representative work, GPTQ, introduces several key techniques that make such iterative methods practical for LLMs with billions of parameters. GPTAQ extends this approach by introducing an asymmetric calibration process that aligns the output of each quantized layer with its full-precision counterpart, incorporating a residual error into the weight compensation framework. In this work, we revisit the formulation of the residual error. We identify a sub-optimal calibration objective in existing methods: during the intra-layer calibration process, they align the quantized output with the output from compensated weights, rather than the true output from the original full-precision model. Therefore, we redefine the objective to precisely align the quantized model's output with the original output of the full-precision model at each step. We then reveal that the residual error originates not only from the output difference of the preceding layer but also from the discrepancy between the compensated and original weights within each layer, which we name the 'compensation-aware error'. By inheriting the neuron decomposition technique from GPTAQ, we can efficiently incorporate this compensation-aware error into the weight update process. Extensive experiments on various LLMs and quantization settings demonstrate that our proposed enhancements integrate seamlessly with both GPTQ and GPTAQ, significantly improving their quantization performance. Our code is publicly available at https://github.com/list0830/ResComp.
Abstract（参考訳）: 近年,大規模言語モデル(LLM)の定量化に成功している。代表的業績であるGPTQは、数十億のパラメータを持つLLMに対して、そのような反復的手法を実践するいくつかの重要な手法を導入している。 GPTAQは、重み補償フレームワークに残留誤差を組み込んだ、各量子化層の出力をその完全精度のものと整合させる非対称キャリブレーションプロセスを導入することにより、このアプローチを拡張した。本研究では,残差の定式化について再検討する。従来の手法では, 正解法ではなく, 補償重みからの出力と, 層内キャリブレーション過程において, 量子化された出力を一致させる。そこで我々は,各ステップにおいて,量子化モデルの出力と完全精度モデルの元の出力とを正確に整合させる目的を再定義する。次に、残差は前層の出力差だけでなく、各層内の補償重みと原重みとの相違から生じることを明らかにし、これを「補償対応誤差」と呼ぶ。 GPTAQからニューロン分解手法を継承することにより、この補償対応誤差を重み更新プロセスに効率的に組み込むことができる。 GPTQとGPTAQの両方とシームレスに統合し,量子化性能を著しく向上させた。私たちのコードはhttps://github.com/list0830/ResCompで公開されています。

論文の概要: Rethinking Residual Errors in Compensation-based LLM Quantization

関連論文リスト