Fugu-MT 論文翻訳(概要): InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization

論文の概要: InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization

arxiv url: http://arxiv.org/abs/2605.26175v1
Date: Mon, 25 May 2026 05:34:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-27 17:51:41.27057
Title: InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization
Title（参考訳）: InfoQuant: 低ビットLDM量子化のための活性化分布の形成
Authors: Ke Li, Dong An, Xiaoling Zang, Can Ye, Liang Xie, Qibo Qiu, Chen Shen, Xiaofei He, Wenxiao Wang,
Abstract要約: 低ビットアクティベーション量子化は、大規模言語モデル(LLM)デプロイメントにおいて依然として大きなボトルネックとなっている。既存のトレーニング後の量子化手法は、ピーク、バランスチャネル、再構築エラーを最小限に抑える。我々は,アクティベーション変換を量化器対応分布設計として再キャストし,情報理論の観点から量子化誤差を解析する。我々は,Pak Suppression Orthogonal Transformation (PSOT) を用いて,アクティベーションをより量子化しやすい分布に整形する列車フリー手法であるInfoQuantを提案する。
参考スコア（独自算出の注目度）: 16.236156118201116
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Low-bit activation quantization remains a major bottleneck in efficient large language model (LLM) deployment. The difficulty is not only that activations contain outliers, but that their distributions are often poorly matched to a low-bit uniform quantizer. Existing post-training quantization (PTQ) methods suppress peaks, balance channels, or minimize reconstruction error, yet they rarely specify what activation distribution is actually easy to discretize. As a result, activations may appear numerically smoother while still incurring large quantization error because the quantization range remains wide or most values collapse into a few levels near the mean. We recast activation transformation as quantizer-facing distribution design and analyze quantization error from an information-theoretic perspective. Our analysis shows that quantization-friendly activations should jointly have a smaller numerical range and sufficient dispersion within that range. Guided by this analysis, we propose InfoQuant, a train-free method that employs Peak Suppression Orthogonal Transformation (PSOT) to shape activations into more quantization-friendly distributions. We further introduce adaptive outlier-token selection to improve the robustness of PSOT during optimization. Across multiple LLM families, InfoQuant consistently outperforms prior PTQ and end-to-end training baselines. Under W4A4KV4, it preserves 97% of floating-point accuracy on average and reduces the LLaMA-2 13B performance gap by 42% over the previous state of the art. Code is available at [https://github.com/LLIKKE/InfoQuant](https://github.com/LLIKKE/InfoQuant)
Abstract（参考訳）: 低ビットアクティベーション量子化は、効率的な大規模言語モデル(LLM)デプロイメントにおいて、依然として大きなボトルネックとなっている。難しいのは、アクティベーションが外れ値を含むだけでなく、それらの分布が低ビットの均一量子化器とよく一致しないことである。既存のトレーニング後の量子化(PTQ)手法はピーク、バランスチャネル、再構成エラーの最小化を抑えるが、どのアクティベーション分布が実際に識別し易いかを特定することは滅多にない。その結果、量子化範囲が広いか、ほとんどの値が平均付近のいくつかのレベルに崩壊するため、大きな量子化誤差を発生させながら、アクティベーションは数値的に滑らかに現れる。我々は,アクティベーション変換を量化器対応分布設計として再キャストし,情報理論の観点から量子化誤差を解析する。分析の結果、量子化フレンドリなアクティベーションは、その範囲内でより小さい数値範囲と十分な分散を持つべきであることが示された。そこで本研究では,Pak Suppression Orthogonal Transformation (PSOT) を用いて,アクティベーションをより量子化しやすい分布に変換する列車自由化手法である InfoQuant を提案する。さらに,最適化時のPSOTのロバスト性を改善するために,適応型外乱選択を導入する。複数のLLMファミリで、InfoQuantはPTQ以前のトレーニングベースラインとエンドツーエンドのトレーニングベースラインを一貫して上回っている。 W4A4KV4では、平均で浮動小数点精度の97%を維持し、LLaMA-2 13Bのパフォーマンスギャップを42%削減する。コードは[https://github.com/LLIKKE/InfoQuant](https://github.com/LLIKKE/InfoQuant]で入手できる。

論文の概要: InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization

関連論文リスト