Fugu-MT 論文翻訳(概要): Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization

論文の概要: Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization

arxiv url: http://arxiv.org/abs/2606.02288v1
Date: Mon, 01 Jun 2026 14:09:35 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:32.198184
Title: Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization
Title（参考訳）: LLMの大量スパイクはバイアスベクトル:機械論的発見とスパイクフリー量子化
Authors: Yung-Chin Chen, Chung Peng Lee, Ze-Wei Liou, Naveen Verma,
Abstract要約: 大規模言語モデル(LLM)における大規模アクティベーションスパイクは、動的範囲を延ばして量子化を著しく低下させる。これらのスパイクは、注意シンクと値状態ドレイン機構を駆動する正規化後に一定ベクトルに収束することを示す。 InSERTQUANTは、事前計算されたテンプレートベクトルを介してスパイクをクランプし、それらの関数を復元するポストトレーニング量子化フレームワークである。
参考スコア（独自算出の注目度）: 2.1915855082751894
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Massive activation spikes in Large Language Models (LLMs) severely degrade quantization by stretching dynamic ranges. While prior hypotheses characterize these as high-level scalar biases, we argue that they are merely the scalar intermediates of rigid, structural vector biases in the spike-carrying tokens. We show that these tokens converge to constant vectors after normalization that drive the attention sink and value-state drain mechanisms. We geometrically substantiate this by analyzing the coordination of projection weights: $W_K$ contrastively amplifies the vector, $W_Q$ aligns semantic tokens toward it, and $W_V$ projects it into the spectral null-space. Furthermore, we reveal that the model actively preserves these structural biases against Rotary Positional Embedding (RoPE) perturbations by localizing them in "zones of rotational stability" utilizing low-frequency bands and coherent channel pairs. Leveraging this, we propose INSERTQUANT, a post-training quantization (PTQ) framework that clamps spikes and restores their function via pre-computed template vectors. This renders activations strictly spike-free, enabling robust low-bit quantization with high fidelity. INSERTQUANT achieves parity with state-of-the-art per-tensor quantization methods on LLMs and uniquely generalizes beyond text to other modalities such as ViTs.
Abstract（参考訳）: 大規模言語モデル(LLM)における大規模アクティベーションスパイクは、動的範囲を延ばして量子化を著しく低下させる。以前の仮説ではこれらを高レベルのスカラーバイアスとして特徴づけるが、これらはスパイクキャリングトークンにおける厳密で構造的なベクトルバイアスのスカラー中間体に過ぎないと論じている。これらのトークンは、注意シンクと値状態ドレイン機構を駆動する正規化後に一定ベクトルに収束することを示す。 W_K$ はベクトルを対照的に増幅し、$W_Q$ はそれに対して意味的トークンを整列し、$W_V$ はスペクトルヌル空間に投影する。さらに,低周波帯とコヒーレントチャネルペアを用いた「回転安定帯」に局在させることにより,回転位置埋め込み(RoPE)摂動に対するこれらの構造バイアスを積極的に保存することを明らかにする。これを応用して、事前計算されたテンプレートベクターを用いて、スパイクをクランプし、関数を復元するPTQ(Post-training Quantization)フレームワークであるINSERTQUANTを提案する。これにより、アクティベーションは厳密にスパイクフリーとなり、高忠実度でロバストな低ビット量子化が可能となる。 INSERTQUANT は LLM 上の最先端のテンソル単位量子化法と同等性を達成し、テキストを超えて ViT などの他のモダリティに一意に一般化する。

論文の概要: Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization

関連論文リスト