Fugu-MT 論文翻訳(概要): State Rank Dynamics in Linear Attention LLMs

論文の概要: State Rank Dynamics in Linear Attention LLMs

arxiv url: http://arxiv.org/abs/2602.02195v1
Date: Mon, 02 Feb 2026 15:00:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-03 19:28:34.237167
Title: State Rank Dynamics in Linear Attention LLMs
Title（参考訳）: リニアアテンションLDMにおける状態ランクダイナミクス
Authors: Ao Sun, Hongtao Zhang, Heng Zhou, Yixuan Ma, Yiran Qin, Tongrui Su, Yan Liu, Zhanyu Ma, Jun Xu, Jiuchong Gao, Jinghua Hao, Renqing He,
Abstract要約: 州の階級階層化は、線形アテンションヘッド間で異なるスペクトル分岐によって特徴づけられる。低ランクの頭部はモデル推論に欠かせないが、高ランクの頭部は顕著な冗長性を示す。我々は,KVキャッシュのオーバーヘッドを38.9%削減し,モデル精度を大きく維持するゼロショット戦略であるJoint Rank-Norm Pruningを提案する。
参考スコア（独自算出の注目度）: 37.607046806053035
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Linear Attention Large Language Models (LLMs) offer a compelling recurrent formulation that compresses context into a fixed-size state matrix, enabling constant-time inference. However, the internal dynamics of this compressed state remain largely opaque. In this work, we present a comprehensive study on the runtime state dynamics of state-of-the-art Linear Attention models. We uncover a fundamental phenomenon termed State Rank Stratification, characterized by a distinct spectral bifurcation among linear attention heads: while one group maintains an effective rank oscillating near zero, the other exhibits rapid growth that converges to an upper bound. Extensive experiments across diverse inference contexts reveal that these dynamics remain strikingly consistent, indicating that the identity of a head,whether low-rank or high-rank,is an intrinsic structural property acquired during pre-training, rather than a transient state dependent on the input data. Furthermore, our diagnostic probes reveal a surprising functional divergence: low-rank heads are indispensable for model reasoning, whereas high-rank heads exhibit significant redundancy. Leveraging this insight, we propose Joint Rank-Norm Pruning, a zero-shot strategy that achieves a 38.9\% reduction in KV-cache overhead while largely maintaining model accuracy.
Abstract（参考訳）: 線形注意大言語モデル(LLM)は、コンテキストを固定サイズの状態行列に圧縮し、定数時間推論を可能にする、説得力のあるリカレントな定式化を提供する。しかし、この圧縮状態の内部力学はほとんど不透明である。本研究では,現状の線形注意モデルにおける実行時状態のダイナミクスについて包括的に研究する。一方の群はゼロ付近で発振する有効ランクを維持し,他方の群は上界に収束する急激な成長を示す。多様な推論コンテキストにわたる広範囲な実験により、これらのダイナミクスは著しく一貫したままであり、低ランクでも高ランクでも、入力データに依存する過渡状態ではなく、事前学習時に得られる固有の構造的特性であることを示す。さらに,低位頭部はモデル推論に欠かせないが,高位頭部は有意な冗長性を示す。この知見を生かして、モデル精度を大きく維持しつつ、KVキャッシュオーバーヘッドを38.9%削減するゼロショット戦略であるJoint Rank-Norm Pruningを提案する。

論文の概要: State Rank Dynamics in Linear Attention LLMs

関連論文リスト