Fugu-MT 論文翻訳(概要): A Residual-Aware Theory of Position Bias in Transformers

論文の概要: A Residual-Aware Theory of Position Bias in Transformers

arxiv url: http://arxiv.org/abs/2602.16837v1
Date: Wed, 18 Feb 2026 20:01:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-20 15:21:28.324936
Title: A Residual-Aware Theory of Position Bias in Transformers
Title（参考訳）: 変圧器の位置バイアスの残留認識理論
Authors: Hanna Herasimchyk, Robin Labryga, Tomislav Prusina, Sören Laue,
Abstract要約: 我々はTransformerモデルがトークンの位置を体系的に好んでいることを示す。因果変換器は早期・後期のトークンに集中してU字型位置バイアスを生じさせることを示す。この結果は、Lost-in-the-Middle現象の原則的なアーキテクチャ的説明を提供する。
参考スコア（独自算出の注目度）: 2.9332247106953098
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer models systematically favor certain token positions, yet the architectural origins of this position bias remain poorly understood. Under causal masking at infinite depth, prior theoretical analyses of attention rollout predict an inevitable collapse of attention onto the first token. Such collapse, however, does not occur in practice. We resolve this discrepancy with a residual-aware theory of cumulative attention rollout. By incorporating residual connections, we show that this architectural component prevents collapse under realistic conditions. At finite depth, we prove that causal Transformers induce a U-shaped position bias, with attention concentrating on early and late tokens. This result provides a principled architectural explanation for the Lost-in-the-Middle phenomenon.
Abstract（参考訳）: トランスフォーマーモデルは特定のトークンの位置を体系的に好んでいるが、この位置バイアスのアーキテクチャ的起源はよく分かっていない。無限深度での因果マスクの下では、注意ロールアウトの理論解析により、最初のトークンへの注意の必然的崩壊が予測される。しかし、実際にはそのような崩壊は起こらない。我々はこの矛盾を累積的注意ロールアウトの残留認識理論で解決する。残余接続を組み込むことにより,この構造成分が現実的な条件下での崩壊を防止することを示す。有限深さでは、因果変換器が初期および後期のトークンに集中してU字型位置バイアスを生じさせることが証明される。この結果は、Lost-in-the-Middle現象の原則的なアーキテクチャ的説明を提供する。

論文の概要: A Residual-Aware Theory of Position Bias in Transformers

関連論文リスト