Fugu-MT 論文翻訳(概要): Lost in the Middle at Birth: An Exact Theory of Transformer Position Bias

論文の概要: Lost in the Middle at Birth: An Exact Theory of Transformer Position Bias

arxiv url: http://arxiv.org/abs/2603.10123v1
Date: Tue, 10 Mar 2026 18:01:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 16:22:32.641516
Title: Lost in the Middle at Birth: An Exact Theory of Transformer Position Bias
Title（参考訳）: 出生時中盤の喪失 : トランスフォーマー位置バイアスの厳密な理論
Authors: Borun D Chowdhury,
Abstract要約: この論文は、ミドル・イン・ザ・ロスト・イン・ザ・ミドル現象について、単一の正確な主張をしている。それは、学習したSoftmaxアーティファクトや、RoPEのような位置エンコーディングの遠隔デカイに広く帰結している。トレーニングされていないQwen2アーキテクチャとGPT-2アーキテクチャがStep0でこのU字形を示し、RoPEと同一か否かを示す。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The ``Lost in the Middle'' phenomenon -- a U-shaped performance curve where LLMs retrieve well from the beginning and end of a context but fail in the middle -- is widely attributed to learned Softmax artifacts or the distance-decay of positional encodings like RoPE. This paper makes a single, precise claim: \emph{the U-shape is already present at initialization, before any training or positional encoding takes effect.} It is an inherent geometric property of the causal decoder with residual connections. We model multi-layer causal attention as iterated powers of the Cesàro matrix and derive the exact closed-form influence density in the continuous limit. Causal masking forces a logarithmic divergence of gradient influence at the start of the prompt (the Primacy Tail), while residual connections create an isolated $\mathcal{O}(1)$ anchor at the final token (the Recency Delta). Between these extremes lies a factorial dead zone of order $\mathcal{O}(1/(H{-}1)!)$, where $H$ is the network depth, making middle-context retrieval and training structurally hostile. We validate empirically that untrained Qwen2 and GPT-2 architectures exhibit this U-shape at Step~0, and that it is identical with or without RoPE. Comparing initialized and pretrained networks, we show that standard training does not overcome the topological valley, confirming that the U-shape persists as an architectural baseline under standard pretraining objectives. We do not claim that this bias is insurmountable, nor that interventions such as RoPE modifications are useless. We establish what the baseline is and where it comes from, so that future efforts to overcome it can be precisely targeted.
Abstract（参考訳）: ロスト・イン・ザ・ミドル(Lost in the Middle')現象(LLMがコンテキストの開始と終了からうまく回復するが、中央で失敗するU字型のパフォーマンス曲線)は、学習したSoftmaxアーティファクトや、RoPEのような位置エンコーディングの遠隔デカイによるものと広く考えられている。本稿では、訓練や位置符号化が効果を発揮する前に、すでに初期化時に \emph{the U-shape が存在するという、単一の正確な主張を行う。 } 残留接続を有する因果デコーダの固有幾何学的性質である。我々は、チェサロ行列の反復パワーとして多層因果注意をモデル化し、連続極限における正確な閉形式影響密度を導出する。因果マスクはプロンプトの開始時に勾配の影響の対数的ばらつき(プリマシー・テール)を強制し、残りの接続は最後のトークン(Recency Delta)で$\mathcal{O}(1)$アンカーを分離する。これらの極端の間には$\mathcal{O}(1/(H{-}1)! ここで$H$はネットワークの深さであり、中間コンテキストの検索と構造的に敵対的なトレーニングを行います。我々は、トレーニングされていないQwen2アーキテクチャとGPT-2アーキテクチャがステップ~0でこのU字型を示し、RoPEと同一か否かを実証的に検証する。初期化ネットワークと事前学習ネットワークを比較すると,標準トレーニングはトポロジカル・バレーを越えず,U字型が標準事前学習対象のアーキテクチャベースラインとして維持されることを確認した。我々は、このバイアスが克服不可能であり、また、RoPE修正のような介入が役に立たないと主張する。ベースラインが何か、どこから来たのかを確立することで、それを克服するための今後の取り組みを正確に目標にすることができるのです。

論文の概要: Lost in the Middle at Birth: An Exact Theory of Transformer Position Bias

関連論文リスト