Fugu-MT 論文翻訳(概要): The Position Curse: LLMs Struggle to Locate the Last Few Items in a List

論文の概要: The Position Curse: LLMs Struggle to Locate the Last Few Items in a List

arxiv url: http://arxiv.org/abs/2605.07127v1
Date: Fri, 08 May 2026 02:04:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 19:43:38.736751
Title: The Position Curse: LLMs Struggle to Locate the Last Few Items in a List
Title（参考訳）: 位置曲線: LLMs Struggle to Locate the Last few Items in a List
Authors: Zhanqi Zhang, Hua-Dong Xiong, Robert C. Wilson, Mikio Aoi, Marcelo G. Mattar, Li Ji-An,
Abstract要約: 私たちはこの失敗を位置曲線と呼んでいる。例えば、2行のコードスニペットであっても、Claude Opus 4.6は多くの場合、第2から第2の行を誤識別する。ポストトレーニングによってこの能力が救えるかどうかを調べるため、位置中心のトレーニングデータセットであるPosBenchを構築した。
参考スコア（独自算出の注目度）: 2.1427692215471263
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern large language models (LLMs) can find a needle in a haystack (locating a single relevant fact buried among hundreds of thousands of irrelevant tokens) with near-saturated accuracy, yet fail to retrieve the last few items in a short list. We call this failure the Position Curse. For instance, even in a two-line code snippet, Claude Opus 4.6 misidentifies the second-to-last line most of the time. To characterize this failure, we evaluated two complementary queries: given a position in a sequence (of letters or words), retrieve the corresponding item; and given an item, return its position. Each position is specified as a forward or backward offset from an anchor, either an endpoint of the list (its start or end) or another item in the list. Across both open-source and frontier closed-source models, backward retrieval substantially lags forward retrieval. To test whether this capability can be rescued by post-training, we constructed PosBench, a position-focused training dataset. LoRA fine-tuning improves both forward and backward retrieval and generalizes to a held-out code-understanding benchmark (PyIndex), yet absolute performance remains far from saturated. As LLM coding agents increasingly operate over large codebases where precise indexing becomes essential for code understanding and editing, position-based retrieval emerges as a key capability for future pretraining objectives and model design.
Abstract（参考訳）: 現代の大規模言語モデル(LLM)は、ほぼ飽和した精度で干し草のスタック(数十万の無関係なトークンに埋もれた単一の関連する事実)に針を見つけることができるが、短いリストで最後の数項目を検索することができない。私たちはこの失敗を位置曲線と呼んでいる。例えば、2行のコードスニペットであっても、Claude Opus 4.6は多くの場合、第2から第2の行を誤識別する。この失敗を特徴付けるために、ある順序(文字や単語)の位置を与えられたり、対応する項目を検索したり、項目を与えられたり、その位置を返すという2つの相補的なクエリを評価した。各位置は、アンカーから前方または後方のオフセットとして指定され、リストのエンドポイント(開始または終了)またはリスト内の他のアイテムのいずれかである。オープンソースとフロンティアのクローズドソースモデルの両方で、後方検索は前方検索を大幅に遅れている。ポストトレーニングによってこの能力が救えるかどうかを調べるため、位置中心のトレーニングデータセットであるPosBenchを構築した。 LoRAの微調整は前方と後方の両方の検索を改善し、ホールドアウトされたコードアンダードベンチマーク(PyIndex)に一般化するが、絶対的なパフォーマンスは飽和には程遠い。 LLMコーディングエージェントは、コードの理解と編集に正確なインデックス付けが不可欠となる大規模なコードベースでますます運用されるようになると、位置ベースの検索が将来の事前学習目標とモデル設計の重要な機能として現れます。

論文の概要: The Position Curse: LLMs Struggle to Locate the Last Few Items in a List

関連論文リスト