Fugu-MT 論文翻訳(概要): Short-Context Dominance: How Much Local Context Natural Language Actually Needs?

論文の概要: Short-Context Dominance: How Much Local Context Natural Language Actually Needs?

arxiv url: http://arxiv.org/abs/2512.08082v1
Date: Mon, 08 Dec 2025 22:25:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-10 22:28:07.743353
Title: Short-Context Dominance: How Much Local Context Natural Language Actually Needs?
Title（参考訳）: 短所支配: ローカルなコンテキスト自然言語は実際どのくらい必要か?
Authors: Vala Vakilian, Zimeng Wang, Ankit Singh Rawat, Christos Thrampoulidis,
Abstract要約: 正確な全文予測を再現するのに必要となる最小コンテキスト長を計測する。長文文書から1-7kのトークンを持つシーケンスの場合、75-80%は最下位96トークンしか必要としない。そこで本研究では,実際の次点知識を必要としないMCL(Distributedally Aware MCL)の実践的プロキシについて紹介する。
参考スコア（独自算出の注目度）: 48.429870236229696
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We investigate the short-context dominance hypothesis: that for most sequences, a small local prefix suffices to predict their next tokens. Using large language models as statistical oracles, we measure the minimum context length (MCL) needed to reproduce accurate full-context predictions across datasets with sequences of varying lengths. For sequences with 1-7k tokens from long-context documents, we consistently find that 75-80% require only the last 96 tokens at most. Given the dominance of short-context tokens, we then ask whether it is possible to detect challenging long-context sequences for which a short local prefix does not suffice for prediction. We introduce a practical proxy to MCL, called Distributionally Aware MCL (DaMCL), that does not require knowledge of the actual next-token and is compatible with sampling strategies beyond greedy decoding. Our experiments validate that simple thresholding of the metric defining DaMCL achieves high performance in detecting long vs. short context sequences. Finally, to counter the bias that short-context dominance induces in LLM output distributions, we develop an intuitive decoding algorithm that leverages our detector to identify and boost tokens that are long-range-relevant. Across Q&A tasks and model architectures, we confirm that mitigating the bias improves performance.
Abstract（参考訳）: 短文支配仮説を考察し、ほとんどのシーケンスにおいて、小さな局所接頭辞が次のトークンを予測するのに十分であることを示す。大規模言語モデルを統計的オラクルとして用い, 異なる長さの列を持つデータセット間で正確な全コンテキスト予測を再現するために必要な最小コンテキスト長(MCL)を測定する。長文文書から1-7kのトークンを持つシーケンスの場合、75-80%は最下位96トークンしか必要としない。短文トークンの優位性を考えると、短い局所接頭辞が予測に十分でない長文列を検出できるかどうかを問う。本研究では,MCLに現実的なプロキシを導入し,実際の次点の知識を必要とせず,強欲な復号化以上のサンプリング戦略と互換性のある分散認識型MCL(DaMCL)を提案する。提案実験では,DAMCLの簡易しきい値設定により,長文と短文の文脈列の検出精度が向上することを確認した。最後に,LLM出力分布において短コンテキスト支配が引き起こすバイアスに対処するため,長距離関連トークンの識別・促進に我々の検出器を利用する直感的復号アルゴリズムを開発した。 Q&Aタスクとモデルアーキテクチャ全体で、バイアスを軽減することでパフォーマンスが向上することを確認した。

論文の概要: Short-Context Dominance: How Much Local Context Natural Language Actually Needs?

関連論文リスト