Fugu-MT 論文翻訳(概要): EndPrompt: Efficient Long-Context Extension via Terminal Anchoring

論文の概要: EndPrompt: Efficient Long-Context Extension via Terminal Anchoring

arxiv url: http://arxiv.org/abs/2605.14589v1
Date: Thu, 14 May 2026 09:00:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:34.739103
Title: EndPrompt: Efficient Long-Context Extension via Terminal Anchoring
Title（参考訳）: EndPrompt: ターミナルアンコリングによる効率的な長期拡張
Authors: Han Tian, Luxuan Chen, Xinran Chen, Rui Kong, Fang Wang, Jiamin Chen, Jinman Zhao, Yuchen Li, Jiashu Zhao, Shuaiqiang Wang, Haoyi Xiong, Dawei Yin,
Abstract要約: 本稿では,短いトレーニングシーケンスのみを用いて,効果的なコンテキスト拡張を実現する手法であるEndPromptを提案する。我々は、元の短いコンテキストを無傷の第1セグメントとして保存し、短い端末プロンプトを第2セグメントとして追加し、ターゲットコンテキスト長の近傍に位置指標を割り当てる。エンドプロンプトの平均RULERスコアは76.03で、LongBenchでは最高であり、LCEG(72.24)、LongLoRA(72.95)、フル長のファインチューニングを上回っている。
参考スコア（独自算出の注目度）: 62.81677226065374
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Extending the context window of large language models typically requires training on sequences at the target length, incurring quadratic memory and computational costs that make long-context adaptation expensive and difficult to reproduce. We propose EndPrompt, a method that achieves effective context extension using only short training sequences. The core insight is that exposing a model to long-range relative positional distances does not require constructing full-length inputs: we preserve the original short context as an intact first segment and append a brief terminal prompt as a second segment, assigning it positional indices near the target context length. This two-segment construction introduces both local and long-range relative distances within a short physical sequence while maintaining the semantic continuity of the training text--a property absent in chunk-based simulation approaches that split contiguous context. We provide a theoretical analysis grounded in Rotary Position Embedding and the Bernstein inequality, showing that position interpolation induces a rigorous smoothness constraint over the attention function, with shared Transformer parameters further suppressing unstable extrapolation to unobserved intermediate distances. Applied to LLaMA-family models extending the context window from 8K to 64K, EndPrompt achieves an average RULER score of 76.03 and the highest average on LongBench, surpassing LCEG (72.24), LongLoRA (72.95), and full-length fine-tuning (69.23) while requiring substantially less computation. These results demonstrate that long-context generalization can be induced from sparse positional supervision, challenging the prevailing assumption that dense long-sequence training is necessary for reliable context-window extension. The code is available at https://github.com/clx1415926/EndPrompt.
Abstract（参考訳）: 大きな言語モデルのコンテキストウィンドウを拡張するには、典型的にはターゲット長のシーケンスをトレーニングする必要がある。本稿では,短いトレーニングシーケンスのみを用いて,効果的なコンテキスト拡張を実現する手法であるEndPromptを提案する。中心となる洞察は、モデルを長距離の相対的な位置距離に露出させることで、元の短いコンテキストを無傷の第1セグメントとして保存し、短い端末プロンプトを第2セグメントとして追加し、ターゲットのコンテキスト長に近い位置指標を割り当てる、という完全な入力を構築する必要がないことである。この2分割構成は、連続したコンテキストを分割するチャンクベースのシミュレーションアプローチに欠けている特性であるトレーニングテキストのセマンティックな連続性を維持しつつ、局所的および長距離な相対距離を短い物理シーケンス内で導入する。本稿では、回転位置埋め込みとベルンシュタインの不等式に基づく理論的解析を行い、位置補間が注意関数に対する厳密な滑らか性制約を誘導し、共有トランスフォーマーパラメータは不安定な外挿を未観測中間距離にさらに抑制することを示した。コンテキストウィンドウを8Kから64Kに拡張するLLaMAモデルに対して、EndPromptは平均RULERスコア76.03を達成し、LCEG(72.24)、LongLoRA(72.95)、フル長の微調整(69.23)をはるかに少ない計算で上回っている。これらの結果から,コンテキスト・ウインドウ拡張に高密度な長期学習が必要であるという仮定に挑戦し,疎度な位置監視から長期コンテキストの一般化を導出できることが示唆された。コードはhttps://github.com/clx1415926/EndPrompt.comで公開されている。

論文の概要: EndPrompt: Efficient Long-Context Extension via Terminal Anchoring

関連論文リスト