Fugu-MT 論文翻訳(概要): OWL: Overcoming Window Length-Dependence in Speculative Decoding for Long-Context Inputs

論文の概要: OWL: Overcoming Window Length-Dependence in Speculative Decoding for Long-Context Inputs

arxiv url: http://arxiv.org/abs/2510.07535v1
Date: Wed, 08 Oct 2025 20:50:46 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-10 17:54:14.727544
Title: OWL: Overcoming Window Length-Dependence in Speculative Decoding for Long-Context Inputs
Title（参考訳）: OWL:長期入力の投機的デコーディングにおけるウィンドウ長依存性の克服
Authors: Jaeseong Lee, seung-won hwang, Aurick Qiao, Gabriele Oliaro, Ye Wang, Samyam Rajbhandari,
Abstract要約: 投機的復号化は、大きな言語モデルに対するより高速な推論を約束する。既存のメソッドは実世界の設定に一般化できない。我々は新しい長文ベンチマーク(LongSpecBench)をリリースし、新しいモデル(OWL)を導入する。
参考スコア（独自算出の注目度）: 34.709771308054236
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Speculative decoding promises faster inference for large language models (LLMs), yet existing methods fail to generalize to real-world settings. Benchmarks typically assume short contexts (e.g., 2K tokens), whereas practical workloads involve long contexts. We find current approaches degrade severely with long contexts; for instance, EAGLE3 even slows down the generation speed by 0.81x. We address these limitations by releasing a new long-context benchmark (LongSpecBench) and introducing a novel model (OWL). OWL achieves about 5x higher acceptance length than EAGLE3 on long-context inputs through three innovations: (1) an LSTM-based drafter conditioned only on the last-token state, making it generalize to various lengths, (2) a special token [SPEC] in the verifier that produces richer representation for drafter, and (3) a hybrid algorithm combining both tree and non-tree decoding methods. We release all code and datasets to advance future research.
Abstract（参考訳）: 投機的復号化は、大規模言語モデル(LLM)の高速な推論を約束するが、既存のメソッドは実世界の設定に一般化できない。ベンチマークは通常、短いコンテキスト(2Kトークンなど)を前提とします。例えば、ERGLE3は生成速度を0.81倍も遅くする。本稿では,LongSpecBench(LongSpecBench)を新たにリリースし,新しいモデル(OWL)を導入することで,これらの制約に対処する。 OWLは, 長文入力におけるEAGLE3よりも約5倍の受理長を達成し, LSTMをベースとしたプロダクタを最終状態に限定し, 様々な長さに一般化し, 2) プロダクタのよりリッチな表現を生成する検証器の特別なトークン[SPEC], (3) ツリーデコードと非ツリーデコードの両方を組み合わせたハイブリッドアルゴリズムを開発した。将来の研究を進めるために、すべてのコードとデータセットをリリースします。

論文の概要: OWL: Overcoming Window Length-Dependence in Speculative Decoding for Long-Context Inputs

関連論文リスト