Fugu-MT 論文翻訳(概要): FastEagle: Cascaded Drafting for Accelerating Speculative Decoding

論文の概要: FastEagle: Cascaded Drafting for Accelerating Speculative Decoding

arxiv url: http://arxiv.org/abs/2509.20416v1
Date: Wed, 24 Sep 2025 09:38:32 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-26 20:58:12.517695
Title: FastEagle: Cascaded Drafting for Accelerating Speculative Decoding
Title（参考訳）: FastEagle: 投機的デコーディングを高速化するためのカスケードドドラフト
Authors: Haiduo Huang, Jiangcheng Song, Wenzhe Zhao, Pengju Ren,
Abstract要約: 我々はFastEagleを紹介します。FastEagleは非自己回帰的なカスケードのドラフトで、ドラフト全体を1つのフォワードパスで出力します。 FastEagleは、競争力のある受け入れ動作を維持しながら、強力な自己回帰型ドラフトラに対して、ウォールタイムの大幅なスピードアップを提供する。
参考スコア（独自算出の注目度）: 6.482154864678126
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Speculative decoding accelerates generation by drafting candidates and verifying them in parallel, yet state-of-the-art drafters (e.g., EAGLE) still require N sequential passes to propose N tokens. We present FastEagle, a non-autoregressive cascaded drafter that emits an entire draft in a single forward pass. FastEagle replaces temporal steps with a lightweight layer cascade and trains with layer-wise supervision to mitigate error accumulation. Coupled with a constrained draft tree that preserves lossless verification cost, FastEagle delivers substantial wall-clock speedups over strong autoregressive drafters while maintaining competitive acceptance behavior. Across multiple LLMs (Vicuna-13B, LLaMA-Instruct 3.x, and DeepSeek-R1-Distill-LLaMA) and tasks (MT-Bench, HumanEval, GSM8K, CNN/DM, Alpaca), FastEagle consistently outperforms EAGLE-3 in speedup under both greedy and stochastic decoding, with comparable average acceptance lengths. These results indicate that removing sequential dependencies in drafting is a practical path toward lossless LLM inference acceleration.
Abstract（参考訳）: 投機的復号化は、候補を起草し、それらを並列に検証することで生成を加速するが、最先端のドラフト作成者(例えば、EAGLE)は N のトークンを提案するために N シーケンシャルパスを必要とする。我々はFastEagleを紹介します。FastEagleは非自己回帰的なカスケードのドラフトで、ドラフト全体を1つのフォワードパスで出力します。 FastEagleは、一時的なステップを軽量なレイヤカスケードに置き換え、エラーの蓄積を緩和するために層単位での監視を行う。損失のない検証コストを維持する制約付きドラフトツリーと組み合わせて、FastEagleは、競争力のある受け入れ動作を維持しながら、強力な自己回帰型ドラフトラに対して、ウォールタイムの大幅なスピードアップを提供する。複数のLCM(Vicuna-13B、LLaMA-Instruct 3.x、DeepSeek-R1-Distill-LLaMA)とタスク(MT-Bench、HumanEval、GSM8K、CNN/DM、Alpaca)をまたいで、FastEagleはGreedyとstchasticの両方のデコーディングでEAGLE-3のスピードアップを常に上回っている。これらの結果から, ドラフトにおける逐次的依存関係の除去は, 損失のないLLM推論加速への実践的な道筋であることが示唆された。

論文の概要: FastEagle: Cascaded Drafting for Accelerating Speculative Decoding

関連論文リスト