Fugu-MT 論文翻訳(概要): Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding

論文の概要: Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding

arxiv url: http://arxiv.org/abs/2509.22134v1
Date: Fri, 26 Sep 2025 09:55:35 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-29 20:57:54.35136
Title: Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
Title（参考訳）: Bridging Draft Policy Misalignment: 投機的デコードのためのグループツリー最適化
Authors: Shijing Hu, Jingyang Li, Zhihui Lu, Pan Zhou,
Abstract要約: グループツリー最適化(GTO)を導入し、デコード時ツリーポリシーとトレーニングを整合させる。ドラフトツリーリワードの増加は、受け入れ長とスピードアップを確実に改善することを示す。 GTOは、効率的な大規模言語モデル推論のための実用的で一般的なソリューションを提供する。
参考スコア（独自算出の注目度）: 24.681973968208364
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Speculative decoding accelerates large language model (LLM) inference by letting a lightweight draft model propose multiple tokens that the target model verifies in parallel. Yet existing training objectives optimize only a single greedy draft path, while decoding follows a tree policy that re-ranks and verifies multiple branches. This draft policy misalignment limits achievable speedups. We introduce Group Tree Optimization (GTO), which aligns training with the decoding-time tree policy through two components: (i) Draft Tree Reward, a sampling-free objective equal to the expected acceptance length of the draft tree under the target model, directly measuring decoding performance; (ii) Group-based Draft Policy Training, a stable optimization scheme that contrasts trees from the current and a frozen reference draft model, forming debiased group-standardized advantages and applying a PPO-style surrogate along the longest accepted sequence for robust updates. We further prove that increasing our Draft Tree Reward provably improves acceptance length and speedup. Across dialogue (MT-Bench), code (HumanEval), and math (GSM8K), and multiple LLMs (e.g., LLaMA-3.1-8B, LLaMA-3.3-70B, Vicuna-1.3-13B, DeepSeek-R1-Distill-LLaMA-8B), GTO increases acceptance length by 7.4% and yields an additional 7.7% speedup over prior state-of-the-art EAGLE-3. By bridging draft policy misalignment, GTO offers a practical, general solution for efficient LLM inference.
Abstract（参考訳）: 投機的復号化は、軽量なドラフトモデルに、ターゲットモデルが並列に検証する複数のトークンを提案することによって、大きな言語モデル(LLM)推論を加速させる。しかし、既存のトレーニング目的は1つのgreedyドラフトパスのみを最適化し、デコードは複数のブランチを再ランクし検証するツリーポリシーに従っている。この政策のミスアライメントは達成可能なスピードアップを制限します。グループツリー最適化(GTO)を導入し、トレーニングとデコード時ツリーポリシーを2つのコンポーネントで整合させる。一標的モデルに基づくドラフトツリーの受入長に匹敵するサンプリング不要の目的であるドラフトツリー・リワード (II)グループベースのドラフトポリシートレーニングは、木を現在のものとフリーズした参照ドラフトモデルとは対照的に安定な最適化手法であり、グループ標準のデバイアスド・アドバンテージを形成し、PPOスタイルのサロゲートを、堅牢な更新のために最も長いシーケンスに沿って適用する。さらに、ドラフトツリーリワードの増加は、受け入れ長とスピードアップを確実に改善することを示す。 Across dialogue (MT-Bench), code (HumanEval), and math (GSM8K), and multiple LLMs (e g , LLaMA-3.1-8B, LLaMA-3.3-70B, Vicuna-1.3-13B, DeepSeek-R1-Distill-LLaMA-8B), GTO は受け入れ長を7.4%増加させ、最先端の EAGLE-3 よりも7.7%高速化する。 GTOは、草案方針の不一致をブリッジすることで、効率的なLLM推論のための実用的で一般的なソリューションを提供する。

論文の概要: Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding

関連論文リスト