Fugu-MT 論文翻訳(概要): Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match

論文の概要: Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match

arxiv url: http://arxiv.org/abs/2511.22972v1
Date: Fri, 28 Nov 2025 08:23:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-01 19:47:55.818345
Title: Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match
Title（参考訳）: トレーニング不要な投機的デコード:厳密な整合性を超えた正確なドラフトを受け入れる
Authors: Jinze Li, Yixing Xu, Guanchen Li, Shuo Yang, Jinfeng Xu, Xuanwu Yin, Dong Li, Edith C. H. Ngai, Emad Barsoum,
Abstract要約: 訓練不要な投機的復号法(FLy)は、厳密な検証基準を緩める新しい手法である。 FLyは目標モデルの精度の99%以上を維持し,平均2.81倍の高速化を実現している。
参考スコア（独自算出の注目度）: 21.810129153556044
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) achieve strong performance across diverse tasks but suffer from high inference latency due to their autoregressive generation. Speculative Decoding (SPD) mitigates this issue by verifying candidate tokens in parallel from a smaller draft model, yet its strict exact-match verification discards many semantically valid continuations. Moreover, existing training-based SPD methods often suffer from performance degradation on out-of-distribution (OOD) tasks. To this end, we propose Training-Free Loosely Speculative Decoding (FLy), a novel method that loosens the rigid verification criterion by leveraging the target model's self-corrective behavior to judge whether a draft-target mismatch remains semantically valid. FLy introduces a two-tier mechanism: an entropy-level gate that identifies whether the current token allows multiple plausible alternatives or is nearly deterministic, and a token-level deferred window that distinguishes genuine errors from differently worded yet semantically correct variants. To further reduce latency, we design a multi-level acceleration strategy that accelerates not only the target model but also the drafter itself. Owing to its training-free design, FLy composes seamlessly with arbitrary draft-target pairs and generalizes across models and domains without hyperparameter re-tuning. Experiments show that FLy preserves more than 99% of the target model's accuracy while achieving an average 2.81x speedup on Llama-3.1-70B-Instruct and 5.07x speedup on the 405B variant. Notably, on out-of-domain datasets, our method remains highly effective and outperforms the training-based method EAGLE-3 by 1.62x.
Abstract（参考訳）: 大規模言語モデル(LLM)は多種多様なタスクにまたがって高いパフォーマンスを達成するが、自動回帰生成のために高い推論遅延に悩まされる。投機的復号(SPD)は、より小さなドラフトモデルから並列に候補トークンを検証することでこの問題を緩和するが、厳密な正確なマッチング検証は多くの意味論的に有効な継続を破棄する。さらに、既存のトレーニングベースのSPD手法は、アウト・オブ・ディストリビューション(OOD)タスクのパフォーマンス劣化に悩まされることが多い。そこで本研究では,目標モデルの自己補正行動を利用して,目標ミスマッチが意味論的に有効かどうかを判断することで,厳密な検証基準を緩める手法であるFLy(Training-Free Loosely Speculative Decoding)を提案する。 FLyは2階層のメカニズムを導入している: エントロピーレベルゲートは、現在のトークンが複数のプラウチブルな代替品を許容するか、ほぼ決定的かを識別する。さらに遅延を低減するために、ターゲットモデルだけでなく、ドラフト作成者自体も加速するマルチレベル加速戦略を設計する。トレーニング不要の設計のため、FLyは任意のドラフトターゲットペアをシームレスに構成し、ハイパーパラメータを再チューニングせずにモデルやドメインをまたいで一般化する。実験の結果、FLyは目標モデルの精度の99%以上を維持し、Llama-3.1-70B-Instructでは平均2.81倍、405Bでは5.07倍のスピードアップを達成した。特に、ドメイン外のデータセットでは、本手法は依然として有効であり、トレーニングベースのEAGLE-3を1.62倍に向上させる。

論文の概要: Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match

関連論文リスト