Fugu-MT 論文翻訳(概要): DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding

論文の概要: DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding

arxiv url: http://arxiv.org/abs/2510.02358v1
Date: Sun, 28 Sep 2025 07:00:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-06 16:35:52.038923
Title: DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding
Title（参考訳）: DiffuSpec: 投機的デコーディングのための拡散言語モデルをアンロックする
Authors: Guanghao Li, Zhihui Fu, Min Fang, Qibin Zhao, Ming Tang, Chun Yuan, Jun Wang,
Abstract要約: DiffuSpecは、事前訓練された拡散言語モデル(DLM)を用いて、単一のフォワードパスでマルチトークンのドラフトを生成する、トレーニングフリーのドロップインフレームワークである。ベンチマーク全体を通じて、DiffuSpecは最大3倍のウォールクロックスピードアップを達成し、投機的復号化のための自己回帰型ドラフトラの堅牢な代替手段として拡散ベースのドラフトを確立する。
参考スコア（独自算出の注目度）: 66.40658898418316
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: As large language models (LLMs) scale up, accuracy improves, but the autoregressive (AR) nature of decoding increases latency since each token requires a serial forward pass. Speculative decoding addresses this by employing a fast drafter to propose multi-token drafts, which are then verified in parallel by the target model. However, many deployments still rely on AR drafters, where sequential passes limit wall-clock gains. We revisit the drafting stage and present DiffuSpec, a training-free drop-in framework that uses a pretrained diffusion language model (DLM) to produce multi-token drafts in a single forward pass, while remaining compatible with standard AR verifiers. Because DLM drafts are generated under bidirectional conditioning, parallel per-position candidates form a token lattice in which the locally highest-probability token at each position need not form a causal left-to-right path. Moreover, DLM drafting requires pre-specifying a draft length, inducing a speed-quality trade-off. To address these challenges, we introduce two practical components: (i) a causal-consistency path search (CPS) over this lattice that extracts a left-to-right path aligned with AR verification; and (ii) an adaptive draft-length (ADL) controller that adjusts next proposal size based on recent acceptance feedback and realized generated length. Across benchmarks, DiffuSpec yields up to 3x wall-clock speedup, establishing diffusion-based drafting as a robust alternative to autoregressive drafters for speculative decoding.
Abstract（参考訳）: 大規模言語モデル(LLM)のスケールアップにより、精度は向上するが、デコーディングの自己回帰(AR)特性は、各トークンがシリアルフォワードパスを必要とするため、遅延を増大させる。投機的復号化は、高速なドラフトラを使ってマルチトークンのドラフトを提案し、ターゲットモデルによって並列に検証することでこの問題に対処する。しかし、多くのデプロイメントは依然としてARドラフトに頼っている。 DiffuSpecは、事前訓練された拡散言語モデル(DLM)を使用して、標準のAR検証と互換性を維持しつつ、単一の前方通過でマルチトークンのドラフトを生成する訓練不要のドロップインフレームワークである。 DLMドラフトは双方向条件下で生成されるため、パラポジション候補は、各位置における局所的に最も高い確率トークンが因果的な左から右への経路を形成する必要がないトークン格子を形成する。さらに、DLMのドラフトにはドラフト長の事前指定が必要であり、スピード品質のトレードオフが引き起こされる。これらの課題に対処するために、我々は2つの実践的な要素を紹介します。 i) この格子上の因果整合経路探索(CPS)は、AR検証と整合した左右の経路を抽出する。 (II) 適応型ドラフト長(ADL) コントローラで、最近の受理フィードバックに基づいて次の提案サイズを調整し、生成した長さを実現する。ベンチマーク全体を通じて、DiffuSpecは最大3倍のウォールクロックスピードアップを達成し、投機的復号化のための自己回帰型ドラフトラの堅牢な代替手段として拡散ベースのドラフトを確立する。

論文の概要: DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding

関連論文リスト