Fugu-MT 論文翻訳(概要): Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding

論文の概要: Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding

arxiv url: http://arxiv.org/abs/2509.18085v1
Date: Mon, 22 Sep 2025 17:58:21 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-23 18:58:16.555729
Title: Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Title（参考訳）: Spiffy: ロスレス投機的復号による拡散LDM加速の乗算
Authors: Sudhanshu Agrawal, Risheek Garrepalli, Raghavv Goel, Mingu Lee, Christopher Lott, Fatih Porikli,
Abstract要約: 拡散LDM (dLLMs) は、最近自己回帰LDM (AR-LLMs) の強力な代替品として登場した。現在利用可能なオープンソースdLLMは、多くの場合、より低いレートで生成される。本稿では,dLLM推論を$mathbf2.8-3.1times$で高速化し,モデルの出力分布を確実に保存する投機的復号アルゴリズムであるSpiffyを提案する。
参考スコア（独自算出の注目度）: 40.96405124314983
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion LLMs (dLLMs) have recently emerged as a powerful alternative to autoregressive LLMs (AR-LLMs) with the potential to operate at significantly higher token generation rates. However, currently available open-source dLLMs often generate at much lower rates, typically decoding only a single token at every denoising timestep in order to maximize output quality. We present Spiffy, a speculative decoding algorithm that accelerates dLLM inference by $\mathbf{2.8{-}3.1\times}$ while provably preserving the model's output distribution. This work addresses the unique challenges involved in applying ideas from speculative decoding of AR-LLMs to the dLLM setting. Spiffy proposes draft states by leveraging the dLLM's distribution itself in an auto-speculative manner. This approach is efficient and effective, and eliminates the overheads of training and running an independent draft model. To structure the candidate draft states, we propose a novel directed draft graph which is uniquely designed to take advantage of the bidirectional, block-wise nature of dLLM generation and can be verified in parallel by the dLLM. To further optimize the structure of these draft graphs, we introduce an efficient, offline calibration algorithm that procedurally determines high-quality graph configurations. These optimized draft graphs, enabling increased acceptance rates, lead to a significant boost in the overall speedup achieved by the system. Crucially, Spiffy is also complementary to other recent innovations in improving dLLM generation speeds such as KV-caching and multi-token unmasking. We demonstrate that when combined with such parallel decoding algorithms, Spiffy is able to effectively multiply the benefits of these methods leading to total speedups of up to $\mathbf{7.9\times}$.
Abstract（参考訳）: 拡散LDM (dLLMs) は, トークン生成速度が著しく高い自己回帰LDM (AR-LLMs) に代わる強力な代替品として最近登場した。しかし、現在利用可能なオープンソースdLLMは、出力品質を最大化するために、通常は1つのトークンのみをデコードする。我々は,dLLM推論を$\mathbf{2.8{-}3.1\times}$で高速化し,モデルの出力分布を確実に保存する投機的復号アルゴリズムであるSpiffyを提案する。この研究は、AR-LLMの投機的デコードからアイデアをdLLM設定に適用する際の、ユニークな課題に対処する。 Spiffy氏は、dLLMの分布自体を自動投機的手法で活用することで、ドラフトステートを提案する。このアプローチは効率的で効果的であり、トレーニングのオーバーヘッドを排除し、独立したドラフトモデルを実行する。提案手法は,dLLM生成の双方向,ブロックワイドな性質を活かし,dLLMにより並列に検証可能な,新規な有向ドラフトグラフを提案する。これらのドラフトグラフの構造をさらに最適化するために,高品質なグラフ構成を手続き的に決定する効率的なオフラインキャリブレーションアルゴリズムを導入する。これらの最適化されたドラフトグラフは、受け入れ率の向上を可能にし、システムによって達成される全体的なスピードアップを著しく向上させる。重要な点として、SpiffyはKVキャッシングやマルチトーケン・アンマスキングのようなdLLM生成速度を改善するという最近の他のイノベーションを補完している。このような並列復号アルゴリズムと組み合わせることで、Spiffyはこれらの手法の利点を効果的に乗算し、最大で$\mathbf{7.9\times}$となることを実証する。

論文の概要: Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding

関連論文リスト