Fugu-MT 論文翻訳(概要): S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

論文の概要: S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

arxiv url: http://arxiv.org/abs/2603.25702v1
Date: Thu, 26 Mar 2026 17:48:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-27 20:52:48.411012
Title: S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation
Title（参考訳）: S2D2:学習自由自己分析による拡散LDMの高速復号化
Authors: Ligong Han, Hao Wang, Han Gao, Kai Xu, Akash Srivastava,
Abstract要約: ブロック拡散言語モデルはブロックワイド自己回帰復号とブロック内並列復号を結合する。この問題に対処する既存のアプローチでは、追加のトレーニングが必要か、追加のテストタイム計算が必要になります。ブロック拡散言語モデルのための訓練不要な自己投機的デコーディングフレームワークであるS2D2を提案する。
参考スコア（独自算出の注目度）: 22.303253139413286
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is often brittle: aggressive thresholds hurt quality, while conservative thresholds require unnecessary denoising steps. Existing approaches that address this issue either require additional training or incur extra test-time compute. We present S2D2, a training-free self-speculative decoding framework for block-diffusion language models. Our key observation is that a block-diffusion model becomes autoregressive when the block size is reduced to one, allowing the same pretrained model to act as both drafter and verifier. S2D2 inserts a speculative verification step into standard block-diffusion decoding and uses lightweight routing policies to decide when verification is worth its cost. This yields a hybrid decoding trajectory in which diffusion proposes tokens in parallel, while the autoregressive mode acts as a local sequence-level critic. Across three mainstream block-diffusion families, S2D2 consistently improves the accuracy-speed tradeoff over strong confidence-thresholding baselines. On SDAR, we observe up to $4.7\times$ speedup over autoregressive decoding, and up to $1.57\times$ over a tuned dynamic decoding baseline while improving accuracy by up to $4.5$ points. On LLaDA2.1-Mini, S2D2 remains complementary to built-in self-correction, including a conservative setting where it is $4.4\times$ faster than the static baseline with slightly higher accuracy.
Abstract（参考訳）: ブロック拡散言語モデルは、ブロックワイドの自己回帰復号とブロック内並列復号化を組み合わせることで、高速な自己回帰生成への有望な道を提供する。しかし、現実的な加速に必要な数段階の体制では、標準的な信頼を保った復号法はしばしば不安定であり、攻撃的なしきい値が品質を損なう一方、保守的なしきい値には不要な復号法を必要とする。この問題に対処する既存のアプローチでは、追加のトレーニングが必要か、追加のテストタイム計算が必要になります。ブロック拡散言語モデルのための訓練不要な自己投機的デコーディングフレームワークであるS2D2を提案する。我々のキーとなる観察は、ブロックサイズを1に減らすとブロック拡散モデルが自己回帰的になり、同じ事前学習モデルがドラフトと検証の両方として機能することである。 S2D2は、標準的なブロック拡散復号法に投機的な検証ステップを挿入し、検証に価値があるかどうかを判断するために軽量なルーティングポリシーを使用する。これにより、拡散がトークンを並列に提案するハイブリッドデコード軌道が得られ、一方自己回帰モードは局所的なシーケンスレベルの批判として機能する。 3つの主要なブロック拡散系の中で、S2D2は強い信頼を保ったベースラインに対する精度と速度のトレードオフを一貫して改善する。 SDARでは、オートレグレッシブデコーディングよりも最大4.7\times$のスピードアップと、チューニングされた動的デコーディングベースラインよりも最大1.57\times$のスピードアップを観察し、精度を最大4.5$ポイント向上させる。 LLaDA2.1-Miniでは、S2D2は内蔵の自己補正と相補的であり、保存的な設定では4.4\times$が静的ベースラインよりも若干高い精度で高速である。

論文の概要: S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

関連論文リスト