Fugu-MT 論文翻訳(概要): Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models

論文の概要: Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models

arxiv url: http://arxiv.org/abs/2510.00294v1
Date: Tue, 30 Sep 2025 21:28:04 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 16:59:20.263783
Title: Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models
Title（参考訳）: 自由ドラフト・アンド・検証:拡散大言語モデルにおけるロスレス並列復号化に向けて
Authors: Shutong Wu, Jiawei Zhang,
Abstract要約: 拡散大言語モデル(DLLM)は自己回帰予測を超えた言語モデリングの新しいパラダイムとして登場した。 Free Draft-and-Verification (Freedave) はDLLMに適した新しい高速サンプリングアルゴリズムである。
参考スコア（独自算出の注目度）: 8.407364705777587
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion Large Language Models (DLLMs) have emerged as a new paradigm of language modeling beyond autoregressive next-token prediction. Thanks to their bidirectional attention mechanism, DLLMs are more capable of capturing the connection of context, and thus show unique advantages in challenges like the famous "reversal curse" or learning under data-constrained scenarios. However, this bidirectional nature also brings an obstacle that DLLMs are not inherently compatible with KV Cache, and consequently, the inference efficiency is not competitive compared with autoregressive models. Taking advantage of their inherent capability of multi-token prediction, existing parallel decoding algorithms can speed up the DLLM inference, but at the cost of non-negligible performance degradation. To overcome this challenge, we introduce Free Draft-and-Verification (Freedave), a novel fast sampling algorithm tailored for DLLMs that achieves lossless parallel decoding. Specifically, we propose a pipeline of parallel-decoded candidate generation and verification, which is guaranteed to reproduce the same sequence generated by static sampling, without introducing extra model forward calls. By applying Freedave, the throughput of DLLMs can be boosted up to $2.8\times$ without performance degradation on math reasoning tasks.
Abstract（参考訳）: 拡散大言語モデル (DLLM) は, 自己回帰的次世代予測を超えて, 言語モデリングの新しいパラダイムとして登場した。彼らの双方向の注意機構のおかげで、DLLMはコンテキストの接続をキャプチャしやすくなり、有名な"逆の呪い"やデータ制約のあるシナリオ下での学習といった課題において、ユニークなアドバンテージを示す。しかし、この双方向性はDLLMが本質的にKVキャッシュと互換性がないという障害をもたらすため、推論効率は自己回帰モデルと競合しない。既存の並列デコーディングアルゴリズムは、マルチトークン予測の本来の能力を活用して、DLLM推論を高速化するが、非無視のパフォーマンス劣化のコストがかかる。この課題を克服するために、損失のない並列デコードを実現するDLLMに適した新しい高速サンプリングアルゴリズムFree Draft-and-Verification (Freedave)を導入する。具体的には,静的サンプリングによって生成された同じシーケンスを,追加のモデルフォワードコールを導入することなく再現することが保証される並列復号化候補生成と検証のパイプラインを提案する。 Freedaveを適用することで、DLLMのスループットを2.8\times$まで向上させることができる。

論文の概要: Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models

関連論文リスト