Fugu-MT 論文翻訳(概要): CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credits

論文の概要: CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credits

arxiv url: http://arxiv.org/abs/2510.06133v1
Date: Tue, 07 Oct 2025 17:08:33 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-08 17:57:08.372927
Title: CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credits
Title（参考訳）: CreditDecoding: トレースクレジットを持つ拡散型大規模言語モデルにおける並列デコーディングの高速化
Authors: Kangyu Wang, Zhiyun Jiang, Haibo Feng, Weijia Zhao, Lin Liu, Jianguo Li, Zhenzhong Lan, Weiyao Lin,
Abstract要約: CreditDecodingはトレーニング不要の並列デコーディングアルゴリズムで、正確だが不確実なトークンの信頼収束を加速する。 8つのベンチマークでは、CreditDecodingはLLaDA-8B-Instructよりも5.48倍の高速化と0.48のパフォーマンス向上を実現している。
参考スコア（独自算出の注目度）: 37.06886078519443
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion large language models (dLLMs) generate text through iterative denoising steps, achieving parallel decoding by denoising only high-confidence positions at each step. However, existing approaches often repetitively remask tokens due to initially low confidence scores, leading to redundant iterations and limiting overall acceleration. Through the analysis of dLLM decoding traces, we observe that the model often determines the final prediction for a token several steps before the decoding step. To leverage this historical information and avoid redundant steps, we introduce the concept of Trace Credit, which quantifies each token's convergence potential by accumulating historical logits. Furthermore, we propose CreditDecoding, a training-free parallel decoding algorithm that accelerates the confidence convergence of correct but underconfident tokens by fusing current logits with Trace Credit. This process significantly reduces redundant iterations and enhances decoding robustness. On eight benchmarks, CreditDecoding achieves a 5.48 times speedup and a 0.48 performance improvement over LLaDA-8B-Instruct, and a 4.11 times speedup with a 0.15 performance improvement over LLaDA-MoE-Instruct. Importantly, CreditDecoding scales effectively to long sequences and is orthogonal to mainstream inference optimizations, making it a readily integrable and versatile solution.
Abstract（参考訳）: 拡散大言語モデル (dLLMs) は反復的復号化ステップを通じてテキストを生成し、各ステップで高信頼位置のみを復号化することで並列復号化を実現する。しかしながら、既存のアプローチは、最初に信頼性スコアが低いためにトークンを繰り返し再マスクし、冗長なイテレーションと全体的なアクセラレーションを制限します。我々は,dLLMデコードトレースの解析を通じて,デコードステップの何段階か前に,トークンの最終的な予測がしばしば決定されることを観察した。この履歴情報を活用して冗長なステップを回避するために,過去のロジットを蓄積して各トークンの収束ポテンシャルを定量化するトレースクレジットの概念を導入する。さらに,学習不要な並列復号アルゴリズムであるCreditDecodingを提案する。このプロセスは冗長な反復を著しく減らし、復号性を高める。 8つのベンチマークでは、CreditDecodingはLLaDA-8B-Instructよりも5.48倍、パフォーマンスは0.11倍、パフォーマンスはLLaDA-MoE-Instructより0.15倍向上した。重要なことは、CreditDecodingは長いシーケンスに効果的にスケールし、メインストリームの推論最適化に直交しているため、容易に統合可能で汎用的なソリューションである。

論文の概要: CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credits

関連論文リスト