Fugu-MT 論文翻訳(概要): Accelerating Diffusion LLM Inference via Local Determinism Propagation

論文の概要: Accelerating Diffusion LLM Inference via Local Determinism Propagation

arxiv url: http://arxiv.org/abs/2510.07081v1
Date: Wed, 08 Oct 2025 14:39:34 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-09 16:41:20.56514
Title: Accelerating Diffusion LLM Inference via Local Determinism Propagation
Title（参考訳）: 局所決定性伝播による拡散LDM推論の高速化
Authors: Fanheng Kong, Jingyuan Zhang, Yahui Liu, Zirui Wu, Yu Tian, Victoria W., Guorui Zhou,
Abstract要約: LocalLeapは、トレーニング不要の適応並列デコード戦略である。 6.94$times$スループットの改善を実現し、デコード手順を元の要件の14.2%に短縮する。
参考スコア（独自算出の注目度）: 27.751279909685604
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion large language models (dLLMs) represent a significant advancement in text generation, offering parallel token decoding capabilities. However, existing open-source implementations suffer from quality-speed trade-offs that impede their practical deployment. Conservative sampling strategies typically decode only the most confident token per step to ensure quality (i.e., greedy decoding), at the cost of inference efficiency due to repeated redundant refinement iterations--a phenomenon we term delayed decoding. Through systematic analysis of dLLM decoding dynamics, we characterize this delayed decoding behavior and propose a training-free adaptive parallel decoding strategy, named LocalLeap, to address these inefficiencies. LocalLeap is built on two fundamental empirical principles: local determinism propagation centered on high-confidence anchors and progressive spatial consistency decay. By applying these principles, LocalLeap identifies anchors and performs localized relaxed parallel decoding within bounded neighborhoods, achieving substantial inference step reduction through early commitment of already-determined tokens without compromising output quality. Comprehensive evaluation on various benchmarks demonstrates that LocalLeap achieves 6.94$\times$ throughput improvements and reduces decoding steps to just 14.2\% of the original requirement, achieving these gains with negligible performance impact. The source codes are available at: https://github.com/friedrichor/LocalLeap.
Abstract（参考訳）: 拡散大言語モデル(dLLM)は、並列トークン復号機能を備えたテキスト生成の大幅な進歩を表している。しかし、既存のオープンソース実装は、彼らの実践的なデプロイメントを妨げる品質と速度のトレードオフに悩まされている。保守的なサンプリング戦略は通常、1ステップ当たりの最も確実なトークンのみをデコードして品質を確保する(すなわち、グレディ復号)。 dLLMデコードダイナミクスの体系的解析を通じて、この遅延復号動作を特徴付けるとともに、これらの非効率に対処する訓練不要適応並列復号戦略であるLocalLeapを提案する。 LocalLeapは2つの基本的な経験的原理に基づいて構築されている。これらの原則を適用することで、LocalLeapはアンカーを特定し、境界付けられた地区内で局所的に緩和された並列デコードを実行する。様々なベンチマークに関する総合的な評価は、LocalLeapがスループットの改善を6.94$\times$達成し、デコード手順を元の要件の14.2\%に減らし、パフォーマンスへの影響を無視できる。ソースコードは、https://github.com/friedrichor/LocalLeap.comで入手できる。

論文の概要: Accelerating Diffusion LLM Inference via Local Determinism Propagation

関連論文リスト