Fugu-MT 論文翻訳(概要): AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size

論文の概要: AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size

arxiv url: http://arxiv.org/abs/2509.26432v2
Date: Wed, 01 Oct 2025 11:26:36 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-02 14:33:21.851787
Title: AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size
Title（参考訳）: AdaBlock-dLLM:適応ブロックサイズによる意味認識拡散LDM推論
Authors: Guanxi Lu, Hao Mark Chen, Yuto Karashima, Zhican Wang, Daichi Fujiki, Hongxiang Fan,
Abstract要約: 拡散に基づく大規模言語モデル (dLLM) は並列デコードに固有の能力で注目を集めている。本稿では,セミARデコードにおける固定ブロックサイズの仮定に挑戦する最初の体系的な研究を提案する。 AdaBlock-dLLMは,実行中のブロックサイズを調整することで,ブロック境界とセマンティックステップを適応的に調整する,トレーニング不要のプラグイン・アンド・プレイスケジューラである。
参考スコア（独自算出の注目度）: 7.442463267121892
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion-based large language models (dLLMs) are gaining attention for their inherent capacity for parallel decoding, offering a compelling alternative to autoregressive LLMs. Among various decoding strategies, blockwise semi-autoregressive (semi-AR) approaches are widely adopted due to their natural support for KV caching and their favorable accuracy-speed trade-off. However, this paper identifies two fundamental limitations in the conventional semi-AR decoding approach that applies a fixed block size: i) late decoding overhead, where the unmasking of high-confidence tokens outside the current block is unnecessarily delayed, and ii) premature decoding error, where low-confidence tokens inside the current block are committed too early, leading to incorrect tokens. This paper presents the first systematic investigation challenging the fixed block size assumption in semi-AR decoding. Through a statistical analysis of confidence dynamics during the denoising process, we identify a volatility band (VB) region during dLLM decoding, which encodes local semantic structure and can be used to guide adaptive block sizing. Leveraging these insights, we introduce AdaBlock-dLLM, a training-free, plug-and-play scheduler that adaptively aligns block boundaries with semantic steps by adjusting block size during runtime. Extensive experiments across diverse benchmarks show that AdaBlock-dLLM achieves up to 5.3% accuracy improvement under the same throughput budget. Beyond inference-time optimization, we hope our semantics-aware adaptive scheduling approach and confidence-based analysis will inspire future training strategies for dLLMs.
Abstract（参考訳）: 拡散に基づく大規模言語モデル(dLLM)は、並列デコードに固有の能力で注目を集めており、自己回帰型LLMに代わる魅力的な代替手段となっている。復号化戦略の中で、KVキャッシングの自然なサポートと精度・速度のトレードオフにより、ブロックワイズ半自己回帰(セミAR)アプローチが広く採用されている。しかし,本論文では,固定ブロックサイズを適用した従来の半AR復号法における2つの基本的制約について述べる。一遅延復号であって、現在のブロックの外側の高信頼トークンの不正化が必然的に遅れているもの二現在のブロック内の低信頼トークンのコミットが早すぎる場合の早期復号エラーにより、不正なトークンが発生すること。本稿では,セミARデコードにおける固定ブロックサイズの仮定に挑戦する最初の体系的な研究を提案する。復調過程における信頼度動態の統計的解析を通じて,局所的な意味構造を符号化し,適応的ブロックサイズを導出するのに使用できる,dLLM復号時の揮発性帯域(VB)領域を同定する。これらの知見を活かしたAdaBlock-dLLMは,実行中のブロックサイズを調整することで,ブロック境界をセマンティックステップに適応的にアライメントする,トレーニング不要のプラグイン・アンド・プレイスケジューラである。様々なベンチマークによる大規模な実験により、AdaBlock-dLLMは同じスループットの予算で最大5.3%の精度の改善が達成された。推論時間の最適化以外にも、セマンティクスを意識した適応スケジューリングアプローチと信頼度に基づく分析が、将来のdLLMのトレーニング戦略を刺激することを期待します。

論文の概要: AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size

関連論文リスト