Fugu-MT 論文翻訳(概要): $R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction

論文の概要: $R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction

arxiv url: http://arxiv.org/abs/2604.18995v1
Date: Tue, 21 Apr 2026 02:26:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-22 22:41:49.577395
Title: $R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction
Title（参考訳）: $R^2$-dLLM:時空間冗長化による拡散大言語モデルの高速化
Authors: Zhenbang Du, Kejing Xia, Xinrui Zhong, Yonggan Fu, Nicolai Oswald, Binfei Ji, Brucek Khailany, Pavlo Molchanov, Yingyan Lin,
Abstract要約: 推論とトレーニングの両方の観点から,デコード冗長性を低減するための統一的なフレームワークを提案する。 R2$-dLLMは、既存のデコード戦略と比較して、デコードステップの数を最大75%削減する。
参考スコア（独自算出の注目度）: 28.068667649331246
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to autoregressive generation by enabling parallel token prediction. However, practical dLLM decoding still suffers from high inference latency, which limits deployment. In this work, we observe that a substantial part of this inefficiency comes from recurring redundancy in the decoding process, including spatial redundancy caused by confidence clusters and positional ambiguity, and temporal redundancy caused by repeatedly remasking predictions that have already stabilized. Motivated by these patterns, we propose $R^2$-dLLM, a unified framework for reducing decoding redundancy from both inference and training perspectives. At inference time, we introduce training-free decoding rules that aggregate local confidence and token predictions, and finalize temporally stable tokens to avoid redundant decoding steps. We further propose a redundancy-aware supervised fine-tuning pipeline that aligns the model with efficient decoding trajectories and reduces reliance on manually tuned thresholds. Experiments demonstrate that $R^2$-dLLM consistently reduces the number of decoding steps by up to 75% compared to existing decoding strategies, while maintaining competitive generation quality across different models and tasks. These results validate that decoding redundancy is a central bottleneck in dLLMs, and that explicitly reducing it yields substantial practical efficiency gains.
Abstract（参考訳）: Diffusion Large Language Models (dLLMs) は、並列トークン予測を可能にすることで自動回帰生成に代わる有望な代替品として登場した。しかし、実際のdLLMデコーディングは、デプロイメントを制限する高い推論遅延に悩まされている。本研究では、この非効率性の大部分は、信頼クラスタや位置のあいまいさによる空間的冗長性や、すでに安定している繰り返しリマキング予測による時間的冗長性など、復号過程における繰り返し冗長性に起因することを観察する。これらのパターンに触発されて、推論とトレーニングの両方の観点からデコード冗長性を減少させる統合フレームワークである$R^2$-dLLMを提案する。推論時に,局所的な信頼度とトークン予測を集約するトレーニング不要な復号規則を導入し,冗長な復号手順を避けるために時間的に安定なトークンを確定する。さらに、冗長性を考慮した教師付き微調整パイプラインを提案し、モデルと効率的な復号軌道の整合を図り、手作業による調整しきい値への依存を減らす。実験によると、$R^2$-dLLMは、既存のデコード戦略と比較して、デコードステップの数を最大75%削減し、異なるモデルやタスク間で競合する生成品質を維持している。これらの結果は、復号化冗長性がdLLMの中心的ボトルネックであり、それを明示的に減少させることで、実質的な効率向上をもたらすことを証明している。

論文の概要: $R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction

関連論文リスト