Fugu-MT 論文翻訳(概要): Residual Context Diffusion Language Models

論文の概要: Residual Context Diffusion Language Models

arxiv url: http://arxiv.org/abs/2601.22954v1
Date: Fri, 30 Jan 2026 13:16:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-02 18:28:15.464366
Title: Residual Context Diffusion Language Models
Title（参考訳）: 残留文脈拡散言語モデル
Authors: Yuezhou Hu, Harman Singh, Monishwaran Maheswaran, Haocheng Xi, Coleman Hooper, Jintao Zhang, Aditya Tomar, Michael W. Mahoney, Sewon Min, Mehrdad Farajtabar, Kurt Keutzer, Amir Gholami, Chenfeng Xu,
Abstract要約: Residual Context Diffusion (RCD) は、捨てられたトークン表現をコンテキスト残留に変換し、次のデノイングステップでそれらを注入するモジュールである。 RCDは、最小限の計算オーバーヘッドで、5-10ポイントの精度でフロンティアdLLMを一貫して改善する。
参考スコア（独自算出の注目度）: 90.07635240595926
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to purely autoregressive language models because they can decode multiple tokens in parallel. However, state-of-the-art block-wise dLLMs rely on a "remasking" mechanism that decodes only the most confident tokens and discards the rest, effectively wasting computation. We demonstrate that recycling computation from the discarded tokens is beneficial, as these tokens retain contextual information useful for subsequent decoding iterations. In light of this, we propose Residual Context Diffusion (RCD), a module that converts these discarded token representations into contextual residuals and injects them back for the next denoising step. RCD uses a decoupled two-stage training pipeline to bypass the memory bottlenecks associated with backpropagation. We validate our method on both long CoT reasoning (SDAR) and short CoT instruction following (LLaDA) models. We demonstrate that a standard dLLM can be efficiently converted to the RCD paradigm with merely ~1 billion tokens. RCD consistently improves frontier dLLMs by 5-10 points in accuracy with minimal extra computation overhead across a wide range of benchmarks. Notably, on the most challenging AIME tasks, RCD nearly doubles baseline accuracy and attains up to 4-5x fewer denoising steps at equivalent accuracy levels.
Abstract（参考訳）: Diffusion Large Language Models (dLLMs) は、複数のトークンを並列にデコードできるため、純粋に自己回帰型の言語モデルに代わる有望な選択肢として登場した。しかし、最先端のブロック単位のdLLMは、最も確実なトークンのみをデコードし、残りのトークンを破棄し、事実上計算を無駄にする"リマッシング"機構に依存している。これらのトークンは,後続の復号化に有用な文脈情報を保持するため,捨てられたトークンからのリサイクル計算が有用であることを示す。そこで我々は,これら捨てられたトークン表現を文脈残差に変換するモジュールであるResidual Context Diffusion (RCD)を提案する。 RCDは分離された2段階のトレーニングパイプラインを使用して、バックプロパゲーションに関連するメモリボトルネックを回避する。提案手法は,長いCoT推論(SDAR)と短いCoT命令追従(LLaDA)の両方で検証する。標準的な dLLM を 10 億のトークンで RCD パラダイムに効率的に変換できることを実証する。 RCDは、幅広いベンチマークで最小限の計算オーバーヘッドで、5-10ポイントの精度でフロンティアdLLMを一貫して改善する。特に、最も困難なAIMEタスクでは、RCDはベースライン精度をほぼ2倍にし、同等の精度で最大4～5倍のデノナイジングステップを達成する。

論文の概要: Residual Context Diffusion Language Models

関連論文リスト