Fugu-MT 論文翻訳(概要): No Compute Left Behind: Rethinking Reasoning and Sampling with Masked Diffusion Models

論文の概要: No Compute Left Behind: Rethinking Reasoning and Sampling with Masked Diffusion Models

arxiv url: http://arxiv.org/abs/2510.19990v1
Date: Wed, 22 Oct 2025 19:41:27 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:16.73522
Title: No Compute Left Behind: Rethinking Reasoning and Sampling with Masked Diffusion Models
Title（参考訳）: コンピュータの左が隠れていない:Masked Diffusion Modelによる推論とサンプリングの再考
Authors: Zachary Horvitz, Raghav Singhal, Hao Zou, Carles Domingo-Enrich, Zhou Yu, Rajesh Ranganath, Kathleen McKeown,
Abstract要約: マスク付き拡散言語モデルは、ランダムにマスキングされたシーケンスにおける埋め込み位置を訓練する。推論・アズ・インフィルとマルチトークンエントロピーデコーディングを提案する。本研究は,MDLMが使用するトレーニングと計算によって,多くの新しい推論とポストトレーニング手法が解き放たれることを示す。
参考スコア（独自算出の注目度）: 42.158430350703505
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Masked diffusion language models (MDLMs) are trained to in-fill positions in randomly masked sequences, in contrast to next-token prediction models. Discussions around MDLMs focus on two benefits: (1) any-order decoding and 2) multi-token decoding. However, we observe that for math and coding tasks, any-order algorithms often underperform or behave similarly to left-to-right sampling, and standard multi-token decoding significantly degrades performance. At inference time, MDLMs compute the conditional distribution of all masked positions. A natural question is: How can we justify this additional compute when left-to-right one-token-at-a-time decoding is on par with any-order decoding algorithms? First, we propose reasoning-as-infilling. By using MDLMs to infill a reasoning template, we can structure outputs and distinguish between reasoning and answer tokens. In turn, this enables measuring answer uncertainty during reasoning, and early exits when the model converges on an answer. Next, given an answer, reasoning-as-infilling enables sampling from the MDLM posterior over reasoning traces conditioned on the answer, providing a new source of high-quality data for post-training. On GSM8k, we observe that fine-tuning LLaDA-8B Base on its posterior reasoning traces provides a performance boost on par with fine-tuning on human-written reasoning traces. Additionally, given an answer, reasoning-as-infilling provides a method for scoring the correctness of the reasoning process at intermediate steps. Second, we propose multi-token entropy decoding (MED), a simple adaptive sampler that minimizes the error incurred by decoding positions in parallel based on the conditional entropies of those positions. MED preserves performance across benchmarks and leads to 2.7x fewer steps. Our work demonstrates that the training and compute used by MDLMs unlock many new inference and post-training methods.
Abstract（参考訳）: マスク付き拡散言語モデル (MDLM) は, ランダムにマスキングされた列において, 次点予測モデルとは対照的に, 入射位置を訓練する。 MDLMに関する議論は2つの利点に焦点をあてている。 2)マルチトークン復号化。しかし、数学やコーディングのタスクでは、任意の順序のアルゴリズムは左から右へのサンプリングと同様に性能が劣り、動作しないことが多く、標準的なマルチトークンデコーディングは性能を著しく低下させる。推定時、MDLMは全てのマスキング位置の条件分布を計算する。自然な疑問は: 左から右へのワンツーケン・アット・ア・タイムのデコーディングが任意の順序のデコーディングアルゴリズムと同等である場合、この余分な計算を正当化するにはどうすればよいのか? まず、推論・アズ・インフィルを提案する。 MDLMを使って推論テンプレートを埋め込むことで、出力を構造化し、推論と応答トークンを区別することができる。これにより、推論中の解の不確実性の測定や、モデルが解に収束する際の早期出口の計測が可能になる。次に、回答が与えられた場合、推論・アズ・インフィリングは、回答に条件付けられた推論トレースよりもMDLM後部からのサンプリングを可能にし、ポストトレーニングのための新しい高品質なデータソースを提供する。 GSM8kでは、後部推論トレースの微調整LLaDA-8Bベースが、人書き推論トレースの微調整と同等の性能向上をもたらすことが観察された。さらに、答えが与えられた場合、推論・アズ・インフィルは中間段階における推論過程の正しさを評価する方法を提供する。第2に,これらの位置の条件付きエントロピーに基づいて並列に復号された位置から発生する誤差を最小化する,単純な適応型サンプリング器であるマルチトークンエントロピー復号(MED)を提案する。 MEDはベンチマーク全体のパフォーマンスを保ち、2.7倍のステップを減らします。本研究は,MDLMが使用するトレーニングと計算によって,多くの新しい推論とポストトレーニング手法が解き放たれることを示す。

論文の概要: No Compute Left Behind: Rethinking Reasoning and Sampling with Masked Diffusion Models

関連論文リスト