Fugu-MT 論文翻訳(概要): Think First, Diffuse Fast: Improving Diffusion Language Model Reasoning via Autoregressive Plan Conditioning

論文の概要: Think First, Diffuse Fast: Improving Diffusion Language Model Reasoning via Autoregressive Plan Conditioning

arxiv url: http://arxiv.org/abs/2603.13243v1
Date: Fri, 20 Feb 2026 09:52:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:42.223289
Title: Think First, Diffuse Fast: Improving Diffusion Language Model Reasoning via Autoregressive Plan Conditioning
Title（参考訳）: まず、Diffuse Fast:自己回帰計画条件による拡散言語モデル推論の改善
Authors: Earl J St Sauver,
Abstract要約: 本稿では,ARモデルから拡散モデルのプロンプトまで,短い自然言語プランを前提とした学習自由化手法であるプラン条件付けを提案する。プランコンディショニングは1問題あたり0.002ドル、レイテンシは2秒追加される。 5つのランダムシードに対して、計画条件付きGSM8K精度は標準偏差がゼロであり、拡散推論は極めて安定である。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion large language models (dLLMs) generate text via iterative denoising but consistently underperform on multi-step reasoning. We hypothesize this gap stems from a coordination problem: AR models build coherence token-by-token, while diffusion models must coordinate all positions simultaneously. We propose plan conditioning, a training-free method that prepends a short (~100-token) natural-language plan from an AR model to the diffusion model's prompt. The plan serves as a frozen scaffold -- globally visible context that every token position can attend to from the first denoising step. On GSM8K, plan conditioning improves LLaDA-8B-Instruct from 75.6% to 87.2% (+11.6 percentage points), matching a same-size AR model (LLaMA 3.1 8B, 87.7%) despite a 6.4pp weaker baseline. On HumanEval, the gain is +12.8pp (37.2% to 50.0%), showing plans generalize to code. The same plans improve LLaMA by only +5.7pp on GSM8K and +1.3pp on HumanEval -- diffusion models benefit 2-10x more, supporting the coordination-problem hypothesis. Across 5 random seeds, plan-conditioned GSM8K accuracy has zero standard deviation, making diffusion inference highly stable. Ablations reveal the model follows plan strategy (wrong-strategy plans cause -16.3pp) but is robust to plan values (perturbed numbers: -1.1pp), and that planner quality has a sharp threshold: smaller Llama-class plans hurt (-1.6 to -6.8pp) while frontier plans provide the full lift. Attention analysis confirms the mechanism: plan tokens receive 1.8x excess attention during early denoising, declining to uniform as completion tokens solidify. Plan conditioning costs ~$0.002 per problem and adds ~2s of latency.
Abstract（参考訳）: 拡散大言語モデル (dLLMs) は反復的推論によってテキストを生成するが、多段階推論では一貫して性能が劣る。 ARモデルはコヒーレンストークンをTokenで構築し、拡散モデルはすべての位置を同時に調整する必要がある。本稿では,ARモデルから拡散モデルのプロンプトまで,短時間(約100-token)の自然言語プランを前提とした学習自由なプラン条件付けを提案する。この計画は凍結した足場として機能し、すべてのトークン位置が第1段階から参加可能なグローバルなコンテキストを提供します。 GSM8Kでは、LLaDA-8B-インストラクトが75.6%から87.2%(+11.6ポイント)に改善され、ベースラインが6.4pp弱にもかかわらず、同じ大きさのARモデル(LLaMA 3.1 8B, 87.7%)と一致する。 HumanEvalのゲインは+12.8pp(37.2%から50.0%)で、コードに一般化する計画を示している。同じ計画では、GSM8Kでは+5.7pp、HumanEvalでは+1.3ppでLLaMAを改善した。 5つのランダムシードに対して、計画条件付きGSM8K精度は標準偏差がゼロであり、拡散推論は極めて安定である。しかし、プランナーの品質は鋭いしきい値を持ち、フロンティアプランがフルリフトを提供する一方で、より小さなラーマ級プランが損傷(-1.6から-6.8pp)する。計画トークンは初期復調時に1.8倍の過度な注意を受け、完了トークンが固まるにつれて均一に低下する。プランコンディショニングのコストは1つあたり0.002ドル、レイテンシーは2秒になる。

論文の概要: Think First, Diffuse Fast: Improving Diffusion Language Model Reasoning via Autoregressive Plan Conditioning

関連論文リスト