Fugu-MT 論文翻訳(概要): Reasoning with Latent Tokens in Diffusion Language Models

論文の概要: Reasoning with Latent Tokens in Diffusion Language Models

arxiv url: http://arxiv.org/abs/2602.03769v1
Date: Tue, 03 Feb 2026 17:27:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-04 18:37:15.604739
Title: Reasoning with Latent Tokens in Diffusion Language Models
Title（参考訳）: 拡散言語モデルにおける潜在トークンの推論
Authors: Andre He, Sean Welleck, Daniel Fried,
Abstract要約: 拡散モデルは、現在のステップではデコードされないものを含む、未知のトークンの分布を共同で予測するように訓練されていることを示す。補助的マルチトークン予測により,潜在トークンを自己回帰モデルに導入できることを実証する。以上の結果から,潜伏トークンは自然に拡散する一方で,グローバルコヒーレンスやルックアヘッドを必要とするタスクの性能向上のための一般的なメカニズムを示すことが示唆された。
参考スコア（独自算出の注目度）: 47.27454676014286
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Discrete diffusion models have recently become competitive with autoregressive models for language modeling, even outperforming them on reasoning tasks requiring planning and global coherence, but they require more computation at inference time. We trace this trade-off to a key mechanism: diffusion models are trained to jointly predict a distribution over all unknown tokens, including those that will not actually be decoded in the current step. Ablating this joint prediction yields faster inference but degrades performance, revealing that accurate prediction at the decoded position relies on joint reasoning about the distribution of undecoded tokens. We interpret these as latent tokens and introduce a method for modulating their number, demonstrating empirically that this enables a smooth tradeoff between inference speed and sample quality. Furthermore, we demonstrate that latent tokens can be introduced into autoregressive models through an auxiliary multi-token prediction objective, yielding substantial improvements on the same reasoning tasks where they have traditionally struggled. Our results suggest that latent tokens, while arising naturally in diffusion, represent a general mechanism for improving performance on tasks requiring global coherence or lookahead.
Abstract（参考訳）: 離散拡散モデルは最近、言語モデリングの自己回帰モデルと競合し、計画やグローバルコヒーレンスを必要とするタスクの推論よりも優れているが、推論時により多くの計算を必要とする。拡散モデルは、現在のステップで実際にデコードされないものを含む、未知のトークンの分布を共同で予測するように訓練される。この共同予測はより高速な推論をもたらすが、性能は低下し、復号された位置での正確な予測は未復号されたトークンの分布に関する共同推論に依存することが明らかになった。我々はこれらを潜在トークンとして解釈し、それらの数を調整する方法を導入し、推論速度とサンプル品質のスムーズなトレードオフを可能にすることを実証的に実証した。さらに, 遅延トークンは補助的マルチトークン予測目標を通じて自己回帰モデルに導入可能であることを示す。以上の結果から,潜伏トークンは自然に拡散する一方で,グローバルコヒーレンスやルックアヘッドを必要とするタスクの性能向上のための一般的なメカニズムを示すことが示唆された。

論文の概要: Reasoning with Latent Tokens in Diffusion Language Models

関連論文リスト