Fugu-MT 論文翻訳(概要): Optimal Inference Schedules for Masked Diffusion Models

論文の概要: Optimal Inference Schedules for Masked Diffusion Models

arxiv url: http://arxiv.org/abs/2511.04647v1
Date: Thu, 06 Nov 2025 18:38:24 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-07 20:17:53.562005
Title: Optimal Inference Schedules for Masked Diffusion Models
Title（参考訳）: マスク付き拡散モデルのための最適推論スケジューリング
Authors: Sitan Chen, Kevin Cong, Jerry Li,
Abstract要約: マスク付き拡散モデル(MDM)は、順番に多くのトークンを同時に同時にサンプリングすることができる。分布の事前知識が強くなければ、一般にそれと競合することは不可能であることを示す。
参考スコア（独自算出の注目度）: 16.774584258255768
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A major bottleneck of standard auto-regressive large language models is that their inference process is inherently sequential, resulting in very long and costly inference times. To circumvent this, practitioners proposed a class of language models called diffusion language models, of which the masked diffusion model (MDM) is the most successful. The MDM is able to sample tokens out-of-order and, ostensibly, many tokens at once and in parallel. However, there is very limited rigorous understanding of how much parallel sampling these models can perform without noticeable degradation in their sampling performance. Prior work of Li and Cai obtained some preliminary bounds, but these are not tight for many natural classes of distributions. In this work, we give a new, exact characterization of the expected divergence between the true distribution and the sampled distribution, for any distribution and any unmasking schedule for the sampler, showing an elegant connection to the theory of univariate function approximation. By leveraging this connection, we then attain a number of novel lower and upper bounds for this problem. While the connection to function approximation in principle gives the optimal unmasking schedule for any distribution, we show that it is in general impossible to compete with it without strong a priori knowledge of the distribution, even in seemingly benign settings. However, we also demonstrate new upper bounds and new sampling schedules in terms of well-studied information-theoretic properties of the base distribution, namely, its total correlation and dual total correlation, which show that in some natural settings, one can sample in $O(log n)$ steps without any visible loss in performance, where $n$ is the total sequence length.
Abstract（参考訳）: 標準の自己回帰型大規模言語モデルのボトルネックは、推論プロセスが本質的にシーケンシャルであり、非常に長くコストがかかることにある。これを回避するために、実践者は拡散言語モデルと呼ばれる言語モデルのクラスを提案し、その中で最も成功したのがマスク拡散モデル(MDM)である。 MDMは、順番に多くのトークンをサンプリングすることができ、かつ、目に見えるように、同時に同時に多くのトークンをサンプリングすることができる。しかしながら、これらのモデルがサンプリング性能を著しく低下させることなく、どれだけの並列サンプリングを行うことができるかという厳密な理解は極めて限られている。 Li と Cai の以前の研究はいくつかの予備境界を得たが、これは多くの分布の自然類に対して厳密ではない。本研究では, 実分布と標本分布との有意なばらつきを, サンプリング器の任意の分布と未一致スケジュールに対して新たに正確に評価し, 単変量関数近似の理論とエレガントな関係を示す。この接続を利用することで、この問題に対して多くの新しい下限と上限を達成できる。関数近似への接続は、原則として任意の分布に対して最適なアンマキングスケジュールを与えるが、その分布に関する事前知識が強固になければ、たとえ一見した設定であっても、一般にそれと競合することは不可能であることを示す。しかし,本論文では,基本分布の高次分布,すなわち,その全相関関係と二重全相関関係について,よりよく研究された情報理論特性を用いて,新たな上限値とサンプリングスケジュールを示すとともに,ある自然条件下では,n$が全列長であるような性能の損失を生じさせないよう,$O(log n)$ステップでサンプルをサンプリングできることが示されている。

論文の概要: Optimal Inference Schedules for Masked Diffusion Models

関連論文リスト