Fugu-MT 論文翻訳(概要): Self-Speculative Masked Diffusions

論文の概要: Self-Speculative Masked Diffusions

arxiv url: http://arxiv.org/abs/2510.03929v1
Date: Sat, 04 Oct 2025 20:16:38 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.347899
Title: Self-Speculative Masked Diffusions
Title（参考訳）: 自己投機的マスケッド拡散
Authors: Andrew Campbell, Valentin De Bortoli, Jiaxin Shi, Arnaud Doucet,
Abstract要約: 本稿では,離散データに対する自己投機的マスク拡散モデルを提案する。マスク位置上の非分解予測を発生させることにより計算負担を低減する。我々は,GPT2スケールのテキストモデリングとタンパク質配列生成に本手法を適用し,必要なネットワーク転送回数を2倍に削減できることを確認した。
参考スコア（独自算出の注目度）: 46.04054227238148
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present self-speculative masked diffusions, a new class of masked diffusion generative models for discrete data that require significantly fewer function evaluations to generate samples. Standard masked diffusion models predict factorized logits over currently masked positions. A number of masked positions are then sampled, however, the factorization approximation means that sampling too many positions in one go leads to poor sample quality. As a result, many simulation steps and therefore neural network function evaluations are required to generate high-quality data. We reduce the computational burden by generating non-factorized predictions over masked positions. This is achieved by modifying the final transformer attention mask from non-causal to causal, enabling draft token generation and parallel validation via a novel, model-integrated speculative sampling mechanism. This results in a non-factorized predictive distribution over masked positions in a single forward pass. We apply our method to GPT2 scale text modelling and protein sequences generation, finding that we can achieve a ~2x reduction in the required number of network forward passes relative to standard masked diffusion models.
Abstract（参考訳）: そこで本研究では, 自己投機的マスク拡散モデルを用いて, サンプル生成に要する関数評価を著しく少なくする離散データに対する新しい種類のマスク拡散生成モデルを提案する。標準的なマスク拡散モデルは、現在マスクされている位置よりも分解ロジットを予測する。マスクされた位置のいくつかがサンプリングされるが、因子化近似は1回にあまりに多くの位置をサンプリングするとサンプルの品質が低下することを意味する。その結果、高品質なデータを生成するには、多くのシミュレーションステップやニューラルネットワーク機能の評価が必要である。マスク位置上の非分解予測を発生させることにより計算負担を低減する。これは、最終的なトランスフォーマーの注意マスクを非因果から因果に修正し、新しいモデル統合投機サンプリング機構によるドラフトトークンの生成と並列検証を可能にする。これにより、単一の前方通過におけるマスク位置上の非分解予測分布が得られる。本手法はGPT2スケールのテキストモデリングとタンパク質配列生成に適用し,標準的なマスク付き拡散モデルと比較して,要求されるネットワークフォワード数の約2倍の削減を実現する。

論文の概要: Self-Speculative Masked Diffusions

関連論文リスト