Fugu-MT 論文翻訳(概要): Any-Order Flexible Length Masked Diffusion

論文の概要: Any-Order Flexible Length Masked Diffusion

arxiv url: http://arxiv.org/abs/2509.01025v2
Date: Sun, 07 Sep 2025 22:48:13 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-09 14:07:03.350667
Title: Any-Order Flexible Length Masked Diffusion
Title（参考訳）: 任意のフレキシブル長仮面拡散
Authors: Jaeyeon Kim, Lee Cheuk-Kit, Carles Domingo-Enrich, Yilun Du, Sham Kakade, Timothy Ngotiaoco, Sitan Chen, Michael Albergo,
Abstract要約: マスク付き拡散モデル(MDMs)は、最近、離散領域上の自己回帰モデルに代わる有望な代替として登場した。本稿では,フレキシブルマスク付き拡散モデル (FlexMDM) を紹介する。我々は,FlexMDMがMDMと複雑度を一致させながら,より忠実度の高い長さ統計をモデル化することを示した。
参考スコア（独自算出の注目度）: 53.89217188409148
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Masked diffusion models (MDMs) have recently emerged as a promising alternative to autoregressive models over discrete domains. MDMs generate sequences in an any-order, parallel fashion, enabling fast inference and strong performance on non-causal tasks. However, a crucial limitation is that they do not support token insertions and are thus limited to fixed-length generations. To this end, we introduce Flexible Masked Diffusion Models (FlexMDMs), a discrete diffusion paradigm that simultaneously can model sequences of flexible length while provably retaining MDMs' flexibility of any-order inference. Grounded in an extension of the stochastic interpolant framework, FlexMDMs generate sequences by inserting mask tokens and unmasking them. Empirically, we show that FlexMDMs match MDMs in perplexity while modeling length statistics with much higher fidelity. On a synthetic maze planning task, they achieve $\approx 60 \%$ higher success rate than MDM baselines. Finally, we show pretrained MDMs can easily be retrofitted into FlexMDMs: on 16 H100s, it takes only three days to fine-tune LLaDA-8B into a FlexMDM, achieving superior performance on math (GSM8K, $58\% \to 67\%$) and code infilling performance ($52\% \to 65\%$).
Abstract（参考訳）: マスク付き拡散モデル(MDMs)は、最近、離散領域上の自己回帰モデルに代わる有望な代替として登場した。 MDMは任意の順序で並列なシーケンスを生成し、高速な推論と非因果的タスクの強いパフォーマンスを実現する。しかし、重要な制限は、トークン挿入をサポートしておらず、したがって固定長世代に制限されていることである。この目的のために、フレキシブルマスク付き拡散モデル(FlexMDMs)を導入し、任意の順序推論の柔軟性を維持しつつ、フレキシブルな長さのシーケンスを同時にモデル化できる離散拡散パラダイムを提案する。確率的補間フレームワークの拡張で、FlexMDMはマスクトークンを挿入してそれらをアンマスクすることでシーケンスを生成する。実験により,FlexMDMはMDMと複雑度を一致させながら,より高忠実度で長さ統計をモデル化することを示した。合成迷路計画タスクでは、MDMベースラインよりも60 %以上の成功率を達成する。 16 H100sでは、LLaDA-8BをFlexMDMに微調整するのにわずか3日しかかからず、数学における優れた性能(GSM8K, 5,8\% \to 67\%$)とコード入力性能(52\% \to 65\%$)を達成する。

論文の概要: Any-Order Flexible Length Masked Diffusion

関連論文リスト