Fugu-MT 論文翻訳(概要): Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture

論文の概要: Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture

arxiv url: http://arxiv.org/abs/2506.19935v1
Date: Tue, 24 Jun 2025 18:22:25 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-26 21:00:42.502447
Title: Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture
Title（参考訳）: マスケ拡散モデルとしての任意の次 GPT: 定式化とアーキテクチャの分離
Authors: Shuchen Xue, Tianyu Xie, Tianyang Hu, Zijin Feng, Jiacheng Sun, Kenji Kawaguchi, Zhenguo Li, Zhi-Ming Ma,
Abstract要約: 自己回帰(AR)モデルの代替として、仮面拡散モデル(MDM)が登場している。 ARモデルはデコーダのみであることが多いが、MDMはエンコーダのみである。本研究は,デコーダのみのフレームワークにおけるMDMを評価した。 MDM内でアーキテクチャの影響(デコーダのみ対エンコーダのみ)を調査する。
参考スコア（独自算出の注目度）: 65.88390432432116
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) predominantly use autoregressive (AR) approaches, but masked diffusion models (MDMs) are emerging as viable alternatives. A key challenge in comparing AR and MDM paradigms is their typical architectural difference: AR models are often decoder-only, while MDMs have largely been encoder-only. This practice of changing both the modeling paradigm and architecture simultaneously makes direct comparisons unfair, as it's hard to distinguish whether observed differences stem from the paradigm itself or the architectural shift. This research evaluates MDMs within a decoder-only framework to: (1) equitably compare MDM (as Any-Order AR, or AO-AR) and standard AR paradigms. Our investigation suggests that the standard AO-AR objective, which averages over all token permutations, may benefit from refinement, as many permutations appear less informative compared to the language's inherent left-to-right structure. (2) Investigate architectural influences (decoder-only vs. encoder-only) within MDMs. We demonstrate that while encoder-only MDMs model a simpler conditional probability space, decoder-only MDMs can achieve dramatic generation speedups ($\sim25\times$) and comparable perplexity with temperature annealing despite modeling a vastly larger space, highlighting key trade-offs. This work thus decouples core paradigm differences from architectural influences, offering insights for future model design. Code is available at https://github.com/scxue/AO-GPT-MDM.
Abstract（参考訳）: 大規模言語モデル(LLM)は、主に自己回帰(AR)アプローチを用いるが、マスク拡散モデル(MDM)は実行可能な代替手段として出現している。 ARモデルはしばしばデコーダのみであり、MDMはエンコーダのみである。モデリングパラダイムとアーキテクチャの両方を同時に変更するというこのプラクティスは、観察された違いがパラダイム自体とアーキテクチャのシフトに由来するかどうかを区別するのは難しいため、直接的な比較を不公平にします。本研究は、デコーダのみのフレームワークにおけるMDMを評価し、(1)MDM(Any-Order AR、またはAO-AR)と標準ARパラダイムとを等しく比較する。本研究は,すべてのトークン置換の平均値である標準AO-ARの目的が,言語固有の左から右への構造に比べて,多くの置換がより情報的でないように見えるため,改良の恩恵を受けることを示唆している。 2)MDMにおける建築的影響(デコーダのみ対エンコーダのみ)について検討する。エンコーダのみの MDM はより単純な条件付き確率空間をモデル化する一方で,デコーダのみの MDM は劇的な生成速度アップ (\sim25\times$) を実現することができる。この作業は、コアパラダイムとアーキテクチャの影響を分離し、将来のモデル設計に対する洞察を提供する。コードはhttps://github.com/scxue/AO-GPT-MDMで入手できる。

論文の概要: Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture

関連論文リスト