Fugu-MT 論文翻訳(概要): BlockGen: Flexible Blockwise Sequence Modeling with Hybrid Samplers

論文の概要: BlockGen: Flexible Blockwise Sequence Modeling with Hybrid Samplers

arxiv url: http://arxiv.org/abs/2606.02241v1
Date: Mon, 01 Jun 2026 13:36:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:32.109307
Title: BlockGen: Flexible Blockwise Sequence Modeling with Hybrid Samplers
Title（参考訳）: BlockGen: ハイブリッドサンプリングによる柔軟なブロックワイドシーケンスモデリング
Authors: Justin Deschenaux, Caglar Gulcehre,
Abstract要約: マスクと均一拡散の両方でインスタンス化するブロックワイズシーケンスモデルであるBlockGenを導入する。 BlockGenは、ARと拡散予測を組み合わせたARインフォームド予測-コレクタサンプリング(ARPC)を可能にし、不可能なトークンを再生成する。 GSM8Kのブロックサイズは16ドルであり、MDMはUSDMよりもわずかに精度が高く、OpenWebTextのGenerative Perplexityでも同様の傾向が観察されている。
参考スコア（独自算出の注目度）: 12.083218729202963
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Is the uniform-state diffusion framework a more powerful paradigm for discrete diffusion? Recent studies indicate that this may be the case. In combination with predictor-corrector samplers, uniform-state diffusion models (USDMs) produce samples of higher-quality than masked diffusion models (MDMs), and USDMs equal or outperform MDMs in downstream tasks, even though they exhibit greater perplexity. Two issues remain unresolved. First, existing work compares uniform and masked diffusion with un-informed correctors that re-inject noise at random positions, rather than targeting tokens most likely to be wrong. Second, prior work compares full-sequence diffusion models, so we do not know whether the same conclusion holds when tokens are generated block by block. To address these issues, we introduce BlockGen, a blockwise sequence model that we instantiate with both masked and uniform diffusion. BlockGen trains on a mixture of block sizes and its likelihood interpolates between AR and pure diffusion more finely than models with a fixed block size. BlockGen enables AR-informed predictor-corrector sampling (ARPC), which combines AR and diffusion predictions to re-generate unlikely tokens without an auxiliary verifier. Under ancestral sampling, uniform outperforms masked in the block-by-block setting, especially in the few-step regime. Under ARPC, the gap closes and reverses at high NFE. With block size $16$ on GSM8K, MDMs reach slightly higher accuracy than USDMs, and we observe a similar trend in Generative Perplexity on OpenWebText. Find our code at https://github.com/jdeschena/blockgen.
Abstract（参考訳）: 均一状態拡散フレームワークは離散拡散のより強力なパラダイムか? 近年の研究では、このことが示唆されている。均一状態拡散モデル(USDM)は、予測・相関型サンプリング器と組み合わせて、マスク拡散モデル(MDM)よりも高品質なサンプルを生成する。 2つの問題が未解決のままである。まず、既存の研究は均一な拡散とマスク付き拡散をランダムな位置でノイズを再注入する非インフォームド整形器と比較する。第二に、先行研究は全列拡散モデルと比較するので、トークンがブロック単位で生成されるときに同じ結論が成立するかどうかは不明です。これらの問題に対処するために、マスクと均一拡散の両方でインスタンス化するブロックワイズシーケンスモデルであるBlockGenを導入する。 BlockGenはブロックサイズと、その可能性でARと純粋な拡散を補間し、固定ブロックサイズを持つモデルよりも微妙に訓練する。 BlockGenは、ARと拡散予測を組み合わせたARインフォームド予測-コレクタサンプリング(ARPC)を可能にし、補助検証なしで不可能なトークンを再生成する。祖先のサンプリングでは、特に数段階の状況において、ブロック・バイ・ブロック設定で均一なパフォーマンスがマスクされる。 ARPCでは、ギャップが閉じて、高いNFEで逆になる。 GSM8Kのブロックサイズは16ドルであり、MDMはUSDMよりもわずかに精度が高く、OpenWebTextのGenerative Perplexityでも同様の傾向が観察されている。コードをhttps://github.com/jdeschena/blockgen.comで見つける。

論文の概要: BlockGen: Flexible Blockwise Sequence Modeling with Hybrid Samplers

関連論文リスト