Fugu-MT 論文翻訳(概要): Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning

論文の概要: Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning

arxiv url: http://arxiv.org/abs/2511.15190v1
Date: Wed, 19 Nov 2025 07:24:07 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-20 15:51:28.679196
Title: Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning
Title（参考訳）: Masked Auto-Regressive Variational Acceleration: Fast Inferenceによる実践的強化学習
Authors: Yuxuan Gu, Weimin Bai, Yifei Wang, Weijian Luo, He Sun,
Abstract要約: マスク付き自己回帰拡散モデル(MAR)は拡散モデルの表現的モデリング能力の恩恵を受ける。 MARVAL(Masked Auto-Regressive Variational Acceleration)は、拡散鎖を1つのAR生成ステップに圧縮する蒸留ベースのフレームワークである。
参考スコア（独自算出の注目度）: 23.8766303220919
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Masked auto-regressive diffusion models (MAR) benefit from the expressive modeling ability of diffusion models and the flexibility of masked auto-regressive ordering. However, vanilla MAR suffers from slow inference due to its hierarchical inference mechanism: an outer AR unmasking loop and an inner diffusion denoising chain. Such decoupled structure not only harm the generation efficiency but also hinder the practical use of MAR for reinforcement learning (RL), an increasingly critical paradigm for generative model post-training.To address this fundamental issue, we introduce MARVAL (Masked Auto-regressive Variational Acceleration), a distillation-based framework that compresses the diffusion chain into a single AR generation step while preserving the flexible auto-regressive unmasking order. Such a distillation with MARVAL not only yields substantial inference acceleration but, crucially, makes RL post-training with verifiable rewards practical, resulting in scalable yet human-preferred fast generative models. Our contributions are twofold: (1) a novel score-based variational objective for distilling masked auto-regressive diffusion models into a single generation step without sacrificing sample quality; and (2) an efficient RL framework for masked auto-regressive models via MARVAL-RL. On ImageNet 256*256, MARVAL-Huge achieves an FID of 2.00 with more than 30 times speedup compared with MAR-diffusion, and MARVAL-RL yields consistent improvements in CLIP and image-reward scores on ImageNet datasets with entity names. In conclusion, MARVAL demonstrates the first practical path to distillation and RL of masked auto-regressive diffusion models, enabling fast sampling and better preference alignments.
Abstract（参考訳）: マスク付き自己回帰拡散モデル(MAR)は、拡散モデルの表現的モデリング能力とマスク付き自己回帰秩序の柔軟性の恩恵を受ける。しかし、バニラMARは、その階層的推論機構(外的ARアンマスキングループと内的拡散縮退鎖)により、緩やかな推論に苦しむ。このような疎結合構造は, 生成効率を損なうだけでなく, 生成モデルポストトレーニングの重要パラダイムである強化学習(RL)へのMARの実践的利用を阻害する。この根本的な問題に対処するために, フレキシブル自己回帰的アンマスク順序を維持しつつ, 拡散鎖を単一のAR生成ステップに圧縮する蒸留ベースのフレームワークであるMARVAL(Masked Auto-Regressive Variational Acceleration)を導入する。このようなMARVALによる蒸留は、かなりの推論加速をもたらすだけでなく、重要なことに、検証可能な報酬でRLのポストトレーニングを実践し、スケーラブルで人間に好まれる高速な生成モデルをもたらす。筆者らの貢献は,(1)マスク付き自己回帰拡散モデルを試料品質を犠牲にすることなく単一生成段階に蒸留するための新しいスコアベース変動目標,(2)MARVAL-RLによるマスク付き自己回帰拡散モデルのための効率的なRLフレームワークである。 ImageNet 256*256では、MARVAL-HugeはMAR-diffusionの30倍以上のスピードアップで2.00のFIDを達成した。結論として、MARVALは、マスク付き自己回帰拡散モデルの蒸留とRLへの最初の実践的経路を示し、高速サンプリングとより良い選好アライメントを可能にする。

論文の概要: Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning

関連論文リスト