Fugu-MT 論文翻訳(概要): AdvDMD: Adversarial Reward Meets DMD For High-Quality Few-Step Generation

論文の概要: AdvDMD: Adversarial Reward Meets DMD For High-Quality Few-Step Generation

arxiv url: http://arxiv.org/abs/2604.28126v1
Date: Wed, 29 Apr 2026 16:56:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-01 16:31:54.217038
Title: AdvDMD: Adversarial Reward Meets DMD For High-Quality Few-Step Generation
Title（参考訳）: AdvDMD:Adversarial RewardがDMDと出会い、高品質なFew-Step生成が可能に
Authors: Xu Wang, Zexian Li, Litong Gong, Tiezheng Ge, Zhijie Deng,
Abstract要約: 拡散モデルは、広範囲なサンプリングステップを犠牲にして、より優れた世代品質を提供する。本稿ではDMD蒸留とRLをシームレスに統一するAdvDMDを提案する。我々は、より安定的で効率的なトレーニングを実現するために、統一されたSDE後方シミュレーションとMDDとRLの異なるトレーニングスケジュールを採用する。
参考スコア（独自算出の注目度）: 29.31853528513521
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models offer superior generation quality at the expense of extensive sampling steps. Distillation methods, with Distribution Matching Distillation (DMD) as a popular example, can mitigate this issue, but performance degradation remains pronounced when sampling steps are limited. Reinforcement learning (RL) has been leveraged to improve the few-step generation quality during distillation, with the potential to even surpass the performance of the teacher model. However, existing approaches are combinatorial in nature, merely integrating an RL process with the distillation process, which introduces unnecessary complexities. To address this gap, we propose AdvDMD, a method that seamlessly unifies DMD distillation and RL. Specifically, AdvDMD employs the adversarially trained discriminator from DMD2 as the reward model, which assigns low scores to generated images and high scores to real ones. It is trained on both intermediate and final states of the denoising process and updated online with the distilled model, enabling a holistic supervision of the sampling trajectories and mitigating reward hacking. We adopt a unified SDE backward simulation and a different training schedule for DMD and RL to enable a more stable and efficient training. Experimental results demonstrate that the 4-step AdvDMD outperforms the original 40-step model for SD3.5 on DPG-Bench, while achieving significant performance gains for SD3 on the GenEval. On Qwen-Image, our 2-step AdvDMD achieves superior performance over TwinFlow.
Abstract（参考訳）: 拡散モデルは、広範囲なサンプリングステップを犠牲にして、より優れた世代品質を提供する。分散マッチング蒸留 (DMD) が一般的な例である蒸留法は, この問題を緩和することができるが, サンプリング工程が限定されている場合, 性能劣化が顕著である。強化学習 (Reinforcement Learning, RL) は, 蒸留における数段階の生成品質の向上に利用されており, 教師モデルの性能を超越する可能性もある。しかし、既存のアプローチは本質的には組合せ的であり、RLプロセスと蒸留プロセスを統合するだけで不要な複雑さをもたらす。そこで本研究では,DMD蒸留とRLをシームレスに統一する手法であるAdvDMDを提案する。具体的には、AdvDMDは、DMD2の逆訓練された判別器を報酬モデルとして採用し、低得点を生成された画像に割り当て、高得点を実画像に割り当てる。復調過程の中間状態と最終状態の両方をトレーニングし、蒸留モデルでオンラインで更新し、サンプリング軌跡の全体的監視と報酬ハッキングの緩和を可能にする。我々は、より安定的で効率的なトレーニングを実現するために、統一されたSDE後方シミュレーションとMDDとRLの異なるトレーニングスケジュールを採用する。実験の結果,4ステップのAdvDMDはDPG-Bench上でのSD3.5の40ステップモデルよりも優れ,GenEval上でのSD3の性能向上を実現していることがわかった。 Qwen-Imageでは、2ステップのAdvDMDがTwinFlowよりも優れたパフォーマンスを実現しています。

論文の概要: AdvDMD: Adversarial Reward Meets DMD For High-Quality Few-Step Generation

関連論文リスト