Fugu-MT 論文翻訳(概要): $R_{dm}$: Re-conceptualizing Distribution Matching as a Reward for Diffusion Distillation

論文の概要: $R_{dm}$: Re-conceptualizing Distribution Matching as a Reward for Diffusion Distillation

arxiv url: http://arxiv.org/abs/2603.28460v1
Date: Mon, 30 Mar 2026 14:01:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:45.432465
Title: $R_{dm}$: Re-conceptualizing Distribution Matching as a Reward for Diffusion Distillation
Title（参考訳）: $R_{dm}$:拡散蒸留のリワードとしての分散マッチングの再概念化
Authors: Linqian Fan, Peiqin Sun, Tiancheng Wen, Shun Lu, Chengru Song,
Abstract要約: 拡散モデルは、最先端の生成性能を達成するが、その遅い反復サンプリングプロセスによってボトルネックとなる。最近のアプローチでは、強化学習(RL)を統合して、この天井を壊そうとしている。本稿では,分布マッチングを報酬として再認識し,$R_dm$と表記する新しいパラダイムを提案する。
参考スコア（独自算出の注目度）: 9.105357939499683
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models achieve state-of-the-art generative performance but are fundamentally bottlenecked by their slow iterative sampling process. While diffusion distillation techniques enable high-fidelity few-step generation, traditional objectives often restrict the student's performance by anchoring it solely to the teacher. Recent approaches have attempted to break this ceiling by integrating Reinforcement Learning (RL), typically through a simple summation of distillation and RL objectives. In this work, we propose a novel paradigm by reconceptualizing distribution matching as a reward, denoted as $R_{dm}$. This unified perspective bridges the algorithmic gap between Diffusion Matching Distillation (DMD) and RL, providing several key benefits. (1) Enhanced optimization stability: we introduce Group Normalized Distribution Matching (GNDM), which adapts standard RL group normalization to stabilize $R_{dm}$ estimation. By leveraging group-mean statistics, GNDM establishes a more robust and effective optimization direction. (2) Seamless reward integration: our reward-centric formulation inherently supports adaptive weighting mechanisms, allowing flexible combination of DMD with external reward models. (3) Improved sampling efficiency: by aligning with RL principles, the framework readily incorporates importance sampling (IS), leading to a significant boost in sampling efficiency. Extensive experiments demonstrate that GNDM outperforms vanilla DMD, reducing the FID by 1.87. Furthermore, our multi-reward variant, GNDMR, surpasses existing baselines by achieving a strong balance between aesthetic quality and fidelity, reaching a peak HPS of 30.37 and a low FID-SD of 12.21. Overall, $R_{dm}$ provides a flexible, stable, and efficient framework for real-time high-fidelity synthesis. Code will be released upon publication.
Abstract（参考訳）: 拡散モデルは最先端の生成性能を達成するが、その遅い反復サンプリングプロセスによって基本的にボトルネックとなる。拡散蒸留技術は高忠実度数ステップの生成を可能にするが、従来の目的は教師のみに固定することで生徒のパフォーマンスを制限することがしばしばある。近年のアプローチでは, 蒸留とRL目標の単純な和で強化学習(RL)を統合することで, この天井を壊そうとしている。本稿では、分布マッチングを報酬として再認識し、$R_{dm}$と表記する新しいパラダイムを提案する。この統合された視点は拡散マッチング蒸留(DMD)とRLの間のアルゴリズム的なギャップを埋め、いくつかの重要な利点を提供する。 1) 最適化安定性の向上: 標準RL群正規化を適用して$R_{dm}$推定を安定化するグループ正規化分布マッチング(GNDM)を導入する。グループ平均統計を利用して、GNDMはより堅牢で効果的な最適化の方向性を確立する。 2)報酬中心の定式化は適応重み付け機構を本質的にサポートし,MDDと外部報酬モデルとの柔軟な組み合わせを可能にした。 (3) サンプリング効率の向上: RL 原則と整合することにより, 重要サンプリング(IS)を容易に組み込むことで, サンプリング効率が大幅に向上する。大規模な実験により、GNDMはバニラDMDより優れ、FIDが1.87減少することが示された。さらに, マルチリワード型であるGNDMRは, 審美的品質と忠実性のバランスを保ち, ピークHPSが30.37, 低FID-SDが12.21に達した。全体として、$R_{dm}$は、リアルタイム高忠実合成のための柔軟で安定で効率的なフレームワークを提供する。コードは出版時に公開される。

論文の概要: $R_{dm}$: Re-conceptualizing Distribution Matching as a Reward for Diffusion Distillation

関連論文リスト