Aligning Few-Step Diffusion Models with Dense Reward Difference Learning
- URL: http://arxiv.org/abs/2411.11727v1
- Date: Mon, 18 Nov 2024 16:57:41 GMT
- Title: Aligning Few-Step Diffusion Models with Dense Reward Difference Learning
- Authors: Ziyi Zhang, Li Shen, Sen Zhang, Deheng Ye, Yong Luo, Miaojing Shi, Bo Du, Dacheng Tao,
- Abstract summary: Stepwise Diffusion Policy Optimization (SDPO) is an alignment method tailored for few-step diffusion models.
SDPO incorporates dense reward feedback at every intermediate step to ensure consistent alignment across all denoising steps.
SDPO consistently outperforms prior methods in reward-based alignment across diverse step configurations.
- Score: 81.85515625591884
- License:
- Abstract: Aligning diffusion models with downstream objectives is essential for their practical applications. However, standard alignment methods often struggle with step generalization when directly applied to few-step diffusion models, leading to inconsistent performance across different denoising step scenarios. To address this, we introduce Stepwise Diffusion Policy Optimization (SDPO), a novel alignment method tailored for few-step diffusion models. Unlike prior approaches that rely on a single sparse reward from only the final step of each denoising trajectory for trajectory-level optimization, SDPO incorporates dense reward feedback at every intermediate step. By learning the differences in dense rewards between paired samples, SDPO facilitates stepwise optimization of few-step diffusion models, ensuring consistent alignment across all denoising steps. To promote stable and efficient training, SDPO introduces an online reinforcement learning framework featuring several novel strategies designed to effectively exploit the stepwise granularity of dense rewards. Experimental results demonstrate that SDPO consistently outperforms prior methods in reward-based alignment across diverse step configurations, underscoring its robust step generalization capabilities. Code is avaliable at https://github.com/ZiyiZhang27/sdpo.
Related papers
- Prioritize Denoising Steps on Diffusion Model Preference Alignment via Explicit Denoised Distribution Estimation [18.295352638247362]
We propose Denoised Distribution Estimation (DDE), a novel method for credit assignment.
DDE directly estimates the terminal denoised distribution from the perspective of each step.
It is equipped with two estimation strategies and capable of representing the entire denoising trajectory with a single model inference.
arXiv Detail & Related papers (2024-11-22T11:45:33Z) - Step-wise Distribution Alignment Guided Style Prompt Tuning for Source-free Cross-domain Few-shot Learning [53.60934432718044]
Cross-domain few-shot learning methods face challenges with large-scale pre-trained models due to inaccessible source data and training strategies.
This paper introduces Step-wise Distribution Alignment Guided Style Prompt Tuning (StepSPT)
StepSPT implicitly narrows domain gaps through prediction distribution optimization.
arXiv Detail & Related papers (2024-11-15T09:34:07Z) - Avoiding mode collapse in diffusion models fine-tuned with reinforcement learning [0.0]
Fine-tuning foundation models via reinforcement learning (RL) has proven promising for aligning to downstream objectives.
We exploit the hierarchical nature of diffusion models (DMs) and train them dynamically at each epoch with a tailored RL method.
We show that models trained with HRF achieve better preservation of diversity in downstream tasks, thus enhancing the fine-tuning robustness and at uncompromising mean rewards.
arXiv Detail & Related papers (2024-10-10T19:06:23Z) - Training-free Diffusion Model Alignment with Sampling Demons [15.400553977713914]
We propose an optimization approach, dubbed Demon, to guide the denoising process at inference time without backpropagation through reward functions or model retraining.
Our approach works by controlling noise distribution in denoising steps to concentrate density on regions corresponding to high rewards through optimization.
To the best of our knowledge, the proposed approach is the first inference-time, backpropagation-free preference alignment method for diffusion models.
arXiv Detail & Related papers (2024-10-08T07:33:49Z) - FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models [10.969811500333755]
We introduce a Fine-tuning Initial Noise Distribution (FIND) framework with policy optimization.
Our method achieves 10 times faster than the SOTA approach.
arXiv Detail & Related papers (2024-07-28T10:07:55Z) - Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts [33.58165081033569]
We introduce Sparse MetA-Tuning (SMAT), a method inspired by sparse mixture-of-experts approaches.
SMAT successfully overcomes OOD sensitivity and delivers on the promise of enhancing the transfer abilities of vision foundation models.
arXiv Detail & Related papers (2024-03-13T12:46:03Z) - Learn to Optimize Denoising Scores for 3D Generation: A Unified and
Improved Diffusion Prior on NeRF and 3D Gaussian Splatting [60.393072253444934]
We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks.
We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation.
arXiv Detail & Related papers (2023-12-08T03:55:34Z) - Variance-Preserving-Based Interpolation Diffusion Models for Speech
Enhancement [53.2171981279647]
We present a framework that encapsulates both the VP- and variance-exploding (VE)-based diffusion methods.
To improve performance and ease model training, we analyze the common difficulties encountered in diffusion models.
We evaluate our model against several methods using a public benchmark to showcase the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-14T14:22:22Z) - Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling.
We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models.
NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z) - Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss.
Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.