Aligning Few-Step Diffusion Models with Dense Reward Difference Learning
- URL: http://arxiv.org/abs/2411.11727v1
- Date: Mon, 18 Nov 2024 16:57:41 GMT
- Title: Aligning Few-Step Diffusion Models with Dense Reward Difference Learning
- Authors: Ziyi Zhang, Li Shen, Sen Zhang, Deheng Ye, Yong Luo, Miaojing Shi, Bo Du, Dacheng Tao,
- Abstract summary: Stepwise Diffusion Policy Optimization (SDPO) is an alignment method tailored for few-step diffusion models.
SDPO incorporates dense reward feedback at every intermediate step to ensure consistent alignment across all denoising steps.
SDPO consistently outperforms prior methods in reward-based alignment across diverse step configurations.
- Score: 81.85515625591884
- License:
- Abstract: Aligning diffusion models with downstream objectives is essential for their practical applications. However, standard alignment methods often struggle with step generalization when directly applied to few-step diffusion models, leading to inconsistent performance across different denoising step scenarios. To address this, we introduce Stepwise Diffusion Policy Optimization (SDPO), a novel alignment method tailored for few-step diffusion models. Unlike prior approaches that rely on a single sparse reward from only the final step of each denoising trajectory for trajectory-level optimization, SDPO incorporates dense reward feedback at every intermediate step. By learning the differences in dense rewards between paired samples, SDPO facilitates stepwise optimization of few-step diffusion models, ensuring consistent alignment across all denoising steps. To promote stable and efficient training, SDPO introduces an online reinforcement learning framework featuring several novel strategies designed to effectively exploit the stepwise granularity of dense rewards. Experimental results demonstrate that SDPO consistently outperforms prior methods in reward-based alignment across diverse step configurations, underscoring its robust step generalization capabilities. Code is avaliable at https://github.com/ZiyiZhang27/sdpo.
Related papers
- Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening [56.99266993852532]
Diffusion-Sharpening is a fine-tuning approach that enhances downstream alignment by optimizing sampling trajectories.
Our method demonstrates superior training efficiency with faster convergence, and best inference efficiency without requiring additional NFEs.
arXiv Detail & Related papers (2025-02-17T18:57:26Z) - Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking [50.325021634589596]
We propose a Tailored Optimization Preference (TailorPO) framework for aligning diffusion models with human preference.
Our approach directly ranks intermediate noisy samples based on their step-wise reward, and effectively resolves the gradient direction issues.
Experimental results demonstrate that our method significantly improves the model's ability to generate aesthetically pleasing and human-preferred images.
arXiv Detail & Related papers (2025-02-01T16:08:43Z) - Test-time Alignment of Diffusion Models without Reward Over-optimization [8.981605934618349]
Diffusion models excel in generative tasks, but aligning them with specific objectives remains challenging.
We propose a training-free, test-time method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution.
We demonstrate its effectiveness in single-reward optimization, multi-objective scenarios, and online black-box optimization.
arXiv Detail & Related papers (2025-01-10T09:10:30Z) - Adaptive Coordinate-Wise Step Sizes for Quasi-Newton Methods: A Learning-to-Optimize Approach [9.82454981262489]
We introduce a novel Learning-to-(L2O) model within the Broyden-Fletcher-Goldfarb-Shanno framework, which leverages neural networks to predict optimal coordinate-wise step sizes.
Our approach achieves substantial improvements over traditional backtracking line search and hypergradient descent-based methods.
arXiv Detail & Related papers (2024-11-25T07:13:59Z) - Prioritize Denoising Steps on Diffusion Model Preference Alignment via Explicit Denoised Distribution Estimation [18.295352638247362]
We propose Denoised Distribution Estimation (DDE), a novel method for credit assignment.
DDE directly estimates the terminal denoised distribution from the perspective of each step.
It is equipped with two estimation strategies and capable of representing the entire denoising trajectory with a single model inference.
arXiv Detail & Related papers (2024-11-22T11:45:33Z) - Step-wise Distribution Alignment Guided Style Prompt Tuning for Source-free Cross-domain Few-shot Learning [53.60934432718044]
Cross-domain few-shot learning methods face challenges with large-scale pre-trained models due to inaccessible source data and training strategies.
This paper introduces Step-wise Distribution Alignment Guided Style Prompt Tuning (StepSPT)
StepSPT implicitly narrows domain gaps through prediction distribution optimization.
arXiv Detail & Related papers (2024-11-15T09:34:07Z) - Training-free Diffusion Model Alignment with Sampling Demons [15.400553977713914]
We propose an optimization approach, dubbed Demon, to guide the denoising process at inference time without backpropagation through reward functions or model retraining.
Our approach works by controlling noise distribution in denoising steps to concentrate density on regions corresponding to high rewards through optimization.
To the best of our knowledge, the proposed approach is the first inference-time, backpropagation-free preference alignment method for diffusion models.
arXiv Detail & Related papers (2024-10-08T07:33:49Z) - FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models [10.969811500333755]
We introduce a Fine-tuning Initial Noise Distribution (FIND) framework with policy optimization.
Our method achieves 10 times faster than the SOTA approach.
arXiv Detail & Related papers (2024-07-28T10:07:55Z) - Variance-Preserving-Based Interpolation Diffusion Models for Speech
Enhancement [53.2171981279647]
We present a framework that encapsulates both the VP- and variance-exploding (VE)-based diffusion methods.
To improve performance and ease model training, we analyze the common difficulties encountered in diffusion models.
We evaluate our model against several methods using a public benchmark to showcase the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-14T14:22:22Z) - Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling.
We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models.
NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.