Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
- URL: http://arxiv.org/abs/2509.06942v3
- Date: Thu, 11 Sep 2025 17:14:11 GMT
- Title: Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
- Authors: Xiangwei Shen, Zhimin Li, Zhantao Yang, Shiyi Zhang, Yingfang Zhang, Donghao Li, Chunyu Wang, Qinglin Lu, Yansong Tang,
- Abstract summary: We propose Direct-Align, a method that predefines a noise prior to effectively recover original images from any time steps via gradient computation.<n>We also introduce Semantic Relative Preference Optimization (SRPO), in which rewards are formulated as text-conditioned signals.<n>By fine-tuning the FLUX model with optimized denoising and online reward adjustment, we improve its human-evaluated realism and aesthetic quality by over 3x.
- Score: 41.498905319841874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies have demonstrated the effectiveness of directly aligning diffusion models with human preferences using differentiable reward. However, they exhibit two primary challenges: (1) they rely on multistep denoising with gradient computation for reward scoring, which is computationally expensive, thus restricting optimization to only a few diffusion steps; (2) they often need continuous offline adaptation of reward models in order to achieve desired aesthetic quality, such as photorealism or precise lighting effects. To address the limitation of multistep denoising, we propose Direct-Align, a method that predefines a noise prior to effectively recover original images from any time steps via interpolation, leveraging the equation that diffusion states are interpolations between noise and target images, which effectively avoids over-optimization in late timesteps. Furthermore, we introduce Semantic Relative Preference Optimization (SRPO), in which rewards are formulated as text-conditioned signals. This approach enables online adjustment of rewards in response to positive and negative prompt augmentation, thereby reducing the reliance on offline reward fine-tuning. By fine-tuning the FLUX model with optimized denoising and online reward adjustment, we improve its human-evaluated realism and aesthetic quality by over 3x.
Related papers
- Preference Score Distillation: Leveraging 2D Rewards to Align Text-to-3D Generation with Human Preference [69.34278282513593]
Preference Score Distillation (PSD) is an optimization-based framework for human-aligned text-to-3D synthesis without 3D training data.<n>Our key insight stems from the incompatibility of pixel-level gradients.<n>We introduce an adaptive strategy to co-optimize preference scores and negative text embeddings.
arXiv Detail & Related papers (2026-03-02T08:23:36Z) - Step-level Reward for Free in RL-based T2I Diffusion Model Fine-tuning [23.02076024811612]
Recent advances in text-to-image (T2I) diffusion model fine-tuning leverage reinforcement learning (RL) to align generated images with learnable reward functions.<n>Existing approaches reformulate denoising as a Markov decision process for RL-driven optimization.<n>We propose a credit assignment framework that dynamically distributes dense rewards across denoising steps.
arXiv Detail & Related papers (2025-05-25T15:43:54Z) - A Simple Combination of Diffusion Models for Better Quality Trade-Offs in Image Denoising [43.44633086975204]
We propose an intuitive method for leveraging pretrained diffusion models.<n>We then introduce our proposed Linear Combination Diffusion Denoiser.<n> LCDD achieves state-of-the-art performance and offers controlled, well-behaved trade-offs.
arXiv Detail & Related papers (2025-03-18T19:02:19Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking [50.325021634589596]
We propose a Tailored Optimization Preference (TailorPO) framework for aligning diffusion models with human preference.<n>Our approach directly ranks intermediate noisy samples based on their step-wise reward, and effectively resolves the gradient direction issues.<n> Experimental results demonstrate that our method significantly improves the model's ability to generate aesthetically pleasing and human-preferred images.
arXiv Detail & Related papers (2025-02-01T16:08:43Z) - E2ED^2:Direct Mapping from Noise to Data for Enhanced Diffusion Models [15.270657838960114]
Diffusion models have established themselves as the de facto primary paradigm in visual generative modeling.<n>We present a novel end-to-end learning paradigm that establishes direct optimization from the final generated samples to initial noises.<n>Our method achieves substantial performance gains in terms of Fr'eche't Inception Distance (FID) and CLIP score, even with fewer sampling steps.
arXiv Detail & Related papers (2024-12-30T16:06:31Z) - Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy [44.09909260046396]
We propose AdaptiveDiffusion to reduce noise prediction steps during the denoising process.
Our method can significantly speed up the denoising process while generating identical results to the original process, achieving up to an average 25x speedup.
arXiv Detail & Related papers (2024-10-13T15:19:18Z) - Zero-Reference Lighting Estimation Diffusion Model for Low-Light Image Enhancement [2.9873893715462185]
We propose a novel zero-reference lighting estimation diffusion model for low-light image enhancement called Zero-LED.<n>It utilizes the stable convergence ability of diffusion models to bridge the gap between low-light domains and real normal-light domains.<n>It successfully alleviates the dependence on pairwise training data via zero-reference learning.
arXiv Detail & Related papers (2024-03-05T11:39:17Z) - ExposureDiffusion: Learning to Expose for Low-light Image Enhancement [87.08496758469835]
This work addresses the issue by seamlessly integrating a diffusion model with a physics-based exposure model.
Our method obtains significantly improved performance and reduced inference time compared with vanilla diffusion models.
The proposed framework can work with both real-paired datasets, SOTA noise models, and different backbone networks.
arXiv Detail & Related papers (2023-07-15T04:48:35Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.