Related papers: Tuning-Free Alignment of Diffusion Models with Direct Noise Optimization

Tuning-Free Alignment of Diffusion Models with Direct Noise Optimization

URL: http://arxiv.org/abs/2405.18881v2
Date: Wed, 3 Jul 2024 05:45:45 GMT
Title: Tuning-Free Alignment of Diffusion Models with Direct Noise Optimization
Authors: Zhiwei Tang, Jiangweizhi Peng, Jiasheng Tang, Mingyi Hong, Fan Wang, Tsung-Hui Chang,
Abstract summary: We propose a novel alignment approach, named Direct Noise optimization (DNO) By design, DNO is tuning-free and prompt-agnostic, as the alignment occurs in an online fashion during generation. We conduct extensive experiments on several popular reward functions trained on human feedback data and demonstrate that the proposed DNO approach achieves state-of-the-art reward scores as well as high image quality, all within a reasonable time budget for generation.
Score: 45.77751895345154
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we focus on the alignment problem of diffusion models with a continuous reward function, which represents specific objectives for downstream tasks, such as improving human preference. The central goal of the alignment problem is to adjust the distribution learned by diffusion models such that the generated samples maximize the target reward function. We propose a novel alignment approach, named Direct Noise Optimization (DNO), that optimizes the injected noise during the sampling process of diffusion models. By design, DNO is tuning-free and prompt-agnostic, as the alignment occurs in an online fashion during generation. We rigorously study the theoretical properties of DNO and also propose variants to deal with non-differentiable reward functions. Furthermore, we identify that naive implementation of DNO occasionally suffers from the out-of-distribution reward hacking problem, where optimized samples have high rewards but are no longer in the support of the pretrained distribution. To remedy this issue, we leverage classical high-dimensional statistics theory and propose to augment the DNO loss with certain probability regularization. We conduct extensive experiments on several popular reward functions trained on human feedback data and demonstrate that the proposed DNO approach achieves state-of-the-art reward scores as well as high image quality, all within a reasonable time budget for generation.

Related papers

Divergence Minimization Preference Optimization for Diffusion Model Alignment [58.651951388346525]
Divergence Minimization Preference Optimization (DMPO) is a principled method for aligning diffusion models by minimizing reverse KL divergence.<n>Our results show that diffusion models fine-tuned with DMPO can consistently outperform or match existing techniques.<n>DMPO unlocks a robust and elegant pathway for preference alignment, bridging principled theory with practical performance in diffusion models.
arXiv Detail & Related papers (2025-07-10T07:57:30Z)
VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL [28.95582264086289]
VAlue-based Reinforced Diffusion (VARD) is a novel approach that first learns a value function predicting expection of rewards from intermediate states.<n>Our method maintains proximity to the pretrained model while enabling effective and stable training via backpropagation.
arXiv Detail & Related papers (2025-05-21T17:44:37Z)
Dynamic Search for Inference-Time Alignment in Diffusion Models [87.35944312589424]
We frame inference-time alignment in diffusion as a search problem and propose Dynamic Search for Diffusion (DSearch) DSearch subsamples from denoising processes and approximates intermediate node rewards. It also dynamically adjusts beam width and tree expansion to efficiently explore high-reward generations.
arXiv Detail & Related papers (2025-03-03T20:32:05Z)
Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design [87.58981407469977]
We propose a novel framework for inference-time reward optimization with diffusion models inspired by evolutionary algorithms. Our approach employs an iterative refinement process consisting of two steps in each iteration: noising and reward-guided denoising.
arXiv Detail & Related papers (2025-02-20T17:48:45Z)
Prompt Tuning with Diffusion for Few-Shot Pre-trained Policy Generalization [55.14484317645865]
We develop a conditional diffusion model to produce exceptional quality prompts for offline reinforcement learning tasks. We show that the Prompt diffuser is a robust and effective tool for the prompt-tuning process, demonstrating strong performance in the meta-RL tasks.
arXiv Detail & Related papers (2024-11-02T07:38:02Z)
Training-free Diffusion Model Alignment with Sampling Demons [15.400553977713914]
We propose an optimization approach, dubbed Demon, to guide the denoising process at inference time without backpropagation through reward functions or model retraining. Our approach works by controlling noise distribution in denoising steps to concentrate density on regions corresponding to high rewards through optimization. To the best of our knowledge, the proposed approach is the first inference-time, backpropagation-free preference alignment method for diffusion models.
arXiv Detail & Related papers (2024-10-08T07:33:49Z)
FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models [10.969811500333755]
We introduce a Fine-tuning Initial Noise Distribution (FIND) framework with policy optimization. Our method achieves 10 times faster than the SOTA approach.
arXiv Detail & Related papers (2024-07-28T10:07:55Z)
Improved Noise Schedule for Diffusion Training [51.849746576387375]
We propose a novel approach to design the noise schedule for enhancing the training of diffusion models. We empirically demonstrate the superiority of our noise schedule over the standard cosine schedule.
arXiv Detail & Related papers (2024-07-03T17:34:55Z)
Diffusion Model for Data-Driven Black-Box Optimization [54.25693582870226]
We focus on diffusion models, a powerful generative AI technology, and investigate their potential for black-box optimization. We study two practical types of labels: 1) noisy measurements of a real-valued reward function and 2) human preference based on pairwise comparisons. Our proposed method reformulates the design optimization problem into a conditional sampling problem, which allows us to leverage the power of diffusion models.
arXiv Detail & Related papers (2024-03-20T00:41:12Z)
Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases [76.9127853906115]
Bridging the gap between diffusion models and human preferences is crucial for their integration into practical generative. We propose Temporal Diffusion Policy Optimization with critic active neuron Reset (TDPO-R), a policy gradient algorithm that exploits the temporal inductive bias of diffusion models. Empirical results demonstrate the superior efficacy of our methods in mitigating reward overoptimization.
arXiv Detail & Related papers (2024-02-13T15:55:41Z)
Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following [21.81411085058986]
Reward-gradient guided denoising generates trajectories that maximize both a differentiable reward function and the likelihood under the data distribution captured by a diffusion model. We propose DiffusionES, a method that combines gradient-free optimization with trajectory denoising. We show that DiffusionES achieves state-of-the-art performance on nuPlan, an established closed-loop planning benchmark for autonomous driving.
arXiv Detail & Related papers (2024-02-09T17:18:33Z)
Observation-Guided Diffusion Probabilistic Models [41.749374023639156]
We propose a novel diffusion-based image generation method called the observation-guided diffusion probabilistic model (OGDM) Our approach reestablishes the training objective by integrating the guidance of the observation process with the Markov chain. We demonstrate the effectiveness of our training algorithm using diverse inference techniques on strong diffusion model baselines.
arXiv Detail & Related papers (2023-10-06T06:29:06Z)
Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models. NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z)
Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs) GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations. We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.