Towards Controllable Diffusion Models via Reward-Guided Exploration
- URL: http://arxiv.org/abs/2304.07132v1
- Date: Fri, 14 Apr 2023 13:51:26 GMT
- Title: Towards Controllable Diffusion Models via Reward-Guided Exploration
- Authors: Hengtong Zhang, Tingyang Xu
- Abstract summary: We propose a novel framework that guides the training-phase of diffusion models via reinforcement learning (RL)
RL enables calculating policy gradients via samples from a pay-off distribution proportional to exponential scaled rewards, rather than from policies themselves.
Experiments on 3D shape and molecule generation tasks show significant improvements over existing conditional diffusion models.
- Score: 15.857464051475294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By formulating data samples' formation as a Markov denoising process,
diffusion models achieve state-of-the-art performances in a collection of
tasks. Recently, many variants of diffusion models have been proposed to enable
controlled sample generation. Most of these existing methods either formulate
the controlling information as an input (i.e.,: conditional representation) for
the noise approximator, or introduce a pre-trained classifier in the test-phase
to guide the Langevin dynamic towards the conditional goal. However, the former
line of methods only work when the controlling information can be formulated as
conditional representations, while the latter requires the pre-trained guidance
classifier to be differentiable. In this paper, we propose a novel framework
named RGDM (Reward-Guided Diffusion Model) that guides the training-phase of
diffusion models via reinforcement learning (RL). The proposed training
framework bridges the objective of weighted log-likelihood and maximum entropy
RL, which enables calculating policy gradients via samples from a pay-off
distribution proportional to exponential scaled rewards, rather than from
policies themselves. Such a framework alleviates the high gradient variances
and enables diffusion models to explore for highly rewarded samples in the
reverse process. Experiments on 3D shape and molecule generation tasks show
significant improvements over existing conditional diffusion models.
Related papers
- Training-free Diffusion Model Alignment with Sampling Demons [15.400553977713914]
We propose an optimization approach, dubbed Demon, to guide the denoising process at inference time without backpropagation through reward functions or model retraining.
Our approach works by controlling noise distribution in denoising steps to concentrate density on regions corresponding to high rewards through optimization.
To the best of our knowledge, the proposed approach is the first inference-time, backpropagation-free preference alignment method for diffusion models.
arXiv Detail & Related papers (2024-10-08T07:33:49Z) - Diffusion Rejection Sampling [13.945372555871414]
Diffusion Rejection Sampling (DiffRS) is a rejection sampling scheme that aligns the sampling transition kernels with the true ones at each timestep.
The proposed method can be viewed as a mechanism that evaluates the quality of samples at each intermediate timestep and refines them with varying effort depending on the sample.
Empirical results demonstrate the state-of-the-art performance of DiffRS on the benchmark datasets and the effectiveness of DiffRS for fast diffusion samplers and large-scale text-to-image diffusion models.
arXiv Detail & Related papers (2024-05-28T07:00:28Z) - Learning Diffusion Priors from Observations by Expectation Maximization [6.224769485481242]
We present a novel method based on the expectation-maximization algorithm for training diffusion models from incomplete and noisy observations only.
As part of our method, we propose and motivate an improved posterior sampling scheme for unconditional diffusion models.
arXiv Detail & Related papers (2024-05-22T15:04:06Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - Unsupervised Discovery of Interpretable Directions in h-space of
Pre-trained Diffusion Models [63.1637853118899]
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models.
We employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself.
By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions.
arXiv Detail & Related papers (2023-10-15T18:44:30Z) - Diff-Instruct: A Universal Approach for Transferring Knowledge From
Pre-trained Diffusion Models [77.83923746319498]
We propose a framework called Diff-Instruct to instruct the training of arbitrary generative models.
We show that Diff-Instruct results in state-of-the-art single-step diffusion-based models.
Experiments on refining GAN models show that the Diff-Instruct can consistently improve the pre-trained generators of GAN models.
arXiv Detail & Related papers (2023-05-29T04:22:57Z) - Reflected Diffusion Models [93.26107023470979]
We present Reflected Diffusion Models, which reverse a reflected differential equation evolving on the support of the data.
Our approach learns the score function through a generalized score matching loss and extends key components of standard diffusion models.
arXiv Detail & Related papers (2023-04-10T17:54:38Z) - Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-Enhanced Sampling Methods [27.014858633903867]
We present a training framework for feature disentanglement of Diffusion Models (FDiff)
We propose two sampling methods that can boost the realism of our Diffusion Models and also enhance the controllability.
arXiv Detail & Related papers (2023-02-28T07:43:00Z) - ShiftDDPMs: Exploring Conditional Diffusion Models by Shifting Diffusion
Trajectories [144.03939123870416]
We propose a novel conditional diffusion model by introducing conditions into the forward process.
We use extra latent space to allocate an exclusive diffusion trajectory for each condition based on some shifting rules.
We formulate our method, which we call textbfShiftDDPMs, and provide a unified point of view on existing related methods.
arXiv Detail & Related papers (2023-02-05T12:48:21Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.