Extracting Reward Functions from Diffusion Models
- URL: http://arxiv.org/abs/2306.01804v2
- Date: Sat, 9 Dec 2023 06:39:08 GMT
- Title: Extracting Reward Functions from Diffusion Models
- Authors: Felipe Nuti, Tim Franzmeyer, Jo\~ao F. Henriques
- Abstract summary: Decision-making diffusion models can be trained on lower-quality data, and then be steered with a reward function to generate near-optimal trajectories.
We consider the problem of extracting a reward function by comparing a decision-making diffusion model that models low-reward behavior and one that models high-reward behavior.
We show that our approach generalizes beyond sequential decision-making by learning a reward-like function from two large-scale image generation diffusion models.
- Score: 7.834479563217133
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have achieved remarkable results in image generation, and
have similarly been used to learn high-performing policies in sequential
decision-making tasks. Decision-making diffusion models can be trained on
lower-quality data, and then be steered with a reward function to generate
near-optimal trajectories. We consider the problem of extracting a reward
function by comparing a decision-making diffusion model that models low-reward
behavior and one that models high-reward behavior; a setting related to inverse
reinforcement learning. We first define the notion of a relative reward
function of two diffusion models and show conditions under which it exists and
is unique. We then devise a practical learning algorithm for extracting it by
aligning the gradients of a reward function -- parametrized by a neural network
-- to the difference in outputs of both diffusion models. Our method finds
correct reward functions in navigation environments, and we demonstrate that
steering the base model with the learned reward functions results in
significantly increased performance in standard locomotion benchmarks. Finally,
we demonstrate that our approach generalizes beyond sequential decision-making
by learning a reward-like function from two large-scale image generation
diffusion models. The extracted reward function successfully assigns lower
rewards to harmful images.
Related papers
- Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized
Control [54.132297393662654]
Diffusion models excel at capturing complex data distributions, such as those of natural images and proteins.
While diffusion models are trained to represent the distribution in the training dataset, we often are more concerned with other properties, such as the aesthetic quality of the generated images.
We present theoretical and empirical evidence that demonstrates our framework is capable of efficiently generating diverse samples with high genuine rewards.
arXiv Detail & Related papers (2024-02-23T08:54:42Z) - Training Class-Imbalanced Diffusion Model Via Overlap Optimization [55.96820607533968]
Diffusion models trained on real-world datasets often yield inferior fidelity for tail classes.
Deep generative models, including diffusion models, are biased towards classes with abundant training images.
We propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.
arXiv Detail & Related papers (2024-02-16T16:47:21Z) - Secrets of RLHF in Large Language Models Part II: Reward Modeling [134.97964938009588]
We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset.
We also introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses.
arXiv Detail & Related papers (2024-01-11T17:56:59Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - Aligning Text-to-Image Diffusion Models with Reward Backpropagation [62.45086888512723]
We propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient.
We show AlignProp achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler.
arXiv Detail & Related papers (2023-10-05T17:59:18Z) - SinDiffusion: Learning a Diffusion Model from a Single Natural Image [159.4285444680301]
We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.
It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales.
Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics.
arXiv Detail & Related papers (2022-11-22T18:00:03Z) - RewardsOfSum: Exploring Reinforcement Learning Rewards for Summarisation [7.0471949371778795]
We propose two reward functions for the task of abstractive summarisation.
The first function, referred to as RwB-Hinge, dynamically selects the samples for the gradient update.
The second function, nicknamed RISK, leverages a small pool of strong candidates to inform the reward.
arXiv Detail & Related papers (2021-06-08T03:30:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.