Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized
Control
- URL: http://arxiv.org/abs/2402.15194v2
- Date: Wed, 28 Feb 2024 09:21:46 GMT
- Title: Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized
Control
- Authors: Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali,
Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Tommaso Biancalani,
Sergey Levine
- Abstract summary: Diffusion models excel at capturing complex data distributions, such as those of natural images and proteins.
While diffusion models are trained to represent the distribution in the training dataset, we often are more concerned with other properties, such as the aesthetic quality of the generated images.
We present theoretical and empirical evidence that demonstrates our framework is capable of efficiently generating diverse samples with high genuine rewards.
- Score: 54.132297393662654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models excel at capturing complex data distributions, such as those
of natural images and proteins. While diffusion models are trained to represent
the distribution in the training dataset, we often are more concerned with
other properties, such as the aesthetic quality of the generated images or the
functional properties of generated proteins. Diffusion models can be finetuned
in a goal-directed way by maximizing the value of some reward function (e.g.,
the aesthetic quality of an image). However, these approaches may lead to
reduced sample diversity, significant deviations from the training data
distribution, and even poor sample quality due to the exploitation of an
imperfect reward function. The last issue often occurs when the reward function
is a learned model meant to approximate a ground-truth "genuine" reward, as is
the case in many practical applications. These challenges, collectively termed
"reward collapse," pose a substantial obstacle. To address this reward
collapse, we frame the finetuning problem as entropy-regularized control
against the pretrained diffusion model, i.e., directly optimizing
entropy-enhanced rewards with neural SDEs. We present theoretical and empirical
evidence that demonstrates our framework is capable of efficiently generating
diverse samples with high genuine rewards, mitigating the overoptimization of
imperfect reward models.
Related papers
- Integrating Amortized Inference with Diffusion Models for Learning Clean Distribution from Corrupted Images [19.957503854446735]
Diffusion models (DMs) have emerged as powerful generative models for solving inverse problems.
FlowDiff is a joint training paradigm that leverages a conditional normalizing flow model to facilitate the training of diffusion models on corrupted data sources.
Our experiment shows that FlowDiff can effectively learn clean distributions across a wide range of corrupted data sources.
arXiv Detail & Related papers (2024-07-15T18:33:20Z) - Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian
Mixture Models [59.331993845831946]
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties.
This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
arXiv Detail & Related papers (2024-03-03T23:15:48Z) - Feedback Efficient Online Fine-Tuning of Diffusion Models [52.170384048274364]
We propose a novel reinforcement learning procedure that efficiently explores on the manifold of feasible samples.
We present a theoretical analysis providing a regret guarantee, as well as empirical validation across three domains.
arXiv Detail & Related papers (2024-02-26T07:24:32Z) - Training Class-Imbalanced Diffusion Model Via Overlap Optimization [55.96820607533968]
Diffusion models trained on real-world datasets often yield inferior fidelity for tail classes.
Deep generative models, including diffusion models, are biased towards classes with abundant training images.
We propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.
arXiv Detail & Related papers (2024-02-16T16:47:21Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Extracting Reward Functions from Diffusion Models [7.834479563217133]
Decision-making diffusion models can be trained on lower-quality data, and then be steered with a reward function to generate near-optimal trajectories.
We consider the problem of extracting a reward function by comparing a decision-making diffusion model that models low-reward behavior and one that models high-reward behavior.
We show that our approach generalizes beyond sequential decision-making by learning a reward-like function from two large-scale image generation diffusion models.
arXiv Detail & Related papers (2023-06-01T17:59:12Z) - RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment [32.752633250862694]
Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data.
We introduce a new framework, Reward rAnked FineTuning, designed to align generative models effectively.
arXiv Detail & Related papers (2023-04-13T18:22:40Z) - Bi-Noising Diffusion: Towards Conditional Diffusion Models with
Generative Restoration Priors [64.24948495708337]
We introduce a new method that brings predicted samples to the training data manifold using a pretrained unconditional diffusion model.
We perform comprehensive experiments to demonstrate the effectiveness of our approach on super-resolution, colorization, turbulence removal, and image-deraining tasks.
arXiv Detail & Related papers (2022-12-14T17:26:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.