Reward-Directed Conditional Diffusion: Provable Distribution Estimation
and Reward Improvement
- URL: http://arxiv.org/abs/2307.07055v1
- Date: Thu, 13 Jul 2023 20:20:40 GMT
- Title: Reward-Directed Conditional Diffusion: Provable Distribution Estimation
and Reward Improvement
- Authors: Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Minshuo Chen, Mengdi Wang
- Abstract summary: Directed generation aims to generate samples with desired properties as measured by a reward function.
We consider the common learning scenario where the data set consists of unlabeled data along with a smaller set of data with noisy reward labels.
- Score: 42.45888600367566
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We explore the methodology and theory of reward-directed generation via
conditional diffusion models. Directed generation aims to generate samples with
desired properties as measured by a reward function, which has broad
applications in generative AI, reinforcement learning, and computational
biology. We consider the common learning scenario where the data set consists
of unlabeled data along with a smaller set of data with noisy reward labels.
Our approach leverages a learned reward function on the smaller data set as a
pseudolabeler. From a theoretical standpoint, we show that this directed
generator can effectively learn and sample from the reward-conditioned data
distribution. Additionally, our model is capable of recovering the latent
subspace representation of data. Moreover, we establish that the model
generates a new population that moves closer to a user-specified target reward
value, where the optimality gap aligns with the off-policy bandit regret in the
feature subspace. The improvement in rewards obtained is influenced by the
interplay between the strength of the reward signal, the distribution shift,
and the cost of off-support extrapolation. We provide empirical results to
validate our theory and highlight the relationship between the strength of
extrapolation and the quality of generated samples.
Related papers
- A Likelihood Based Approach to Distribution Regression Using Conditional Deep Generative Models [6.647819824559201]
We study the large-sample properties of a likelihood-based approach for estimating conditional deep generative models.
Our results lead to the convergence rate of a sieve maximum likelihood estimator for estimating the conditional distribution.
arXiv Detail & Related papers (2024-10-02T20:46:21Z) - Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized
Control [54.132297393662654]
Diffusion models excel at capturing complex data distributions, such as those of natural images and proteins.
While diffusion models are trained to represent the distribution in the training dataset, we often are more concerned with other properties, such as the aesthetic quality of the generated images.
We present theoretical and empirical evidence that demonstrates our framework is capable of efficiently generating diverse samples with high genuine rewards.
arXiv Detail & Related papers (2024-02-23T08:54:42Z) - Transductive Reward Inference on Graph [53.003245457089406]
We develop a reward inference method based on the contextual properties of information propagation on graphs.
We leverage both the available data and limited reward annotations to construct a reward propagation graph.
We employ the constructed graph for transductive reward inference, thereby estimating rewards for unlabelled data.
arXiv Detail & Related papers (2024-02-06T03:31:28Z) - Fair Sampling in Diffusion Models through Switching Mechanism [5.560136885815622]
We propose a fairness-aware sampling method called textitattribute switching mechanism for diffusion models.
We mathematically prove and experimentally demonstrate the effectiveness of the proposed method on two key aspects.
arXiv Detail & Related papers (2024-01-06T06:55:26Z) - Generative Causal Representation Learning for Out-of-Distribution Motion
Forecasting [13.99348653165494]
We propose Generative Causal Learning Representation to facilitate knowledge transfer under distribution shifts.
While we evaluate the effectiveness of our proposed method in human trajectory prediction models, GCRL can be applied to other domains as well.
arXiv Detail & Related papers (2023-02-17T00:30:44Z) - Breaking the Spurious Causality of Conditional Generation via Fairness
Intervention with Corrective Sampling [77.15766509677348]
Conditional generative models often inherit spurious correlations from the training dataset.
This can result in label-conditional distributions that are imbalanced with respect to another latent attribute.
We propose a general two-step strategy to mitigate this issue.
arXiv Detail & Related papers (2022-12-05T08:09:33Z) - Selectively increasing the diversity of GAN-generated samples [8.980453507536017]
We propose a novel method to selectively increase the diversity of GAN-generated samples.
We show the superiority of our method in a synthetic benchmark as well as a real-life scenario simulating data from the Zero Degree Calorimeter of ALICE experiment in CERN.
arXiv Detail & Related papers (2022-07-04T16:27:06Z) - Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited
Data [125.7135706352493]
Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images.
Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting.
This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator.
arXiv Detail & Related papers (2021-11-12T18:13:45Z) - Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.