Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling
in Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2304.12824v2
- Date: Tue, 30 May 2023 13:15:39 GMT
- Title: Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling
in Offline Reinforcement Learning
- Authors: Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongxuan Li, Jun Zhu
- Abstract summary: This paper considers a general setting where the guidance is defined by an (unnormalized) energy function.
The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure is unknown and is hard to estimate.
We propose an exact formulation of the intermediate guidance as well as a novel training objective named contrastive energy prediction (CEP) to learn the exact guidance.
- Score: 44.880922634512096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Guided sampling is a vital approach for applying diffusion models in
real-world tasks that embeds human-defined guidance during the sampling
procedure. This paper considers a general setting where the guidance is defined
by an (unnormalized) energy function. The main challenge for this setting is
that the intermediate guidance during the diffusion sampling procedure, which
is jointly defined by the sampling distribution and the energy function, is
unknown and is hard to estimate. To address this challenge, we propose an exact
formulation of the intermediate guidance as well as a novel training objective
named contrastive energy prediction (CEP) to learn the exact guidance. Our
method is guaranteed to converge to the exact guidance under unlimited model
capacity and data samples, while previous methods can not. We demonstrate the
effectiveness of our method by applying it to offline reinforcement learning
(RL). Extensive experiments on D4RL benchmarks demonstrate that our method
outperforms existing state-of-the-art algorithms. We also provide some examples
of applying CEP for image synthesis to demonstrate the scalability of CEP on
high-dimensional data.
Related papers
- Learned Reference-based Diffusion Sampling for multi-modal distributions [2.1383136715042417]
We introduce Learned Reference-based Diffusion Sampler (LRDS), a methodology specifically designed to leverage prior knowledge on the location of the target modes.
LRDS proceeds in two steps by learning a reference diffusion model on samples located in high-density space regions.
We experimentally demonstrate that LRDS best exploits prior knowledge on the target distribution compared to competing algorithms on a variety of challenging distributions.
arXiv Detail & Related papers (2024-10-25T10:23:34Z) - Adaptive teachers for amortized samplers [76.88721198565861]
Amortized inference is the task of training a parametric model, such as a neural network, to approximate a distribution with a given unnormalized density where exact sampling is intractable.
Off-policy RL training facilitates the discovery of diverse, high-reward candidates, but existing methods still face challenges in efficient exploration.
We propose an adaptive training distribution (the Teacher) to guide the training of the primary amortized sampler (the Student) by prioritizing high-loss regions.
arXiv Detail & Related papers (2024-10-02T11:33:13Z) - Operator World Models for Reinforcement Learning [37.69110422996011]
Policy Mirror Descent (PMD) is a powerful and theoretically sound methodology for sequential decision-making.
It is not directly applicable to Reinforcement Learning (RL) due to the inaccessibility of explicit action-value functions.
We introduce a novel approach based on learning a world model of the environment using conditional mean embeddings.
arXiv Detail & Related papers (2024-06-28T12:05:47Z) - Manifold Preserving Guided Diffusion [121.97907811212123]
Conditional image generation still faces challenges of cost, generalizability, and the need for task-specific training.
We propose Manifold Preserving Guided Diffusion (MPGD), a training-free conditional generation framework.
arXiv Detail & Related papers (2023-11-28T02:08:06Z) - Observation-Guided Diffusion Probabilistic Models [41.749374023639156]
We propose a novel diffusion-based image generation method called the observation-guided diffusion probabilistic model (OGDM)
Our approach reestablishes the training objective by integrating the guidance of the observation process with the Markov chain.
We demonstrate the effectiveness of our training algorithm using diverse inference techniques on strong diffusion model baselines.
arXiv Detail & Related papers (2023-10-06T06:29:06Z) - Diffusion Generative Flow Samplers: Improving learning signals through
partial trajectory optimization [87.21285093582446]
Diffusion Generative Flow Samplers (DGFS) is a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments.
Our method takes inspiration from the theory developed for generative flow networks (GFlowNets)
arXiv Detail & Related papers (2023-10-04T09:39:05Z) - Sample Dropout: A Simple yet Effective Variance Reduction Technique in
Deep Policy Optimization [18.627233013208834]
We show that the use of importance sampling could introduce high variance in the objective estimate.
We propose a technique called sample dropout to bound the estimation variance by dropping out samples when their ratio deviation is too high.
arXiv Detail & Related papers (2023-02-05T04:44:35Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited
Data [125.7135706352493]
Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images.
Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting.
This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator.
arXiv Detail & Related papers (2021-11-12T18:13:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.