Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling
in Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2304.12824v2
- Date: Tue, 30 May 2023 13:15:39 GMT
- Title: Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling
in Offline Reinforcement Learning
- Authors: Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongxuan Li, Jun Zhu
- Abstract summary: This paper considers a general setting where the guidance is defined by an (unnormalized) energy function.
The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure is unknown and is hard to estimate.
We propose an exact formulation of the intermediate guidance as well as a novel training objective named contrastive energy prediction (CEP) to learn the exact guidance.
- Score: 44.880922634512096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Guided sampling is a vital approach for applying diffusion models in
real-world tasks that embeds human-defined guidance during the sampling
procedure. This paper considers a general setting where the guidance is defined
by an (unnormalized) energy function. The main challenge for this setting is
that the intermediate guidance during the diffusion sampling procedure, which
is jointly defined by the sampling distribution and the energy function, is
unknown and is hard to estimate. To address this challenge, we propose an exact
formulation of the intermediate guidance as well as a novel training objective
named contrastive energy prediction (CEP) to learn the exact guidance. Our
method is guaranteed to converge to the exact guidance under unlimited model
capacity and data samples, while previous methods can not. We demonstrate the
effectiveness of our method by applying it to offline reinforcement learning
(RL). Extensive experiments on D4RL benchmarks demonstrate that our method
outperforms existing state-of-the-art algorithms. We also provide some examples
of applying CEP for image synthesis to demonstrate the scalability of CEP on
high-dimensional data.
Related papers
- On the Robustness of Fully-Spiking Neural Networks in Open-World Scenarios using Forward-Only Learning Algorithms [6.7236795813629]
We develop a novel algorithm for Out-of-Distribution (OoD) detection using the Forward-Forward Algorithm (FFA)
Our approach measures the likelihood of a sample belonging to the in-distribution (ID) data by using the distance from the latent representation of samples to class-representative manifold.
We also propose a gradient-free attribution technique that highlights the features of a sample pushing it away from the distribution of any class.
arXiv Detail & Related papers (2024-07-19T08:08:17Z) - Operator World Models for Reinforcement Learning [37.69110422996011]
Policy Mirror Descent is not directly applicable to Reinforcement Learning (RL)
We introduce a novel approach based on learning a world model of the environment using conditional mean embeddings.
We then leverage the operatorial formulation of RL to express the action-value function in terms of this quantity in closed form via matrix operations.
arXiv Detail & Related papers (2024-06-28T12:05:47Z) - Data-driven Power Flow Linearization: Theory [9.246677771418428]
Data-driven power flow linearization (DPFL) stands out for its higher approximation accuracy, wide adaptability, and better ability to implicitly incorporate the latest system attributes.
This tutorial first classifies existing DPFL methods into DPFL training algorithms and supportive techniques.
Their mathematical models, analytical solutions, capabilities, limitations, and generalizability are systematically examined, discussed, and summarized.
arXiv Detail & Related papers (2024-06-10T22:22:41Z) - Manifold Preserving Guided Diffusion [121.97907811212123]
Conditional image generation still faces challenges of cost, generalizability, and the need for task-specific training.
We propose Manifold Preserving Guided Diffusion (MPGD), a training-free conditional generation framework.
arXiv Detail & Related papers (2023-11-28T02:08:06Z) - Observation-Guided Diffusion Probabilistic Models [41.749374023639156]
We propose a novel diffusion-based image generation method called the observation-guided diffusion probabilistic model (OGDM)
Our approach reestablishes the training objective by integrating the guidance of the observation process with the Markov chain.
We demonstrate the effectiveness of our training algorithm using diverse inference techniques on strong diffusion model baselines.
arXiv Detail & Related papers (2023-10-06T06:29:06Z) - Diffusion Generative Flow Samplers: Improving learning signals through
partial trajectory optimization [87.21285093582446]
Diffusion Generative Flow Samplers (DGFS) is a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments.
Our method takes inspiration from the theory developed for generative flow networks (GFlowNets)
arXiv Detail & Related papers (2023-10-04T09:39:05Z) - Sample Dropout: A Simple yet Effective Variance Reduction Technique in
Deep Policy Optimization [18.627233013208834]
We show that the use of importance sampling could introduce high variance in the objective estimate.
We propose a technique called sample dropout to bound the estimation variance by dropping out samples when their ratio deviation is too high.
arXiv Detail & Related papers (2023-02-05T04:44:35Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited
Data [125.7135706352493]
Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images.
Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting.
This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator.
arXiv Detail & Related papers (2021-11-12T18:13:45Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.