Related papers: Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning

Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning

URL: http://arxiv.org/abs/2304.12824v2
Date: Tue, 30 May 2023 13:15:39 GMT
Title: Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning
Authors: Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongxuan Li, Jun Zhu
Abstract summary: This paper considers a general setting where the guidance is defined by an (unnormalized) energy function. The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure is unknown and is hard to estimate. We propose an exact formulation of the intermediate guidance as well as a novel training objective named contrastive energy prediction (CEP) to learn the exact guidance.
Score: 44.880922634512096
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Guided sampling is a vital approach for applying diffusion models in real-world tasks that embeds human-defined guidance during the sampling procedure. This paper considers a general setting where the guidance is defined by an (unnormalized) energy function. The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure, which is jointly defined by the sampling distribution and the energy function, is unknown and is hard to estimate. To address this challenge, we propose an exact formulation of the intermediate guidance as well as a novel training objective named contrastive energy prediction (CEP) to learn the exact guidance. Our method is guaranteed to converge to the exact guidance under unlimited model capacity and data samples, while previous methods can not. We demonstrate the effectiveness of our method by applying it to offline reinforcement learning (RL). Extensive experiments on D4RL benchmarks demonstrate that our method outperforms existing state-of-the-art algorithms. We also provide some examples of applying CEP for image synthesis to demonstrate the scalability of CEP on high-dimensional data.

Related papers

Energy-Weighted Flow Matching for Offline Reinforcement Learning [53.64306385597818]
This paper investigates energy guidance in generative modeling, where the target distribution is defined as $q(mathbf x) propto p(mathbf x)exp(-beta mathcal E(mathcal x))$, with $p(mathbf x)$ being the data distribution and $mathcal E(mathcal x)$ as the energy function. We introduce energy-weighted flow matching (EFM), a method that directly learns the energy-guided flow without the need for auxiliary models. We extend this methodology to energy-weighted
arXiv Detail & Related papers (2025-03-06T21:10:12Z)
Exploratory Diffusion Policy for Unsupervised Reinforcement Learning [28.413426177336703]
Unsupervised reinforcement learning aims to pre-train agents by exploring states or skills in reward-free environments. Existing methods often overlook the fitting ability of pre-trained policies and struggle to handle the heterogeneous pre-training data. We propose Exploratory Diffusion Policy (EDP), which leverages the strong expressive ability of diffusion models to fit the explored data.
arXiv Detail & Related papers (2025-02-11T05:48:51Z)
Learned Reference-based Diffusion Sampling for multi-modal distributions [2.1383136715042417]
We introduce Learned Reference-based Diffusion Sampler (LRDS), a methodology specifically designed to leverage prior knowledge on the location of the target modes. LRDS proceeds in two steps by learning a reference diffusion model on samples located in high-density space regions. We experimentally demonstrate that LRDS best exploits prior knowledge on the target distribution compared to competing algorithms on a variety of challenging distributions.
arXiv Detail & Related papers (2024-10-25T10:23:34Z)
Adaptive teachers for amortized samplers [76.88721198565861]
Amortized inference is the task of training a parametric model, such as a neural network, to approximate a distribution with a given unnormalized density where exact sampling is intractable. Off-policy RL training facilitates the discovery of diverse, high-reward candidates, but existing methods still face challenges in efficient exploration. We propose an adaptive training distribution (the Teacher) to guide the training of the primary amortized sampler (the Student) by prioritizing high-loss regions.
arXiv Detail & Related papers (2024-10-02T11:33:13Z)
Operator World Models for Reinforcement Learning [37.69110422996011]
Policy Mirror Descent (PMD) is a powerful and theoretically sound methodology for sequential decision-making. It is not directly applicable to Reinforcement Learning (RL) due to the inaccessibility of explicit action-value functions. We introduce a novel approach based on learning a world model of the environment using conditional mean embeddings.
arXiv Detail & Related papers (2024-06-28T12:05:47Z)
Manifold Preserving Guided Diffusion [121.97907811212123]
Conditional image generation still faces challenges of cost, generalizability, and the need for task-specific training. We propose Manifold Preserving Guided Diffusion (MPGD), a training-free conditional generation framework.
arXiv Detail & Related papers (2023-11-28T02:08:06Z)
Observation-Guided Diffusion Probabilistic Models [41.749374023639156]
We propose a novel diffusion-based image generation method called the observation-guided diffusion probabilistic model (OGDM) Our approach reestablishes the training objective by integrating the guidance of the observation process with the Markov chain. We demonstrate the effectiveness of our training algorithm using diverse inference techniques on strong diffusion model baselines.
arXiv Detail & Related papers (2023-10-06T06:29:06Z)
Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization [87.21285093582446]
Diffusion Generative Flow Samplers (DGFS) is a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments. Our method takes inspiration from the theory developed for generative flow networks (GFlowNets)
arXiv Detail & Related papers (2023-10-04T09:39:05Z)
Sample Dropout: A Simple yet Effective Variance Reduction Technique in Deep Policy Optimization [18.627233013208834]
We show that the use of importance sampling could introduce high variance in the objective estimate. We propose a technique called sample dropout to bound the estimation variance by dropping out samples when their ratio deviation is too high.
arXiv Detail & Related papers (2023-02-05T04:44:35Z)
Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior: From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks. We present a generalization bound for meta-learning, which was first derived by Rothfuss et al. We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z)
Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data [125.7135706352493]
Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images. Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting. This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator.
arXiv Detail & Related papers (2021-11-12T18:13:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.