Related papers: Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning

Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning

URL: http://arxiv.org/abs/2505.01822v1
Date: Sat, 03 May 2025 14:00:25 GMT
Title: Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning
Authors: Jifeng Hu, Sili Huang, Zhejian Yang, Shengchao Hu, Li Shen, Hechang Chen, Lichao Sun, Yi Chang, Dacheng Tao,
Abstract summary: Conditional decision generation with diffusion models has shown powerful competitiveness in reinforcement learning (RL)<n>Recent studies reveal the relation between energy-function-guidance diffusion models and constrained RL problems.<n>Main challenge lies in estimating the intermediate energy, which is intractable due to the log-expectation formulation during the generation process.
Score: 54.07840818762834
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conditional decision generation with diffusion models has shown powerful competitiveness in reinforcement learning (RL). Recent studies reveal the relation between energy-function-guidance diffusion models and constrained RL problems. The main challenge lies in estimating the intermediate energy, which is intractable due to the log-expectation formulation during the generation process. To address this issue, we propose the Analytic Energy-guided Policy Optimization (AEPO). Specifically, we first provide a theoretical analysis and the closed-form solution of the intermediate guidance when the diffusion model obeys the conditional Gaussian transformation. Then, we analyze the posterior Gaussian distribution in the log-expectation formulation and obtain the target estimation of the log-expectation under mild assumptions. Finally, we train an intermediate energy neural network to approach the target estimation of log-expectation formulation. We apply our method in 30+ offline RL tasks to demonstrate the effectiveness of our method. Extensive experiments illustrate that our method surpasses numerous representative baselines in D4RL offline reinforcement learning benchmarks.

Related papers

Self-Refining Training for Amortized Density Functional Theory [5.5541132320126945]
We propose a novel method that reduces the dependency of amortized DFT solvers on large pre-collected datasets by introducing a self-refining training strategy.<n>We derive our method as a minimization of the variational upper bound on the KL-divergence measuring the discrepancy between the generated samples and the target Boltzmann distribution defined by the ground state energy.
arXiv Detail & Related papers (2025-06-02T00:32:32Z)
Energy-Weighted Flow Matching for Offline Reinforcement Learning [53.64306385597818]
This paper investigates energy guidance in generative modeling, where the target distribution is defined as $q(mathbf x) propto p(mathbf x)exp(-beta mathcal E(mathcal x))$, with $p(mathbf x)$ being the data distribution and $mathcal E(mathcal x)$ as the energy function.<n>We introduce energy-weighted flow matching (EFM), a method that directly learns the energy-guided flow without the need for auxiliary models.<n>We extend this methodology to energy-weighted
arXiv Detail & Related papers (2025-03-06T21:10:12Z)
Reward-Directed Score-Based Diffusion Models via q-Learning [8.725446812770791]
We propose a new reinforcement learning (RL) formulation for training continuous-time score-based diffusion models for generative AI. Our formulation does not involve any pretrained model for the unknown score functions of the noise-perturbed data distributions.
arXiv Detail & Related papers (2024-09-07T13:55:45Z)
A Score-Based Density Formula, with Applications in Diffusion Generative Models [6.76974373198208]
Score-based generative models (SGMs) have revolutionized the field of generative modeling, achieving unprecedented success in generating realistic and diverse content. Despite empirical advances, the theoretical basis for why optimizing the evidence lower bound (ELBO) on the log-likelihood is effective for training diffusion generative models, such as DDPMs, remains largely unexplored.
arXiv Detail & Related papers (2024-08-29T17:59:07Z)
Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review [63.31328039424469]
This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to optimize downstream reward functions. We explain the application of various RL algorithms, including PPO, differentiable optimization, reward-weighted MLE, value-weighted sampling, and path consistency learning.
arXiv Detail & Related papers (2024-07-18T17:35:32Z)
Unmasking Bias in Diffusion Model Training [40.90066994983719]
Denoising diffusion models have emerged as a dominant approach for image generation. They still suffer from slow convergence in training and color shift issues in sampling. In this paper, we identify that these obstacles can be largely attributed to bias and suboptimality inherent in the default training paradigm.
arXiv Detail & Related papers (2023-10-12T16:04:41Z)
PDE+: Enhancing Generalization via PDE with Adaptive Distributional Diffusion [66.95761172711073]
generalization of neural networks is a central challenge in machine learning. We propose to enhance it directly through the underlying function of neural networks, rather than focusing on adjusting input data. We put this theoretical framework into practice as $textbfPDE+$ ($textbfPDE$ with $textbfA$daptive $textbfD$istributional $textbfD$iffusion)
arXiv Detail & Related papers (2023-05-25T08:23:26Z)
Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning [44.880922634512096]
This paper considers a general setting where the guidance is defined by an (unnormalized) energy function. The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure is unknown and is hard to estimate. We propose an exact formulation of the intermediate guidance as well as a novel training objective named contrastive energy prediction (CEP) to learn the exact guidance.
arXiv Detail & Related papers (2023-04-25T13:50:41Z)
How Much is Enough? A Study on Diffusion Times in Score-based Generative Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution. We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z)
An Energy-Based Prior for Generative Saliency [62.79775297611203]
We propose a novel generative saliency prediction framework that adopts an informative energy-based model as a prior distribution. With the generative saliency model, we can obtain a pixel-wise uncertainty map from an image, indicating model confidence in the saliency prediction. Experimental results show that our generative saliency model with an energy-based prior can achieve not only accurate saliency predictions but also reliable uncertainty maps consistent with human perception.
arXiv Detail & Related papers (2022-04-19T10:51:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.