Related papers: DIME:Diffusion-Based Maximum Entropy Reinforcement Learning

DIME:Diffusion-Based Maximum Entropy Reinforcement Learning

URL: http://arxiv.org/abs/2502.02316v1
Date: Tue, 04 Feb 2025 13:37:14 GMT
Title: DIME:Diffusion-Based Maximum Entropy Reinforcement Learning
Authors: Onur Celik, Zechu Li, Denis Blessing, Ge Li, Daniel Palanicek, Jan Peters, Georgia Chalvatzaki, Gerhard Neumann,
Abstract summary: Maximum entropy reinforcement learning (MaxEnt-RL) has become the standard approach to RL due to its beneficial exploration properties.<n>We propose Diffusion-Based Maximum Entropy RL (DIME) to overcome the intractability of computing their marginal entropy.
Score: 37.420420953705396
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Maximum entropy reinforcement learning (MaxEnt-RL) has become the standard approach to RL due to its beneficial exploration properties. Traditionally, policies are parameterized using Gaussian distributions, which significantly limits their representational capacity. Diffusion-based policies offer a more expressive alternative, yet integrating them into MaxEnt-RL poses challenges--primarily due to the intractability of computing their marginal entropy. To overcome this, we propose Diffusion-Based Maximum Entropy RL (DIME). DIME leverages recent advances in approximate inference with diffusion models to derive a lower bound on the maximum entropy objective. Additionally, we propose a policy iteration scheme that provably converges to the optimal diffusion policy. Our method enables the use of expressive diffusion-based policies while retaining the principled exploration benefits of MaxEnt-RL, significantly outperforming other diffusion-based methods on challenging high-dimensional control benchmarks. It is also competitive with state-of-the-art non-diffusion based RL methods while requiring fewer algorithmic design choices and smaller update-to-data ratios, reducing computational complexity.

Related papers

Maximum Entropy Reinforcement Learning with Diffusion Policy [24.889485955864547]
In this paper, we employ the diffusion model, a powerful generative model capable of capturing complex multimodal distributions, as the policy representation to fulfill the MaxEnt RL objective. Our method enables efficient exploration and brings the policy closer to the optimal MaxEnt policy.
arXiv Detail & Related papers (2025-02-17T09:55:58Z)
Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review [63.31328039424469]
This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to optimize downstream reward functions. We explain the application of various RL algorithms, including PPO, differentiable optimization, reward-weighted MLE, value-weighted sampling, and path consistency learning.
arXiv Detail & Related papers (2024-07-18T17:35:32Z)
Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization [55.97310586039358]
Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality.<n>We propose a novel model-free diffusion-based online RL algorithm, Q-weighted Variational Policy Optimization (QVPO)<n>Specifically, we introduce the Q-weighted variational loss, which can be proved to be a tight lower bound of the policy objective in online RL under certain conditions.<n>We also develop an efficient behavior policy to enhance sample efficiency by reducing the variance of the diffusion policy during online interactions.
arXiv Detail & Related papers (2024-05-25T10:45:46Z)
Diffusion Actor-Critic with Entropy Regulator [32.79341490514616]
We propose an online RL algorithm termed diffusion actor-critic with entropy regulator (DACER)<n>This algorithm conceptualizes the reverse process of the diffusion model as a novel policy function.<n>Experiments on MuJoCo benchmarks and a multimodal task demonstrate that the DACER algorithm achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-05-24T03:23:27Z)
Discrete Probabilistic Inference as Control in Multi-path Environments [84.67055173040107]
We consider the problem of sampling from a discrete and structured distribution as a sequential decision problem. We show that GFlowNets learn a policy that samples objects proportionally to their reward by enforcing a conservation of flows. We also prove that some flow-matching objectives found in the GFlowNet literature are in fact equivalent to well-established MaxEnt RL algorithms with a corrected reward.
arXiv Detail & Related papers (2024-02-15T20:20:35Z)
Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement Learning [11.0460569590737]
This paper presents advanced techniques of training diffusion policies for offline reinforcement learning (RL) We show that an SDE has a solution that we can use to calculate the log probability of the policy, yielding an entropy regularizer that improves the exploration of offline datasets. By combining the entropy-regularized diffusion policy with Q-ensembles in offline RL, our method achieves state-of-the-art performance on most tasks in D4RL benchmarks.
arXiv Detail & Related papers (2024-02-06T15:34:30Z)
DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning [11.678012836760967]
Constrained policy search is a fundamental problem in offline reinforcement learning. We propose a novel approach, $textbfDiffusion-based Constrained Policy Search$ (dubbed DiffCPS)
arXiv Detail & Related papers (2023-10-09T01:29:17Z)
Efficient Diffusion Policies for Offline Reinforcement Learning [85.73757789282212]
Diffsuion-QL significantly boosts the performance of offline RL by representing a policy with a diffusion model. We propose efficient diffusion policy (EDP) to overcome these two challenges. EDP constructs actions from corrupted ones at training to avoid running the sampling chain.
arXiv Detail & Related papers (2023-05-31T17:55:21Z)
Policy Representation via Diffusion Probability Model for Reinforcement Learning [67.56363353547775]
We build a theoretical foundation of policy representation via the diffusion probability model. We present a convergence guarantee for diffusion policy, which provides a theory to understand the multimodality of diffusion policy. We propose the DIPO which is an implementation for model-free online RL with DIffusion POlicy.
arXiv Detail & Related papers (2023-05-22T15:23:41Z)
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset. We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy. We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.