Diffusion Policies for Out-of-Distribution Generalization in Offline
Reinforcement Learning
- URL: http://arxiv.org/abs/2307.04726v3
- Date: Mon, 4 Sep 2023 02:40:31 GMT
- Title: Diffusion Policies for Out-of-Distribution Generalization in Offline
Reinforcement Learning
- Authors: Suzan Ece Ada, Erhan Oztop, Emre Ugur
- Abstract summary: offline RL methods leverage previous experiences to learn better policies than the behavior policy used for data collection.
However, offline RL algorithms face challenges in handling distribution shifts and effectively representing policies due to the lack of online interaction during training.
We introduce a novel method named State Reconstruction for Diffusion Policies (SRDP), incorporating state reconstruction feature learning in the recent class of diffusion policies.
- Score: 1.9336815376402723
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline Reinforcement Learning (RL) methods leverage previous experiences to
learn better policies than the behavior policy used for data collection. In
contrast to behavior cloning, which assumes the data is collected from expert
demonstrations, offline RL can work with non-expert data and multimodal
behavior policies. However, offline RL algorithms face challenges in handling
distribution shifts and effectively representing policies due to the lack of
online interaction during training. Prior work on offline RL uses conditional
diffusion models to represent multimodal behavior in the dataset. Nevertheless,
these methods are not tailored toward alleviating the out-of-distribution state
generalization. We introduce a novel method named State Reconstruction for
Diffusion Policies (SRDP), incorporating state reconstruction feature learning
in the recent class of diffusion policies to address the out-of-distribution
generalization problem. State reconstruction loss promotes generalizable
representation learning of states to alleviate the distribution shift incurred
by the out-of-distribution (OOD) states. We design a novel 2D Multimodal
Contextual Bandit environment to illustrate the OOD generalization and faster
convergence of SRDP compared to prior algorithms. In addition, we assess the
performance of our model on D4RL continuous control benchmarks, namely the
navigation of an 8-DoF ant and forward locomotion of half-cheetah, hopper, and
walker2d, achieving state-of-the-art results.
Related papers
- DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning [22.323173093804897]
offline reinforcement learning can learn optimal policies from pre-collected offline datasets without interacting with the environment.
Recent works address this issue by employing generative adversarial networks (GANs)
Inspired by the diffusion, we propose a new offline RL method named Diffusion Policies with Generative Adversarial Networks (DiffPoGAN)
arXiv Detail & Related papers (2024-06-13T13:15:40Z) - Bridging Distributionally Robust Learning and Offline RL: An Approach to
Mitigate Distribution Shift and Partial Data Coverage [32.578787778183546]
offline reinforcement learning (RL) algorithms learn optimal polices using historical (offline) data.
One of the main challenges in offline RL is the distribution shift.
We propose two offline RL algorithms using the distributionally robust learning (DRL) framework.
arXiv Detail & Related papers (2023-10-27T19:19:30Z) - Offline RL With Realistic Datasets: Heteroskedasticity and Support
Constraints [82.43359506154117]
We show that typical offline reinforcement learning methods fail to learn from data with non-uniform variability.
Our method is simple, theoretically motivated, and improves performance across a wide range of offline RL problems in Atari games, navigation, and pixel-based manipulation.
arXiv Detail & Related papers (2022-11-02T11:36:06Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - A Policy-Guided Imitation Approach for Offline Reinforcement Learning [9.195775740684248]
We introduce Policy-guided Offline RL (textttPOR)
textttPOR demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline RL.
arXiv Detail & Related papers (2022-10-15T15:54:28Z) - Diffusion Policies as an Expressive Policy Class for Offline
Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset.
We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy.
We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z) - Regularizing a Model-based Policy Stationary Distribution to Stabilize
Offline Reinforcement Learning [62.19209005400561]
offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learning from static datasets.
A key challenge of offline RL is the instability of policy training, caused by the mismatch between the distribution of the offline data and the undiscounted stationary state-action distribution of the learned policy.
We regularize the undiscounted stationary distribution of the current policy towards the offline data during the policy optimization process.
arXiv Detail & Related papers (2022-06-14T20:56:16Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.