Related papers: Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning

Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning

URL: http://arxiv.org/abs/2307.04726v4
Date: Fri, 06 Jun 2025 11:23:26 GMT
Title: Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning
Authors: Suzan Ece Ada, Erhan Oztop, Emre Ugur,
Abstract summary: We propose a novel method that incorporates state reconstruction feature learning in the recent class of diffusion policies.<n>Our method promotes learning of generalizable state representation to alleviate the distribution shift caused by OOD states.
Score: 1.7205106391379026
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Offline Reinforcement Learning (RL) methods leverage previous experiences to learn better policies than the behavior policy used for data collection. However, they face challenges handling distribution shifts due to the lack of online interaction during training. To this end, we propose a novel method named State Reconstruction for Diffusion Policies (SRDP) that incorporates state reconstruction feature learning in the recent class of diffusion policies to address the problem of out-of-distribution (OOD) generalization. Our method promotes learning of generalizable state representation to alleviate the distribution shift caused by OOD states. To illustrate the OOD generalization and faster convergence of SRDP, we design a novel 2D Multimodal Contextual Bandit environment and realize it on a 6-DoF real-world UR10 robot, as well as in simulation, and compare its performance with prior algorithms. In particular, we show the importance of the proposed state reconstruction via ablation studies. In addition, we assess the performance of our model on standard continuous control benchmarks (D4RL), namely the navigation of an 8-DoF ant and forward locomotion of half-cheetah, hopper, and walker2d, achieving state-of-the-art results. Finally, we demonstrate that our method can achieve 167% improvement over the competing baseline on a sparse continuous control navigation task where various regions of the state space are removed from the offline RL dataset, including the region encapsulating the goal.

Related papers

Diffusion Policies with Value-Conditional Optimization for Offline Reinforcement Learning [11.523827300607541]
We propose DIffusion policies with Value-conditional Optimization (DIVO)<n>DIVO generates high-quality, broadly covered in-distribution state-action samples while facilitating efficient policy improvement.<n>We evaluate DIVO on the D4RL benchmark and compare it against state-of-the-art baselines.
arXiv Detail & Related papers (2025-11-12T02:59:07Z)
Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning [53.9544543607396]
We propose a novel framework that integrates reward rendering with Imitation from Observation (IfO)<n>By instantiating F-distance in different ways, we derive two theoretical analysis and develop a practical algorithm called Accessible State Oriented Policy Regularization (ASOR)<n>ASOR serves as a general add-on module that can be incorporated into various approaches RL, including offline RL and off-policy RL.
arXiv Detail & Related papers (2025-03-10T03:50:20Z)
DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning [22.323173093804897]
offline reinforcement learning can learn optimal policies from pre-collected offline datasets without interacting with the environment. Recent works address this issue by employing generative adversarial networks (GANs) Inspired by the diffusion, we propose a new offline RL method named Diffusion Policies with Generative Adversarial Networks (DiffPoGAN)
arXiv Detail & Related papers (2024-06-13T13:15:40Z)
CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning [25.071018803326254]
Distribution shift is a major obstacle in offline reinforcement learning. Previous conservative offline RL algorithms struggle to generalize to unseen actions. We propose to use the gradient fields of the dataset density generated from a pre-trained offline RL algorithm to adjust the original actions.
arXiv Detail & Related papers (2024-06-11T17:59:29Z)
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL [57.202733701029594]
We propose Decision Mamba, a novel multi-grained state space model (SSM) with a self-evolving policy learning strategy.<n>To address these challenges, we propose Decision Mamba, a novel multi-grained state space model (SSM) with a self-evolving policy learning strategy.<n>To mitigate the overfitting issue on noisy trajectories, a self-evolving policy is proposed by using progressive regularization.
arXiv Detail & Related papers (2024-06-08T10:12:00Z)
State-Constrained Offline Reinforcement Learning [9.38848713730931]
We introduce state-constrained offline RL, a novel framework that focuses solely on the dataset's state distribution.<n>We also introduce StaCQ, a deep learning algorithm that achieves state-of-the-art performance on the D4RL benchmark datasets.
arXiv Detail & Related papers (2024-05-23T09:50:04Z)
Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage [32.578787778183546]
offline reinforcement learning (RL) algorithms learn optimal polices using historical (offline) data. One of the main challenges in offline RL is the distribution shift. We propose two offline RL algorithms using the distributionally robust learning (DRL) framework.
arXiv Detail & Related papers (2023-10-27T19:19:30Z)
Variational Latent Branching Model for Off-Policy Evaluation [23.073461349048834]
We propose a variational latent branching model (VLBM) to learn the transition function of Markov decision processes (MDPs) We introduce the branching architecture to improve the model's robustness against randomly model weights. We show that the VLBM outperforms existing state-of-the-art OPE methods in general.
arXiv Detail & Related papers (2023-01-28T02:20:03Z)
Model-based trajectory stitching for improved behavioural cloning and its applications [7.462336024223669]
Trajectory Stitching (TS) generates new trajectories by stitching' pairs of states that were disconnected in the original data. We demonstrate that the iterative process of replacing old trajectories with new ones incrementally improves the underlying behavioural policy.
arXiv Detail & Related papers (2022-12-08T14:18:04Z)
Offline RL With Realistic Datasets: Heteroskedasticity and Support Constraints [82.43359506154117]
We show that typical offline reinforcement learning methods fail to learn from data with non-uniform variability. Our method is simple, theoretically motivated, and improves performance across a wide range of offline RL problems in Atari games, navigation, and pixel-based manipulation.
arXiv Detail & Related papers (2022-11-02T11:36:06Z)
Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets. We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged. We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z)
A Policy-Guided Imitation Approach for Offline Reinforcement Learning [9.195775740684248]
We introduce Policy-guided Offline RL (textttPOR) textttPOR demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline RL.
arXiv Detail & Related papers (2022-10-15T15:54:28Z)
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset. We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy. We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z)
Backward Imitation and Forward Reinforcement Learning via Bi-directional Model Rollouts [11.4219428942199]
Traditional model-based reinforcement learning (RL) methods generate forward rollout traces using the learnt dynamics model. In this paper, we propose the backward imitation and forward reinforcement learning (BIFRL) framework. BIFRL empowers the agent to both reach to and explore from high-value states in a more efficient manner.
arXiv Detail & Related papers (2022-08-04T04:04:05Z)
Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making. Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values. It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z)
Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning [62.19209005400561]
offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learning from static datasets. A key challenge of offline RL is the instability of policy training, caused by the mismatch between the distribution of the offline data and the undiscounted stationary state-action distribution of the learned policy. We regularize the undiscounted stationary distribution of the current policy towards the offline data during the policy optimization process.
arXiv Detail & Related papers (2022-06-14T20:56:16Z)
RORL: Robust Offline Reinforcement Learning via Conservative Smoothing [72.8062448549897]
offline reinforcement learning can exploit the massive amount of offline data for complex decision-making tasks. Current offline RL algorithms are generally designed to be conservative for value estimation and action selection. We propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique.
arXiv Detail & Related papers (2022-06-06T18:07:41Z)
Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE) MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary. In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z)
MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. We show that an existing model-based RL algorithm already produces significant gains in the offline setting. We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.