Related papers: Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL

Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL

URL: http://arxiv.org/abs/2106.09119v2
Date: Fri, 18 Jun 2021 04:46:41 GMT
Title: Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL
Authors: Catherine Cang, Aravind Rajeswaran, Pieter Abbeel, Michael Laskin
Abstract summary: We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE) MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary. In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
Score: 82.93243616342275
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Offline Reinforcement Learning (RL) aims to extract near-optimal policies from imperfect offline data without additional environment interactions. Extracting policies from diverse offline datasets has the potential to expand the range of applicability of RL by making the training process safer, faster, and more streamlined. We investigate how to improve the performance of offline RL algorithms, its robustness to the quality of offline data, as well as its generalization capabilities. To this end, we introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE). Our algorithm is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary. When combined together, they substantially improve the performance and generalization of offline RL policies. In the widely studied D4RL offline RL benchmark, we find that MABE achieves higher average performance compared to prior model-free and model-based algorithms. In experiments that require cross-domain generalization, we find that MABE outperforms prior methods. Our website is available at https://sites.google.com/berkeley.edu/mabe .

Related papers

Unsupervised Data Generation for Offline Reinforcement Learning: A Perspective from Model [57.20064815347607]
offline reinforcement learning (RL) recently gains growing interests from RL researchers.<n>The performance of offline RL suffers from the out-of-distribution problem, which can be corrected by feedback in online RL.<n>In this paper, we first build a bridge over the batch data and the performance of offline RL algorithms theoretically.<n>We show that in task-agnostic settings, a series of policies trained by unsupervised RL can minimize the worst-case regret in the performance gap.
arXiv Detail & Related papers (2025-06-24T14:08:36Z)
MOORL: A Framework for Integrating Offline-Online Reinforcement Learning [6.7265073544042995]
We propose Meta Offline-Online Reinforcement Learning (MOORL), a hybrid framework that unifies offline and online learning.<n>Our theoretical analysis demonstrates that the hybrid approach enhances exploration by effectively combining the complementary strengths of offline and online data.<n>With minimal computational overhead, MOORL achieves strong performance, underscoring its potential for practical applications in real-world scenarios.
arXiv Detail & Related papers (2025-06-11T10:12:50Z)
Enhancing Sample Efficiency and Exploration in Reinforcement Learning through the Integration of Diffusion Models and Proximal Policy Optimization [1.631115063641726]
We propose a framework that enhances PPO algorithms by incorporating a diffusion model to generate high-quality virtual trajectories for offline datasets. Our contributions are threefold: we explore the potential of diffusion models in RL, particularly for offline datasets, extend the application of online RL to offline environments, and experimentally validate the performance improvements of PPO with diffusion models.
arXiv Detail & Related papers (2024-09-02T19:10:32Z)
Is Value Learning Really the Main Bottleneck in Offline RL? [70.54708989409409]
We show that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL. We propose two simple test-time policy improvement methods and show that these methods lead to better performance.
arXiv Detail & Related papers (2024-06-13T17:07:49Z)
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset. We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z)
Contextual Transformer for Offline Meta Reinforcement Learning [16.587320914107128]
We show how prompts can improve sequence modeling-based offline reinforcement learning ( offline RL) algorithms. We propose prompt tuning for offline RL, where a context vector sequence istextuald with the input to guide the conditional policy generation. We extend our framework to Meta-RL settings and propose Contextual Meta Transformer (CMT); CMT leverages the context among different tasks as the prompt to improve generalization on unseen tasks.
arXiv Detail & Related papers (2022-11-15T10:00:14Z)
OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way. Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy. We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z)
Representation Matters: Offline Pretraining for Sequential Decision Making [27.74988221252854]
In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making. We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
arXiv Detail & Related papers (2021-02-11T02:38:12Z)
Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR) We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. We show that an existing model-based RL algorithm already produces significant gains in the offline setting. We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.