Benchmarks for Reinforcement Learning with Biased Offline Data and Imperfect Simulators
- URL: http://arxiv.org/abs/2407.00806v1
- Date: Sun, 30 Jun 2024 19:22:59 GMT
- Title: Benchmarks for Reinforcement Learning with Biased Offline Data and Imperfect Simulators
- Authors: Ori Linial, Guy Tennenholtz, Uri Shalit,
- Abstract summary: We outline four principal challenges for combining offline data with imperfect simulators in reinforcement learning.
These challenges include simulator modeling error, partial observability, state and action discrepancies, and hidden confounding.
Our results suggest the key necessity of such benchmarks for future research.
- Score: 16.740841615738642
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many reinforcement learning (RL) applications one cannot easily let the agent act in the world; this is true for autonomous vehicles, healthcare applications, and even some recommender systems, to name a few examples. Offline RL provides a way to train agents without real-world exploration, but is often faced with biases due to data distribution shifts, limited coverage, and incomplete representation of the environment. To address these issues, practical applications have tried to combine simulators with grounded offline data, using so-called hybrid methods. However, constructing a reliable simulator is in itself often challenging due to intricate system complexities as well as missing or incomplete information. In this work, we outline four principal challenges for combining offline data with imperfect simulators in RL: simulator modeling error, partial observability, state and action discrepancies, and hidden confounding. To help drive the RL community to pursue these problems, we construct ``Benchmarks for Mechanistic Offline Reinforcement Learning'' (B4MRL), which provide dataset-simulator benchmarks for the aforementioned challenges. Our results suggest the key necessity of such benchmarks for future research.
Related papers
- D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Improving Offline Reinforcement Learning with Inaccurate Simulators [34.54402525918925]
We propose a novel approach to combine the offline dataset and the inaccurate simulation data in a better manner.
Specifically, we pre-train a generative adversarial network (GAN) model to fit the state distribution of the offline dataset.
Our experimental results in the D4RL benchmark and a real-world manipulation task confirm that our method can benefit more from both inaccurate simulator and limited offline datasets to achieve better performance than the state-of-the-art methods.
arXiv Detail & Related papers (2024-05-07T13:29:41Z) - Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous
Driving Research [76.93956925360638]
Waymax is a new data-driven simulator for autonomous driving in multi-agent scenes.
It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training.
We benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions.
arXiv Detail & Related papers (2023-10-12T20:49:15Z) - H2O+: An Improved Framework for Hybrid Offline-and-Online RL with
Dynamics Gaps [31.608209251850553]
We develop a new algorithm, called H2O+, which offers great flexibility to bridge various choices of offline and online learning methods.
We demonstrate superior performance and flexibility over advanced cross-domain online and offline RL algorithms.
arXiv Detail & Related papers (2023-09-22T08:58:22Z) - When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online
Reinforcement Learning [7.786094194874359]
We propose the Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning (H2O) framework to provide an affirmative answer to this question.
H2O introduces a dynamics-aware policy evaluation scheme, which adaptively penalizes the Q function learning on simulated state-action pairs with large dynamics gaps.
We demonstrate the superior performance of H2O against other cross-domain online and offline RL algorithms.
arXiv Detail & Related papers (2022-06-27T17:18:11Z) - DR2L: Surfacing Corner Cases to Robustify Autonomous Driving via Domain
Randomization Reinforcement Learning [4.040937987024427]
Domain Randomization(DR) is a methodology that can bridge this gap with little or no real-world data.
An adversarial model is put forward to robustify DeepRL-based autonomous vehicles trained in simulation.
arXiv Detail & Related papers (2021-07-25T09:15:46Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z) - From Simulation to Real World Maneuver Execution using Deep
Reinforcement Learning [69.23334811890919]
Deep Reinforcement Learning has proved to be able to solve many control tasks in different fields, but the behavior of these systems is not always as expected when deployed in real-world scenarios.
This is mainly due to the lack of domain adaptation between simulated and real-world data together with the absence of distinction between train and test datasets.
We present a system based on multiple environments in which agents are trained simultaneously, evaluating the behavior of the model in different scenarios.
arXiv Detail & Related papers (2020-05-13T14:22:20Z) - Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks [70.56451186797436]
We study how to use meta-reinforcement learning to solve the bulk of the problem in simulation.
We demonstrate our approach by training an agent to successfully perform challenging real-world insertion tasks.
arXiv Detail & Related papers (2020-04-29T18:00:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.