Improving Offline Reinforcement Learning with Inaccurate Simulators
- URL: http://arxiv.org/abs/2405.04307v1
- Date: Tue, 7 May 2024 13:29:41 GMT
- Title: Improving Offline Reinforcement Learning with Inaccurate Simulators
- Authors: Yiwen Hou, Haoyuan Sun, Jinming Ma, Feng Wu,
- Abstract summary: We propose a novel approach to combine the offline dataset and the inaccurate simulation data in a better manner.
Specifically, we pre-train a generative adversarial network (GAN) model to fit the state distribution of the offline dataset.
Our experimental results in the D4RL benchmark and a real-world manipulation task confirm that our method can benefit more from both inaccurate simulator and limited offline datasets to achieve better performance than the state-of-the-art methods.
- Score: 34.54402525918925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning (RL) provides a promising approach to avoid costly online interaction with the real environment. However, the performance of offline RL highly depends on the quality of the datasets, which may cause extrapolation error in the learning process. In many robotic applications, an inaccurate simulator is often available. However, the data directly collected from the inaccurate simulator cannot be directly used in offline RL due to the well-known exploration-exploitation dilemma and the dynamic gap between inaccurate simulation and the real environment. To address these issues, we propose a novel approach to combine the offline dataset and the inaccurate simulation data in a better manner. Specifically, we pre-train a generative adversarial network (GAN) model to fit the state distribution of the offline dataset. Given this, we collect data from the inaccurate simulator starting from the distribution provided by the generator and reweight the simulated data using the discriminator. Our experimental results in the D4RL benchmark and a real-world manipulation task confirm that our method can benefit more from both inaccurate simulator and limited offline datasets to achieve better performance than the state-of-the-art methods.
Related papers
- Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling [34.547551367941246]
Real-world data collected from sensors or humans often contains noise and errors.
Traditional offline RL methods based on temporal difference learning tend to underperform Decision Transformer (DT) under data corruption.
We propose Robust Decision Transformer (RDT) by incorporating several robust techniques.
arXiv Detail & Related papers (2024-07-05T06:34:32Z) - Benchmarks for Reinforcement Learning with Biased Offline Data and Imperfect Simulators [16.740841615738642]
We outline four principal challenges for combining offline data with imperfect simulators in reinforcement learning.
These challenges include simulator modeling error, partial observability, state and action discrepancies, and hidden confounding.
Our results suggest the key necessity of such benchmarks for future research.
arXiv Detail & Related papers (2024-06-30T19:22:59Z) - Improved Long Short-Term Memory-based Wastewater Treatment Simulators for Deep Reinforcement Learning [0.0]
We implement two methods to improve the trained models for wastewater treatment data.
The experimental results showed that implementing these methods can improve the behavior of simulators in terms of Dynamic Time Warping throughout a year.
arXiv Detail & Related papers (2024-03-22T10:20:09Z) - Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z) - Diffusion Dataset Generation: Towards Closing the Sim2Real Gap for
Pedestrian Detection [0.11470070927586014]
We propose a novel method of synthetic data creation meant to close the sim2real gap for the pedestrian detection task.
Our method uses a diffusion-based architecture to learn a real-world distribution which, once trained, is used to generate datasets.
We show that training on a combination of generated and simulated data increases average precision by as much as 27.3% for pedestrian detection models in real-world data.
arXiv Detail & Related papers (2023-05-16T12:33:51Z) - Continual learning autoencoder training for a particle-in-cell
simulation via streaming [52.77024349608834]
upcoming exascale era will provide a new generation of physics simulations with high resolution.
These simulations will have a high resolution, which will impact the training of machine learning models since storing a high amount of simulation data on disk is nearly impossible.
This work presents an approach that trains a neural network concurrently to a running simulation without data on a disk.
arXiv Detail & Related papers (2022-11-09T09:55:14Z) - Training robust anomaly detection using ML-Enhanced simulations [1.370633147306388]
Simulations can provide edge conditions for anomaly detection which may be sparse or non-existent in real-world data.
Our approach enhances simulations using neural networks trained on real-world data to create outputs that are more realistic and variable than traditional simulations.
arXiv Detail & Related papers (2020-08-27T12:28:07Z) - AutoSimulate: (Quickly) Learning Synthetic Data Generation [70.82315853981838]
We propose an efficient alternative for optimal synthetic data generation based on a novel differentiable approximation of the objective.
We demonstrate that the proposed method finds the optimal data distribution faster (up to $50times$), with significantly reduced training data generation (up to $30times$) and better accuracy ($+8.7%$) on real-world test datasets than previous methods.
arXiv Detail & Related papers (2020-08-16T11:36:11Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.