H2O+: An Improved Framework for Hybrid Offline-and-Online RL with
Dynamics Gaps
- URL: http://arxiv.org/abs/2309.12716v1
- Date: Fri, 22 Sep 2023 08:58:22 GMT
- Title: H2O+: An Improved Framework for Hybrid Offline-and-Online RL with
Dynamics Gaps
- Authors: Haoyi Niu, Tianying Ji, Bingqi Liu, Haocheng Zhao, Xiangyu Zhu,
Jianying Zheng, Pengfei Huang, Guyue Zhou, Jianming Hu, Xianyuan Zhan
- Abstract summary: We develop a new algorithm, called H2O+, which offers great flexibility to bridge various choices of offline and online learning methods.
We demonstrate superior performance and flexibility over advanced cross-domain online and offline RL algorithms.
- Score: 31.608209251850553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Solving real-world complex tasks using reinforcement learning (RL) without
high-fidelity simulation environments or large amounts of offline data can be
quite challenging. Online RL agents trained in imperfect simulation
environments can suffer from severe sim-to-real issues. Offline RL approaches
although bypass the need for simulators, often pose demanding requirements on
the size and quality of the offline datasets. The recently emerged hybrid
offline-and-online RL provides an attractive framework that enables joint use
of limited offline data and imperfect simulator for transferable policy
learning. In this paper, we develop a new algorithm, called H2O+, which offers
great flexibility to bridge various choices of offline and online learning
methods, while also accounting for dynamics gaps between the real and
simulation environment. Through extensive simulation and real-world robotics
experiments, we demonstrate superior performance and flexibility over advanced
cross-domain online and offline RL algorithms.
Related papers
- D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - A Benchmark Environment for Offline Reinforcement Learning in Racing Games [54.83171948184851]
Offline Reinforcement Learning (ORL) is a promising approach to reduce the high sample complexity of traditional Reinforcement Learning (RL)
This paper introduces OfflineMania, a novel environment for ORL research.
It is inspired by the iconic TrackMania series and developed using the Unity 3D game engine.
arXiv Detail & Related papers (2024-07-12T16:44:03Z) - Benchmarks for Reinforcement Learning with Biased Offline Data and Imperfect Simulators [16.740841615738642]
We outline four principal challenges for combining offline data with imperfect simulators in reinforcement learning.
These challenges include simulator modeling error, partial observability, state and action discrepancies, and hidden confounding.
Our results suggest the key necessity of such benchmarks for future research.
arXiv Detail & Related papers (2024-06-30T19:22:59Z) - ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles [52.34951901588738]
We propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL.
By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance.
Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods.
arXiv Detail & Related papers (2023-06-12T05:10:10Z) - Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid
Reinforcement Learning [66.43003402281659]
A central question boils down to how to efficiently utilize online data collection to strengthen and complement the offline dataset.
We design a three-stage hybrid RL algorithm that beats the best of both worlds -- pure offline RL and pure online RL.
The proposed algorithm does not require any reward information during data collection.
arXiv Detail & Related papers (2023-05-17T15:17:23Z) - Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient [42.47810044648846]
We consider a hybrid reinforcement learning setting (Hybrid RL) in which an agent has access to an offline dataset and the ability to collect experience via real-world online interaction.
We adapt the classical Q learning/iteration algorithm to the hybrid setting, which we call Hybrid Q-Learning or Hy-Q.
We show that Hy-Q with neural network function approximation outperforms state-of-the-art online, offline, and hybrid RL baselines on challenging benchmarks.
arXiv Detail & Related papers (2022-10-13T04:19:05Z) - When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online
Reinforcement Learning [7.786094194874359]
We propose the Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning (H2O) framework to provide an affirmative answer to this question.
H2O introduces a dynamics-aware policy evaluation scheme, which adaptively penalizes the Q function learning on simulated state-action pairs with large dynamics gaps.
We demonstrate the superior performance of H2O against other cross-domain online and offline RL algorithms.
arXiv Detail & Related papers (2022-06-27T17:18:11Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.