When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online
Reinforcement Learning
- URL: http://arxiv.org/abs/2206.13464v1
- Date: Mon, 27 Jun 2022 17:18:11 GMT
- Title: When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online
Reinforcement Learning
- Authors: Haoyi Niu, Shubham Sharma, Yiwen Qiu, Ming Li, Guyue Zhou, Jianming
Hu, Xianyuan Zhan
- Abstract summary: We propose the Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning (H2O) framework to provide an affirmative answer to this question.
H2O introduces a dynamics-aware policy evaluation scheme, which adaptively penalizes the Q function learning on simulated state-action pairs with large dynamics gaps.
We demonstrate the superior performance of H2O against other cross-domain online and offline RL algorithms.
- Score: 7.786094194874359
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Learning effective reinforcement learning (RL) policies to solve real-world
complex tasks can be quite challenging without a high-fidelity simulation
environment. In most cases, we are only given imperfect simulators with
simplified dynamics, which inevitably lead to severe sim-to-real gaps in RL
policy learning. The recently emerged field of offline RL provides another
possibility to learn policies directly from pre-collected historical data.
However, to achieve reasonable performance, existing offline RL algorithms need
impractically large offline data with sufficient state-action space coverage
for training. This brings up a new question: is it possible to combine learning
from limited real data in offline RL and unrestricted exploration through
imperfect simulators in online RL to address the drawbacks of both approaches?
In this study, we propose the Dynamics-Aware Hybrid Offline-and-Online
Reinforcement Learning (H2O) framework to provide an affirmative answer to this
question. H2O introduces a dynamics-aware policy evaluation scheme, which
adaptively penalizes the Q function learning on simulated state-action pairs
with large dynamics gaps, while also simultaneously allowing learning from a
fixed real-world dataset. Through extensive simulation and real-world tasks, as
well as theoretical analysis, we demonstrate the superior performance of H2O
against other cross-domain online and offline RL algorithms. H2O provides a
brand new hybrid offline-and-online RL paradigm, which can potentially shed
light on future RL algorithm design for solving practical real-world tasks.
Related papers
- D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - H2O+: An Improved Framework for Hybrid Offline-and-Online RL with
Dynamics Gaps [31.608209251850553]
We develop a new algorithm, called H2O+, which offers great flexibility to bridge various choices of offline and online learning methods.
We demonstrate superior performance and flexibility over advanced cross-domain online and offline RL algorithms.
arXiv Detail & Related papers (2023-09-22T08:58:22Z) - ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles [52.34951901588738]
We propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL.
By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance.
Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods.
arXiv Detail & Related papers (2023-06-12T05:10:10Z) - Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid
Reinforcement Learning [66.43003402281659]
A central question boils down to how to efficiently utilize online data collection to strengthen and complement the offline dataset.
We design a three-stage hybrid RL algorithm that beats the best of both worlds -- pure offline RL and pure online RL.
The proposed algorithm does not require any reward information during data collection.
arXiv Detail & Related papers (2023-05-17T15:17:23Z) - Using Offline Data to Speed-up Reinforcement Learning in Procedurally
Generated Environments [11.272582555795989]
We study whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments.
We consider two settings of using IL from offline data for RL: (1) pre-training a policy before online RL training and (2) concurrently training a policy with online RL and IL from offline data.
arXiv Detail & Related papers (2023-04-18T16:23:15Z) - Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient [42.47810044648846]
We consider a hybrid reinforcement learning setting (Hybrid RL) in which an agent has access to an offline dataset and the ability to collect experience via real-world online interaction.
We adapt the classical Q learning/iteration algorithm to the hybrid setting, which we call Hybrid Q-Learning or Hy-Q.
We show that Hy-Q with neural network function approximation outperforms state-of-the-art online, offline, and hybrid RL baselines on challenging benchmarks.
arXiv Detail & Related papers (2022-10-13T04:19:05Z) - Cloud-Edge Training Architecture for Sim-to-Real Deep Reinforcement
Learning [0.8399688944263843]
Deep reinforcement learning (DRL) is a promising approach to solve complex control tasks by learning policies through interactions with the environment.
Sim-to-real approaches leverage simulations to pretrain DRL policies and then deploy them in the real world.
This work proposes a distributed cloud-edge architecture to train DRL agents in the real world in real-time.
arXiv Detail & Related papers (2022-03-04T10:27:01Z) - A Workflow for Offline Model-Free Robotic Reinforcement Learning [117.07743713715291]
offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction.
We develop a practical workflow for using offline RL analogous to the relatively well-understood for supervised learning problems.
We demonstrate the efficacy of this workflow in producing effective policies without any online tuning.
arXiv Detail & Related papers (2021-09-22T16:03:29Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.