Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning
- URL: http://arxiv.org/abs/2305.15260v4
- Date: Tue, 29 Oct 2024 09:41:58 GMT
- Title: Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning
- Authors: Qi Wang, Junming Yang, Yunbo Wang, Xin Jin, Wenjun Zeng, Xiaokang Yang,
- Abstract summary: This paper tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages.
The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the "test bed" for offline policies.
We introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces.
- Score: 93.99377042564919
- License:
- Abstract: Training offline RL models using visual inputs poses two significant challenges, i.e., the overfitting problem in representation learning and the overestimation bias for expected future rewards. Recent work has attempted to alleviate the overestimation bias by encouraging conservative behaviors. This paper, in contrast, tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages. The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the "test bed" for offline policies. To enable effective online-to-offline knowledge transfer, we introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces. Experimental results demonstrate the effectiveness of CoWorld, outperforming existing RL approaches by large margins.
Related papers
- Unsupervised-to-Online Reinforcement Learning [59.910638327123394]
Unsupervised-to-online RL (U2O RL) replaces domain-specific supervised offline RL with unsupervised offline RL.
U2O RL not only enables reusing a single pre-trained model for multiple downstream tasks, but also learns better representations.
We empirically demonstrate that U2O RL achieves strong performance that matches or even outperforms previous offline-to-online RL approaches.
arXiv Detail & Related papers (2024-08-27T05:23:45Z) - ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles [52.34951901588738]
We propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL.
By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance.
Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods.
arXiv Detail & Related papers (2023-06-12T05:10:10Z) - Survival Instinct in Offline Reinforcement Learning [28.319886852612672]
offline RL can produce well-optimal and safe policies even when trained with "wrong" reward labels.
We demonstrate that this surprising property is attributable to an interplay between the notion of pessimism in offline RL algorithms and certain implicit biases in common data collection practices.
Our empirical and theoretical results suggest a new paradigm for RL, whereby an agent is nudged to learn a desirable behavior with imperfect reward but purposely biased data coverage.
arXiv Detail & Related papers (2023-06-05T22:15:39Z) - Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid
Reinforcement Learning [66.43003402281659]
A central question boils down to how to efficiently utilize online data collection to strengthen and complement the offline dataset.
We design a three-stage hybrid RL algorithm that beats the best of both worlds -- pure offline RL and pure online RL.
The proposed algorithm does not require any reward information during data collection.
arXiv Detail & Related papers (2023-05-17T15:17:23Z) - When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online
Reinforcement Learning [7.786094194874359]
We propose the Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning (H2O) framework to provide an affirmative answer to this question.
H2O introduces a dynamics-aware policy evaluation scheme, which adaptively penalizes the Q function learning on simulated state-action pairs with large dynamics gaps.
We demonstrate the superior performance of H2O against other cross-domain online and offline RL algorithms.
arXiv Detail & Related papers (2022-06-27T17:18:11Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z) - Offline Reinforcement Learning Hands-On [60.36729294485601]
offline RL aims to turn large datasets into powerful decision-making engines without any online interactions with the environment.
This work aims to reflect upon these efforts from a practitioner viewpoint.
We experimentally validate that diversity and high-return examples in the data are crucial to the success of offline RL.
arXiv Detail & Related papers (2020-11-29T14:45:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.