Offline Reinforcement Learning from Images with Latent Space Models
- URL: http://arxiv.org/abs/2012.11547v1
- Date: Mon, 21 Dec 2020 18:28:17 GMT
- Title: Offline Reinforcement Learning from Images with Latent Space Models
- Authors: Rafael Rafailov, Tianhe Yu, Aravind Rajeswaran, Chelsea Finn
- Abstract summary: offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
- Score: 60.69745540036375
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline reinforcement learning (RL) refers to the problem of learning
policies from a static dataset of environment interactions. Offline RL enables
extensive use and re-use of historical datasets, while also alleviating safety
concerns associated with online exploration, thereby expanding the real-world
applicability of RL. Most prior work in offline RL has focused on tasks with
compact state representations. However, the ability to learn directly from rich
observation spaces like images is critical for real-world applications such as
robotics. In this work, we build on recent advances in model-based algorithms
for offline RL, and extend them to high-dimensional visual observation spaces.
Model-based offline RL algorithms have achieved state of the art results in
state based tasks and have strong theoretical guarantees. However, they rely
crucially on the ability to quantify uncertainty in the model predictions,
which is particularly challenging with image observations. To overcome this
challenge, we propose to learn a latent-state dynamics model, and represent the
uncertainty in the latent space. Our approach is both tractable in practice and
corresponds to maximizing a lower bound of the ELBO in the unknown POMDP. In
experiments on a range of challenging image-based locomotion and manipulation
tasks, we find that our algorithm significantly outperforms previous offline
model-free RL methods as well as state-of-the-art online visual model-based RL
methods. Moreover, we also find that our approach excels on an image-based
drawer closing task on a real robot using a pre-existing dataset. All results
including videos can be found online at https://sites.google.com/view/lompo/ .
Related papers
- D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets [32.496818080222646]
We propose a new approach to model-based offline reinforcement learning.
We provide a theoretical guarantee of model uncertainty and performance bound of SeMOPO.
Experimental results show that our method substantially outperforms all baseline methods.
arXiv Detail & Related papers (2024-06-13T15:16:38Z) - MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z) - Finetuning Offline World Models in the Real World [13.46766121896684]
Reinforcement Learning (RL) is notoriously data-inefficient, which makes training on a real robot difficult.
offline RL has been proposed as a framework for training RL policies on pre-existing datasets without any online interaction.
In this work, we consider the problem of pretraining a world model with offline data collected on a real robot, and then finetuning the model on online data collected by planning with the learned model.
arXiv Detail & Related papers (2023-10-24T17:46:12Z) - Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning [93.99377042564919]
This paper tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages.
The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the "test bed" for offline policies.
We introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces.
arXiv Detail & Related papers (2023-05-24T15:45:35Z) - Challenges and Opportunities in Offline Reinforcement Learning from
Visual Observations [58.758928936316785]
offline reinforcement learning from visual observations with continuous action spaces remains under-explored.
We show that modifications to two popular vision-based online reinforcement learning algorithms suffice to outperform existing offline RL methods.
arXiv Detail & Related papers (2022-06-09T22:08:47Z) - A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open
Problems [0.0]
Reinforcement learning (RL) has experienced a dramatic increase in popularity.
There is still a wide range of domains inaccessible to RL due to the high cost and danger of interacting with the environment.
offline RL is a paradigm that learns exclusively from static datasets of previously collected interactions.
arXiv Detail & Related papers (2022-03-02T20:05:11Z) - A Workflow for Offline Model-Free Robotic Reinforcement Learning [117.07743713715291]
offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction.
We develop a practical workflow for using offline RL analogous to the relatively well-understood for supervised learning problems.
We demonstrate the efficacy of this workflow in producing effective policies without any online tuning.
arXiv Detail & Related papers (2021-09-22T16:03:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.