Finetuning Offline World Models in the Real World
- URL: http://arxiv.org/abs/2310.16029v1
- Date: Tue, 24 Oct 2023 17:46:12 GMT
- Title: Finetuning Offline World Models in the Real World
- Authors: Yunhai Feng, Nicklas Hansen, Ziyan Xiong, Chandramouli Rajagopalan,
Xiaolong Wang
- Abstract summary: Reinforcement Learning (RL) is notoriously data-inefficient, which makes training on a real robot difficult.
offline RL has been proposed as a framework for training RL policies on pre-existing datasets without any online interaction.
In this work, we consider the problem of pretraining a world model with offline data collected on a real robot, and then finetuning the model on online data collected by planning with the learned model.
- Score: 13.46766121896684
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement Learning (RL) is notoriously data-inefficient, which makes
training on a real robot difficult. While model-based RL algorithms (world
models) improve data-efficiency to some extent, they still require hours or
days of interaction to learn skills. Recently, offline RL has been proposed as
a framework for training RL policies on pre-existing datasets without any
online interaction. However, constraining an algorithm to a fixed dataset
induces a state-action distribution shift between training and inference, and
limits its applicability to new tasks. In this work, we seek to get the best of
both worlds: we consider the problem of pretraining a world model with offline
data collected on a real robot, and then finetuning the model on online data
collected by planning with the learned model. To mitigate extrapolation errors
during online interaction, we propose to regularize the planner at test-time by
balancing estimated returns and (epistemic) model uncertainty. We evaluate our
method on a variety of visuo-motor control tasks in simulation and on a real
robot, and find that our method enables few-shot finetuning to seen and unseen
tasks even when offline data is limited. Videos, code, and data are available
at https://yunhaifeng.com/FOWM .
Related papers
- MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement
Learning [45.73223325256312]
We investigate whether internal models learned by modern model-based RL algorithms can be leveraged to solve new, distinctly different tasks faster.
We propose Model-Based Cross-Task Transfer (XTRA), a framework for sample-efficient online RL with scalable pretraining and finetuning of learned world models.
arXiv Detail & Related papers (2022-10-19T17:57:06Z) - Efficient Robotic Manipulation Through Offline-to-Online Reinforcement
Learning and Goal-Aware State Information [5.604859261995801]
We propose a unified offline-to-online RL framework that resolves the transition performance drop issue.
We introduce goal-aware state information to the RL agent, which can greatly reduce task complexity and accelerate policy learning.
Our framework achieves great training efficiency and performance compared with the state-of-the-art methods in multiple robotic manipulation tasks.
arXiv Detail & Related papers (2021-10-21T05:34:25Z) - A Workflow for Offline Model-Free Robotic Reinforcement Learning [117.07743713715291]
offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction.
We develop a practical workflow for using offline RL analogous to the relatively well-understood for supervised learning problems.
We demonstrate the efficacy of this workflow in producing effective policies without any online tuning.
arXiv Detail & Related papers (2021-09-22T16:03:29Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.