MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning
- URL: http://arxiv.org/abs/2401.03306v1
- Date: Sat, 6 Jan 2024 21:04:31 GMT
- Title: MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning
- Authors: Rafael Rafailov, Kyle Hatch, Victor Kolev, John D. Martin, Mariano
Phielipp, Chelsea Finn
- Abstract summary: We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
- Score: 52.101643259906915
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of offline pre-training and online fine-tuning for
reinforcement learning from high-dimensional observations in the context of
realistic robot tasks. Recent offline model-free approaches successfully use
online fine-tuning to either improve the performance of the agent over the data
collection policy or adapt to novel tasks. At the same time, model-based RL
algorithms have achieved significant progress in sample efficiency and the
complexity of the tasks they can solve, yet remain under-utilized in the
fine-tuning setting. In this work, we argue that existing model-based offline
RL methods are not suitable for offline-to-online fine-tuning in
high-dimensional domains due to issues with distribution shifts, off-dynamics
data, and non-stationary rewards. We propose an on-policy model-based method
that can efficiently reuse prior data through model-based value expansion and
policy regularization, while preventing model exploitation by controlling
epistemic uncertainty. We find that our approach successfully solves tasks from
the MetaWorld benchmark, as well as the Franka Kitchen robot manipulation
environment completely from images. To the best of our knowledge, MOTO is the
first method to solve this environment from pixels.
Related papers
- Deep autoregressive density nets vs neural ensembles for model-based
offline reinforcement learning [2.9158689853305693]
We consider a model-based reinforcement learning algorithm that infers the system dynamics from the available data and performs policy optimization on imaginary model rollouts.
This approach is vulnerable to exploiting model errors which can lead to catastrophic failures on the real system.
We show that better performance can be obtained with a single well-calibrated autoregressive model on the D4RL benchmark.
arXiv Detail & Related papers (2024-02-05T10:18:15Z) - Finetuning Offline World Models in the Real World [13.46766121896684]
Reinforcement Learning (RL) is notoriously data-inefficient, which makes training on a real robot difficult.
offline RL has been proposed as a framework for training RL policies on pre-existing datasets without any online interaction.
In this work, we consider the problem of pretraining a world model with offline data collected on a real robot, and then finetuning the model on online data collected by planning with the learned model.
arXiv Detail & Related papers (2023-10-24T17:46:12Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - A Unified Framework for Alternating Offline Model Training and Policy
Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z) - Fully Decentralized Model-based Policy Optimization for Networked
Systems [23.46407780093797]
This work aims to improve data efficiency of multi-agent control by model-based learning.
We consider networked systems where agents are cooperative and communicate only locally with their neighbors.
In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts.
arXiv Detail & Related papers (2022-07-13T23:52:14Z) - UMBRELLA: Uncertainty-Aware Model-Based Offline Reinforcement Learning
Leveraging Planning [1.1339580074756188]
Offline reinforcement learning (RL) provides a framework for learning decision-making from offline data.
Self-driving vehicles (SDV) learn a policy, which potentially even outperforms the behavior in the sub-optimal data set.
This motivates the use of model-based offline RL approaches, which leverage planning.
arXiv Detail & Related papers (2021-11-22T10:37:52Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.