Pre-Training for Robots: Offline RL Enables Learning New Tasks from a
Handful of Trials
- URL: http://arxiv.org/abs/2210.05178v3
- Date: Sat, 23 Sep 2023 23:25:32 GMT
- Title: Pre-Training for Robots: Offline RL Enables Learning New Tasks from a
Handful of Trials
- Authors: Aviral Kumar, Anikait Singh, Frederik Ebert, Mitsuhiko Nakamoto,
Yanlai Yang, Chelsea Finn, Sergey Levine
- Abstract summary: We present a framework based on offline RL that attempts to effectively learn new tasks.
It combines pre-training on existing robotic datasets with rapid fine-tuning on a new task, with as few as 10 demonstrations.
To our knowledge, PTR is the first RL method that succeeds at learning new tasks in a new domain on a real WidowX robot.
- Score: 97.95400776235736
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Progress in deep learning highlights the tremendous potential of utilizing
diverse robotic datasets for attaining effective generalization and makes it
enticing to consider leveraging broad datasets for attaining robust
generalization in robotic learning as well. However, in practice, we often want
to learn a new skill in a new environment that is unlikely to be contained in
the prior data. Therefore we ask: how can we leverage existing diverse offline
datasets in combination with small amounts of task-specific data to solve new
tasks, while still enjoying the generalization benefits of training on large
amounts of data? In this paper, we demonstrate that end-to-end offline RL can
be an effective approach for doing this, without the need for any
representation learning or vision-based pre-training. We present pre-training
for robots (PTR), a framework based on offline RL that attempts to effectively
learn new tasks by combining pre-training on existing robotic datasets with
rapid fine-tuning on a new task, with as few as 10 demonstrations. PTR utilizes
an existing offline RL method, conservative Q-learning (CQL), but extends it to
include several crucial design decisions that enable PTR to actually work and
outperform a variety of prior methods. To our knowledge, PTR is the first RL
method that succeeds at learning new tasks in a new domain on a real WidowX
robot with as few as 10 task demonstrations, by effectively leveraging an
existing dataset of diverse multi-task robot data collected in a variety of toy
kitchens. We also demonstrate that PTR can enable effective autonomous
fine-tuning and improvement in a handful of trials, without needing any
demonstrations. An accompanying overview video can be found in the
supplementary material and at thi URL: https://sites.google.com/view/ptr-final/
Related papers
- EXTRACT: Efficient Policy Learning by Extracting Transferable Robot Skills from Offline Data [22.471559284344462]
Most reinforcement learning (RL) methods focus on learning optimal policies over low-level action spaces.
While these methods can perform well in their training environments, they lack the flexibility to transfer to new tasks.
We demonstrate through experiments in sparse, image-based, robot manipulation environments that can more quickly learn new tasks than prior works.
arXiv Detail & Related papers (2024-06-25T17:50:03Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Robotic Offline RL from Internet Videos via Value-Function Pre-Training [67.44673316943475]
We develop a system for leveraging large-scale human video datasets in robotic offline RL.
We show that value learning on video datasets learns representations more conducive to downstream robotic offline RL than other approaches.
arXiv Detail & Related papers (2023-09-22T17:59:14Z) - Efficient Robotic Manipulation Through Offline-to-Online Reinforcement
Learning and Goal-Aware State Information [5.604859261995801]
We propose a unified offline-to-online RL framework that resolves the transition performance drop issue.
We introduce goal-aware state information to the RL agent, which can greatly reduce task complexity and accelerate policy learning.
Our framework achieves great training efficiency and performance compared with the state-of-the-art methods in multiple robotic manipulation tasks.
arXiv Detail & Related papers (2021-10-21T05:34:25Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - COG: Connecting New Skills to Past Experience with Offline Reinforcement
Learning [78.13740204156858]
We show that we can reuse prior data to extend new skills simply through dynamic programming.
We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task.
We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands.
arXiv Detail & Related papers (2020-10-27T17:57:29Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.