Parrot: Data-Driven Behavioral Priors for Reinforcement Learning
- URL: http://arxiv.org/abs/2011.10024v1
- Date: Thu, 19 Nov 2020 18:47:40 GMT
- Title: Parrot: Data-Driven Behavioral Priors for Reinforcement Learning
- Authors: Avi Singh, Huihan Liu, Gaoyue Zhou, Albert Yu, Nicholas Rhinehart,
Sergey Levine
- Abstract summary: We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
- Score: 79.32403825036792
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning provides a general framework for flexible decision
making and control, but requires extensive data collection for each new task
that an agent needs to learn. In other machine learning fields, such as natural
language processing or computer vision, pre-training on large, previously
collected datasets to bootstrap learning for new tasks has emerged as a
powerful paradigm to reduce data requirements when learning a new task. In this
paper, we ask the following question: how can we enable similarly useful
pre-training for RL agents? We propose a method for pre-training behavioral
priors that can capture complex input-output relationships observed in
successful trials from a wide range of previously seen tasks, and we show how
this learned prior can be used for rapidly learning new tasks without impeding
the RL agent's ability to try out novel behaviors. We demonstrate the
effectiveness of our approach in challenging robotic manipulation domains
involving image observations and sparse reward functions, where our method
outperforms prior works by a substantial margin.
Related papers
- Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.
By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - Learning and Retrieval from Prior Data for Skill-based Imitation
Learning [47.59794569496233]
We develop a skill-based imitation learning framework that extracts temporally extended sensorimotor skills from prior data.
We identify several key design choices that significantly improve performance on novel tasks.
arXiv Detail & Related papers (2022-10-20T17:34:59Z) - Hierarchical Few-Shot Imitation with Skill Transition Models [66.81252581083199]
Few-shot Imitation with Skill Transition Models (FIST) is an algorithm that extracts skills from offline data and utilizes them to generalize to unseen tasks.
We show that FIST is capable of generalizing to new tasks and substantially outperforms prior baselines in navigation experiments.
arXiv Detail & Related papers (2021-07-19T15:56:01Z) - COG: Connecting New Skills to Past Experience with Offline Reinforcement
Learning [78.13740204156858]
We show that we can reuse prior data to extend new skills simply through dynamic programming.
We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task.
We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands.
arXiv Detail & Related papers (2020-10-27T17:57:29Z) - A Survey on Self-supervised Pre-training for Sequential Transfer
Learning in Neural Networks [1.1802674324027231]
Self-supervised pre-training for transfer learning is becoming an increasingly popular technique to improve state-of-the-art results using unlabeled data.
We provide an overview of the taxonomy for self-supervised learning and transfer learning, and highlight some prominent methods for designing pre-training tasks across different domains.
arXiv Detail & Related papers (2020-07-01T22:55:48Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.