Learning from Guided Play: A Scheduled Hierarchical Approach for
Improving Exploration in Adversarial Imitation Learning
- URL: http://arxiv.org/abs/2112.08932v1
- Date: Thu, 16 Dec 2021 14:58:08 GMT
- Title: Learning from Guided Play: A Scheduled Hierarchical Approach for
Improving Exploration in Adversarial Imitation Learning
- Authors: Trevor Ablett, Bryan Chan, Jonathan Kelly
- Abstract summary: We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks.
This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible.
- Score: 7.51557557629519
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effective exploration continues to be a significant challenge that prevents
the deployment of reinforcement learning for many physical systems. This is
particularly true for systems with continuous and high-dimensional state and
action spaces, such as robotic manipulators. The challenge is accentuated in
the sparse rewards setting, where the low-level state information required for
the design of dense rewards is unavailable. Adversarial imitation learning
(AIL) can partially overcome this barrier by leveraging expert-generated
demonstrations of optimal behaviour and providing, essentially, a replacement
for dense reward information. Unfortunately, the availability of expert
demonstrations does not necessarily improve an agent's capability to explore
effectively and, as we empirically show, can lead to inefficient or stagnated
learning. We present Learning from Guided Play (LfGP), a framework in which we
leverage expert demonstrations of, in addition to a main task, multiple
auxiliary tasks. Subsequently, a hierarchical model is used to learn each task
reward and policy through a modified AIL procedure, in which exploration of all
tasks is enforced via a scheduler composing different tasks together. This
affords many benefits: learning efficiency is improved for main tasks with
challenging bottleneck transitions, expert data becomes reusable between tasks,
and transfer learning through the reuse of learned auxiliary task models
becomes possible. Our experimental results in a challenging multitask robotic
manipulation domain indicate that our method compares favourably to supervised
imitation learning and to a state-of-the-art AIL method. Code is available at
https://github.com/utiasSTARS/lfgp.
Related papers
- Learning from Guided Play: Improving Exploration for Adversarial
Imitation Learning with Simple Auxiliary Tasks [8.320969283401233]
We show that the standard, naive approach to exploration can manifest as a suboptimal local maximum.
We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of multiple exploratory, auxiliary tasks.
arXiv Detail & Related papers (2022-12-30T20:38:54Z) - Reinforcement learning with Demonstrations from Mismatched Task under
Sparse Reward [7.51772160511614]
Reinforcement learning often suffer from the sparse reward issue in real-world robotics problems.
Prior works often assume that the learning agent and the expert aim to accomplish the same task, which requires collecting new data for every new task.
In this paper, we consider the case where the target task is mismatched from but similar with that of the expert.
Existing LfD methods can not effectively guide learning in mismatched new tasks with sparse rewards.
arXiv Detail & Related papers (2022-12-03T02:24:59Z) - Learning and Retrieval from Prior Data for Skill-based Imitation
Learning [47.59794569496233]
We develop a skill-based imitation learning framework that extracts temporally extended sensorimotor skills from prior data.
We identify several key design choices that significantly improve performance on novel tasks.
arXiv Detail & Related papers (2022-10-20T17:34:59Z) - Unsupervised Reinforcement Learning for Transferable Manipulation Skill
Discovery [22.32327908453603]
Current reinforcement learning (RL) in robotics often experiences difficulty in generalizing to new downstream tasks.
We propose a framework that pre-trains the agent in a task-agnostic manner without access to the task-specific reward.
We show that our approach achieves the most diverse interacting behavior and significantly improves sample efficiency in downstream tasks.
arXiv Detail & Related papers (2022-04-29T06:57:46Z) - Hierarchical Skills for Efficient Exploration [70.62309286348057]
In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration.
Prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design.
We propose a hierarchical skill learning framework that acquires skills of varying complexity in an unsupervised manner.
arXiv Detail & Related papers (2021-10-20T22:29:32Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale [103.7609761511652]
We show how a large-scale collective robotic learning system can acquire a repertoire of behaviors simultaneously.
New tasks can be continuously instantiated from previously learned tasks.
We train and evaluate our system on a set of 12 real-world tasks with data collected from 7 robots.
arXiv Detail & Related papers (2021-04-16T16:38:02Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z) - Gradient Surgery for Multi-Task Learning [119.675492088251]
Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks.
The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood.
We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient.
arXiv Detail & Related papers (2020-01-19T06:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.