Adversarial Option-Aware Hierarchical Imitation Learning
- URL: http://arxiv.org/abs/2106.05530v2
- Date: Fri, 11 Jun 2021 05:43:42 GMT
- Title: Adversarial Option-Aware Hierarchical Imitation Learning
- Authors: Mingxuan Jing, Wenbing Huang, Fuchun Sun, Xiaojian Ma, Tao Kong,
Chuang Gan, Lei Li
- Abstract summary: We propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization.
Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
- Score: 89.92994158193237
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: It has been a challenge to learning skills for an agent from long-horizon
unannotated demonstrations. Existing approaches like Hierarchical Imitation
Learning(HIL) are prone to compounding errors or suboptimal solutions. In this
paper, we propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train
the policy via generative adversarial optimization. In particular, we propose
an Expectation-Maximization(EM)-style algorithm: an E-step that samples the
options of expert conditioned on the current learned policy, and an M-step that
updates the low- and high-level policies of agent simultaneously to minimize
the newly proposed option-occupancy measurement between the expert and the
agent. We theoretically prove the convergence of the proposed algorithm.
Experiments show that Option-GAIL outperforms other counterparts consistently
across a variety of tasks.
Related papers
- A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning [54.20447310988282]
We present a meta-algorithm alternating between regret minimization algorithms instanced at different (high and low) temporal abstractions.
At the higher level, we treat the problem as a Semi-Markov Decision Process (SMDP), with fixed low-level policies, while at a lower level, inner option policies are learned with a fixed high-level policy.
arXiv Detail & Related papers (2024-06-21T13:17:33Z) - A Unified Algorithm Framework for Unsupervised Discovery of Skills based
on Determinantal Point Process [53.86223883060367]
We show that diversity and coverage in unsupervised option discovery can indeed be unified under the same mathematical framework.
Our proposed algorithm, ODPP, has undergone extensive evaluation on challenging tasks created with Mujoco and Atari.
arXiv Detail & Related papers (2022-12-01T01:40:03Z) - Option-Aware Adversarial Inverse Reinforcement Learning for Robotic
Control [44.77500987121531]
Hierarchical Imitation Learning (HIL) has been proposed to recover highly-complex behaviors in long-horizon tasks from expert demonstrations.
We develop a novel HIL algorithm based on Adversarial Inverse Reinforcement Learning.
We also propose a Variational Autoencoder framework for learning with our objectives in an end-to-end fashion.
arXiv Detail & Related papers (2022-10-05T00:28:26Z) - The Paradox of Choice: Using Attention in Hierarchical Reinforcement
Learning [59.777127897688594]
We present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options.
We investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices.
arXiv Detail & Related papers (2022-01-24T13:18:02Z) - Attention Option-Critic [56.50123642237106]
We propose an attention-based extension to the option-critic framework.
We show that this leads to behaviorally diverse options which are also capable of state abstraction.
We also demonstrate the more efficient, interpretable, and reusable nature of the learned options in comparison with option-critic.
arXiv Detail & Related papers (2022-01-07T18:44:28Z) - Flexible Option Learning [69.78645585943592]
We revisit and extend intra-option learning in the context of deep reinforcement learning.
We obtain significant improvements in performance and data-efficiency across a wide variety of domains.
arXiv Detail & Related papers (2021-12-06T15:07:48Z) - Online Baum-Welch algorithm for Hierarchical Imitation Learning [7.271970309320002]
We propose an online algorithm to perform hierarchical imitation learning in the options framework.
We show that this approach works well in both discrete and continuous environments.
arXiv Detail & Related papers (2021-03-22T22:03:25Z) - Learning Diverse Options via InfoMax Termination Critic [0.0]
We consider the problem of autonomously learning reusable temporally extended actions, or options, in reinforcement learning.
Motivated by the recent success of mutual information based skill learning, we hypothesize that more diverse options are more reusable.
We propose a method for learning gradient of options by maximizing MI between options and corresponding state transitions.
arXiv Detail & Related papers (2020-10-06T14:21:05Z) - SOAC: The Soft Option Actor-Critic Architecture [25.198302636265286]
Methods have been proposed for concurrently learning low-level intra-option policies and high-level option selection policy.
Existing methods typically suffer from two major challenges: ineffective exploration and unstable updates.
We present a novel and stable off-policy approach that builds on the maximum entropy model to address these challenges.
arXiv Detail & Related papers (2020-06-25T13:06:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.