Option-Aware Adversarial Inverse Reinforcement Learning for Robotic
Control
- URL: http://arxiv.org/abs/2210.01969v5
- Date: Fri, 26 May 2023 03:19:00 GMT
- Title: Option-Aware Adversarial Inverse Reinforcement Learning for Robotic
Control
- Authors: Jiayu Chen, Tian Lan, Vaneet Aggarwal
- Abstract summary: Hierarchical Imitation Learning (HIL) has been proposed to recover highly-complex behaviors in long-horizon tasks from expert demonstrations.
We develop a novel HIL algorithm based on Adversarial Inverse Reinforcement Learning.
We also propose a Variational Autoencoder framework for learning with our objectives in an end-to-end fashion.
- Score: 44.77500987121531
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical Imitation Learning (HIL) has been proposed to recover
highly-complex behaviors in long-horizon tasks from expert demonstrations by
modeling the task hierarchy with the option framework. Existing methods either
overlook the causal relationship between the subtask and its corresponding
policy or cannot learn the policy in an end-to-end fashion, which leads to
suboptimality. In this work, we develop a novel HIL algorithm based on
Adversarial Inverse Reinforcement Learning and adapt it with the
Expectation-Maximization algorithm in order to directly recover a hierarchical
policy from the unannotated demonstrations. Further, we introduce a directed
information term to the objective function to enhance the causality and propose
a Variational Autoencoder framework for learning with our objectives in an
end-to-end fashion. Theoretical justifications and evaluations on challenging
robotic control tasks are provided to show the superiority of our algorithm.
The codes are available at https://github.com/LucasCJYSDL/HierAIRL.
Related papers
- MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback
and Dynamic Distance Constraint [40.3872201560003]
Hierarchical reinforcement learning (HRL) uses a hierarchical framework that divides tasks into subgoals and completes them sequentially.
Current methods struggle to find suitable subgoals for ensuring a stable learning process.
We propose a general hierarchical reinforcement learning framework incorporating human feedback and dynamic distance constraints.
arXiv Detail & Related papers (2024-02-22T03:11:09Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Chain-of-Thought Predictive Control [32.30974063877643]
We study generalizable policy learning from demonstrations for complex low-level control.
We propose a novel hierarchical imitation learning method that utilizes sub-optimal demos.
arXiv Detail & Related papers (2023-04-03T07:59:13Z) - Hierarchical Imitation Learning with Vector Quantized Models [77.67190661002691]
We propose to use reinforcement learning to identify subgoals in expert trajectories.
We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning.
In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art.
arXiv Detail & Related papers (2023-01-30T15:04:39Z) - Attaining Interpretability in Reinforcement Learning via Hierarchical
Primitive Composition [3.1078562713129765]
We propose a novel hierarchical reinforcement learning algorithm that mitigates the aforementioned issues by decomposing the original task in a hierarchy.
We show how the proposed scheme can be employed in practice by solving a pick and place task with a 6 DoF manipulator.
arXiv Detail & Related papers (2021-10-05T05:59:31Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization.
Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z) - Outcome-Driven Reinforcement Learning via Variational Inference [95.82770132618862]
We discuss a new perspective on reinforcement learning, recasting it as the problem of inferring actions that achieve desired outcomes, rather than a problem of maximizing rewards.
To solve the resulting outcome-directed inference problem, we establish a novel variational inference formulation that allows us to derive a well-shaped reward function.
We empirically demonstrate that this method eliminates the need to design reward functions and leads to effective goal-directed behaviors.
arXiv Detail & Related papers (2021-04-20T18:16:21Z) - Learning Off-Policy with Online Planning [18.63424441772675]
We investigate a novel instantiation of H-step lookahead with a learned model and a terminal value function.
We show the flexibility of LOOP to incorporate safety constraints during deployment with a set of navigation environments.
arXiv Detail & Related papers (2020-08-23T16:18:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.