Related papers: Option-Aware Adversarial Inverse Reinforcement Learning for Robotic Control

Option-Aware Adversarial Inverse Reinforcement Learning for Robotic Control

URL: http://arxiv.org/abs/2210.01969v5
Date: Fri, 26 May 2023 03:19:00 GMT
Title: Option-Aware Adversarial Inverse Reinforcement Learning for Robotic Control
Authors: Jiayu Chen, Tian Lan, Vaneet Aggarwal
Abstract summary: Hierarchical Imitation Learning (HIL) has been proposed to recover highly-complex behaviors in long-horizon tasks from expert demonstrations. We develop a novel HIL algorithm based on Adversarial Inverse Reinforcement Learning. We also propose a Variational Autoencoder framework for learning with our objectives in an end-to-end fashion.
Score: 44.77500987121531
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hierarchical Imitation Learning (HIL) has been proposed to recover highly-complex behaviors in long-horizon tasks from expert demonstrations by modeling the task hierarchy with the option framework. Existing methods either overlook the causal relationship between the subtask and its corresponding policy or cannot learn the policy in an end-to-end fashion, which leads to suboptimality. In this work, we develop a novel HIL algorithm based on Adversarial Inverse Reinforcement Learning and adapt it with the Expectation-Maximization algorithm in order to directly recover a hierarchical policy from the unannotated demonstrations. Further, we introduce a directed information term to the objective function to enhance the causality and propose a Variational Autoencoder framework for learning with our objectives in an end-to-end fashion. Theoretical justifications and evaluations on challenging robotic control tasks are provided to show the superiority of our algorithm. The codes are available at https://github.com/LucasCJYSDL/HierAIRL.

Related papers

Feature-Based vs. GAN-Based Learning from Demonstrations: When and Why [50.191655141020505]
This survey provides a comparative analysis of feature-based and GAN-based approaches to learning from demonstrations.<n>We argue that the dichotomy between feature-based and GAN-based methods is increasingly nuanced.
arXiv Detail & Related papers (2025-07-08T11:45:51Z)
MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback and Dynamic Distance Constraint [40.3872201560003]
Hierarchical reinforcement learning (HRL) uses a hierarchical framework that divides tasks into subgoals and completes them sequentially. Current methods struggle to find suitable subgoals for ensuring a stable learning process. We propose a general hierarchical reinforcement learning framework incorporating human feedback and dynamic distance constraints.
arXiv Detail & Related papers (2024-02-22T03:11:09Z)
RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z)
Chain-of-Thought Predictive Control [32.30974063877643]
We study generalizable policy learning from demonstrations for complex low-level control. We propose a novel hierarchical imitation learning method that utilizes sub-optimal demos.
arXiv Detail & Related papers (2023-04-03T07:59:13Z)
Hierarchical Imitation Learning with Vector Quantized Models [77.67190661002691]
We propose to use reinforcement learning to identify subgoals in expert trajectories. We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning. In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art.
arXiv Detail & Related papers (2023-01-30T15:04:39Z)
Attaining Interpretability in Reinforcement Learning via Hierarchical Primitive Composition [3.1078562713129765]
We propose a novel hierarchical reinforcement learning algorithm that mitigates the aforementioned issues by decomposing the original task in a hierarchy. We show how the proposed scheme can be employed in practice by solving a pick and place task with a 6 DoF manipulator.
arXiv Detail & Related papers (2021-10-05T05:59:31Z)
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems. We propose a novel method for computing the normalized maximum likelihood (NML) distribution. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z)
Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon. The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization. Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z)
Outcome-Driven Reinforcement Learning via Variational Inference [95.82770132618862]
We discuss a new perspective on reinforcement learning, recasting it as the problem of inferring actions that achieve desired outcomes, rather than a problem of maximizing rewards. To solve the resulting outcome-directed inference problem, we establish a novel variational inference formulation that allows us to derive a well-shaped reward function. We empirically demonstrate that this method eliminates the need to design reward functions and leads to effective goal-directed behaviors.
arXiv Detail & Related papers (2021-04-20T18:16:21Z)
Learning Off-Policy with Online Planning [18.63424441772675]
We investigate a novel instantiation of H-step lookahead with a learned model and a terminal value function. We show the flexibility of LOOP to incorporate safety constraints during deployment with a set of navigation environments.
arXiv Detail & Related papers (2020-08-23T16:18:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.