Curricular Subgoals for Inverse Reinforcement Learning
- URL: http://arxiv.org/abs/2306.08232v1
- Date: Wed, 14 Jun 2023 04:06:41 GMT
- Title: Curricular Subgoals for Inverse Reinforcement Learning
- Authors: Shunyu Liu, Yunpeng Qing, Shuqi Xu, Hongyan Wu, Jiangtao Zhang,
Jingyuan Cong, Tianhao Chen, Yunfu Liu, Mingli Song
- Abstract summary: Inverse Reinforcement Learning (IRL) aims to reconstruct the reward function from expert demonstrations to facilitate policy learning.
Existing IRL methods mainly focus on learning global reward functions to minimize the trajectory difference between the imitator and the expert.
We propose a novel Curricular Subgoal-based Inverse Reinforcement Learning framework, that explicitly disentangles one task with several local subgoals to guide agent imitation.
- Score: 21.038691420095525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inverse Reinforcement Learning (IRL) aims to reconstruct the reward function
from expert demonstrations to facilitate policy learning, and has demonstrated
its remarkable success in imitation learning. To promote expert-like behavior,
existing IRL methods mainly focus on learning global reward functions to
minimize the trajectory difference between the imitator and the expert.
However, these global designs are still limited by the redundant noise and
error propagation problems, leading to the unsuitable reward assignment and
thus downgrading the agent capability in complex multi-stage tasks. In this
paper, we propose a novel Curricular Subgoal-based Inverse Reinforcement
Learning (CSIRL) framework, that explicitly disentangles one task with several
local subgoals to guide agent imitation. Specifically, CSIRL firstly introduces
decision uncertainty of the trained agent over expert trajectories to
dynamically select subgoals, which directly determines the exploration boundary
of different task stages. To further acquire local reward functions for each
stage, we customize a meta-imitation objective based on these curricular
subgoals to train an intrinsic reward generator. Experiments on the D4RL and
autonomous driving benchmarks demonstrate that the proposed methods yields
results superior to the state-of-the-art counterparts, as well as better
interpretability. Our code is available at https://github.com/Plankson/CSIRL.
Related papers
- Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching [23.600285251963395]
In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment.
Traditionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and a learner optimize the reward through repeated RL procedures.
We propose a novel approach to IRL by direct policy optimization, exploiting a linear factorization of the return as the inner product of successor features and a reward vector.
arXiv Detail & Related papers (2024-11-11T14:05:50Z) - Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement
Learning with Sub-optimal Demonstrations [25.536792010283566]
Inverse reinforcement learning (IRL) aims to explicitly infer an underlying reward function based on collected expert demonstrations.
We introduce the Distance-rank Aware Sequential Reward Learning (DRASRL) framework.
Our framework demonstrates significant performance improvements over previous SOTA methods.
arXiv Detail & Related papers (2023-10-13T02:38:35Z) - ReProHRL: Towards Multi-Goal Navigation in the Real World using
Hierarchical Agents [1.3194749469702445]
We present Ready for Production Hierarchical RL (ReProHRL) that divides tasks with hierarchical multi-goal navigation guided by reinforcement learning.
We also use object detectors as a pre-processing step to learn multi-goal navigation and transfer it to the real world.
For the real-world implementation and proof of concept demonstration, we deploy the proposed method on a nano-drone named Crazyflie with a front camera.
arXiv Detail & Related papers (2023-08-17T02:23:59Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Reinforcement Learning Agent Training with Goals for Real World Tasks [3.747737951407512]
Reinforcement Learning (RL) is a promising approach for solving various control, optimization, and sequential decision making tasks.
We propose a specification language (Inkling Goal Specification) for complex control and optimization tasks.
We include a set of experiments showing that the proposed method provides great ease of use to specify a wide range of real world tasks.
arXiv Detail & Related papers (2021-07-21T23:21:16Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - f-IRL: Inverse Reinforcement Learning via State Marginal Matching [13.100127636586317]
We propose a method for learning the reward function (and the corresponding policy) to match the expert state density.
We present an algorithm, f-IRL, that recovers a stationary reward function from the expert density by gradient descent.
Our method outperforms adversarial imitation learning methods in terms of sample efficiency and the required number of expert trajectories.
arXiv Detail & Related papers (2020-11-09T19:37:48Z) - Soft Expert Reward Learning for Vision-and-Language Navigation [94.86954695912125]
Vision-and-Language Navigation (VLN) requires an agent to find a specified spot in an unseen environment by following natural language instructions.
We introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.
arXiv Detail & Related papers (2020-07-21T14:17:36Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.