Context-Hierarchy Inverse Reinforcement Learning
- URL: http://arxiv.org/abs/2202.12597v1
- Date: Fri, 25 Feb 2022 10:29:05 GMT
- Title: Context-Hierarchy Inverse Reinforcement Learning
- Authors: Wei Gao, David Hsu, Wee Sun Lee
- Abstract summary: inverse reinforcement learning (IRL) agent learns to act intelligently by observing expert demonstrations and learning the expert's underlying reward function.
We present Context Hierarchy IRL(CHIRL), a new IRL algorithm that exploits the context to scale up IRL and learn reward functions of complex behaviors.
Experiments on benchmark tasks, including a large scale autonomous driving task in the CARLA simulator, show promising results in scaling up IRL for tasks with complex reward functions.
- Score: 30.71220625227959
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: An inverse reinforcement learning (IRL) agent learns to act intelligently by
observing expert demonstrations and learning the expert's underlying reward
function. Although learning the reward functions from demonstrations has
achieved great success in various tasks, several other challenges are mostly
ignored. Firstly, existing IRL methods try to learn the reward function from
scratch without relying on any prior knowledge. Secondly, traditional IRL
methods assume the reward functions are homogeneous across all the
demonstrations. Some existing IRL methods managed to extend to the
heterogeneous demonstrations. However, they still assume one hidden variable
that affects the behavior and learn the underlying hidden variable together
with the reward from demonstrations. To solve these issues, we present Context
Hierarchy IRL(CHIRL), a new IRL algorithm that exploits the context to scale up
IRL and learn reward functions of complex behaviors. CHIRL models the context
hierarchically as a directed acyclic graph; it represents the reward function
as a corresponding modular deep neural network that associates each network
module with a node of the context hierarchy. The context hierarchy and the
modular reward representation enable data sharing across multiple contexts and
state abstraction, significantly improving the learning performance. CHIRL has
a natural connection with hierarchical task planning when the context hierarchy
represents subtask decomposition. It enables to incorporate the prior knowledge
of causal dependencies of subtasks and make it capable of solving large complex
tasks by decoupling it into several subtasks and conquering each subtask to
solve the original task. Experiments on benchmark tasks, including a large
scale autonomous driving task in the CARLA simulator, show promising results in
scaling up IRL for tasks with complex reward functions.
Related papers
- Automated Feature Selection for Inverse Reinforcement Learning [7.278033100480175]
Inverse reinforcement learning (IRL) is an imitation learning approach to learning reward functions from expert demonstrations.
We propose a method that employs basis functions to form a candidate set of features.
We demonstrate the approach's effectiveness by recovering reward functions that capture expert policies.
arXiv Detail & Related papers (2024-03-22T10:05:21Z) - Curricular Subgoals for Inverse Reinforcement Learning [21.038691420095525]
Inverse Reinforcement Learning (IRL) aims to reconstruct the reward function from expert demonstrations to facilitate policy learning.
Existing IRL methods mainly focus on learning global reward functions to minimize the trajectory difference between the imitator and the expert.
We propose a novel Curricular Subgoal-based Inverse Reinforcement Learning framework, that explicitly disentangles one task with several local subgoals to guide agent imitation.
arXiv Detail & Related papers (2023-06-14T04:06:41Z) - BC-IRL: Learning Generalizable Reward Functions from Demonstrations [51.535870379280155]
inverse reinforcement learning method learns reward functions that generalize better when compared to maximum-entropy IRL approaches.
We show that BC-IRL learns rewards that generalize better on an illustrative simple task and two continuous robotic control tasks, achieving over twice the success rate of baselines in challenging generalization settings.
arXiv Detail & Related papers (2023-03-28T17:57:20Z) - Reward Learning using Structural Motifs in Inverse Reinforcement
Learning [3.04585143845864]
Inverse Reinforcement Learning (textitIRL) problem has seen rapid evolution in the past few years, with important applications in domains like robotics, cognition, and health.
We explore the inefficacy of current IRL methods in learning an agent's reward function from expert trajectories depicting long-horizon, complex sequential tasks.
We propose a novel IRL method, SMIRL, that first learns the (approximate) structure of a task as a finite-state-automaton (FSA), then uses the structural motif to solve the IRL problem.
arXiv Detail & Related papers (2022-09-25T18:34:59Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Reward Shaping with Dynamic Trajectory Aggregation [7.6146285961466]
Potential-based reward shaping is a basic method for enriching rewards.
SARSA-RS learns the potential function and acquires it.
We propose a trajectory aggregation that uses subgoal series.
arXiv Detail & Related papers (2021-04-13T13:07:48Z) - Replacing Rewards with Examples: Example-Based Policy Search via
Recursive Classification [133.20816939521941]
In the standard Markov decision process formalism, users specify tasks by writing down a reward function.
In many scenarios, the user is unable to describe the task in words or numbers, but can readily provide examples of what the world would look like if the task were solved.
Motivated by this observation, we derive a control algorithm that aims to visit states that have a high probability of leading to successful outcomes, given only examples of successful outcome states.
arXiv Detail & Related papers (2021-03-23T16:19:55Z) - Learning Task Decomposition with Ordered Memory Policy Network [73.3813423684999]
We propose Ordered Memory Policy Network (OMPN) to discover subtask hierarchy by learning from demonstration.
OMPN can be applied to partially observable environments and still achieve higher task decomposition performance.
Our visualization confirms that the subtask hierarchy can emerge in our model.
arXiv Detail & Related papers (2021-03-19T18:13:35Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally
Extended Actions [37.66289166905027]
Explicit engineering of reward functions for given environments has been a major hindrance to reinforcement learning methods.
We propose an algorithm that learns hierarchical disentangled rewards with a policy over options.
arXiv Detail & Related papers (2020-02-20T22:21:41Z) - Hierarchical Reinforcement Learning as a Model of Human Task
Interleaving [60.95424607008241]
We develop a hierarchical model of supervisory control driven by reinforcement learning.
The model reproduces known empirical effects of task interleaving.
The results support hierarchical RL as a plausible model of task interleaving.
arXiv Detail & Related papers (2020-01-04T17:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.