oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally
Extended Actions
- URL: http://arxiv.org/abs/2002.09043v1
- Date: Thu, 20 Feb 2020 22:21:41 GMT
- Title: oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally
Extended Actions
- Authors: David Venuto, Jhelum Chakravorty, Leonard Boussioux, Junhao Wang,
Gavin McCracken, Doina Precup
- Abstract summary: Explicit engineering of reward functions for given environments has been a major hindrance to reinforcement learning methods.
We propose an algorithm that learns hierarchical disentangled rewards with a policy over options.
- Score: 37.66289166905027
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Explicit engineering of reward functions for given environments has been a
major hindrance to reinforcement learning methods. While Inverse Reinforcement
Learning (IRL) is a solution to recover reward functions from demonstrations
only, these learned rewards are generally heavily \textit{entangled} with the
dynamics of the environment and therefore not portable or \emph{robust} to
changing environments. Modern adversarial methods have yielded some success in
reducing reward entanglement in the IRL setting. In this work, we leverage one
such method, Adversarial Inverse Reinforcement Learning (AIRL), to propose an
algorithm that learns hierarchical disentangled rewards with a policy over
options. We show that this method has the ability to learn \emph{generalizable}
policies and reward functions in complex transfer learning tasks, while
yielding results in continuous control benchmarks that are comparable to those
of the state-of-the-art methods.
Related papers
- Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching [23.600285251963395]
In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment.
Traditionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and a learner optimize the reward through repeated RL procedures.
We propose a novel approach to IRL by direct policy optimization, exploiting a linear factorization of the return as the inner product of successor features and a reward vector.
arXiv Detail & Related papers (2024-11-11T14:05:50Z) - Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble [8.857776147129464]
Recovering reward function from expert demonstrations is a fundamental problem in reinforcement learning.
We present a dynamics-agnostic discriminator-ensemble reward learning method capable of learning both state-action and state-only reward functions.
arXiv Detail & Related papers (2022-06-01T05:16:39Z) - Adversarial Motion Priors Make Good Substitutes for Complex Reward
Functions [124.11520774395748]
Reinforcement learning practitioners often utilize complex reward functions that encourage physically plausible behaviors.
We propose substituting complex reward functions with "style rewards" learned from a dataset of motion capture demonstrations.
A learned style reward can be combined with an arbitrary task reward to train policies that perform tasks using naturalistic strategies.
arXiv Detail & Related papers (2022-03-28T21:17:36Z) - Temporal Abstractions-Augmented Temporally Contrastive Learning: An
Alternative to the Laplacian in RL [140.12803111221206]
In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting.
We propose an alternative method that is able to recover, in a non-uniform-prior setting, the expressiveness and the desired properties of the Laplacian representation.
We find that our method succeeds as an alternative to the Laplacian in the non-uniform setting and scales to challenging continuous control environments.
arXiv Detail & Related papers (2022-03-21T22:07:48Z) - OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via
Distribution Matching [12.335788185691916]
Inverse Reinforcement Learning (IRL) is attractive in scenarios where reward engineering can be tedious.
Prior IRL algorithms use on-policy transitions, which require intensive sampling from the current policy for stable and optimal performance.
We present Off-Policy Inverse Reinforcement Learning (OPIRL), which adopts off-policy data distribution instead of on-policy.
arXiv Detail & Related papers (2021-09-09T14:32:26Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Demonstration-efficient Inverse Reinforcement Learning in Procedurally
Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations.
We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z) - Self-Imitation Learning for Robot Tasks with Sparse and Delayed Rewards [1.2691047660244335]
We propose a practical self-imitation learning method named Self-Imitation Learning with Constant Reward (SILCR)
Our method assigns the immediate rewards at each timestep with constant values according to their final episodic rewards.
We demonstrate the effectiveness of our method in some challenging continuous robotics control tasks in MuJoCo simulation.
arXiv Detail & Related papers (2020-10-14T11:12:07Z) - Off-Policy Adversarial Inverse Reinforcement Learning [0.0]
Adversarial Imitation Learning (AIL) is a class of algorithms in Reinforcement learning (RL)
We propose an Off-Policy Adversarial Inverse Reinforcement Learning (Off-policy-AIRL) algorithm which is sample efficient as well as gives good imitation performance.
arXiv Detail & Related papers (2020-05-03T16:51:40Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.