Related papers: oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally Extended Actions

oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally Extended Actions

URL: http://arxiv.org/abs/2002.09043v1
Date: Thu, 20 Feb 2020 22:21:41 GMT
Title: oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally Extended Actions
Authors: David Venuto, Jhelum Chakravorty, Leonard Boussioux, Junhao Wang, Gavin McCracken, Doina Precup
Abstract summary: Explicit engineering of reward functions for given environments has been a major hindrance to reinforcement learning methods. We propose an algorithm that learns hierarchical disentangled rewards with a policy over options.
Score: 37.66289166905027
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Explicit engineering of reward functions for given environments has been a major hindrance to reinforcement learning methods. While Inverse Reinforcement Learning (IRL) is a solution to recover reward functions from demonstrations only, these learned rewards are generally heavily \textit{entangled} with the dynamics of the environment and therefore not portable or \emph{robust} to changing environments. Modern adversarial methods have yielded some success in reducing reward entanglement in the IRL setting. In this work, we leverage one such method, Adversarial Inverse Reinforcement Learning (AIRL), to propose an algorithm that learns hierarchical disentangled rewards with a policy over options. We show that this method has the ability to learn \emph{generalizable} policies and reward functions in complex transfer learning tasks, while yielding results in continuous control benchmarks that are comparable to those of the state-of-the-art methods.

Related papers

Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching [23.600285251963395]
In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment. Traditionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and a learner optimize the reward through repeated RL procedures. We propose a novel approach to IRL by direct policy optimization, exploiting a linear factorization of the return as the inner product of successor features and a reward vector.
arXiv Detail & Related papers (2024-11-11T14:05:50Z)
RILe: Reinforced Imitation Learning [60.63173816209543]
RILe (Reinforced Learning) is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently.<n>Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z)
Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble [8.857776147129464]
Recovering reward function from expert demonstrations is a fundamental problem in reinforcement learning. We present a dynamics-agnostic discriminator-ensemble reward learning method capable of learning both state-action and state-only reward functions.
arXiv Detail & Related papers (2022-06-01T05:16:39Z)
Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions [124.11520774395748]
Reinforcement learning practitioners often utilize complex reward functions that encourage physically plausible behaviors. We propose substituting complex reward functions with "style rewards" learned from a dataset of motion capture demonstrations. A learned style reward can be combined with an arbitrary task reward to train policies that perform tasks using naturalistic strategies.
arXiv Detail & Related papers (2022-03-28T21:17:36Z)
Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL [140.12803111221206]
In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting. We propose an alternative method that is able to recover, in a non-uniform-prior setting, the expressiveness and the desired properties of the Laplacian representation. We find that our method succeeds as an alternative to the Laplacian in the non-uniform setting and scales to challenging continuous control environments.
arXiv Detail & Related papers (2022-03-21T22:07:48Z)
OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching [12.335788185691916]
Inverse Reinforcement Learning (IRL) is attractive in scenarios where reward engineering can be tedious. Prior IRL algorithms use on-policy transitions, which require intensive sampling from the current policy for stable and optimal performance. We present Off-Policy Inverse Reinforcement Learning (OPIRL), which adopts off-policy data distribution instead of on-policy.
arXiv Detail & Related papers (2021-09-09T14:32:26Z)
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD) We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
Demonstration-efficient Inverse Reinforcement Learning in Procedurally Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations. We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z)
Self-Imitation Learning for Robot Tasks with Sparse and Delayed Rewards [1.2691047660244335]
We propose a practical self-imitation learning method named Self-Imitation Learning with Constant Reward (SILCR) Our method assigns the immediate rewards at each timestep with constant values according to their final episodic rewards. We demonstrate the effectiveness of our method in some challenging continuous robotics control tasks in MuJoCo simulation.
arXiv Detail & Related papers (2020-10-14T11:12:07Z)
Off-Policy Adversarial Inverse Reinforcement Learning [0.0]
Adversarial Imitation Learning (AIL) is a class of algorithms in Reinforcement learning (RL) We propose an Off-Policy Adversarial Inverse Reinforcement Learning (Off-policy-AIRL) algorithm which is sample efficient as well as gives good imitation performance.
arXiv Detail & Related papers (2020-05-03T16:51:40Z)
Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data. Can we learn effective policies via supervised learning without demonstrations? We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.