Regularized Inverse Reinforcement Learning
- URL: http://arxiv.org/abs/2010.03691v2
- Date: Thu, 3 Dec 2020 01:34:00 GMT
- Title: Regularized Inverse Reinforcement Learning
- Authors: Wonseok Jeon, Chen-Yang Su, Paul Barde, Thang Doan, Derek
Nowrouzezahrai, Joelle Pineau
- Abstract summary: Inverse Reinforcement Learning (IRL) aims to facilitate a learner's ability to imitate expert behavior.
Regularized IRL applies strongly convex regularizers to the learner's policy.
We propose tractable solutions, and practical methods to obtain them, for regularized IRL.
- Score: 49.78352058771138
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inverse Reinforcement Learning (IRL) aims to facilitate a learner's ability
to imitate expert behavior by acquiring reward functions that explain the
expert's decisions. Regularized IRL applies strongly convex regularizers to the
learner's policy in order to avoid the expert's behavior being rationalized by
arbitrary constant rewards, also known as degenerate solutions. We propose
tractable solutions, and practical methods to obtain them, for regularized IRL.
Current methods are restricted to the maximum-entropy IRL framework, limiting
them to Shannon-entropy regularizers, as well as proposing the solutions that
are intractable in practice. We present theoretical backing for our proposed
IRL method's applicability for both discrete and continuous controls,
empirically validating our performance on a variety of tasks.
Related papers
- Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching [23.600285251963395]
In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment.
Traditionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and a learner optimize the reward through repeated RL procedures.
We propose a novel approach to IRL by direct policy optimization, exploiting a linear factorization of the return as the inner product of successor features and a reward vector.
arXiv Detail & Related papers (2024-11-11T14:05:50Z) - Is Inverse Reinforcement Learning Harder than Standard Reinforcement
Learning? A Theoretical Perspective [55.36819597141271]
Inverse Reinforcement Learning (IRL) -- the problem of learning reward functions from demonstrations of an emphexpert policy -- plays a critical role in developing intelligent systems.
This paper provides the first line of efficient IRL in vanilla offline and online settings using samples and runtime.
As an application, we show that the learned rewards can emphtransfer to another target MDP with suitable guarantees.
arXiv Detail & Related papers (2023-11-29T00:09:01Z) - Deconfounding Imitation Learning with Variational Inference [19.99248795957195]
Standard imitation learning can fail when the expert demonstrators have different sensory inputs than the imitating agent.
This is because partial observability gives rise to hidden confounders in the causal graph.
We propose to train a variational inference model to infer the expert's latent information and use it to train a latent-conditional policy.
arXiv Detail & Related papers (2022-11-04T18:00:02Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Versatile Inverse Reinforcement Learning via Cumulative Rewards [22.56145954060092]
Inverse Reinforcement Learning infers a reward function from expert demonstrations, aiming to encode the behavior and intentions of the expert.
We propose a novel method for Inverse Reinforcement Learning that overcomes these problems by formulating the recovered reward as a sum of iteratively trained discriminators.
arXiv Detail & Related papers (2021-11-15T10:49:15Z) - Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit.
We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner.
Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z) - State Augmented Constrained Reinforcement Learning: Overcoming the
Limitations of Learning with Rewards [88.30521204048551]
A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds.
We show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards.
This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods.
arXiv Detail & Related papers (2021-02-23T21:07:35Z) - Inverse Reinforcement Learning in the Continuous Setting with Formal
Guarantees [31.122125783516726]
Inverse Reinforcement Learning (IRL) is the problem of finding a reward function which describes observed/known expert behavior.
We provide a new IRL algorithm for the continuous state space setting with unknown transition dynamics.
arXiv Detail & Related papers (2021-02-16T03:17:23Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.