Imitating Past Successes can be Very Suboptimal
- URL: http://arxiv.org/abs/2206.03378v1
- Date: Tue, 7 Jun 2022 15:13:43 GMT
- Title: Imitating Past Successes can be Very Suboptimal
- Authors: Benjamin Eysenbach, Soumith Udatha, Sergey Levine and Ruslan
Salakhutdinov
- Abstract summary: We show that existing outcome-conditioned imitation learning methods do not necessarily improve the policy.
We show that a simple modification results in a method that does guarantee policy improvement.
Our aim is not to develop an entirely new method, but rather to explain how a variant of outcome-conditioned imitation learning can be used to maximize rewards.
- Score: 145.70788608016755
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prior work has proposed a simple strategy for reinforcement learning (RL):
label experience with the outcomes achieved in that experience, and then
imitate the relabeled experience. These outcome-conditioned imitation learning
methods are appealing because of their simplicity, strong performance, and
close ties with supervised learning. However, it remains unclear how these
methods relate to the standard RL objective, reward maximization. In this
paper, we prove that existing outcome-conditioned imitation learning methods do
not necessarily improve the policy; rather, in some settings they can decrease
the expected reward. Nonetheless, we show that a simple modification results in
a method that does guarantee policy improvement, under some assumptions. Our
aim is not to develop an entirely new method, but rather to explain how a
variant of outcome-conditioned imitation learning can be used to maximize
rewards.
Related papers
- Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching [23.600285251963395]
In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment.
Traditionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and a learner optimize the reward through repeated RL procedures.
We propose a novel approach to IRL by direct policy optimization, exploiting a linear factorization of the return as the inner product of successor features and a reward vector.
arXiv Detail & Related papers (2024-11-11T14:05:50Z) - Causal Imitation Learning with Unobserved Confounders [82.22545916247269]
We study imitation learning when sensory inputs of the learner and the expert differ.
We show that imitation could still be feasible by exploiting quantitative knowledge of the expert trajectories.
arXiv Detail & Related papers (2022-08-12T13:29:53Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Imitation Learning by State-Only Distribution Matching [2.580765958706854]
Imitation Learning from observation describes policy learning in a similar way to human learning.
We propose a non-adversarial learning-from-observations approach, together with an interpretable convergence and performance metric.
arXiv Detail & Related papers (2022-02-09T08:38:50Z) - State Augmented Constrained Reinforcement Learning: Overcoming the
Limitations of Learning with Rewards [88.30521204048551]
A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds.
We show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards.
This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods.
arXiv Detail & Related papers (2021-02-23T21:07:35Z) - Non-Adversarial Imitation Learning and its Connections to Adversarial
Methods [21.89749623434729]
We present a framework for non-adversarial imitation learning.
The resulting algorithms are similar to their adversarial counterparts.
We also show that our non-adversarial formulation can be used to derive novel algorithms.
arXiv Detail & Related papers (2020-08-08T13:43:06Z) - Learning the Truth From Only One Side of the Story [58.65439277460011]
We focus on generalized linear models and show that without adjusting for this sampling bias, the model may converge suboptimally or even fail to converge to the optimal solution.
We propose an adaptive approach that comes with theoretical guarantees and show that it outperforms several existing methods empirically.
arXiv Detail & Related papers (2020-06-08T18:20:28Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.