Error Bounds of Imitating Policies and Environments
- URL: http://arxiv.org/abs/2010.11876v1
- Date: Thu, 22 Oct 2020 17:13:31 GMT
- Title: Error Bounds of Imitating Policies and Environments
- Authors: Tian Xu, Ziniu Li, Yang Yu
- Abstract summary: We firstly analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning and generative adversarial imitation.
The results support that generative adversarial imitation can reduce the compounding errors compared to behavioral cloning, and thus has a better sample complexity.
The results show that environment models can be more effectively imitated by generative adversarial imitation than behavioral cloning, suggesting a novel application of adversarial imitation for model-based reinforcement learning.
- Score: 11.154257789731467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation learning trains a policy by mimicking expert demonstrations.
Various imitation methods were proposed and empirically evaluated, meanwhile,
their theoretical understanding needs further studies. In this paper, we
firstly analyze the value gap between the expert policy and imitated policies
by two imitation methods, behavioral cloning and generative adversarial
imitation. The results support that generative adversarial imitation can reduce
the compounding errors compared to behavioral cloning, and thus has a better
sample complexity. Noticed that by considering the environment transition model
as a dual agent, imitation learning can also be used to learn the environment
model. Therefore, based on the bounds of imitating policies, we further analyze
the performance of imitating environments. The results show that environment
models can be more effectively imitated by generative adversarial imitation
than behavioral cloning, suggesting a novel application of adversarial
imitation for model-based reinforcement learning. We hope these results could
inspire future advances in imitation learning and model-based reinforcement
learning.
Related papers
- Toward Understanding In-context vs. In-weight Learning [50.24035812301655]
We identify simplified distributional properties that give rise to the emergence and disappearance of in-context learning.
We then extend the study to a full large language model, showing how fine-tuning on various collections of natural language prompts can elicit similar in-context and in-weight learning behaviour.
arXiv Detail & Related papers (2024-10-30T14:09:00Z) - Interpretable Imitation Learning with Dynamic Causal Relations [65.18456572421702]
We propose to expose captured knowledge in the form of a directed acyclic causal graph.
We also design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs.
The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner.
arXiv Detail & Related papers (2023-09-30T20:59:42Z) - Sequential Causal Imitation Learning with Unobserved Confounders [82.22545916247269]
"Monkey see monkey do" is an age-old adage, referring to na"ive imitation without a deep understanding of a system's underlying mechanics.
This paper investigates the problem of causal imitation learning in sequential settings, where the imitator must make multiple decisions per episode.
arXiv Detail & Related papers (2022-08-12T13:53:23Z) - Causal Imitation Learning with Unobserved Confounders [82.22545916247269]
We study imitation learning when sensory inputs of the learner and the expert differ.
We show that imitation could still be feasible by exploiting quantitative knowledge of the expert trajectories.
arXiv Detail & Related papers (2022-08-12T13:29:53Z) - Model-Based Imitation Learning Using Entropy Regularization of Model and
Policy [0.456877715768796]
We propose model-based Entropy-Regularized Imitation Learning (MB-ERIL) under the entropy-regularized Markov decision process.
A policy discriminator distinguishes the actions generated by a robot from expert ones, and a model discriminator distinguishes the counterfactual state transitions generated by the model from the actual ones.
Computer simulations and real robot experiments show that MB-ERIL achieves a competitive performance and significantly improves the sample efficiency compared to baseline methods.
arXiv Detail & Related papers (2022-06-21T04:15:12Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Imitation Learning by State-Only Distribution Matching [2.580765958706854]
Imitation Learning from observation describes policy learning in a similar way to human learning.
We propose a non-adversarial learning-from-observations approach, together with an interpretable convergence and performance metric.
arXiv Detail & Related papers (2022-02-09T08:38:50Z) - Deterministic and Discriminative Imitation (D2-Imitation): Revisiting
Adversarial Imitation for Sample Efficiency [61.03922379081648]
We propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization.
Our empirical results show that D2-Imitation is effective in achieving good sample efficiency, outperforming several off-policy extension approaches of adversarial imitation.
arXiv Detail & Related papers (2021-12-11T19:36:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.