Generalization Guarantees for Imitation Learning
- URL: http://arxiv.org/abs/2008.01913v2
- Date: Thu, 3 Dec 2020 06:35:17 GMT
- Title: Generalization Guarantees for Imitation Learning
- Authors: Allen Z. Ren, Sushant Veer, Anirudha Majumdar
- Abstract summary: Control policies from imitation learning can often fail to generalize to novel environments.
We present rigorous generalization guarantees for imitation learning by leveraging the Probably Approximately Correct (PAC)-Bayes framework.
- Score: 6.542289202349586
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Control policies from imitation learning can often fail to generalize to
novel environments due to imperfect demonstrations or the inability of
imitation learning algorithms to accurately infer the expert's policies. In
this paper, we present rigorous generalization guarantees for imitation
learning by leveraging the Probably Approximately Correct (PAC)-Bayes framework
to provide upper bounds on the expected cost of policies in novel environments.
We propose a two-stage training method where a latent policy distribution is
first embedded with multi-modal expert behavior using a conditional variational
autoencoder, and then "fine-tuned" in new training environments to explicitly
optimize the generalization bound. We demonstrate strong generalization bounds
and their tightness relative to empirical performance in simulation for (i)
grasping diverse mugs, (ii) planar pushing with visual feedback, and (iii)
vision-based indoor navigation, as well as through hardware experiments for the
two manipulation tasks.
Related papers
- Invariant Causal Imitation Learning for Generalizable Policies [87.51882102248395]
We propose Invariant Causal Learning (ICIL) to learn an imitation policy.
ICIL learns a representation of causal features that is disentangled from the specific representations of noise variables.
We show that ICIL is effective in learning imitation policies capable of generalizing to unseen environments.
arXiv Detail & Related papers (2023-11-02T16:52:36Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Robust Learning from Observation with Model Misspecification [33.92371002674386]
Imitation learning (IL) is a popular paradigm for training policies in robotic systems.
We propose a robust IL algorithm to learn policies that can effectively transfer to the real environment without fine-tuning.
arXiv Detail & Related papers (2022-02-12T07:04:06Z) - Dual Behavior Regularized Reinforcement Learning [8.883885464358737]
Reinforcement learning has been shown to perform a range of complex tasks through interaction with an environment or collected leveraging experience.
We propose dual, advantage-based behavior policy based on counterfactual regret minimization.
arXiv Detail & Related papers (2021-09-19T00:47:18Z) - Robust Reinforcement Learning via Adversarial training with Langevin
Dynamics [51.234482917047835]
We introduce a sampling perspective to tackle the challenging task of training robust Reinforcement Learning (RL) agents.
We present a novel, scalable two-player RL algorithm, which is a sampling variant of the two-player policy method.
arXiv Detail & Related papers (2020-02-14T14:59:14Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.