Causal Imitation Learning with Unobserved Confounders
- URL: http://arxiv.org/abs/2208.06267v1
- Date: Fri, 12 Aug 2022 13:29:53 GMT
- Title: Causal Imitation Learning with Unobserved Confounders
- Authors: Junzhe Zhang, Daniel Kumor, Elias Bareinboim
- Abstract summary: We study imitation learning when sensory inputs of the learner and the expert differ.
We show that imitation could still be feasible by exploiting quantitative knowledge of the expert trajectories.
- Score: 82.22545916247269
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the common ways children learn is by mimicking adults. Imitation
learning focuses on learning policies with suitable performance from
demonstrations generated by an expert, with an unspecified performance measure,
and unobserved reward signal. Popular methods for imitation learning start by
either directly mimicking the behavior policy of an expert (behavior cloning)
or by learning a reward function that prioritizes observed expert trajectories
(inverse reinforcement learning). However, these methods rely on the assumption
that covariates used by the expert to determine her/his actions are fully
observed. In this paper, we relax this assumption and study imitation learning
when sensory inputs of the learner and the expert differ. First, we provide a
non-parametric, graphical criterion that is complete (both necessary and
sufficient) for determining the feasibility of imitation from the combinations
of demonstration data and qualitative assumptions about the underlying
environment, represented in the form of a causal model. We then show that when
such a criterion does not hold, imitation could still be feasible by exploiting
quantitative knowledge of the expert trajectories. Finally, we develop an
efficient procedure for learning the imitating policy from experts'
trajectories.
Related papers
- IDIL: Imitation Learning of Intent-Driven Expert Behavior [2.07180164747172]
We introduce IDIL, a novel imitation learning algorithm to mimic diverse intent-driven behaviors of experts.
It is capable of addressing sequential tasks with high-dimensional state representations.
As it creates a generative model, IDIL demonstrates superior performance in intent inference metrics.
arXiv Detail & Related papers (2024-04-25T19:18:30Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Sample-efficient Adversarial Imitation Learning [45.400080101596956]
We propose a self-supervised representation-based adversarial imitation learning method to learn state and action representations.
We show a 39% relative improvement over existing adversarial imitation learning methods on MuJoCo in a setting limited to 100 expert state-action pairs.
arXiv Detail & Related papers (2023-03-14T12:36:01Z) - How To Guide Your Learner: Imitation Learning with Active Adaptive
Expert Involvement [20.91491585498749]
We propose a novel active imitation learning framework based on a teacher-student interaction model.
We show that AdapMen can improve the error bound and avoid compounding error under mild conditions.
arXiv Detail & Related papers (2023-03-03T16:44:33Z) - Deconfounding Imitation Learning with Variational Inference [19.99248795957195]
Standard imitation learning can fail when the expert demonstrators have different sensory inputs than the imitating agent.
This is because partial observability gives rise to hidden confounders in the causal graph.
We propose to train a variational inference model to infer the expert's latent information and use it to train a latent-conditional policy.
arXiv Detail & Related papers (2022-11-04T18:00:02Z) - Evaluating Disentanglement in Generative Models Without Knowledge of
Latent Factors [71.79984112148865]
We introduce a method for ranking generative models based on the training dynamics exhibited during learning.
Inspired by recent theoretical characterizations of disentanglement, our method does not require supervision of the underlying latent factors.
arXiv Detail & Related papers (2022-10-04T17:27:29Z) - Imitating Past Successes can be Very Suboptimal [145.70788608016755]
We show that existing outcome-conditioned imitation learning methods do not necessarily improve the policy.
We show that a simple modification results in a method that does guarantee policy improvement.
Our aim is not to develop an entirely new method, but rather to explain how a variant of outcome-conditioned imitation learning can be used to maximize rewards.
arXiv Detail & Related papers (2022-06-07T15:13:43Z) - Imitation by Predicting Observations [17.86983397979034]
We present a new method for imitation solely from observations that achieves comparable performance to experts on challenging continuous control tasks.
Our method, which we call FORM, is derived from an inverse RL objective and imitates using a model of expert behavior learned by generative modelling of the expert's observations.
We show that FORM performs comparably to a strong baseline IRL method (GAIL) on the DeepMind Control Suite benchmark, while outperforming GAIL in the presence of task-irrelevant features.
arXiv Detail & Related papers (2021-07-08T14:09:30Z) - Learning "What-if" Explanations for Sequential Decision-Making [92.8311073739295]
Building interpretable parameterizations of real-world decision-making on the basis of demonstrated behavior is essential.
We propose learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to "what if" outcomes.
We highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior.
arXiv Detail & Related papers (2020-07-02T14:24:17Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.