Sequential Causal Imitation Learning with Unobserved Confounders
- URL: http://arxiv.org/abs/2208.06276v1
- Date: Fri, 12 Aug 2022 13:53:23 GMT
- Title: Sequential Causal Imitation Learning with Unobserved Confounders
- Authors: Daniel Kumor, Junzhe Zhang, Elias Bareinboim
- Abstract summary: "Monkey see monkey do" is an age-old adage, referring to na"ive imitation without a deep understanding of a system's underlying mechanics.
This paper investigates the problem of causal imitation learning in sequential settings, where the imitator must make multiple decisions per episode.
- Score: 82.22545916247269
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: "Monkey see monkey do" is an age-old adage, referring to na\"ive imitation
without a deep understanding of a system's underlying mechanics. Indeed, if a
demonstrator has access to information unavailable to the imitator (monkey),
such as a different set of sensors, then no matter how perfectly the imitator
models its perceived environment (See), attempting to reproduce the
demonstrator's behavior (Do) can lead to poor outcomes. Imitation learning in
the presence of a mismatch between demonstrator and imitator has been studied
in the literature under the rubric of causal imitation learning (Zhang et al.,
2020), but existing solutions are limited to single-stage decision-making. This
paper investigates the problem of causal imitation learning in sequential
settings, where the imitator must make multiple decisions per episode. We
develop a graphical criterion that is necessary and sufficient for determining
the feasibility of causal imitation, providing conditions when an imitator can
match a demonstrator's performance despite differing capabilities. Finally, we
provide an efficient algorithm for determining imitability and corroborate our
theory with simulations.
Related papers
- Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable
Environments [45.213059639254475]
We propose a new topic called imitator learning (ItorL)
It aims to derive an imitator module that can reconstruct the imitation policies based on very limited expert demonstrations.
For autonomous imitation policy building, we design a demonstration-based attention architecture for imitator policy.
arXiv Detail & Related papers (2023-10-09T13:35:28Z) - Out-of-Dynamics Imitation Learning from Multimodal Demonstrations [68.46458026983409]
We study out-of-dynamics imitation learning (OOD-IL), which relaxes the assumption to that the demonstrator and the imitator have the same state spaces.
OOD-IL enables imitation learning to utilize demonstrations from a wide range of demonstrators but introduces a new challenge.
We develop a better transferability measurement to tackle this newly-emerged challenge.
arXiv Detail & Related papers (2022-11-13T07:45:06Z) - Causal Imitation Learning with Unobserved Confounders [82.22545916247269]
We study imitation learning when sensory inputs of the learner and the expert differ.
We show that imitation could still be feasible by exploiting quantitative knowledge of the expert trajectories.
arXiv Detail & Related papers (2022-08-12T13:29:53Z) - A Low Rank Promoting Prior for Unsupervised Contrastive Learning [108.91406719395417]
We construct a novel probabilistic graphical model that effectively incorporates the low rank promoting prior into the framework of contrastive learning.
Our hypothesis explicitly requires that all the samples belonging to the same instance class lie on the same subspace with small dimension.
Empirical evidences show that the proposed algorithm clearly surpasses the state-of-the-art approaches on multiple benchmarks.
arXiv Detail & Related papers (2021-08-05T15:58:25Z) - Skeletal Feature Compensation for Imitation Learning with Embodiment
Mismatch [51.03498820458658]
SILEM is a proposed imitation learning technique that compensates for differences in skeletal features obtained from the learner and expert.
We create toy domains based on PyBullet's HalfCheetah and Ant to assess SILEM's benefits for this type of embodiment mismatch.
We also provide qualitative and quantitative results on more realistic problems -- teaching simulated humanoid agents to walk by observing human demonstrations.
arXiv Detail & Related papers (2021-04-15T22:50:48Z) - Learning from Imperfect Demonstrations from Agents with Varying Dynamics [29.94164262533282]
We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning.
Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.
arXiv Detail & Related papers (2021-03-10T07:39:38Z) - Feedback in Imitation Learning: Confusion on Causality and Covariate
Shift [12.93527098342393]
We argue that conditioning policies on previous actions leads to a dramatic divergence between "held out" error and performance of the learner in situ.
We analyze existing benchmarks used to test imitation learning approaches.
We find, in a surprising contrast with previous literature, that naive behavioral cloning provides excellent results.
arXiv Detail & Related papers (2021-02-04T20:18:56Z) - Error Bounds of Imitating Policies and Environments [11.154257789731467]
We firstly analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning and generative adversarial imitation.
The results support that generative adversarial imitation can reduce the compounding errors compared to behavioral cloning, and thus has a better sample complexity.
The results show that environment models can be more effectively imitated by generative adversarial imitation than behavioral cloning, suggesting a novel application of adversarial imitation for model-based reinforcement learning.
arXiv Detail & Related papers (2020-10-22T17:13:31Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.