Learning from Imperfect Demonstrations via Adversarial Confidence
Transfer
- URL: http://arxiv.org/abs/2202.02967v1
- Date: Mon, 7 Feb 2022 06:33:35 GMT
- Title: Learning from Imperfect Demonstrations via Adversarial Confidence
Transfer
- Authors: Zhangjie Cao, Zihan Wang, Dorsa Sadigh
- Abstract summary: We study the problem of learning from imperfect demonstrations by learning a confidence predictor.
We learn a common latent space through adversarial distribution matching of multi-length partial trajectories.
Our experiments in three simulated environments and a real robot reaching task demonstrate that our approach learns a policy with the highest expected return.
- Score: 44.14553613304978
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing learning from demonstration algorithms usually assume access to
expert demonstrations. However, this assumption is limiting in many real-world
applications since the collected demonstrations may be suboptimal or even
consist of failure cases. We therefore study the problem of learning from
imperfect demonstrations by learning a confidence predictor. Specifically, we
rely on demonstrations along with their confidence values from a different
correspondent environment (source environment) to learn a confidence predictor
for the environment we aim to learn a policy in (target environment -- where we
only have unlabeled demonstrations.) We learn a common latent space through
adversarial distribution matching of multi-length partial trajectories to
enable the transfer of confidence across source and target environments. The
learned confidence reweights the demonstrations to enable learning more from
informative demonstrations and discarding the irrelevant ones. Our experiments
in three simulated environments and a real robot reaching task demonstrate that
our approach learns a policy with the highest expected return.
Related papers
- Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning [48.595574101874575]
In the real world, expert demonstrations are more likely to be imperfect.
A positive-unlabeled adversarial imitation learning algorithm is developed.
Agent policy will be optimized to cheat the discriminator and produce trajectories similar to those optimal expert demonstrations.
arXiv Detail & Related papers (2023-02-13T11:26:44Z) - Out-of-Dynamics Imitation Learning from Multimodal Demonstrations [68.46458026983409]
We study out-of-dynamics imitation learning (OOD-IL), which relaxes the assumption to that the demonstrator and the imitator have the same state spaces.
OOD-IL enables imitation learning to utilize demonstrations from a wide range of demonstrators but introduces a new challenge.
We develop a better transferability measurement to tackle this newly-emerged challenge.
arXiv Detail & Related papers (2022-11-13T07:45:06Z) - Robustness of Demonstration-based Learning Under Limited Data Scenario [54.912936555876826]
Demonstration-based learning has shown great potential in stimulating pretrained language models' ability under limited data scenario.
Why such demonstrations are beneficial for the learning process remains unclear since there is no explicit alignment between the demonstrations and the predictions.
In this paper, we design pathological demonstrations by gradually removing intuitively useful information from the standard ones to take a deep dive of the robustness of demonstration-based sequence labeling.
arXiv Detail & Related papers (2022-10-19T16:15:04Z) - Imitation Learning by Estimating Expertise of Demonstrators [92.20185160311036]
We show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms.
We develop and optimize a joint model over a learned policy and expertise levels of the demonstrators.
We illustrate our findings on real-robotic continuous control tasks from Robomimic and discrete environments such as MiniGrid and chess.
arXiv Detail & Related papers (2022-02-02T21:23:19Z) - Learning Feasibility to Imitate Demonstrators with Different Dynamics [23.239058855103067]
The goal of learning from demonstrations is to learn a policy for an agent (imitator) by mimicking the behavior in the demonstrations.
We learn a feasibility metric that captures the likelihood of a demonstration being feasible by the imitator.
Our experiments on four simulated environments and on a real robot show that the policy learned with our approach achieves a higher expected return than prior works.
arXiv Detail & Related papers (2021-10-28T14:15:47Z) - Confidence-Aware Imitation Learning from Demonstrations with Varying
Optimality [30.51436098631477]
Confidence-Aware Imitation Learning (CAIL) learns a well-performing policy from confidence-reweighted demonstrations.
We provide theoretical guarantees on the convergence of CAIL and evaluate its performance in both simulated and real robot experiments.
arXiv Detail & Related papers (2021-10-27T20:29:38Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - Learning from Imperfect Demonstrations from Agents with Varying Dynamics [29.94164262533282]
We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning.
Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.
arXiv Detail & Related papers (2021-03-10T07:39:38Z) - Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.