Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning
- URL: http://arxiv.org/abs/2302.06271v1
- Date: Mon, 13 Feb 2023 11:26:44 GMT
- Title: Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning
- Authors: Yunke Wang, Bo Du, Chang Xu
- Abstract summary: In the real world, expert demonstrations are more likely to be imperfect.
A positive-unlabeled adversarial imitation learning algorithm is developed.
Agent policy will be optimized to cheat the discriminator and produce trajectories similar to those optimal expert demonstrations.
- Score: 48.595574101874575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial imitation learning has become a widely used imitation learning
framework. The discriminator is often trained by taking expert demonstrations
and policy trajectories as examples respectively from two categories (positive
vs. negative) and the policy is then expected to produce trajectories that are
indistinguishable from the expert demonstrations. But in the real world, the
collected expert demonstrations are more likely to be imperfect, where only an
unknown fraction of the demonstrations are optimal. Instead of treating
imperfect expert demonstrations as absolutely positive or negative, we
investigate unlabeled imperfect expert demonstrations as they are. A
positive-unlabeled adversarial imitation learning algorithm is developed to
dynamically sample expert demonstrations that can well match the trajectories
from the constantly optimized agent policy. The trajectories of an initial
agent policy could be closer to those non-optimal expert demonstrations, but
within the framework of adversarial imitation learning, agent policy will be
optimized to cheat the discriminator and produce trajectories that are similar
to those optimal expert demonstrations. Theoretical analysis shows that our
method learns from the imperfect demonstrations via a self-paced way.
Experimental results on MuJoCo and RoboSuite platforms demonstrate the
effectiveness of our method from different aspects.
Related papers
- Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning [54.40927310957792]
We introduce a novel concept of personalized expert demonstrations, tailored for each individual agent or, more broadly, each individual type of agent within a heterogeneous team.
These demonstrations solely pertain to single-agent behaviors and how each agent can achieve personal goals without encompassing any cooperative elements.
We propose an approach that selectively utilizes personalized expert demonstrations as guidance and allows agents to learn to cooperate.
arXiv Detail & Related papers (2024-03-13T20:11:20Z) - Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - Divide and Repair: Using Options to Improve Performance of Imitation
Learning Against Adversarial Demonstrations [0.6853165736531939]
We consider the problem of learning to perform a task from demonstrations given by teachers or experts.
Some of the experts' demonstrations might be adversarial and demonstrate an incorrect way to perform the task.
We propose a novel technique that can identify parts of demonstrated trajectories that have not been significantly modified by the adversary.
arXiv Detail & Related papers (2023-06-07T16:33:52Z) - Robust Imitation of a Few Demonstrations with a Backwards Model [3.8530020696501794]
Behavior cloning of expert demonstrations can speed up learning optimal policies in a more sample-efficient way than reinforcement learning.
We tackle this issue by extending the region of attraction around the demonstrations so that the agent can learn how to get back onto the demonstrated trajectories if it veers off-course.
With optimal or near-optimal demonstrations, the learned policy will be both optimal and robust to deviations, with a wider region of attraction.
arXiv Detail & Related papers (2022-10-17T18:02:19Z) - Causal Imitation Learning with Unobserved Confounders [82.22545916247269]
We study imitation learning when sensory inputs of the learner and the expert differ.
We show that imitation could still be feasible by exploiting quantitative knowledge of the expert trajectories.
arXiv Detail & Related papers (2022-08-12T13:29:53Z) - Learning from Imperfect Demonstrations via Adversarial Confidence
Transfer [44.14553613304978]
We study the problem of learning from imperfect demonstrations by learning a confidence predictor.
We learn a common latent space through adversarial distribution matching of multi-length partial trajectories.
Our experiments in three simulated environments and a real robot reaching task demonstrate that our approach learns a policy with the highest expected return.
arXiv Detail & Related papers (2022-02-07T06:33:35Z) - Imitation Learning by Estimating Expertise of Demonstrators [92.20185160311036]
We show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms.
We develop and optimize a joint model over a learned policy and expertise levels of the demonstrators.
We illustrate our findings on real-robotic continuous control tasks from Robomimic and discrete environments such as MiniGrid and chess.
arXiv Detail & Related papers (2022-02-02T21:23:19Z) - Learning from Imperfect Demonstrations from Agents with Varying Dynamics [29.94164262533282]
We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning.
Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.
arXiv Detail & Related papers (2021-03-10T07:39:38Z) - Combating False Negatives in Adversarial Imitation Learning [67.99941805086154]
In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior.
As the trained policy learns to be more successful, the negative examples become increasingly similar to expert ones.
We propose a method to alleviate the impact of false negatives and test it on the BabyAI environment.
arXiv Detail & Related papers (2020-02-02T14:56:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.