Imitation Learning from Purified Demonstration
- URL: http://arxiv.org/abs/2310.07143v1
- Date: Wed, 11 Oct 2023 02:36:52 GMT
- Title: Imitation Learning from Purified Demonstration
- Authors: Yunke Wang, Minjing Dong, Bo Du, Chang Xu
- Abstract summary: We propose to purify the potential perturbations in imperfect demonstrations and conduct imitation learning from purified demonstrations.
We provide theoretical evidence supporting our approach, demonstrating that total variance distance between the purified and optimal demonstration distributions can be upper-bounded.
- Score: 55.23663861003027
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation learning has emerged as a promising approach for addressing
sequential decision-making problems, with the assumption that expert
demonstrations are optimal. However, in real-world scenarios, expert
demonstrations are often imperfect, leading to challenges in effectively
applying imitation learning. While existing research has focused on optimizing
with imperfect demonstrations, the training typically requires a certain
proportion of optimal demonstrations to guarantee performance. To tackle these
problems, we propose to purify the potential perturbations in imperfect
demonstrations and subsequently conduct imitation learning from purified
demonstrations. Motivated by the success of diffusion models, we introduce a
two-step purification via the diffusion process. In the first step, we apply a
forward diffusion process to effectively smooth out the potential perturbations
in imperfect demonstrations by introducing additional noise. Subsequently, a
reverse generative process is utilized to recover the optimal expert
demonstrations from the diffused ones. We provide theoretical evidence
supporting our approach, demonstrating that total variance distance between the
purified and optimal demonstration distributions can be upper-bounded. The
evaluation results on MuJoCo demonstrate the effectiveness of our method from
different aspects.
Related papers
- Good Better Best: Self-Motivated Imitation Learning for noisy
Demonstrations [12.627982138086892]
Imitation Learning aims to discover a policy by minimizing the discrepancy between the agent's behavior and expert demonstrations.
In this paper, we introduce Self-Motivated Imitation LEarning (SMILE), a method capable of progressively filtering out demonstrations collected by policies deemed inferior to the current policy.
arXiv Detail & Related papers (2023-10-24T13:09:56Z) - Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - Sample-efficient Adversarial Imitation Learning [45.400080101596956]
We propose a self-supervised representation-based adversarial imitation learning method to learn state and action representations.
We show a 39% relative improvement over existing adversarial imitation learning methods on MuJoCo in a setting limited to 100 expert state-action pairs.
arXiv Detail & Related papers (2023-03-14T12:36:01Z) - Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning [48.595574101874575]
In the real world, expert demonstrations are more likely to be imperfect.
A positive-unlabeled adversarial imitation learning algorithm is developed.
Agent policy will be optimized to cheat the discriminator and produce trajectories similar to those optimal expert demonstrations.
arXiv Detail & Related papers (2023-02-13T11:26:44Z) - Out-of-Dynamics Imitation Learning from Multimodal Demonstrations [68.46458026983409]
We study out-of-dynamics imitation learning (OOD-IL), which relaxes the assumption to that the demonstrator and the imitator have the same state spaces.
OOD-IL enables imitation learning to utilize demonstrations from a wide range of demonstrators but introduces a new challenge.
We develop a better transferability measurement to tackle this newly-emerged challenge.
arXiv Detail & Related papers (2022-11-13T07:45:06Z) - Contrastive Demonstration Tuning for Pre-trained Language Models [59.90340768724675]
Demonstration examples are crucial for an excellent final performance of prompt-tuning.
The proposed approach can be: (i) Plugged into any previous prompt-tuning approaches; (ii) Extended to widespread classification tasks with a large number of categories.
Experimental results on 16 datasets illustrate that our method integrated with previous approaches LM-BFF and P-tuning can yield better performance.
arXiv Detail & Related papers (2022-04-09T05:30:48Z) - Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations
using Generative Models [18.195406135434503]
We propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential.
We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first.
In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials.
arXiv Detail & Related papers (2020-11-02T20:32:05Z) - Learning the Truth From Only One Side of the Story [58.65439277460011]
We focus on generalized linear models and show that without adjusting for this sampling bias, the model may converge suboptimally or even fail to converge to the optimal solution.
We propose an adaptive approach that comes with theoretical guarantees and show that it outperforms several existing methods empirically.
arXiv Detail & Related papers (2020-06-08T18:20:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.