Imitation Learning from Purified Demonstration
- URL: http://arxiv.org/abs/2310.07143v1
- Date: Wed, 11 Oct 2023 02:36:52 GMT
- Title: Imitation Learning from Purified Demonstration
- Authors: Yunke Wang, Minjing Dong, Bo Du, Chang Xu
- Abstract summary: We propose to purify the potential perturbations in imperfect demonstrations and conduct imitation learning from purified demonstrations.
We provide theoretical evidence supporting our approach, demonstrating that total variance distance between the purified and optimal demonstration distributions can be upper-bounded.
- Score: 55.23663861003027
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation learning has emerged as a promising approach for addressing
sequential decision-making problems, with the assumption that expert
demonstrations are optimal. However, in real-world scenarios, expert
demonstrations are often imperfect, leading to challenges in effectively
applying imitation learning. While existing research has focused on optimizing
with imperfect demonstrations, the training typically requires a certain
proportion of optimal demonstrations to guarantee performance. To tackle these
problems, we propose to purify the potential perturbations in imperfect
demonstrations and subsequently conduct imitation learning from purified
demonstrations. Motivated by the success of diffusion models, we introduce a
two-step purification via the diffusion process. In the first step, we apply a
forward diffusion process to effectively smooth out the potential perturbations
in imperfect demonstrations by introducing additional noise. Subsequently, a
reverse generative process is utilized to recover the optimal expert
demonstrations from the diffused ones. We provide theoretical evidence
supporting our approach, demonstrating that total variance distance between the
purified and optimal demonstration distributions can be upper-bounded. The
evaluation results on MuJoCo demonstrate the effectiveness of our method from
different aspects.
Related papers
- Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning [48.595574101874575]
In the real world, expert demonstrations are more likely to be imperfect.
A positive-unlabeled adversarial imitation learning algorithm is developed.
Agent policy will be optimized to cheat the discriminator and produce trajectories similar to those optimal expert demonstrations.
arXiv Detail & Related papers (2023-02-13T11:26:44Z) - Out-of-Dynamics Imitation Learning from Multimodal Demonstrations [68.46458026983409]
We study out-of-dynamics imitation learning (OOD-IL), which relaxes the assumption to that the demonstrator and the imitator have the same state spaces.
OOD-IL enables imitation learning to utilize demonstrations from a wide range of demonstrators but introduces a new challenge.
We develop a better transferability measurement to tackle this newly-emerged challenge.
arXiv Detail & Related papers (2022-11-13T07:45:06Z) - Robustness of Demonstration-based Learning Under Limited Data Scenario [54.912936555876826]
Demonstration-based learning has shown great potential in stimulating pretrained language models' ability under limited data scenario.
Why such demonstrations are beneficial for the learning process remains unclear since there is no explicit alignment between the demonstrations and the predictions.
In this paper, we design pathological demonstrations by gradually removing intuitively useful information from the standard ones to take a deep dive of the robustness of demonstration-based sequence labeling.
arXiv Detail & Related papers (2022-10-19T16:15:04Z) - Let Me Check the Examples: Enhancing Demonstration Learning via Explicit
Imitation [9.851250429233634]
Demonstration learning aims to guide the prompt prediction via providing answered demonstrations in the few shot settings.
Existing work onlycorporas the answered examples as demonstrations to the prompt template without any additional operation.
We introduce Imitation DEMOnstration Learning (Imitation-Demo) to strengthen demonstration learning via explicitly imitating human review behaviour.
arXiv Detail & Related papers (2022-08-31T06:59:36Z) - Evaluating the Effectiveness of Corrective Demonstrations and a Low-Cost
Sensor for Dexterous Manipulation [0.5669790037378094]
Imitation learning is a promising approach to help robots acquire dexterous manipulation capabilities.
We investigate characteristics of such additional demonstrations and their impact on performance.
We show that inexpensive vision-based sensors, such as LeapMotion, can be used to dramatically reduce the cost of providing demonstrations.
arXiv Detail & Related papers (2022-04-15T19:55:46Z) - Contrastive Demonstration Tuning for Pre-trained Language Models [59.90340768724675]
Demonstration examples are crucial for an excellent final performance of prompt-tuning.
The proposed approach can be: (i) Plugged into any previous prompt-tuning approaches; (ii) Extended to widespread classification tasks with a large number of categories.
Experimental results on 16 datasets illustrate that our method integrated with previous approaches LM-BFF and P-tuning can yield better performance.
arXiv Detail & Related papers (2022-04-09T05:30:48Z) - Learning from Imperfect Demonstrations from Agents with Varying Dynamics [29.94164262533282]
We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning.
Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.
arXiv Detail & Related papers (2021-03-10T07:39:38Z) - Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations
using Generative Models [18.195406135434503]
We propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential.
We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first.
In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials.
arXiv Detail & Related papers (2020-11-02T20:32:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.