Imitation Learning from Purified Demonstrations
- URL: http://arxiv.org/abs/2310.07143v2
- Date: Tue, 6 Aug 2024 00:51:37 GMT
- Title: Imitation Learning from Purified Demonstrations
- Authors: Yunke Wang, Minjing Dong, Yukun Zhao, Bo Du, Chang Xu,
- Abstract summary: We propose to purify the potential noises in imperfect demonstrations first, and subsequently conduct imitation learning from these purified demonstrations.
We provide theoretical evidence supporting our approach, demonstrating that the distance between the purified and optimal demonstration can be bounded.
- Score: 47.52316615371601
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation learning has emerged as a promising approach for addressing sequential decision-making problems, with the assumption that expert demonstrations are optimal. However, in real-world scenarios, most demonstrations are often imperfect, leading to challenges in the effectiveness of imitation learning. While existing research has focused on optimizing with imperfect demonstrations, the training typically requires a certain proportion of optimal demonstrations to guarantee performance. To tackle these problems, we propose to purify the potential noises in imperfect demonstrations first, and subsequently conduct imitation learning from these purified demonstrations. Motivated by the success of diffusion model, we introduce a two-step purification via diffusion process. In the first step, we apply a forward diffusion process to smooth potential noises in imperfect demonstrations by introducing additional noise. Subsequently, a reverse generative process is utilized to recover the optimal demonstration from the diffused ones. We provide theoretical evidence supporting our approach, demonstrating that the distance between the purified and optimal demonstration can be bounded. Empirical results on MuJoCo and RoboSuite demonstrate the effectiveness of our method from different aspects.
Related papers
- Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning [48.595574101874575]
In the real world, expert demonstrations are more likely to be imperfect.
A positive-unlabeled adversarial imitation learning algorithm is developed.
Agent policy will be optimized to cheat the discriminator and produce trajectories similar to those optimal expert demonstrations.
arXiv Detail & Related papers (2023-02-13T11:26:44Z) - Out-of-Dynamics Imitation Learning from Multimodal Demonstrations [68.46458026983409]
We study out-of-dynamics imitation learning (OOD-IL), which relaxes the assumption to that the demonstrator and the imitator have the same state spaces.
OOD-IL enables imitation learning to utilize demonstrations from a wide range of demonstrators but introduces a new challenge.
We develop a better transferability measurement to tackle this newly-emerged challenge.
arXiv Detail & Related papers (2022-11-13T07:45:06Z) - Robustness of Demonstration-based Learning Under Limited Data Scenario [54.912936555876826]
Demonstration-based learning has shown great potential in stimulating pretrained language models' ability under limited data scenario.
Why such demonstrations are beneficial for the learning process remains unclear since there is no explicit alignment between the demonstrations and the predictions.
In this paper, we design pathological demonstrations by gradually removing intuitively useful information from the standard ones to take a deep dive of the robustness of demonstration-based sequence labeling.
arXiv Detail & Related papers (2022-10-19T16:15:04Z) - Let Me Check the Examples: Enhancing Demonstration Learning via Explicit
Imitation [9.851250429233634]
Demonstration learning aims to guide the prompt prediction via providing answered demonstrations in the few shot settings.
Existing work onlycorporas the answered examples as demonstrations to the prompt template without any additional operation.
We introduce Imitation DEMOnstration Learning (Imitation-Demo) to strengthen demonstration learning via explicitly imitating human review behaviour.
arXiv Detail & Related papers (2022-08-31T06:59:36Z) - Evaluating the Effectiveness of Corrective Demonstrations and a Low-Cost
Sensor for Dexterous Manipulation [0.5669790037378094]
Imitation learning is a promising approach to help robots acquire dexterous manipulation capabilities.
We investigate characteristics of such additional demonstrations and their impact on performance.
We show that inexpensive vision-based sensors, such as LeapMotion, can be used to dramatically reduce the cost of providing demonstrations.
arXiv Detail & Related papers (2022-04-15T19:55:46Z) - Contrastive Demonstration Tuning for Pre-trained Language Models [59.90340768724675]
Demonstration examples are crucial for an excellent final performance of prompt-tuning.
The proposed approach can be: (i) Plugged into any previous prompt-tuning approaches; (ii) Extended to widespread classification tasks with a large number of categories.
Experimental results on 16 datasets illustrate that our method integrated with previous approaches LM-BFF and P-tuning can yield better performance.
arXiv Detail & Related papers (2022-04-09T05:30:48Z) - Learning from Imperfect Demonstrations from Agents with Varying Dynamics [29.94164262533282]
We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning.
Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.
arXiv Detail & Related papers (2021-03-10T07:39:38Z) - Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations
using Generative Models [18.195406135434503]
We propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential.
We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first.
In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials.
arXiv Detail & Related papers (2020-11-02T20:32:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.