Good Better Best: Self-Motivated Imitation Learning for noisy
Demonstrations
- URL: http://arxiv.org/abs/2310.15815v1
- Date: Tue, 24 Oct 2023 13:09:56 GMT
- Title: Good Better Best: Self-Motivated Imitation Learning for noisy
Demonstrations
- Authors: Ye Yuan, Xin Li, Yong Heng, Leiji Zhang, MingZhong Wang
- Abstract summary: Imitation Learning aims to discover a policy by minimizing the discrepancy between the agent's behavior and expert demonstrations.
In this paper, we introduce Self-Motivated Imitation LEarning (SMILE), a method capable of progressively filtering out demonstrations collected by policies deemed inferior to the current policy.
- Score: 12.627982138086892
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation Learning (IL) aims to discover a policy by minimizing the
discrepancy between the agent's behavior and expert demonstrations. However, IL
is susceptible to limitations imposed by noisy demonstrations from non-expert
behaviors, presenting a significant challenge due to the lack of supplementary
information to assess their expertise. In this paper, we introduce
Self-Motivated Imitation LEarning (SMILE), a method capable of progressively
filtering out demonstrations collected by policies deemed inferior to the
current policy, eliminating the need for additional information. We utilize the
forward and reverse processes of Diffusion Models to emulate the shift in
demonstration expertise from low to high and vice versa, thereby extracting the
noise information that diffuses expertise. Then, the noise information is
leveraged to predict the diffusion steps between the current policy and
demonstrators, which we theoretically demonstrate its equivalence to their
expertise gap. We further explain in detail how the predicted diffusion steps
are applied to filter out noisy demonstrations in a self-motivated manner and
provide its theoretical grounds. Through empirical evaluations on MuJoCo tasks,
we demonstrate that our method is proficient in learning the expert policy
amidst noisy demonstrations, and effectively filters out demonstrations with
expertise inferior to the current policy.
Related papers
- Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment [62.05713042908654]
We introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges.
We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals.
Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model for AfD.
arXiv Detail & Related papers (2024-05-24T15:13:53Z) - Imitation Learning from Purified Demonstrations [47.52316615371601]
We propose to purify the potential noises in imperfect demonstrations first, and subsequently conduct imitation learning from these purified demonstrations.
We provide theoretical evidence supporting our approach, demonstrating that the distance between the purified and optimal demonstration can be bounded.
arXiv Detail & Related papers (2023-10-11T02:36:52Z) - Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning [48.595574101874575]
In the real world, expert demonstrations are more likely to be imperfect.
A positive-unlabeled adversarial imitation learning algorithm is developed.
Agent policy will be optimized to cheat the discriminator and produce trajectories similar to those optimal expert demonstrations.
arXiv Detail & Related papers (2023-02-13T11:26:44Z) - Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit.
We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner.
Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z) - Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations
using Generative Models [18.195406135434503]
We propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential.
We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first.
In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials.
arXiv Detail & Related papers (2020-11-02T20:32:05Z) - Learn to Exceed: Stereo Inverse Reinforcement Learning with Concurrent
Policy Optimization [1.0965065178451106]
We study the problem of obtaining a control policy that can mimic and then outperform expert demonstrations in Markov decision processes.
One main relevant approach is the inverse reinforcement learning (IRL), which mainly focuses on inferring a reward function from expert demonstrations.
We propose a novel method that enables the learning agent to outperform the demonstrator via a new concurrent reward and action policy learning approach.
arXiv Detail & Related papers (2020-09-21T02:16:21Z) - Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z) - State-Only Imitation Learning for Dexterous Manipulation [63.03621861920732]
In this paper, we explore state-only imitation learning.
We train an inverse dynamics model and use it to predict actions for state-only demonstrations.
Our method performs on par with state-action approaches and considerably outperforms RL alone.
arXiv Detail & Related papers (2020-04-07T17:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.