Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations
- URL: http://arxiv.org/abs/2306.07919v1
- Date: Tue, 13 Jun 2023 17:24:37 GMT
- Title: Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations
- Authors: Tianxiang Zhao, Wenchao Yu, Suhang Wang, Lu Wang, Xiang Zhang, Yuncong
Chen, Yanchi Liu, Wei Cheng, Haifeng Chen
- Abstract summary: We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
- Score: 60.241144377865716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation learning has achieved great success in many sequential
decision-making tasks, in which a neural agent is learned by imitating
collected human demonstrations. However, existing algorithms typically require
a large number of high-quality demonstrations that are difficult and expensive
to collect. Usually, a trade-off needs to be made between demonstration quality
and quantity in practice. Targeting this problem, in this work we consider the
imitation of sub-optimal demonstrations, with both a small clean demonstration
set and a large noisy set. Some pioneering works have been proposed, but they
suffer from many limitations, e.g., assuming a demonstration to be of the same
optimality throughout time steps and failing to provide any interpretation
w.r.t knowledge learned from the noisy set. Addressing these problems, we
propose {\method} by evaluating and imitating at the sub-demonstration level,
encoding action primitives of varying quality into different skills.
Concretely, {\method} consists of a high-level controller to discover skills
and a skill-conditioned module to capture action-taking policies, and is
trained following a two-phase pipeline by first discovering skills with all
demonstrations and then adapting the controller to only the clean set. A
mutual-information-based regularization and a dynamic sub-demonstration
optimality estimator are designed to promote disentanglement in the skill
space. Extensive experiments are conducted over two gym environments and a
real-world healthcare dataset to demonstrate the superiority of {\method} in
learning from sub-optimal demonstrations and its improved interpretability by
examining learned skills.
Related papers
- Learning to Discern: Imitating Heterogeneous Human Demonstrations with
Preference and Representation Learning [12.4468604987226]
This paper introduces Learning to Discern (L2D), an offline imitation learning framework for learning from demonstrations with diverse quality and style.
We show that L2D can effectively assess and learn from varying demonstrations, thereby leading to improved policy performance across a range of tasks in both simulations and on a physical robot.
arXiv Detail & Related papers (2023-10-22T06:08:55Z) - Eliciting Compatible Demonstrations for Multi-Human Imitation Learning [16.11830547863391]
Imitation learning from human-provided demonstrations is a strong approach for learning policies for robot manipulation.
Natural human behavior has a great deal of heterogeneity, with several optimal ways to demonstrate a task.
This mismatch presents a problem for interactive imitation learning, where sequences of users improve on a policy by iteratively collecting new, possibly conflicting demonstrations.
We show that we can both identify incompatible demonstrations via post-hoc filtering, and apply our compatibility measure to actively elicit compatible demonstrations from new users.
arXiv Detail & Related papers (2022-10-14T19:37:55Z) - Extraneousness-Aware Imitation Learning [25.60384350984274]
Extraneousness-Aware Learning (EIL) learns visuomotor policies from third-person demonstrations with extraneous subsequences.
EIL learns action-conditioned observation embeddings in a self-supervised manner and retrieves task-relevant observations across visual demonstrations.
Experimental results show that EIL outperforms strong baselines and achieves comparable policies to those trained with perfect demonstration.
arXiv Detail & Related papers (2022-10-04T04:42:26Z) - Imitation Learning by Estimating Expertise of Demonstrators [92.20185160311036]
We show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms.
We develop and optimize a joint model over a learned policy and expertise levels of the demonstrators.
We illustrate our findings on real-robotic continuous control tasks from Robomimic and discrete environments such as MiniGrid and chess.
arXiv Detail & Related papers (2022-02-02T21:23:19Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - Learning from Imperfect Demonstrations from Agents with Varying Dynamics [29.94164262533282]
We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning.
Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.
arXiv Detail & Related papers (2021-03-10T07:39:38Z) - Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query.
Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories.
We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z) - Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z) - Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [78.94386823185724]
Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations.
In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive.
We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
arXiv Detail & Related papers (2020-04-01T15:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.