Imitation Learning by Estimating Expertise of Demonstrators
- URL: http://arxiv.org/abs/2202.01288v1
- Date: Wed, 2 Feb 2022 21:23:19 GMT
- Title: Imitation Learning by Estimating Expertise of Demonstrators
- Authors: Mark Beliaev, Andy Shih, Stefano Ermon, Dorsa Sadigh, Ramtin Pedarsani
- Abstract summary: We show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms.
We develop and optimize a joint model over a learned policy and expertise levels of the demonstrators.
We illustrate our findings on real-robotic continuous control tasks from Robomimic and discrete environments such as MiniGrid and chess.
- Score: 92.20185160311036
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many existing imitation learning datasets are collected from multiple
demonstrators, each with different expertise at different parts of the
environment. Yet, standard imitation learning algorithms typically treat all
demonstrators as homogeneous, regardless of their expertise, absorbing the
weaknesses of any suboptimal demonstrators. In this work, we show that
unsupervised learning over demonstrator expertise can lead to a consistent
boost in the performance of imitation learning algorithms. We develop and
optimize a joint model over a learned policy and expertise levels of the
demonstrators. This enables our model to learn from the optimal behavior and
filter out the suboptimal behavior of each demonstrator. Our model learns a
single policy that can outperform even the best demonstrator, and can be used
to estimate the expertise of any demonstrator at any state. We illustrate our
findings on real-robotic continuous control tasks from Robomimic and discrete
environments such as MiniGrid and chess, out-performing competing methods in
$21$ out of $23$ settings, with an average of $7\%$ and up to $60\%$
improvement in terms of the final reward.
Related papers
- Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - Sample-efficient Adversarial Imitation Learning [45.400080101596956]
We propose a self-supervised representation-based adversarial imitation learning method to learn state and action representations.
We show a 39% relative improvement over existing adversarial imitation learning methods on MuJoCo in a setting limited to 100 expert state-action pairs.
arXiv Detail & Related papers (2023-03-14T12:36:01Z) - Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning [48.595574101874575]
In the real world, expert demonstrations are more likely to be imperfect.
A positive-unlabeled adversarial imitation learning algorithm is developed.
Agent policy will be optimized to cheat the discriminator and produce trajectories similar to those optimal expert demonstrations.
arXiv Detail & Related papers (2023-02-13T11:26:44Z) - Out-of-Dynamics Imitation Learning from Multimodal Demonstrations [68.46458026983409]
We study out-of-dynamics imitation learning (OOD-IL), which relaxes the assumption to that the demonstrator and the imitator have the same state spaces.
OOD-IL enables imitation learning to utilize demonstrations from a wide range of demonstrators but introduces a new challenge.
We develop a better transferability measurement to tackle this newly-emerged challenge.
arXiv Detail & Related papers (2022-11-13T07:45:06Z) - Confidence-Aware Imitation Learning from Demonstrations with Varying
Optimality [30.51436098631477]
Confidence-Aware Imitation Learning (CAIL) learns a well-performing policy from confidence-reweighted demonstrations.
We provide theoretical guarantees on the convergence of CAIL and evaluate its performance in both simulated and real robot experiments.
arXiv Detail & Related papers (2021-10-27T20:29:38Z) - Learning from Imperfect Demonstrations from Agents with Varying Dynamics [29.94164262533282]
We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning.
Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.
arXiv Detail & Related papers (2021-03-10T07:39:38Z) - Intrinsic Reward Driven Imitation Learning via Generative Model [48.97800481338626]
Most inverse reinforcement learning (IRL) methods fail to outperform the demonstrator in a high-dimensional environment.
We propose a novel reward learning module to generate intrinsic reward signals via a generative model.
Empirical results show that our method outperforms state-of-the-art IRL methods on multiple Atari games, even with one-life demonstration.
arXiv Detail & Related papers (2020-06-26T15:39:40Z) - Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z) - Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [78.94386823185724]
Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations.
In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive.
We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
arXiv Detail & Related papers (2020-04-01T15:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.