Confidence-Aware Imitation Learning from Demonstrations with Varying
Optimality
- URL: http://arxiv.org/abs/2110.14754v1
- Date: Wed, 27 Oct 2021 20:29:38 GMT
- Title: Confidence-Aware Imitation Learning from Demonstrations with Varying
Optimality
- Authors: Songyuan Zhang, Zhangjie Cao, Dorsa Sadigh, Yanan Sui
- Abstract summary: Confidence-Aware Imitation Learning (CAIL) learns a well-performing policy from confidence-reweighted demonstrations.
We provide theoretical guarantees on the convergence of CAIL and evaluate its performance in both simulated and real robot experiments.
- Score: 30.51436098631477
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing imitation learning approaches assume the demonstrations are
drawn from experts who are optimal, but relaxing this assumption enables us to
use a wider range of data. Standard imitation learning may learn a suboptimal
policy from demonstrations with varying optimality. Prior works use confidence
scores or rankings to capture beneficial information from demonstrations with
varying optimality, but they suffer from many limitations, e.g., manually
annotated confidence scores or high average optimality of demonstrations. In
this paper, we propose a general framework to learn from demonstrations with
varying optimality that jointly learns the confidence score and a
well-performing policy. Our approach, Confidence-Aware Imitation Learning
(CAIL) learns a well-performing policy from confidence-reweighted
demonstrations, while using an outer loss to track the performance of our model
and to learn the confidence. We provide theoretical guarantees on the
convergence of CAIL and evaluate its performance in both simulated and real
robot experiments. Our results show that CAIL significantly outperforms other
imitation learning methods from demonstrations with varying optimality. We
further show that even without access to any optimal demonstrations, CAIL can
still learn a successful policy, and outperforms prior work.
Related papers
- Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - "Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations [3.637365301757111]
Methods like Reinforcement Learning from Expert Demonstrations (RLED) introduce external expert demonstrations to facilitate agent exploration during the learning process.
How to select the best set of human demonstrations that is most beneficial for learning becomes a major concern.
This paper presents EARLY, an algorithm that enables a learning agent to generate optimized queries of expert demonstrations in a trajectory-based feature space.
arXiv Detail & Related papers (2024-06-05T08:52:21Z) - Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning [48.595574101874575]
In the real world, expert demonstrations are more likely to be imperfect.
A positive-unlabeled adversarial imitation learning algorithm is developed.
Agent policy will be optimized to cheat the discriminator and produce trajectories similar to those optimal expert demonstrations.
arXiv Detail & Related papers (2023-02-13T11:26:44Z) - Imitating Past Successes can be Very Suboptimal [145.70788608016755]
We show that existing outcome-conditioned imitation learning methods do not necessarily improve the policy.
We show that a simple modification results in a method that does guarantee policy improvement.
Our aim is not to develop an entirely new method, but rather to explain how a variant of outcome-conditioned imitation learning can be used to maximize rewards.
arXiv Detail & Related papers (2022-06-07T15:13:43Z) - Imitation Learning by State-Only Distribution Matching [2.580765958706854]
Imitation Learning from observation describes policy learning in a similar way to human learning.
We propose a non-adversarial learning-from-observations approach, together with an interpretable convergence and performance metric.
arXiv Detail & Related papers (2022-02-09T08:38:50Z) - Learning from Imperfect Demonstrations via Adversarial Confidence
Transfer [44.14553613304978]
We study the problem of learning from imperfect demonstrations by learning a confidence predictor.
We learn a common latent space through adversarial distribution matching of multi-length partial trajectories.
Our experiments in three simulated environments and a real robot reaching task demonstrate that our approach learns a policy with the highest expected return.
arXiv Detail & Related papers (2022-02-07T06:33:35Z) - Imitation Learning by Estimating Expertise of Demonstrators [92.20185160311036]
We show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms.
We develop and optimize a joint model over a learned policy and expertise levels of the demonstrators.
We illustrate our findings on real-robotic continuous control tasks from Robomimic and discrete environments such as MiniGrid and chess.
arXiv Detail & Related papers (2022-02-02T21:23:19Z) - Learning the Truth From Only One Side of the Story [58.65439277460011]
We focus on generalized linear models and show that without adjusting for this sampling bias, the model may converge suboptimally or even fail to converge to the optimal solution.
We propose an adaptive approach that comes with theoretical guarantees and show that it outperforms several existing methods empirically.
arXiv Detail & Related papers (2020-06-08T18:20:28Z) - Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [78.94386823185724]
Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations.
In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive.
We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
arXiv Detail & Related papers (2020-04-01T15:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.