Support-weighted Adversarial Imitation Learning
- URL: http://arxiv.org/abs/2002.08803v1
- Date: Thu, 20 Feb 2020 15:34:30 GMT
- Title: Support-weighted Adversarial Imitation Learning
- Authors: Ruohan Wang, Carlo Ciliberto, Pierluigi Amadori, Yiannis Demiris
- Abstract summary: Adversarial Imitation Learning (AIL) is a family of imitation learning methods designed to mimic expert behaviors from demonstrations.
We propose Support-weighted Adversarial Imitation Learning (SAIL), a general framework that extends a given AIL algorithm with information derived from support estimation of the expert policies.
We show that the proposed method achieves better performance and training stability than baseline methods on a wide range of benchmark control tasks.
- Score: 39.42395724783555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial Imitation Learning (AIL) is a broad family of imitation learning
methods designed to mimic expert behaviors from demonstrations. While AIL has
shown state-of-the-art performance on imitation learning with only small number
of demonstrations, it faces several practical challenges such as potential
training instability and implicit reward bias. To address the challenges, we
propose Support-weighted Adversarial Imitation Learning (SAIL), a general
framework that extends a given AIL algorithm with information derived from
support estimation of the expert policies. SAIL improves the quality of the
reinforcement signals by weighing the adversarial reward with a confidence
score from support estimation of the expert policy. We also show that SAIL is
always at least as efficient as the underlying AIL algorithm that SAIL uses for
learning the adversarial reward. Empirically, we show that the proposed method
achieves better performance and training stability than baseline methods on a
wide range of benchmark control tasks.
Related papers
- Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment [63.15719512614899]
Refusal Training (RT) struggles to generalize against various OOD jailbreaking attacks.
We observe significant improvements on generalization as N increases.
We propose training model to perform safety reasoning for each query.
arXiv Detail & Related papers (2025-02-06T13:01:44Z) - Towards Effective Evaluations and Comparisons for LLM Unlearning Methods [97.2995389188179]
This paper seeks to refine the evaluation of machine unlearning for large language models.
It addresses two key challenges -- the robustness of evaluation metrics and the trade-offs between competing goals.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment [62.05713042908654]
We introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges.
We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals.
Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model for AfD.
arXiv Detail & Related papers (2024-05-24T15:13:53Z) - Sample-efficient Adversarial Imitation Learning [45.400080101596956]
We propose a self-supervised representation-based adversarial imitation learning method to learn state and action representations.
We show a 39% relative improvement over existing adversarial imitation learning methods on MuJoCo in a setting limited to 100 expert state-action pairs.
arXiv Detail & Related papers (2023-03-14T12:36:01Z) - Imitating Past Successes can be Very Suboptimal [145.70788608016755]
We show that existing outcome-conditioned imitation learning methods do not necessarily improve the policy.
We show that a simple modification results in a method that does guarantee policy improvement.
Our aim is not to develop an entirely new method, but rather to explain how a variant of outcome-conditioned imitation learning can be used to maximize rewards.
arXiv Detail & Related papers (2022-06-07T15:13:43Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Confidence-Aware Imitation Learning from Demonstrations with Varying
Optimality [30.51436098631477]
Confidence-Aware Imitation Learning (CAIL) learns a well-performing policy from confidence-reweighted demonstrations.
We provide theoretical guarantees on the convergence of CAIL and evaluate its performance in both simulated and real robot experiments.
arXiv Detail & Related papers (2021-10-27T20:29:38Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - GAN-Based Interactive Reinforcement Learning from Demonstration and
Human Evaluative Feedback [6.367592686247906]
We propose GAN-Based Interactive Reinforcement Learning (GAIRL) from demonstration and human evaluative feedback.
We tested our proposed method in six physics-based control tasks.
arXiv Detail & Related papers (2021-04-14T02:58:51Z) - Self-Imitation Advantage Learning [43.8107780378031]
Self-imitation learning is a Reinforcement Learning method that encourages actions whose returns were higher than expected.
We propose a novel generalization of self-imitation learning for off-policy RL, based on a modification of the Bellman optimality operator.
arXiv Detail & Related papers (2020-12-22T13:21:50Z) - Off-Policy Adversarial Inverse Reinforcement Learning [0.0]
Adversarial Imitation Learning (AIL) is a class of algorithms in Reinforcement learning (RL)
We propose an Off-Policy Adversarial Inverse Reinforcement Learning (Off-policy-AIRL) algorithm which is sample efficient as well as gives good imitation performance.
arXiv Detail & Related papers (2020-05-03T16:51:40Z) - Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [78.94386823185724]
Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations.
In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive.
We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
arXiv Detail & Related papers (2020-04-01T15:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.