Imitation from Diverse Behaviors: Wasserstein Quality Diversity Imitation Learning with Single-Step Archive Exploration
- URL: http://arxiv.org/abs/2411.06965v1
- Date: Mon, 11 Nov 2024 13:11:18 GMT
- Title: Imitation from Diverse Behaviors: Wasserstein Quality Diversity Imitation Learning with Single-Step Archive Exploration
- Authors: Xingrui Yu, Zhenglin Wan, David Mark Bossens, Yueming Lyu, Qing Guo, Ivor W. Tsang,
- Abstract summary: This work introduces Wasserstein Quality Diversity Imitation Learning (WQDIL)
It improves the stability of imitation learning in the quality diversity setting with latent adversarial training based on a Wasserstein Auto-Encoder (WAE)
It mitigates a behavior-overfitting issue using a measure-conditioned reward function with a single-step archive exploration bonus.
- Score: 37.836675202590406
- License:
- Abstract: Learning diverse and high-performance behaviors from a limited set of demonstrations is a grand challenge. Traditional imitation learning methods usually fail in this task because most of them are designed to learn one specific behavior even with multiple demonstrations. Therefore, novel techniques for quality diversity imitation learning are needed to solve the above challenge. This work introduces Wasserstein Quality Diversity Imitation Learning (WQDIL), which 1) improves the stability of imitation learning in the quality diversity setting with latent adversarial training based on a Wasserstein Auto-Encoder (WAE), and 2) mitigates a behavior-overfitting issue using a measure-conditioned reward function with a single-step archive exploration bonus. Empirically, our method significantly outperforms state-of-the-art IL methods, achieving near-expert or beyond-expert QD performance on the challenging continuous control tasks derived from MuJoCo environments.
Related papers
- Quality Diversity Imitation Learning [9.627530753815968]
We introduce the first generic framework for Quality Diversity Imitation Learning (QD-IL)
Our framework integrates the principles of quality diversity with adversarial imitation learning (AIL) methods, and can potentially improve any inverse reinforcement learning (IRL) method.
Our method even achieves 2x expert performance in the most challenging Humanoid environment.
arXiv Detail & Related papers (2024-10-08T15:49:33Z) - Explorative Imitation Learning: A Path Signature Approach for Continuous Environments [9.416194245966022]
Continuous Imitation Learning from Observation (CILO) is a new method augmenting imitation learning with two important features.
CILO exploration allows for more diverse state transitions, requiring less expert trajectories and resulting in fewer training iterations.
It has the best overall performance of all imitation learning methods in all environments, outperforming the expert in two of them.
arXiv Detail & Related papers (2024-07-05T20:25:39Z) - Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics [7.600968522331612]
Quality-Diversity Actor-Critic (QDAC) is an off-policy actor-critic deep reinforcement learning algorithm.
Compared with other Quality-Diversity methods, QDAC significantly higher performance and more diverse behaviors.
We also demonstrate that we can harness the learned skills to adapt better than other baselines to five perturbed environments.
arXiv Detail & Related papers (2024-03-15T00:09:47Z) - Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - Learning Options via Compression [62.55893046218824]
We propose a new objective that combines the maximum likelihood objective with a penalty on the description length of the skills.
Our objective learns skills that solve downstream tasks in fewer samples compared to skills learned from only maximizing likelihood.
arXiv Detail & Related papers (2022-12-08T22:34:59Z) - Diverse Imitation Learning via Self-Organizing Generative Models [6.783186172518836]
Imitation learning is the task of replicating expert policy from demonstrations, without access to a reward function.
We adopt an encoder-free generative model for behavior cloning (BC) to accurately distinguish and imitate different modes.
We show that our method significantly outperforms the state of the art across multiple experiments.
arXiv Detail & Related papers (2022-05-06T21:55:31Z) - Seeing Differently, Acting Similarly: Imitation Learning with
Heterogeneous Observations [126.78199124026398]
In many real-world imitation learning tasks, the demonstrator and the learner have to act in different but full observation spaces.
In this work, we model the above learning problem as Heterogeneous Observations Learning (HOIL)
We propose the Importance Weighting with REjection (IWRE) algorithm based on the techniques of importance-weighting, learning with rejection, and active querying to solve the key challenge of occupancy measure matching.
arXiv Detail & Related papers (2021-06-17T05:44:04Z) - Soft Hindsight Experience Replay [77.99182201815763]
Soft Hindsight Experience Replay (SHER) is a novel approach based on HER and Maximum Entropy Reinforcement Learning (MERL)
We evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards.
arXiv Detail & Related papers (2020-02-06T03:57:04Z) - Gradient Surgery for Multi-Task Learning [119.675492088251]
Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks.
The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood.
We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient.
arXiv Detail & Related papers (2020-01-19T06:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.