Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics
- URL: http://arxiv.org/abs/2403.09930v3
- Date: Mon, 3 Jun 2024 09:46:32 GMT
- Title: Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics
- Authors: Luca Grillotti, Maxence Faldor, Borja G. León, Antoine Cully,
- Abstract summary: Quality-Diversity Actor-Critic (QDAC) is an off-policy actor-critic deep reinforcement learning algorithm.
Compared with other Quality-Diversity methods, QDAC significantly higher performance and more diverse behaviors.
We also demonstrate that we can harness the learned skills to adapt better than other baselines to five perturbed environments.
- Score: 7.600968522331612
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A key aspect of intelligence is the ability to demonstrate a broad spectrum of behaviors for adapting to unexpected situations. Over the past decade, advancements in deep reinforcement learning have led to groundbreaking achievements to solve complex continuous control tasks. However, most approaches return only one solution specialized for a specific problem. We introduce Quality-Diversity Actor-Critic (QDAC), an off-policy actor-critic deep reinforcement learning algorithm that leverages a value function critic and a successor features critic to learn high-performing and diverse behaviors. In this framework, the actor optimizes an objective that seamlessly unifies both critics using constrained optimization to (1) maximize return, while (2) executing diverse skills. Compared with other Quality-Diversity methods, QDAC achieves significantly higher performance and more diverse behaviors on six challenging continuous control locomotion tasks. We also demonstrate that we can harness the learned skills to adapt better than other baselines to five perturbed environments. Finally, qualitative analyses showcase a range of remarkable behaviors: adaptive-intelligent-robotics.github.io/QDAC.
Related papers
- Imitation from Diverse Behaviors: Wasserstein Quality Diversity Imitation Learning with Single-Step Archive Exploration [37.836675202590406]
This work introduces Wasserstein Quality Diversity Imitation Learning (WQDIL)
It improves the stability of imitation learning in the quality diversity setting with latent adversarial training based on a Wasserstein Auto-Encoder (WAE)
It mitigates a behavior-overfitting issue using a measure-conditioned reward function with a single-step archive exploration bonus.
arXiv Detail & Related papers (2024-11-11T13:11:18Z) - Quality Diversity Imitation Learning [9.627530753815968]
We introduce the first generic framework for Quality Diversity Imitation Learning (QD-IL)
Our framework integrates the principles of quality diversity with adversarial imitation learning (AIL) methods, and can potentially improve any inverse reinforcement learning (IRL) method.
Our method even achieves 2x expert performance in the most challenging Humanoid environment.
arXiv Detail & Related papers (2024-10-08T15:49:33Z) - Testing for Fault Diversity in Reinforcement Learning [13.133263651395865]
We argue that policy testing should not find as many failures as possible (e.g., inputs that trigger similar car crashes) but rather aim at revealing as informative and diverse faults as possible in the model.
We show that QD optimisation, while being conceptually simple and generally applicable, finds effectively more diverse faults in the decision model.
arXiv Detail & Related papers (2024-03-22T09:46:30Z) - On Task Performance and Model Calibration with Supervised and
Self-Ensembled In-Context Learning [71.44986275228747]
In-context learning (ICL) has become an efficient approach propelled by the recent advancements in large language models (LLMs)
However, both paradigms are prone to suffer from the critical problem of overconfidence (i.e., miscalibration)
arXiv Detail & Related papers (2023-12-21T11:55:10Z) - DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement
Learning [84.22561239481901]
We propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents.
We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement.
arXiv Detail & Related papers (2023-12-10T06:03:57Z) - Proximal Policy Gradient Arborescence for Quality Diversity
Reinforcement Learning [14.16864939687988]
Training generally capable agents that thoroughly explore their environment and learn new and diverse skills is a long-term goal of robot learning.
Quality Diversity Reinforcement Learning (QD-RL) is an emerging research area that blends the best aspects of both fields.
arXiv Detail & Related papers (2023-05-23T08:05:59Z) - Learning Options via Compression [62.55893046218824]
We propose a new objective that combines the maximum likelihood objective with a penalty on the description length of the skills.
Our objective learns skills that solve downstream tasks in fewer samples compared to skills learned from only maximizing likelihood.
arXiv Detail & Related papers (2022-12-08T22:34:59Z) - Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods.
By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z) - Simultaneous Double Q-learning with Conservative Advantage Learning for
Actor-Critic Methods [133.85604983925282]
We propose Simultaneous Double Q-learning with Conservative Advantage Learning (SDQ-CAL)
Our algorithm realizes less biased value estimation and achieves state-of-the-art performance in a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2022-05-08T09:17:16Z) - Learning to Walk Autonomously via Reset-Free Quality-Diversity [73.08073762433376]
Quality-Diversity algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills.
Existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions.
This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments.
arXiv Detail & Related papers (2022-04-07T14:07:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.