Related papers: Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning

Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning

URL: http://arxiv.org/abs/2401.08632v1
Date: Sun, 10 Dec 2023 19:53:15 GMT
Title: Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning
Authors: Maxence Faldor, F\'elix Chalumeau, Manon Flageat, Antoine Cully
Abstract summary: Quality-Diversity optimization is a family of Evolutionary Algorithms, that generates collections of both diverse and high-performing solutions. MAP-Elites is a prominent example, that has been successfully applied to a variety of domains, including evolutionary robotics. We present three contributions: (1) we enhance the Policy Gradient variation operator with a descriptor-conditioned critic that reconciles diversity search with gradient-based methods, and (2) we leverage the actor-critic training to learn a descriptor-conditioned policy at no additional cost.
Score: 4.787389127632926
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A fundamental trait of intelligence involves finding novel and creative solutions to address a given challenge or to adapt to unforeseen situations. Reflecting this, Quality-Diversity optimization is a family of Evolutionary Algorithms, that generates collections of both diverse and high-performing solutions. Among these, MAP-Elites is a prominent example, that has been successfully applied to a variety of domains, including evolutionary robotics. However, MAP-Elites performs a divergent search with random mutations originating from Genetic Algorithms, and thus, is limited to evolving populations of low-dimensional solutions. PGA-MAP-Elites overcomes this limitation using a gradient-based variation operator inspired by deep reinforcement learning which enables the evolution of large neural networks. Although high-performing in many environments, PGA-MAP-Elites fails on several tasks where the convergent search of the gradient-based variation operator hinders diversity. In this work, we present three contributions: (1) we enhance the Policy Gradient variation operator with a descriptor-conditioned critic that reconciles diversity search with gradient-based methods, (2) we leverage the actor-critic training to learn a descriptor-conditioned policy at no additional cost, distilling the knowledge of the population into one single versatile policy that can execute a diversity of behaviors, (3) we exploit the descriptor-conditioned actor by injecting it in the population, despite network architecture differences. Our method, DCG-MAP-Elites, achieves equal or higher QD score and coverage compared to all baselines on seven challenging continuous control locomotion tasks.

Related papers

Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution Strategies [50.10277748405355]
Noise-Reuse Evolution Strategies (NRES) is a general class of unbiased online evolution strategies methods. We show NRES results in faster convergence than existing AD and ES methods in terms of wall-clock time and number of steps across a variety of applications.
arXiv Detail & Related papers (2023-04-21T17:53:05Z)
MAP-Elites with Descriptor-Conditioned Gradients and Archive Distillation into a Single Policy [1.376408511310322]
DCG-MAP-Elites improves the QD score over PGA-MAP-Elites by 82% on average, on a set of challenging locomotion tasks. Our algorithm, DCG-MAP-Elites improves the QD score over PGA-MAP-Elites by 82% on average, on a set of challenging locomotion tasks.
arXiv Detail & Related papers (2023-03-07T11:58:01Z)
Diversity Through Exclusion (DTE): Niche Identification for Reinforcement Learning through Value-Decomposition [63.67574523750839]
We propose a generic reinforcement learning (RL) algorithm that performs better than baseline deep Q-learning algorithms in environments with multiple variably-valued niches. We show that agents trained this way can escape poor-but-attractive local optima to instead converge to harder-to-discover higher value strategies.
arXiv Detail & Related papers (2023-02-02T16:00:19Z)
Empirical analysis of PGA-MAP-Elites for Neuroevolution in Uncertain Domains [1.376408511310322]
We show that PGA-MAP-Elites is highly performant in both deterministic and uncertain high-dimensional environments. In addition to outperforming all the considered baselines, the collections of solutions generated by PGA-MAP-Elites are highly reproducible in uncertain environments.
arXiv Detail & Related papers (2022-10-24T12:17:18Z)
Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery [12.586875201983778]
Deep Reinforcement Learning (RL) has emerged as a powerful paradigm for training neural policies to solve complex control tasks. We show that Quality Diversity (QD) methods are a competitive alternative to information-theory-augmented RL for skill discovery.
arXiv Detail & Related papers (2022-10-06T11:06:39Z)
Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning [8.591356221688773]
Differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available for the objective and measures. We develop two variants of the DQD algorithm CMA-MEGA, each with different gradient approximations, and evaluate them on four simulated walking tasks. One variant achieves comparable performance (QD score) with the state-of-the-art PGA-MAP-Elites in two tasks. The other variant performs comparably in all tasks but is less efficient than PGA-MAP-Elites in two tasks.
arXiv Detail & Related papers (2022-02-08T05:53:55Z)
Result Diversification by Multi-objective Evolutionary Algorithms with Theoretical Guarantees [94.72461292387146]
We propose to reformulate the result diversification problem as a bi-objective search problem, and solve it by a multi-objective evolutionary algorithm (EA) We theoretically prove that the GSEMO can achieve the optimal-time approximation ratio, $1/2$. When the objective function changes dynamically, the GSEMO can maintain this approximation ratio in running time, addressing the open question proposed by Borodin et al.
arXiv Detail & Related papers (2021-10-18T14:00:22Z)
Gradient Regularized Contrastive Learning for Continual Domain Adaptation [86.02012896014095]
We study the problem of continual domain adaptation, where the model is presented with a labelled source domain and a sequence of unlabelled target domains. We propose Gradient Regularized Contrastive Learning (GRCL) to solve the obstacles. Experiments on Digits, DomainNet and Office-Caltech benchmarks demonstrate the strong performance of our approach.
arXiv Detail & Related papers (2021-03-23T04:10:42Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
Diversity Policy Gradient for Sample Efficient Quality-Diversity Optimization [7.8499505363825755]
Aiming for diversity in addition to performance is a convenient way to deal with the exploration-exploitation trade-off. This paper proposes a novel algorithm, QDPG, which combines the strength of Policy Gradient algorithms and Quality Diversity approaches.
arXiv Detail & Related papers (2020-06-15T16:04:06Z)
Unsupervised Domain Adaptation in Person re-ID via k-Reciprocal Clustering and Large-Scale Heterogeneous Environment Synthesis [76.46004354572956]
We introduce an unsupervised domain adaptation approach for person re-identification. Experimental results show that the proposed ktCUDA and SHRED approach achieves an average improvement of +5.7 mAP in re-identification performance.
arXiv Detail & Related papers (2020-01-14T17:43:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.