MAP-Elites with Descriptor-Conditioned Gradients and Archive
Distillation into a Single Policy
- URL: http://arxiv.org/abs/2303.03832v1
- Date: Tue, 7 Mar 2023 11:58:01 GMT
- Title: MAP-Elites with Descriptor-Conditioned Gradients and Archive
Distillation into a Single Policy
- Authors: Maxence Faldor and F\'elix Chalumeau and Manon Flageat and Antoine
Cully
- Abstract summary: DCG-MAP-Elites improves the QD score over PGA-MAP-Elites by 82% on average, on a set of challenging locomotion tasks.
Our algorithm, DCG-MAP-Elites improves the QD score over PGA-MAP-Elites by 82% on average, on a set of challenging locomotion tasks.
- Score: 1.376408511310322
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quality-Diversity algorithms, such as MAP-Elites, are a branch of
Evolutionary Computation generating collections of diverse and high-performing
solutions, that have been successfully applied to a variety of domains and
particularly in evolutionary robotics. However, MAP-Elites performs a divergent
search based on random mutations originating from Genetic Algorithms, and thus,
is limited to evolving populations of low-dimensional solutions. PGA-MAP-Elites
overcomes this limitation by integrating a gradient-based variation operator
inspired by Deep Reinforcement Learning which enables the evolution of large
neural networks. Although high-performing in many environments, PGA-MAP-Elites
fails on several tasks where the convergent search of the gradient-based
operator does not direct mutations towards archive-improving solutions. In this
work, we present two contributions: (1) we enhance the Policy Gradient
variation operator with a descriptor-conditioned critic that improves the
archive across the entire descriptor space, (2) we exploit the actor-critic
training to learn a descriptor-conditioned policy at no additional cost,
distilling the knowledge of the archive into one single versatile policy that
can execute the entire range of behaviors contained in the archive. Our
algorithm, DCG-MAP-Elites improves the QD score over PGA-MAP-Elites by 82% on
average, on a set of challenging locomotion tasks.
Related papers
- Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning [4.851070356054758]
Quality-Diversity algorithms are evolutionary methods designed to generate a set of diverse and high-fitness solutions.
As a genetic algorithm, MAP-Elites relies on random mutations, which can become inefficient in high-dimensional search spaces.
We introduce DCRL-MAP-Elites, an extension of DCG-MAP-Elites that utilizes the descriptor-conditioned actor as a generative model.
arXiv Detail & Related papers (2023-12-10T19:53:15Z) - Don't Bet on Luck Alone: Enhancing Behavioral Reproducibility of
Quality-Diversity Solutions in Uncertain Domains [2.639902239625779]
We introduce Archive Reproducibility Improvement Algorithm (ARIA)
ARIA is a plug-and-play approach that improves the quality of solutions present in an archive.
We show that our algorithm enhances the quality and descriptor space coverage of any given archive by at least 50%.
arXiv Detail & Related papers (2023-04-07T14:45:14Z) - Empirical analysis of PGA-MAP-Elites for Neuroevolution in Uncertain
Domains [1.376408511310322]
We show that PGA-MAP-Elites is highly performant in both deterministic and uncertain high-dimensional environments.
In addition to outperforming all the considered baselines, the collections of solutions generated by PGA-MAP-Elites are highly reproducible in uncertain environments.
arXiv Detail & Related papers (2022-10-24T12:17:18Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Approximating Gradients for Differentiable Quality Diversity in
Reinforcement Learning [8.591356221688773]
Differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available for the objective and measures.
We develop two variants of the DQD algorithm CMA-MEGA, each with different gradient approximations, and evaluate them on four simulated walking tasks.
One variant achieves comparable performance (QD score) with the state-of-the-art PGA-MAP-Elites in two tasks. The other variant performs comparably in all tasks but is less efficient than PGA-MAP-Elites in two tasks.
arXiv Detail & Related papers (2022-02-08T05:53:55Z) - Result Diversification by Multi-objective Evolutionary Algorithms with
Theoretical Guarantees [94.72461292387146]
We propose to reformulate the result diversification problem as a bi-objective search problem, and solve it by a multi-objective evolutionary algorithm (EA)
We theoretically prove that the GSEMO can achieve the optimal-time approximation ratio, $1/2$.
When the objective function changes dynamically, the GSEMO can maintain this approximation ratio in running time, addressing the open question proposed by Borodin et al.
arXiv Detail & Related papers (2021-10-18T14:00:22Z) - AP-Loss for Accurate One-Stage Object Detection [49.13608882885456]
One-stage object detectors are trained by optimizing classification-loss and localization-loss simultaneously.
The former suffers much from extreme foreground-background imbalance due to the large number of anchors.
This paper proposes a novel framework to replace the classification task in one-stage detectors with a ranking task.
arXiv Detail & Related papers (2020-08-17T13:22:01Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Diversity Policy Gradient for Sample Efficient Quality-Diversity
Optimization [7.8499505363825755]
Aiming for diversity in addition to performance is a convenient way to deal with the exploration-exploitation trade-off.
This paper proposes a novel algorithm, QDPG, which combines the strength of Policy Gradient algorithms and Quality Diversity approaches.
arXiv Detail & Related papers (2020-06-15T16:04:06Z) - Zeroth-order Deterministic Policy Gradient [116.87117204825105]
We introduce Zeroth-order Deterministic Policy Gradient (ZDPG)
ZDPG approximates policy-reward gradients via two-point evaluations of the $Q$function.
New finite sample complexity bounds for ZDPG improve upon existing results by up to two orders of magnitude.
arXiv Detail & Related papers (2020-06-12T16:52:29Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.