Empirical analysis of PGA-MAP-Elites for Neuroevolution in Uncertain
Domains
- URL: http://arxiv.org/abs/2210.13156v1
- Date: Mon, 24 Oct 2022 12:17:18 GMT
- Title: Empirical analysis of PGA-MAP-Elites for Neuroevolution in Uncertain
Domains
- Authors: Manon Flageat, Felix Chalumeau, and Antoine Cully
- Abstract summary: We show that PGA-MAP-Elites is highly performant in both deterministic and uncertain high-dimensional environments.
In addition to outperforming all the considered baselines, the collections of solutions generated by PGA-MAP-Elites are highly reproducible in uncertain environments.
- Score: 1.376408511310322
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quality-Diversity algorithms, among which MAP-Elites, have emerged as
powerful alternatives to performance-only optimisation approaches as they
enable generating collections of diverse and high-performing solutions to an
optimisation problem. However, they are often limited to low-dimensional search
spaces and deterministic environments. The recently introduced Policy Gradient
Assisted MAP-Elites (PGA-MAP-Elites) algorithm overcomes this limitation by
pairing the traditional Genetic operator of MAP-Elites with a gradient-based
operator inspired by Deep Reinforcement Learning. This new operator guides
mutations toward high-performing solutions using policy-gradients. In this
work, we propose an in-depth study of PGA-MAP-Elites. We demonstrate the
benefits of policy-gradients on the performance of the algorithm and the
reproducibility of the generated solutions when considering uncertain domains.
We first prove that PGA-MAP-Elites is highly performant in both deterministic
and uncertain high-dimensional environments, decorrelating the two challenges
it tackles. Secondly, we show that in addition to outperforming all the
considered baselines, the collections of solutions generated by PGA-MAP-Elites
are highly reproducible in uncertain environments, approaching the
reproducibility of solutions found by Quality-Diversity approaches built
specifically for uncertain applications. Finally, we propose an ablation and
in-depth analysis of the dynamic of the policy-gradients-based variation. We
demonstrate that the policy-gradient variation operator is determinant to
guarantee the performance of PGA-MAP-Elites but is only essential during the
early stage of the process, where it finds high-performing regions of the
search space.
Related papers
- Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Surpassing legacy approaches to PWR core reload optimization with single-objective Reinforcement learning [0.0]
We have developed methods based on Deep Reinforcement Learning (DRL) for both single- and multi-objective optimization.
In this paper, we demonstrate the advantage of our RL-based approach, specifically using Proximal Policy Optimization (PPO)
PPO adapts its search capability via a policy with learnable weights, allowing it to function as both a global and local search method.
arXiv Detail & Related papers (2024-02-16T19:35:58Z) - Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning [4.851070356054758]
Quality-Diversity algorithms are evolutionary methods designed to generate a set of diverse and high-fitness solutions.
As a genetic algorithm, MAP-Elites relies on random mutations, which can become inefficient in high-dimensional search spaces.
We introduce DCRL-MAP-Elites, an extension of DCG-MAP-Elites that utilizes the descriptor-conditioned actor as a generative model.
arXiv Detail & Related papers (2023-12-10T19:53:15Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - MAP-Elites with Descriptor-Conditioned Gradients and Archive
Distillation into a Single Policy [1.376408511310322]
DCG-MAP-Elites improves the QD score over PGA-MAP-Elites by 82% on average, on a set of challenging locomotion tasks.
Our algorithm, DCG-MAP-Elites improves the QD score over PGA-MAP-Elites by 82% on average, on a set of challenging locomotion tasks.
arXiv Detail & Related papers (2023-03-07T11:58:01Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Approximating Gradients for Differentiable Quality Diversity in
Reinforcement Learning [8.591356221688773]
Differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available for the objective and measures.
We develop two variants of the DQD algorithm CMA-MEGA, each with different gradient approximations, and evaluate them on four simulated walking tasks.
One variant achieves comparable performance (QD score) with the state-of-the-art PGA-MAP-Elites in two tasks. The other variant performs comparably in all tasks but is less efficient than PGA-MAP-Elites in two tasks.
arXiv Detail & Related papers (2022-02-08T05:53:55Z) - Result Diversification by Multi-objective Evolutionary Algorithms with
Theoretical Guarantees [94.72461292387146]
We propose to reformulate the result diversification problem as a bi-objective search problem, and solve it by a multi-objective evolutionary algorithm (EA)
We theoretically prove that the GSEMO can achieve the optimal-time approximation ratio, $1/2$.
When the objective function changes dynamically, the GSEMO can maintain this approximation ratio in running time, addressing the open question proposed by Borodin et al.
arXiv Detail & Related papers (2021-10-18T14:00:22Z) - Deep Reinforcement Learning for Field Development Optimization [0.0]
In this work, the goal is to apply convolutional neural network-based (CNN) deep reinforcement learning (DRL) algorithms to the field development optimization problem.
The proximal policy optimization (PPO) algorithm is considered with two CNN architectures of varying number of layers and composition.
Both networks obtained policies that provide satisfactory results when compared to a hybrid particle swarm optimization - mesh adaptive direct search (PSO-MADS) algorithm.
arXiv Detail & Related papers (2020-08-05T06:26:13Z) - Implementation Matters in Deep Policy Gradients: A Case Study on PPO and
TRPO [90.90009491366273]
We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms.
Specifically, we investigate the consequences of "code-level optimizations:"
Our results show that they (a) are responsible for most of PPO's gain in cumulative reward over TRPO, and (b) fundamentally change how RL methods function.
arXiv Detail & Related papers (2020-05-25T16:24:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.