Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement
Learning
- URL: http://arxiv.org/abs/2401.08632v1
- Date: Sun, 10 Dec 2023 19:53:15 GMT
- Title: Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement
Learning
- Authors: Maxence Faldor, F\'elix Chalumeau, Manon Flageat, Antoine Cully
- Abstract summary: Quality-Diversity optimization is a family of Evolutionary Algorithms, that generates collections of both diverse and high-performing solutions.
MAP-Elites is a prominent example, that has been successfully applied to a variety of domains, including evolutionary robotics.
We present three contributions: (1) we enhance the Policy Gradient variation operator with a descriptor-conditioned critic that reconciles diversity search with gradient-based methods, and (2) we leverage the actor-critic training to learn a descriptor-conditioned policy at no additional cost.
- Score: 4.787389127632926
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A fundamental trait of intelligence involves finding novel and creative
solutions to address a given challenge or to adapt to unforeseen situations.
Reflecting this, Quality-Diversity optimization is a family of Evolutionary
Algorithms, that generates collections of both diverse and high-performing
solutions. Among these, MAP-Elites is a prominent example, that has been
successfully applied to a variety of domains, including evolutionary robotics.
However, MAP-Elites performs a divergent search with random mutations
originating from Genetic Algorithms, and thus, is limited to evolving
populations of low-dimensional solutions. PGA-MAP-Elites overcomes this
limitation using a gradient-based variation operator inspired by deep
reinforcement learning which enables the evolution of large neural networks.
Although high-performing in many environments, PGA-MAP-Elites fails on several
tasks where the convergent search of the gradient-based variation operator
hinders diversity. In this work, we present three contributions: (1) we enhance
the Policy Gradient variation operator with a descriptor-conditioned critic
that reconciles diversity search with gradient-based methods, (2) we leverage
the actor-critic training to learn a descriptor-conditioned policy at no
additional cost, distilling the knowledge of the population into one single
versatile policy that can execute a diversity of behaviors, (3) we exploit the
descriptor-conditioned actor by injecting it in the population, despite network
architecture differences. Our method, DCG-MAP-Elites, achieves equal or higher
QD score and coverage compared to all baselines on seven challenging continuous
control locomotion tasks.
Related papers
- AlphaEvolve: A coding agent for scientific and algorithmic discovery [63.13852052551106]
We present AlphaEvolve, an evolutionary coding agent that substantially enhances capabilities of state-of-the-art LLMs.<n>AlphaEvolve orchestrates an autonomous pipeline of LLMs, whose task is to improve an algorithm by making direct changes to the code.<n>We demonstrate the broad applicability of this approach by applying it to a number of important computational problems.
arXiv Detail & Related papers (2025-06-16T06:37:18Z) - Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization [25.633698252033756]
We propose the Evolutionary Augmentation Mechanism (EAM) to synergize the learning efficiency of DRL with the global search power of GAs.<n>EAM operates by generating solutions from a learned policy and refining them through domain-specific genetic operations such as crossover and mutation.<n>EAM can be seamlessly integrated with state-of-the-art DRL solvers such as the Attention Model, POMO, and SymNCO.
arXiv Detail & Related papers (2025-06-11T05:17:30Z) - Evolution-based Region Adversarial Prompt Learning for Robustness Enhancement in Vision-Language Models [52.8949080772873]
We propose an evolution-based region adversarial prompt tuning method called ER-APT.
In each training iteration, we first generate AEs using traditional gradient-based methods.
Subsequently, a genetic evolution mechanism incorporating selection, mutation, and crossover is applied to optimize the AEs.
The final evolved AEs are used for prompt tuning, achieving region-based adversarial optimization instead of conventional single-point adversarial prompt tuning.
arXiv Detail & Related papers (2025-03-17T07:08:47Z) - Exploring the Generalization Capabilities of AID-based Bi-level Optimization [50.3142765099442]
We present two types of bi-level optimization methods: approximate implicit differentiation (AID)-based and iterative differentiation (D)-based approaches.
AID-based methods cannot be easily transformed but must stay in the two-level structure.
We demonstrate the effectiveness and potential applications of these methods on real-world tasks.
arXiv Detail & Related papers (2024-11-25T04:22:17Z) - Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on Light-Weighed Backbones and Effective Measurement of Multi-Task Learning Challenges by Feature Disentanglement [69.51496713076253]
In this paper, we focus on the aforementioned efficiency aspects of existing MTL methods.
We first carry out large-scale experiments of the methods with smaller backbones and on a the MetaGraspNet dataset as a new test ground.
We also propose Feature Disentanglement measure as a novel and efficient identifier of the challenges in MTL.
arXiv Detail & Related papers (2024-02-05T22:15:55Z) - GE-AdvGAN: Improving the transferability of adversarial samples by
gradient editing-based adversarial generative model [69.71629949747884]
Adversarial generative models, such as Generative Adversarial Networks (GANs), are widely applied for generating various types of data.
In this work, we propose a novel algorithm named GE-AdvGAN to enhance the transferability of adversarial samples.
arXiv Detail & Related papers (2024-01-11T16:43:16Z) - Reinforcement Learning-assisted Evolutionary Algorithm: A Survey and
Research Opportunities [63.258517066104446]
Reinforcement learning integrated as a component in the evolutionary algorithm has demonstrated superior performance in recent years.
We discuss the RL-EA integration method, the RL-assisted strategy adopted by RL-EA, and its applications according to the existing literature.
In the applications of RL-EA section, we also demonstrate the excellent performance of RL-EA on several benchmarks and a range of public datasets.
arXiv Detail & Related papers (2023-08-25T15:06:05Z) - A Reinforcement Learning-assisted Genetic Programming Algorithm for Team
Formation Problem Considering Person-Job Matching [70.28786574064694]
A reinforcement learning-assisted genetic programming algorithm (RL-GP) is proposed to enhance the quality of solutions.
The hyper-heuristic rules obtained through efficient learning can be utilized as decision-making aids when forming project teams.
arXiv Detail & Related papers (2023-04-08T14:32:12Z) - MAP-Elites with Descriptor-Conditioned Gradients and Archive
Distillation into a Single Policy [1.376408511310322]
DCG-MAP-Elites improves the QD score over PGA-MAP-Elites by 82% on average, on a set of challenging locomotion tasks.
Our algorithm, DCG-MAP-Elites improves the QD score over PGA-MAP-Elites by 82% on average, on a set of challenging locomotion tasks.
arXiv Detail & Related papers (2023-03-07T11:58:01Z) - Empirical analysis of PGA-MAP-Elites for Neuroevolution in Uncertain
Domains [1.376408511310322]
We show that PGA-MAP-Elites is highly performant in both deterministic and uncertain high-dimensional environments.
In addition to outperforming all the considered baselines, the collections of solutions generated by PGA-MAP-Elites are highly reproducible in uncertain environments.
arXiv Detail & Related papers (2022-10-24T12:17:18Z) - Self-Referential Quality Diversity Through Differential Map-Elites [5.2508303190856624]
Differential MAP-Elites is a novel algorithm that combines the illumination capacity of computation-MAP-Elites with the continuous-space optimization capacity of Differential Evolution.
The basic MAP-Elites algorithm, introduced for the first time here, is relatively simple in that it simply combines the operators from Differential Evolution with the map structure of Differential-MAP-Elites.
arXiv Detail & Related papers (2021-07-11T04:31:10Z) - Adam revisited: a weighted past gradients perspective [57.54752290924522]
We propose a novel adaptive method weighted adaptive algorithm (WADA) to tackle the non-convergence issues.
We prove that WADA can achieve a weighted data-dependent regret bound, which could be better than the original regret bound of ADAGRAD.
arXiv Detail & Related papers (2021-01-01T14:01:52Z) - Competitiveness of MAP-Elites against Proximal Policy Optimization on
locomotion tasks in deterministic simulations [1.827510863075184]
We show that Multidimensional Archive of Phenotypic Elites (MAP-Elites) can deliver better-performing solutions than one of the state-of-the-art RL methods.
This paper demonstrates that EAs combined with modern computational resources display promising characteristics.
arXiv Detail & Related papers (2020-09-17T17:41:46Z) - Multi-Emitter MAP-Elites: Improving quality, diversity and convergence
speed with heterogeneous sets of emitters [1.827510863075184]
We introduce Multi-Emitter MAP-Elites (ME-MAP-Elites), an algorithm that directly extends CMA-ME and improves its quality, diversity and data efficiency.
A bandit algorithm dynamically finds the best selection of emitters depending on the current situation.
We evaluate the performance of ME-MAP-Elites on six tasks, ranging from standard optimisation problems (in 100 dimensions) to complex locomotion tasks in robotics.
arXiv Detail & Related papers (2020-07-10T12:45:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.