Related papers: Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning

Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning

URL: http://arxiv.org/abs/2401.08632v2
Date: Thu, 03 Oct 2024 19:13:56 GMT
Title: Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning
Authors: Maxence Faldor, Félix Chalumeau, Manon Flageat, Antoine Cully,
Abstract summary: Quality-Diversity algorithms are evolutionary methods designed to generate a set of diverse and high-fitness solutions. As a genetic algorithm, MAP-Elites relies on random mutations, which can become inefficient in high-dimensional search spaces. We introduce DCRL-MAP-Elites, an extension of DCG-MAP-Elites that utilizes the descriptor-conditioned actor as a generative model.
Score: 4.851070356054758
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A hallmark of intelligence is the ability to exhibit a wide range of effective behaviors. Inspired by this principle, Quality-Diversity algorithms, such as MAP-Elites, are evolutionary methods designed to generate a set of diverse and high-fitness solutions. However, as a genetic algorithm, MAP-Elites relies on random mutations, which can become inefficient in high-dimensional search spaces, thus limiting its scalability to more complex domains, such as learning to control agents directly from high-dimensional inputs. To address this limitation, advanced methods like PGA-MAP-Elites and DCG-MAP-Elites have been developed, which combine actor-critic techniques from Reinforcement Learning with MAP-Elites, significantly enhancing the performance and efficiency of Quality-Diversity algorithms in complex, high-dimensional tasks. While these methods have successfully leveraged the trained critic to guide more effective mutations, the potential of the trained actor remains underutilized in improving both the quality and diversity of the evolved population. In this work, we introduce DCRL-MAP-Elites, an extension of DCG-MAP-Elites that utilizes the descriptor-conditioned actor as a generative model to produce diverse solutions, which are then injected into the offspring batch at each generation. Additionally, we present an empirical analysis of the fitness and descriptor reproducibility of the solutions discovered by each algorithm. Finally, we present a second empirical analysis shedding light on the synergies between the different variations operators and explaining the performance improvement from PGA-MAP-Elites to DCRL-MAP-Elites.

Related papers

Evolution-based Region Adversarial Prompt Learning for Robustness Enhancement in Vision-Language Models [52.8949080772873]
We propose an evolution-based region adversarial prompt tuning method called ER-APT. In each training iteration, we first generate AEs using traditional gradient-based methods. Subsequently, a genetic evolution mechanism incorporating selection, mutation, and crossover is applied to optimize the AEs. The final evolved AEs are used for prompt tuning, achieving region-based adversarial optimization instead of conventional single-point adversarial prompt tuning.
arXiv Detail & Related papers (2025-03-17T07:08:47Z)
Exploring the Generalization Capabilities of AID-based Bi-level Optimization [50.3142765099442]
We present two types of bi-level optimization methods: approximate implicit differentiation (AID)-based and iterative differentiation (D)-based approaches. AID-based methods cannot be easily transformed but must stay in the two-level structure. We demonstrate the effectiveness and potential applications of these methods on real-world tasks.
arXiv Detail & Related papers (2024-11-25T04:22:17Z)
Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on Light-Weighed Backbones and Effective Measurement of Multi-Task Learning Challenges by Feature Disentanglement [69.51496713076253]
In this paper, we focus on the aforementioned efficiency aspects of existing MTL methods. We first carry out large-scale experiments of the methods with smaller backbones and on a the MetaGraspNet dataset as a new test ground. We also propose Feature Disentanglement measure as a novel and efficient identifier of the challenges in MTL.
arXiv Detail & Related papers (2024-02-05T22:15:55Z)
GE-AdvGAN: Improving the transferability of adversarial samples by gradient editing-based adversarial generative model [69.71629949747884]
Adversarial generative models, such as Generative Adversarial Networks (GANs), are widely applied for generating various types of data. In this work, we propose a novel algorithm named GE-AdvGAN to enhance the transferability of adversarial samples.
arXiv Detail & Related papers (2024-01-11T16:43:16Z)
Reinforcement Learning-assisted Evolutionary Algorithm: A Survey and Research Opportunities [63.258517066104446]
Reinforcement learning integrated as a component in the evolutionary algorithm has demonstrated superior performance in recent years. We discuss the RL-EA integration method, the RL-assisted strategy adopted by RL-EA, and its applications according to the existing literature. In the applications of RL-EA section, we also demonstrate the excellent performance of RL-EA on several benchmarks and a range of public datasets.
arXiv Detail & Related papers (2023-08-25T15:06:05Z)
A Reinforcement Learning-assisted Genetic Programming Algorithm for Team Formation Problem Considering Person-Job Matching [70.28786574064694]
A reinforcement learning-assisted genetic programming algorithm (RL-GP) is proposed to enhance the quality of solutions. The hyper-heuristic rules obtained through efficient learning can be utilized as decision-making aids when forming project teams.
arXiv Detail & Related papers (2023-04-08T14:32:12Z)
MAP-Elites with Descriptor-Conditioned Gradients and Archive Distillation into a Single Policy [1.376408511310322]
DCG-MAP-Elites improves the QD score over PGA-MAP-Elites by 82% on average, on a set of challenging locomotion tasks. Our algorithm, DCG-MAP-Elites improves the QD score over PGA-MAP-Elites by 82% on average, on a set of challenging locomotion tasks.
arXiv Detail & Related papers (2023-03-07T11:58:01Z)
Empirical analysis of PGA-MAP-Elites for Neuroevolution in Uncertain Domains [1.376408511310322]
We show that PGA-MAP-Elites is highly performant in both deterministic and uncertain high-dimensional environments. In addition to outperforming all the considered baselines, the collections of solutions generated by PGA-MAP-Elites are highly reproducible in uncertain environments.
arXiv Detail & Related papers (2022-10-24T12:17:18Z)
Self-Referential Quality Diversity Through Differential Map-Elites [5.2508303190856624]
Differential MAP-Elites is a novel algorithm that combines the illumination capacity of computation-MAP-Elites with the continuous-space optimization capacity of Differential Evolution. The basic MAP-Elites algorithm, introduced for the first time here, is relatively simple in that it simply combines the operators from Differential Evolution with the map structure of Differential-MAP-Elites.
arXiv Detail & Related papers (2021-07-11T04:31:10Z)
Adam revisited: a weighted past gradients perspective [57.54752290924522]
We propose a novel adaptive method weighted adaptive algorithm (WADA) to tackle the non-convergence issues. We prove that WADA can achieve a weighted data-dependent regret bound, which could be better than the original regret bound of ADAGRAD.
arXiv Detail & Related papers (2021-01-01T14:01:52Z)
Competitiveness of MAP-Elites against Proximal Policy Optimization on locomotion tasks in deterministic simulations [1.827510863075184]
We show that Multidimensional Archive of Phenotypic Elites (MAP-Elites) can deliver better-performing solutions than one of the state-of-the-art RL methods. This paper demonstrates that EAs combined with modern computational resources display promising characteristics.
arXiv Detail & Related papers (2020-09-17T17:41:46Z)
Multi-Emitter MAP-Elites: Improving quality, diversity and convergence speed with heterogeneous sets of emitters [1.827510863075184]
We introduce Multi-Emitter MAP-Elites (ME-MAP-Elites), an algorithm that directly extends CMA-ME and improves its quality, diversity and data efficiency. A bandit algorithm dynamically finds the best selection of emitters depending on the current situation. We evaluate the performance of ME-MAP-Elites on six tasks, ranging from standard optimisation problems (in 100 dimensions) to complex locomotion tasks in robotics.
arXiv Detail & Related papers (2020-07-10T12:45:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.