Related papers: Diversity Policy Gradient for Sample Efficient Quality-Diversity Optimization

Diversity Policy Gradient for Sample Efficient Quality-Diversity Optimization

URL: http://arxiv.org/abs/2006.08505v5
Date: Tue, 31 May 2022 08:57:21 GMT
Title: Diversity Policy Gradient for Sample Efficient Quality-Diversity Optimization
Authors: Thomas Pierrot, Valentin Mac\'e, F\'elix Chalumeau, Arthur Flajolet, Geoffrey Cideron, Karim Beguir, Antoine Cully, Olivier Sigaud and Nicolas Perrin-Gilbert
Abstract summary: Aiming for diversity in addition to performance is a convenient way to deal with the exploration-exploitation trade-off. This paper proposes a novel algorithm, QDPG, which combines the strength of Policy Gradient algorithms and Quality Diversity approaches.
Score: 7.8499505363825755
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A fascinating aspect of nature lies in its ability to produce a large and diverse collection of organisms that are all high-performing in their niche. By contrast, most AI algorithms focus on finding a single efficient solution to a given problem. Aiming for diversity in addition to performance is a convenient way to deal with the exploration-exploitation trade-off that plays a central role in learning. It also allows for increased robustness when the returned collection contains several working solutions to the considered problem, making it well-suited for real applications such as robotics. Quality-Diversity (QD) methods are evolutionary algorithms designed for this purpose. This paper proposes a novel algorithm, QDPG, which combines the strength of Policy Gradient algorithms and Quality Diversity approaches to produce a collection of diverse and high-performing neural policies in continuous control environments. The main contribution of this work is the introduction of a Diversity Policy Gradient (DPG) that exploits information at the time-step level to drive policies towards more diversity in a sample-efficient manner. Specifically, QDPG selects neural controllers from a MAP-Elites grid and uses two gradient-based mutation operators to improve both quality and diversity. Our results demonstrate that QDPG is significantly more sample-efficient than its evolutionary competitors.

Related papers

Diversity-Rewarded CFG Distillation [62.08448835625036]
We introduce diversity-rewarded CFG distillation, a novel finetuning procedure that distills the strengths of CFG while addressing its limitations. Our approach optimises two training objectives: (1) a distillation objective, encouraging the model alone (without CFG) to imitate the CFG-augmented predictions, and (2) an RL objective with a diversity reward, promoting the generation of diverse outputs for a given prompt.
arXiv Detail & Related papers (2024-10-08T14:40:51Z)
Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization [13.436983663467938]
This paper introduces Quality Diversity through Human Feedback (QDHF), a novel approach that progressively infers diversity metrics from human judgments of similarity among solutions. Empirical studies show that QDHF significantly outperforms state-of-the-art methods in automatic diversity discovery. In open-ended generative tasks, QDHF substantially enhances the diversity of text-to-image generation from a diffusion model.
arXiv Detail & Related papers (2023-10-18T16:46:16Z)
Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement Learning [14.16864939687988]
Training generally capable agents that thoroughly explore their environment and learn new and diverse skills is a long-term goal of robot learning. Quality Diversity Reinforcement Learning (QD-RL) is an emerging research area that blends the best aspects of both fields.
arXiv Detail & Related papers (2023-05-23T08:05:59Z)
Efficient Exploration using Model-Based Quality-Diversity with Gradients [4.788163807490196]
In this paper, we propose a model-based Quality-Diversity approach. It extends existing QD methods to use gradients for efficient exploitation and leverage perturbations in imagination for efficient exploration. We demonstrate that it maintains the divergent search capabilities of population-based approaches on tasks with deceptive rewards while significantly improving their sample efficiency and quality of solutions.
arXiv Detail & Related papers (2022-11-22T22:19:01Z)
Multi-Objective GFlowNets [59.16787189214784]
We study the problem of generating diverse candidates in the context of Multi-Objective Optimization. In many applications of machine learning such as drug discovery and material design, the goal is to generate candidates which simultaneously optimize a set of potentially conflicting objectives. We propose Multi-Objective GFlowNets (MOGFNs), a novel method for generating diverse optimal solutions, based on GFlowNets.
arXiv Detail & Related papers (2022-10-23T16:15:36Z)
Learning to Walk Autonomously via Reset-Free Quality-Diversity [73.08073762433376]
Quality-Diversity algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills. Existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions. This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments.
arXiv Detail & Related papers (2022-04-07T14:07:51Z)
Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning [8.591356221688773]
Differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available for the objective and measures. We develop two variants of the DQD algorithm CMA-MEGA, each with different gradient approximations, and evaluate them on four simulated walking tasks. One variant achieves comparable performance (QD score) with the state-of-the-art PGA-MAP-Elites in two tasks. The other variant performs comparably in all tasks but is less efficient than PGA-MAP-Elites in two tasks.
arXiv Detail & Related papers (2022-02-08T05:53:55Z)
Result Diversification by Multi-objective Evolutionary Algorithms with Theoretical Guarantees [94.72461292387146]
We propose to reformulate the result diversification problem as a bi-objective search problem, and solve it by a multi-objective evolutionary algorithm (EA) We theoretically prove that the GSEMO can achieve the optimal-time approximation ratio, $1/2$. When the objective function changes dynamically, the GSEMO can maintain this approximation ratio in running time, addressing the open question proposed by Borodin et al.
arXiv Detail & Related papers (2021-10-18T14:00:22Z)
Few-shot Quality-Diversity Optimization [50.337225556491774]
Quality-Diversity (QD) optimization has been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning. We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation. Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
arXiv Detail & Related papers (2021-09-14T17:12:20Z)
Generative Actor-Critic: An Off-policy Algorithm Using the Push-forward Model [24.030426634281643]
In continuous control tasks, widely used policies with Gaussian distributions results in ineffective exploration of environments. We propose a density-free off-policy algorithm, Generative Actor-Critic, using the push-forward model to increase the expressiveness of policies. We show that push-forward policies possess desirable features, such as multi-modality, which can improve the efficiency of exploration and performance of algorithms obviously.
arXiv Detail & Related papers (2021-05-08T16:29:20Z)
Selection-Expansion: A Unifying Framework for Motion-Planning and Diversity Search Algorithms [69.87173070473717]
We investigate the properties of two diversity search algorithms, the Novelty Search and the Goal Exploration Process algorithms. The relation to MP algorithms reveals that the smoothness, or lack of smoothness of the mapping between the policy parameter space and the outcome space plays a key role in the search efficiency.
arXiv Detail & Related papers (2021-04-10T13:52:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.