Diversity Policy Gradient for Sample Efficient Quality-Diversity
Optimization
- URL: http://arxiv.org/abs/2006.08505v5
- Date: Tue, 31 May 2022 08:57:21 GMT
- Title: Diversity Policy Gradient for Sample Efficient Quality-Diversity
Optimization
- Authors: Thomas Pierrot, Valentin Mac\'e, F\'elix Chalumeau, Arthur Flajolet,
Geoffrey Cideron, Karim Beguir, Antoine Cully, Olivier Sigaud and Nicolas
Perrin-Gilbert
- Abstract summary: Aiming for diversity in addition to performance is a convenient way to deal with the exploration-exploitation trade-off.
This paper proposes a novel algorithm, QDPG, which combines the strength of Policy Gradient algorithms and Quality Diversity approaches.
- Score: 7.8499505363825755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A fascinating aspect of nature lies in its ability to produce a large and
diverse collection of organisms that are all high-performing in their niche. By
contrast, most AI algorithms focus on finding a single efficient solution to a
given problem. Aiming for diversity in addition to performance is a convenient
way to deal with the exploration-exploitation trade-off that plays a central
role in learning. It also allows for increased robustness when the returned
collection contains several working solutions to the considered problem, making
it well-suited for real applications such as robotics. Quality-Diversity (QD)
methods are evolutionary algorithms designed for this purpose. This paper
proposes a novel algorithm, QDPG, which combines the strength of Policy
Gradient algorithms and Quality Diversity approaches to produce a collection of
diverse and high-performing neural policies in continuous control environments.
The main contribution of this work is the introduction of a Diversity Policy
Gradient (DPG) that exploits information at the time-step level to drive
policies towards more diversity in a sample-efficient manner. Specifically,
QDPG selects neural controllers from a MAP-Elites grid and uses two
gradient-based mutation operators to improve both quality and diversity. Our
results demonstrate that QDPG is significantly more sample-efficient than its
evolutionary competitors.
Related papers
- Diversity-Rewarded CFG Distillation [62.08448835625036]
We introduce diversity-rewarded CFG distillation, a novel finetuning procedure that distills the strengths of CFG while addressing its limitations.
Our approach optimises two training objectives: (1) a distillation objective, encouraging the model alone (without CFG) to imitate the CFG-augmented predictions, and (2) an RL objective with a diversity reward, promoting the generation of diverse outputs for a given prompt.
arXiv Detail & Related papers (2024-10-08T14:40:51Z) - Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization [13.436983663467938]
This paper introduces Quality Diversity through Human Feedback (QDHF), a novel approach that progressively infers diversity metrics from human judgments of similarity among solutions.
Empirical studies show that QDHF significantly outperforms state-of-the-art methods in automatic diversity discovery.
In open-ended generative tasks, QDHF substantially enhances the diversity of text-to-image generation from a diffusion model.
arXiv Detail & Related papers (2023-10-18T16:46:16Z) - Proximal Policy Gradient Arborescence for Quality Diversity
Reinforcement Learning [14.16864939687988]
Training generally capable agents that thoroughly explore their environment and learn new and diverse skills is a long-term goal of robot learning.
Quality Diversity Reinforcement Learning (QD-RL) is an emerging research area that blends the best aspects of both fields.
arXiv Detail & Related papers (2023-05-23T08:05:59Z) - Efficient Exploration using Model-Based Quality-Diversity with Gradients [4.788163807490196]
In this paper, we propose a model-based Quality-Diversity approach.
It extends existing QD methods to use gradients for efficient exploitation and leverage perturbations in imagination for efficient exploration.
We demonstrate that it maintains the divergent search capabilities of population-based approaches on tasks with deceptive rewards while significantly improving their sample efficiency and quality of solutions.
arXiv Detail & Related papers (2022-11-22T22:19:01Z) - Multi-Objective GFlowNets [59.16787189214784]
We study the problem of generating diverse candidates in the context of Multi-Objective Optimization.
In many applications of machine learning such as drug discovery and material design, the goal is to generate candidates which simultaneously optimize a set of potentially conflicting objectives.
We propose Multi-Objective GFlowNets (MOGFNs), a novel method for generating diverse optimal solutions, based on GFlowNets.
arXiv Detail & Related papers (2022-10-23T16:15:36Z) - Learning to Walk Autonomously via Reset-Free Quality-Diversity [73.08073762433376]
Quality-Diversity algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills.
Existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions.
This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments.
arXiv Detail & Related papers (2022-04-07T14:07:51Z) - Approximating Gradients for Differentiable Quality Diversity in
Reinforcement Learning [8.591356221688773]
Differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available for the objective and measures.
We develop two variants of the DQD algorithm CMA-MEGA, each with different gradient approximations, and evaluate them on four simulated walking tasks.
One variant achieves comparable performance (QD score) with the state-of-the-art PGA-MAP-Elites in two tasks. The other variant performs comparably in all tasks but is less efficient than PGA-MAP-Elites in two tasks.
arXiv Detail & Related papers (2022-02-08T05:53:55Z) - Result Diversification by Multi-objective Evolutionary Algorithms with
Theoretical Guarantees [94.72461292387146]
We propose to reformulate the result diversification problem as a bi-objective search problem, and solve it by a multi-objective evolutionary algorithm (EA)
We theoretically prove that the GSEMO can achieve the optimal-time approximation ratio, $1/2$.
When the objective function changes dynamically, the GSEMO can maintain this approximation ratio in running time, addressing the open question proposed by Borodin et al.
arXiv Detail & Related papers (2021-10-18T14:00:22Z) - Few-shot Quality-Diversity Optimization [50.337225556491774]
Quality-Diversity (QD) optimization has been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning.
We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation.
Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
arXiv Detail & Related papers (2021-09-14T17:12:20Z) - Generative Actor-Critic: An Off-policy Algorithm Using the Push-forward
Model [24.030426634281643]
In continuous control tasks, widely used policies with Gaussian distributions results in ineffective exploration of environments.
We propose a density-free off-policy algorithm, Generative Actor-Critic, using the push-forward model to increase the expressiveness of policies.
We show that push-forward policies possess desirable features, such as multi-modality, which can improve the efficiency of exploration and performance of algorithms obviously.
arXiv Detail & Related papers (2021-05-08T16:29:20Z) - Selection-Expansion: A Unifying Framework for Motion-Planning and
Diversity Search Algorithms [69.87173070473717]
We investigate the properties of two diversity search algorithms, the Novelty Search and the Goal Exploration Process algorithms.
The relation to MP algorithms reveals that the smoothness, or lack of smoothness of the mapping between the policy parameter space and the outcome space plays a key role in the search efficiency.
arXiv Detail & Related papers (2021-04-10T13:52:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.