Approximating Gradients for Differentiable Quality Diversity in
Reinforcement Learning
- URL: http://arxiv.org/abs/2202.03666v1
- Date: Tue, 8 Feb 2022 05:53:55 GMT
- Title: Approximating Gradients for Differentiable Quality Diversity in
Reinforcement Learning
- Authors: Bryon Tjanaka, Matthew C. Fontaine, Julian Togelius, Stefanos
Nikolaidis
- Abstract summary: Differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available for the objective and measures.
We develop two variants of the DQD algorithm CMA-MEGA, each with different gradient approximations, and evaluate them on four simulated walking tasks.
One variant achieves comparable performance (QD score) with the state-of-the-art PGA-MAP-Elites in two tasks. The other variant performs comparably in all tasks but is less efficient than PGA-MAP-Elites in two tasks.
- Score: 8.591356221688773
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Consider a walking agent that must adapt to damage. To approach this task, we
can train a collection of policies and have the agent select a suitable policy
when damaged. Training this collection may be viewed as a quality diversity
(QD) optimization problem, where we search for solutions (policies) which
maximize an objective (walking forward) while spanning a set of measures
(measurable characteristics). Recent work shows that differentiable quality
diversity (DQD) algorithms greatly accelerate QD optimization when exact
gradients are available for the objective and measures. However, such gradients
are typically unavailable in RL settings due to non-differentiable
environments. To apply DQD in RL settings, we propose to approximate objective
and measure gradients with evolution strategies and actor-critic methods. We
develop two variants of the DQD algorithm CMA-MEGA, each with different
gradient approximations, and evaluate them on four simulated walking tasks. One
variant achieves comparable performance (QD score) with the state-of-the-art
PGA-MAP-Elites in two tasks. The other variant performs comparably in all tasks
but is less efficient than PGA-MAP-Elites in two tasks. These results provide
insight into the limitations of CMA-MEGA in domains that require rigorous
optimization of the objective and where exact gradients are unavailable.
Related papers
- Efficient Quality-Diversity Optimization through Diverse Quality Species [3.428706362109921]
We show that a diverse population of solutions can be found without the limitation of needing an archive or defining the range of behaviors in advance.
We propose Diverse Quality Species (DQS) as an alternative to archive-based Quality-Diversity (QD) algorithms.
arXiv Detail & Related papers (2023-04-14T23:15:51Z) - Enhancing MAP-Elites with Multiple Parallel Evolution Strategies [8.585387103144825]
We propose a novel Quality-Diversity (QD) algorithm based on Evolution Strategies (ES)
MEMES maintains multiple (up to 100) simultaneous ES processes, each with its own independent objective and reset mechanism designed for QD optimisation.
We show that MEMES outperforms both gradient-based and mutation-based QD algorithms on black-box optimisation and QD-Reinforcement-Learning tasks.
arXiv Detail & Related papers (2023-03-10T18:55:02Z) - MAP-Elites with Descriptor-Conditioned Gradients and Archive
Distillation into a Single Policy [1.376408511310322]
DCG-MAP-Elites improves the QD score over PGA-MAP-Elites by 82% on average, on a set of challenging locomotion tasks.
Our algorithm, DCG-MAP-Elites improves the QD score over PGA-MAP-Elites by 82% on average, on a set of challenging locomotion tasks.
arXiv Detail & Related papers (2023-03-07T11:58:01Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation.
We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z) - Evolving Pareto-Optimal Actor-Critic Algorithms for Generalizability and
Stability [67.8426046908398]
Generalizability and stability are two key objectives for operating reinforcement learning (RL) agents in the real world.
This paper presents MetaPG, an evolutionary method for automated design of actor-critic loss functions.
arXiv Detail & Related papers (2022-04-08T20:46:16Z) - Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients.
We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function.
CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z) - Few-shot Quality-Diversity Optimization [50.337225556491774]
Quality-Diversity (QD) optimization has been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning.
We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation.
Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
arXiv Detail & Related papers (2021-09-14T17:12:20Z) - Differentiable Quality Diversity [1.0965065178451106]
We present the differentiable quality diversity (DQD) problem, where both the objective and measure functions are first order differentiable.
We then present MAP-Elites via Gradient Arborescence (MEGA), a DQD algorithm that leverages gradient information to efficiently explore the joint range of the objective and measure functions.
Results in two QD benchmark domains and in searching the latent space of a StyleGAN show that MEGA significantly outperforms state-of-the-art QD algorithms.
arXiv Detail & Related papers (2021-06-07T18:11:53Z) - Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged
Gradient Method for Stochastic Optimization [21.473252641133413]
MADGRAD shows excellent performance on deep learning optimization problems from multiple fields.
For each of these tasks, MADGRAD matches or outperforms both SGD and ADAM in test set performance.
arXiv Detail & Related papers (2021-01-26T20:38:26Z) - Robust Optimal Transport with Applications in Generative Modeling and
Domain Adaptation [120.69747175899421]
Optimal Transport (OT) distances such as Wasserstein have been used in several areas such as GANs and domain adaptation.
We propose a computationally-efficient dual form of the robust OT optimization that is amenable to modern deep learning applications.
Our approach can train state-of-the-art GAN models on noisy datasets corrupted with outlier distributions.
arXiv Detail & Related papers (2020-10-12T17:13:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.