Related papers: Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning

Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning

URL: http://arxiv.org/abs/2202.03666v1
Date: Tue, 8 Feb 2022 05:53:55 GMT
Title: Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning
Authors: Bryon Tjanaka, Matthew C. Fontaine, Julian Togelius, Stefanos Nikolaidis
Abstract summary: Differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available for the objective and measures. We develop two variants of the DQD algorithm CMA-MEGA, each with different gradient approximations, and evaluate them on four simulated walking tasks. One variant achieves comparable performance (QD score) with the state-of-the-art PGA-MAP-Elites in two tasks. The other variant performs comparably in all tasks but is less efficient than PGA-MAP-Elites in two tasks.
Score: 8.591356221688773
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Consider a walking agent that must adapt to damage. To approach this task, we can train a collection of policies and have the agent select a suitable policy when damaged. Training this collection may be viewed as a quality diversity (QD) optimization problem, where we search for solutions (policies) which maximize an objective (walking forward) while spanning a set of measures (measurable characteristics). Recent work shows that differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available for the objective and measures. However, such gradients are typically unavailable in RL settings due to non-differentiable environments. To apply DQD in RL settings, we propose to approximate objective and measure gradients with evolution strategies and actor-critic methods. We develop two variants of the DQD algorithm CMA-MEGA, each with different gradient approximations, and evaluate them on four simulated walking tasks. One variant achieves comparable performance (QD score) with the state-of-the-art PGA-MAP-Elites in two tasks. The other variant performs comparably in all tasks but is less efficient than PGA-MAP-Elites in two tasks. These results provide insight into the limitations of CMA-MEGA in domains that require rigorous optimization of the objective and where exact gradients are unavailable.

Related papers

On the Convergence of DP-SGD with Adaptive Clipping [56.24689348875711]
Gradient Descent with gradient clipping is a powerful technique for enabling differentially private optimization. This paper provides the first comprehensive convergence analysis of SGD with quantile clipping (QC-SGD) We show how QC-SGD suffers from a bias problem similar to constant-threshold clipped SGD but can be mitigated through a carefully designed quantile and step size schedule.
arXiv Detail & Related papers (2024-12-27T20:29:47Z)
Efficient Quality-Diversity Optimization through Diverse Quality Species [3.428706362109921]
We show that a diverse population of solutions can be found without the limitation of needing an archive or defining the range of behaviors in advance. We propose Diverse Quality Species (DQS) as an alternative to archive-based Quality-Diversity (QD) algorithms.
arXiv Detail & Related papers (2023-04-14T23:15:51Z)
Enhancing MAP-Elites with Multiple Parallel Evolution Strategies [8.585387103144825]
We propose a novel Quality-Diversity (QD) algorithm based on Evolution Strategies (ES) MEMES maintains multiple (up to 100) simultaneous ES processes, each with its own independent objective and reset mechanism designed for QD optimisation. We show that MEMES outperforms both gradient-based and mutation-based QD algorithms on black-box optimisation and QD-Reinforcement-Learning tasks.
arXiv Detail & Related papers (2023-03-10T18:55:02Z)
MAP-Elites with Descriptor-Conditioned Gradients and Archive Distillation into a Single Policy [1.376408511310322]
DCG-MAP-Elites improves the QD score over PGA-MAP-Elites by 82% on average, on a set of challenging locomotion tasks. Our algorithm, DCG-MAP-Elites improves the QD score over PGA-MAP-Elites by 82% on average, on a set of challenging locomotion tasks.
arXiv Detail & Related papers (2023-03-07T11:58:01Z)
Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm. We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z)
Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation. We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional. We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z)
Evolving Pareto-Optimal Actor-Critic Algorithms for Generalizability and Stability [67.8426046908398]
Generalizability and stability are two key objectives for operating reinforcement learning (RL) agents in the real world. This paper presents MetaPG, an evolutionary method for automated design of actor-critic loss functions.
arXiv Detail & Related papers (2022-04-08T20:46:16Z)
Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients. We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function. CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z)
Few-shot Quality-Diversity Optimization [50.337225556491774]
Quality-Diversity (QD) optimization has been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning. We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation. Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
arXiv Detail & Related papers (2021-09-14T17:12:20Z)
Differentiable Quality Diversity [1.0965065178451106]
We present the differentiable quality diversity (DQD) problem, where both the objective and measure functions are first order differentiable. We then present MAP-Elites via Gradient Arborescence (MEGA), a DQD algorithm that leverages gradient information to efficiently explore the joint range of the objective and measure functions. Results in two QD benchmark domains and in searching the latent space of a StyleGAN show that MEGA significantly outperforms state-of-the-art QD algorithms.
arXiv Detail & Related papers (2021-06-07T18:11:53Z)
Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization [21.473252641133413]
MADGRAD shows excellent performance on deep learning optimization problems from multiple fields. For each of these tasks, MADGRAD matches or outperforms both SGD and ADAM in test set performance.
arXiv Detail & Related papers (2021-01-26T20:38:26Z)
Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation [120.69747175899421]
Optimal Transport (OT) distances such as Wasserstein have been used in several areas such as GANs and domain adaptation. We propose a computationally-efficient dual form of the robust OT optimization that is amenable to modern deep learning applications. Our approach can train state-of-the-art GAN models on noisy datasets corrupted with outlier distributions.
arXiv Detail & Related papers (2020-10-12T17:13:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.