Proximal Policy Gradient Arborescence for Quality Diversity
Reinforcement Learning
- URL: http://arxiv.org/abs/2305.13795v2
- Date: Mon, 29 Jan 2024 20:05:18 GMT
- Title: Proximal Policy Gradient Arborescence for Quality Diversity
Reinforcement Learning
- Authors: Sumeet Batra, Bryon Tjanaka, Matthew C. Fontaine, Aleksei Petrenko,
Stefanos Nikolaidis, Gaurav Sukhatme
- Abstract summary: Training generally capable agents that thoroughly explore their environment and learn new and diverse skills is a long-term goal of robot learning.
Quality Diversity Reinforcement Learning (QD-RL) is an emerging research area that blends the best aspects of both fields.
- Score: 14.16864939687988
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training generally capable agents that thoroughly explore their environment
and learn new and diverse skills is a long-term goal of robot learning. Quality
Diversity Reinforcement Learning (QD-RL) is an emerging research area that
blends the best aspects of both fields -- Quality Diversity (QD) provides a
principled form of exploration and produces collections of behaviorally diverse
agents, while Reinforcement Learning (RL) provides a powerful performance
improvement operator enabling generalization across tasks and dynamic
environments. Existing QD-RL approaches have been constrained to sample
efficient, deterministic off-policy RL algorithms and/or evolution strategies,
and struggle with highly stochastic environments. In this work, we, for the
first time, adapt on-policy RL, specifically Proximal Policy Optimization
(PPO), to the Differentiable Quality Diversity (DQD) framework and propose
additional improvements over prior work that enable efficient optimization and
discovery of novel skills on challenging locomotion tasks. Our new algorithm,
Proximal Policy Gradient Arborescence (PPGA), achieves state-of-the-art
results, including a 4x improvement in best reward over baselines on the
challenging humanoid domain.
Related papers
- Reinforcement Learning-assisted Evolutionary Algorithm: A Survey and
Research Opportunities [63.258517066104446]
Reinforcement learning integrated as a component in the evolutionary algorithm has demonstrated superior performance in recent years.
We discuss the RL-EA integration method, the RL-assisted strategy adopted by RL-EA, and its applications according to the existing literature.
In the applications of RL-EA section, we also demonstrate the excellent performance of RL-EA on several benchmarks and a range of public datasets.
arXiv Detail & Related papers (2023-08-25T15:06:05Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Learning to Optimize for Reinforcement Learning [58.01132862590378]
Reinforcement learning (RL) is essentially different from supervised learning, and in practice, these learneds do not work well even in simple RL tasks.
Agent-gradient distribution is non-independent and identically distributed, leading to inefficient meta-training.
We show that, although only trained in toy tasks, our learned can generalize unseen complex tasks in Brax.
arXiv Detail & Related papers (2023-02-03T00:11:02Z) - Neuroevolution is a Competitive Alternative to Reinforcement Learning
for Skill Discovery [12.586875201983778]
Deep Reinforcement Learning (RL) has emerged as a powerful paradigm for training neural policies to solve complex control tasks.
We show that Quality Diversity (QD) methods are a competitive alternative to information-theory-augmented RL for skill discovery.
arXiv Detail & Related papers (2022-10-06T11:06:39Z) - Meta-Reinforcement Learning in Broad and Non-Parametric Environments [8.091658684517103]
We introduce TIGR, a Task-Inference-based meta-RL algorithm for tasks in non-parametric environments.
We decouple the policy training from the task-inference learning and efficiently train the inference mechanism on the basis of an unsupervised reconstruction objective.
We provide a benchmark with qualitatively distinct tasks based on the half-cheetah environment and demonstrate the superior performance of TIGR compared to state-of-the-art meta-RL approaches.
arXiv Detail & Related papers (2021-08-08T19:32:44Z) - Variational Empowerment as Representation Learning for Goal-Based
Reinforcement Learning [114.07623388322048]
We discuss how the standard goal-conditioned RL (GCRL) is encapsulated by the objective variational empowerment.
Our work lays a novel foundation from which to evaluate, analyze, and develop representation learning techniques in goal-based RL.
arXiv Detail & Related papers (2021-06-02T18:12:26Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Diversity Policy Gradient for Sample Efficient Quality-Diversity
Optimization [7.8499505363825755]
Aiming for diversity in addition to performance is a convenient way to deal with the exploration-exploitation trade-off.
This paper proposes a novel algorithm, QDPG, which combines the strength of Policy Gradient algorithms and Quality Diversity approaches.
arXiv Detail & Related papers (2020-06-15T16:04:06Z) - Robust Reinforcement Learning via Adversarial training with Langevin
Dynamics [51.234482917047835]
We introduce a sampling perspective to tackle the challenging task of training robust Reinforcement Learning (RL) agents.
We present a novel, scalable two-player RL algorithm, which is a sampling variant of the two-player policy method.
arXiv Detail & Related papers (2020-02-14T14:59:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.