Neuroevolution is a Competitive Alternative to Reinforcement Learning
for Skill Discovery
- URL: http://arxiv.org/abs/2210.03516v4
- Date: Fri, 8 Sep 2023 09:33:41 GMT
- Title: Neuroevolution is a Competitive Alternative to Reinforcement Learning
for Skill Discovery
- Authors: Felix Chalumeau, Raphael Boige, Bryan Lim, Valentin Mac\'e, Maxime
Allard, Arthur Flajolet, Antoine Cully, Thomas Pierrot
- Abstract summary: Deep Reinforcement Learning (RL) has emerged as a powerful paradigm for training neural policies to solve complex control tasks.
We show that Quality Diversity (QD) methods are a competitive alternative to information-theory-augmented RL for skill discovery.
- Score: 12.586875201983778
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Reinforcement Learning (RL) has emerged as a powerful paradigm for
training neural policies to solve complex control tasks. However, these
policies tend to be overfit to the exact specifications of the task and
environment they were trained on, and thus do not perform well when conditions
deviate slightly or when composed hierarchically to solve even more complex
tasks. Recent work has shown that training a mixture of policies, as opposed to
a single one, that are driven to explore different regions of the state-action
space can address this shortcoming by generating a diverse set of behaviors,
referred to as skills, that can be collectively used to great effect in
adaptation tasks or for hierarchical planning. This is typically realized by
including a diversity term - often derived from information theory - in the
objective function optimized by RL. However these approaches often require
careful hyperparameter tuning to be effective. In this work, we demonstrate
that less widely-used neuroevolution methods, specifically Quality Diversity
(QD), are a competitive alternative to information-theory-augmented RL for
skill discovery. Through an extensive empirical evaluation comparing eight
state-of-the-art algorithms (four flagship algorithms from each line of work)
on the basis of (i) metrics directly evaluating the skills' diversity, (ii) the
skills' performance on adaptation tasks, and (iii) the skills' performance when
used as primitives for hierarchical planning; QD methods are found to provide
equal, and sometimes improved, performance whilst being less sensitive to
hyperparameters and more scalable. As no single method is found to provide
near-optimal performance across all environments, there is a rich scope for
further research which we support by proposing future directions and providing
optimized open-source implementations.
Related papers
- Stepsize Learning for Policy Gradient Methods in Contextual Markov
Decision Processes [35.889129338603446]
Policy-based algorithms are among the most widely adopted techniques in model-free RL.
They tend to struggle when asked to accomplish a series of heterogeneous tasks.
We introduce a new formulation, known as meta-MDP, that can be used to solve any hyper parameter selection problem in RL.
arXiv Detail & Related papers (2023-06-13T12:58:12Z) - Hierarchical Deep Counterfactual Regret Minimization [53.86223883060367]
In this paper, we introduce the first hierarchical version of Deep CFR, an innovative method that boosts learning efficiency in tasks involving extensively large state spaces and deep game trees.
A notable advantage of HDCFR over previous works is its ability to facilitate learning with predefined (human) expertise and foster the acquisition of skills that can be transferred to similar tasks.
arXiv Detail & Related papers (2023-05-27T02:05:41Z) - Proximal Policy Gradient Arborescence for Quality Diversity
Reinforcement Learning [14.16864939687988]
Training generally capable agents that thoroughly explore their environment and learn new and diverse skills is a long-term goal of robot learning.
Quality Diversity Reinforcement Learning (QD-RL) is an emerging research area that blends the best aspects of both fields.
arXiv Detail & Related papers (2023-05-23T08:05:59Z) - Efficient Meta Reinforcement Learning for Preference-based Fast
Adaptation [17.165083095799712]
We study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning.
We develop a meta-RL algorithm that enables fast policy adaptation with preference-based feedback.
arXiv Detail & Related papers (2022-11-20T03:55:09Z) - The Information Geometry of Unsupervised Reinforcement Learning [133.20816939521941]
Unsupervised skill discovery is a class of algorithms that learn a set of policies without access to a reward function.
We show that unsupervised skill discovery algorithms do not learn skills that are optimal for every possible reward function.
arXiv Detail & Related papers (2021-10-06T13:08:36Z) - Few-shot Quality-Diversity Optimization [50.337225556491774]
Quality-Diversity (QD) optimization has been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning.
We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation.
Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
arXiv Detail & Related papers (2021-09-14T17:12:20Z) - Meta-Reinforcement Learning in Broad and Non-Parametric Environments [8.091658684517103]
We introduce TIGR, a Task-Inference-based meta-RL algorithm for tasks in non-parametric environments.
We decouple the policy training from the task-inference learning and efficiently train the inference mechanism on the basis of an unsupervised reconstruction objective.
We provide a benchmark with qualitatively distinct tasks based on the half-cheetah environment and demonstrate the superior performance of TIGR compared to state-of-the-art meta-RL approaches.
arXiv Detail & Related papers (2021-08-08T19:32:44Z) - Variational Empowerment as Representation Learning for Goal-Based
Reinforcement Learning [114.07623388322048]
We discuss how the standard goal-conditioned RL (GCRL) is encapsulated by the objective variational empowerment.
Our work lays a novel foundation from which to evaluate, analyze, and develop representation learning techniques in goal-based RL.
arXiv Detail & Related papers (2021-06-02T18:12:26Z) - Policy Information Capacity: Information-Theoretic Measure for Task
Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty.
We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives.
These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Simultaneously Evolving Deep Reinforcement Learning Models using
Multifactorial Optimization [18.703421169342796]
This work proposes a framework capable of simultaneously evolving several DQL models towards solving interrelated Reinforcement Learning tasks.
A thorough experimentation is presented and discussed so as to assess the performance of the framework.
arXiv Detail & Related papers (2020-02-25T10:36:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.