Discovering General Reinforcement Learning Algorithms with Adversarial
Environment Design
- URL: http://arxiv.org/abs/2310.02782v1
- Date: Wed, 4 Oct 2023 12:52:56 GMT
- Title: Discovering General Reinforcement Learning Algorithms with Adversarial
Environment Design
- Authors: Matthew Thomas Jackson, Minqi Jiang, Jack Parker-Holder, Risto Vuorio,
Chris Lu, Gregory Farquhar, Shimon Whiteson, Jakob Nicolaus Foerster
- Abstract summary: We show that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks.
Despite impressive initial results from algorithms such as Learned Policy Gradient (LPG), there remains a gap when these algorithms are applied to unseen environments.
In this work, we examine how characteristics of the meta-supervised-training distribution impact the performance of these algorithms.
- Score: 54.39859618450935
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The past decade has seen vast progress in deep reinforcement learning (RL) on
the back of algorithms manually designed by human researchers. Recently, it has
been shown that it is possible to meta-learn update rules, with the hope of
discovering algorithms that can perform well on a wide range of RL tasks.
Despite impressive initial results from algorithms such as Learned Policy
Gradient (LPG), there remains a generalization gap when these algorithms are
applied to unseen environments. In this work, we examine how characteristics of
the meta-training distribution impact the generalization performance of these
algorithms. Motivated by this analysis and building on ideas from Unsupervised
Environment Design (UED), we propose a novel approach for automatically
generating curricula to maximize the regret of a meta-learned optimizer, in
addition to a novel approximation of regret, which we name algorithmic regret
(AR). The result is our method, General RL Optimizers Obtained Via Environment
Design (GROOVE). In a series of experiments, we show that GROOVE achieves
superior generalization to LPG, and evaluate AR against baseline metrics from
UED, identifying it as a critical component of environment design in this
setting. We believe this approach is a step towards the discovery of truly
general RL algorithms, capable of solving a wide range of real-world
environments.
Related papers
- Surpassing legacy approaches to PWR core reload optimization with single-objective Reinforcement learning [0.0]
We have developed methods based on Deep Reinforcement Learning (DRL) for both single- and multi-objective optimization.
In this paper, we demonstrate the advantage of our RL-based approach, specifically using Proximal Policy Optimization (PPO)
PPO adapts its search capability via a policy with learnable weights, allowing it to function as both a global and local search method.
arXiv Detail & Related papers (2024-02-16T19:35:58Z) - Reinforcement Learning-assisted Evolutionary Algorithm: A Survey and
Research Opportunities [63.258517066104446]
Reinforcement learning integrated as a component in the evolutionary algorithm has demonstrated superior performance in recent years.
We discuss the RL-EA integration method, the RL-assisted strategy adopted by RL-EA, and its applications according to the existing literature.
In the applications of RL-EA section, we also demonstrate the excellent performance of RL-EA on several benchmarks and a range of public datasets.
arXiv Detail & Related papers (2023-08-25T15:06:05Z) - Discovered Policy Optimisation [17.458523575470384]
We explore the Mirror Learning space by meta-learning a "drift" function.
We refer to the immediate result as Learnt Policy optimisation (LPO)
By analysing LPO we gain original insights into policy optimisation which we use to formulate a novel, closed-form RL algorithm, Discovered Policy optimisation (DPO)
arXiv Detail & Related papers (2022-10-11T17:32:11Z) - Identifying Co-Adaptation of Algorithmic and Implementational
Innovations in Deep Reinforcement Learning: A Taxonomy and Case Study of
Inference-based Algorithms [15.338931971492288]
We focus on a series of inference-based actor-critic algorithms to decouple their algorithmic innovations and implementation decisions.
We identify substantial performance drops whenever implementation details are mismatched for algorithmic choices.
Results show which implementation details are co-adapted and co-evolved with algorithms.
arXiv Detail & Related papers (2021-03-31T17:55:20Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules.
This paper introduces a new meta-learning approach that discovers an entire update rule.
It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z) - A Brief Look at Generalization in Visual Meta-Reinforcement Learning [56.50123642237106]
We evaluate the generalization performance of meta-reinforcement learning algorithms.
We find that these algorithms can display strong overfitting when they are evaluated on challenging tasks.
arXiv Detail & Related papers (2020-06-12T15:17:17Z) - Implementation Matters in Deep Policy Gradients: A Case Study on PPO and
TRPO [90.90009491366273]
We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms.
Specifically, we investigate the consequences of "code-level optimizations:"
Our results show that they (a) are responsible for most of PPO's gain in cumulative reward over TRPO, and (b) fundamentally change how RL methods function.
arXiv Detail & Related papers (2020-05-25T16:24:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.