A Brief Look at Generalization in Visual Meta-Reinforcement Learning
- URL: http://arxiv.org/abs/2006.07262v3
- Date: Fri, 3 Jul 2020 13:55:03 GMT
- Title: A Brief Look at Generalization in Visual Meta-Reinforcement Learning
- Authors: Safa Alver, Doina Precup
- Abstract summary: We evaluate the generalization performance of meta-reinforcement learning algorithms.
We find that these algorithms can display strong overfitting when they are evaluated on challenging tasks.
- Score: 56.50123642237106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the realization that deep reinforcement learning algorithms trained on
high-dimensional tasks can strongly overfit to their training environments,
there have been several studies that investigated the generalization
performance of these algorithms. However, there has been no similar study that
evaluated the generalization performance of algorithms that were specifically
designed for generalization, i.e. meta-reinforcement learning algorithms. In
this paper, we assess the generalization performance of these algorithms by
leveraging high-dimensional, procedurally generated environments. We find that
these algorithms can display strong overfitting when they are evaluated on
challenging tasks. We also observe that scalability to high-dimensional tasks
with sparse rewards remains a significant problem among many of the current
meta-reinforcement learning algorithms. With these results, we highlight the
need for developing meta-reinforcement learning algorithms that can both
generalize and scale.
Related papers
- From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models [63.188607839223046]
This survey focuses on the benefits of scaling compute during inference.
We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation.
arXiv Detail & Related papers (2024-06-24T17:45:59Z) - Large-scale Benchmarking of Metaphor-based Optimization Heuristics [5.081212121019668]
We run a set of 294 algorithm implementations on the BBOB function suite.
We investigate how the choice of the budget, the performance measure, or other aspects of experimental design impact the comparison of these algorithms.
arXiv Detail & Related papers (2024-02-15T08:54:46Z) - Discovering General Reinforcement Learning Algorithms with Adversarial
Environment Design [54.39859618450935]
We show that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks.
Despite impressive initial results from algorithms such as Learned Policy Gradient (LPG), there remains a gap when these algorithms are applied to unseen environments.
In this work, we examine how characteristics of the meta-supervised-training distribution impact the performance of these algorithms.
arXiv Detail & Related papers (2023-10-04T12:52:56Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Dual Algorithmic Reasoning [9.701208207491879]
We propose to learn algorithms by exploiting duality of the underlying algorithmic problem.
We demonstrate that simultaneously learning the dual definition of these optimisation problems in algorithmic learning allows for better learning.
We then validate the real-world utility of our dual algorithmic reasoner by deploying it on a challenging brain vessel classification task.
arXiv Detail & Related papers (2023-02-09T08:46:23Z) - A Generalist Neural Algorithmic Learner [18.425083543441776]
We build a single graph neural network processor capable of learning to execute a wide range of algorithms.
We show that it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime.
arXiv Detail & Related papers (2022-09-22T16:41:33Z) - Information-theoretic generalization bounds for black-box learning
algorithms [46.44597430985965]
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm.
We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.
arXiv Detail & Related papers (2021-10-04T17:28:41Z) - Identifying Co-Adaptation of Algorithmic and Implementational
Innovations in Deep Reinforcement Learning: A Taxonomy and Case Study of
Inference-based Algorithms [15.338931971492288]
We focus on a series of inference-based actor-critic algorithms to decouple their algorithmic innovations and implementation decisions.
We identify substantial performance drops whenever implementation details are mismatched for algorithmic choices.
Results show which implementation details are co-adapted and co-evolved with algorithms.
arXiv Detail & Related papers (2021-03-31T17:55:20Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.