Related papers: Introducing Symmetries to Black Box Meta Reinforcement Learning

Introducing Symmetries to Black Box Meta Reinforcement Learning

URL: http://arxiv.org/abs/2109.10781v1
Date: Wed, 22 Sep 2021 15:09:58 GMT
Title: Introducing Symmetries to Black Box Meta Reinforcement Learning
Authors: Louis Kirsch, Sebastian Flennerhag, Hado van Hasselt, Abram Friesen, Junhyuk Oh, Yutian Chen
Abstract summary: In so-called black-box approaches, the policy and the learning algorithm are jointly represented by a single neural network. We show that a successful meta RL approach that meta-learns an objective for backpropagation-based learning exhibits certain symmetries. These symmetries can play an important role in meta-generalisation.
Score: 26.338797667571693
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Meta reinforcement learning (RL) attempts to discover new RL algorithms automatically from environment interaction. In so-called black-box approaches, the policy and the learning algorithm are jointly represented by a single neural network. These methods are very flexible, but they tend to underperform in terms of generalisation to new, unseen environments. In this paper, we explore the role of symmetries in meta-generalisation. We show that a recent successful meta RL approach that meta-learns an objective for backpropagation-based learning exhibits certain symmetries (specifically the reuse of the learning rule, and invariance to input and output permutations) that are not present in typical black-box meta RL systems. We hypothesise that these symmetries can play an important role in meta-generalisation. Building off recent work in black-box supervised meta learning, we develop a black-box meta RL system that exhibits these same symmetries. We show through careful experimentation that incorporating these symmetries can lead to algorithms with a greater ability to generalise to unseen action & observation spaces, tasks, and environments.

Related papers

Hereditary Geometric Meta-RL: Nonlocal Generalization via Task Symmetries [3.2228025627337864]
We develop a geometric perspective that endows the task space with a "hereditary geometry" induced by the inherent symmetries of the underlying system.<n>We show that when the task space is inherited from the symmetries of the underlying system, the task space embeds into a subgroup of those symmetries whose actions are linearizable, connected, and compact--properties that enable efficient learning and inference at the test time.
arXiv Detail & Related papers (2026-02-28T00:57:39Z)
How Should We Meta-Learn Reinforcement Learning Algorithms? [74.37180723338591]
We carry out an empirical comparison of the different approaches when applied to a range of meta-learned algorithms.<n>In addition to meta-train and meta-test performance, we also investigate factors including the interpretability, sample cost and train time.<n>We propose several guidelines for meta-learning new RL algorithms which will help ensure that future learned algorithms are as performant as possible.
arXiv Detail & Related papers (2025-07-23T16:31:38Z)
Black box meta-learning intrinsic rewards for sparse-reward environments [0.0]
This work investigates how meta-learning can improve the training signal received by RL agents. We analyze and compare this approach to the use of extrinsic rewards and a meta-learned advantage function. The developed algorithms are evaluated on distributions of continuous control tasks with both parametric and non-parametric variations.
arXiv Detail & Related papers (2024-07-31T12:09:33Z)
Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning [58.575939354953526]
PACOH-RL is a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics. Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics. Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions.
arXiv Detail & Related papers (2023-11-13T18:51:57Z)
A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL. We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task. We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z)
Symmetry Detection in Trajectory Data for More Meaningful Reinforcement Learning Representations [0.0]
We present a method of automatically detecting RL symmetries directly from raw trajectory data without requiring active control of the system. We show in experiments on two simulated RL use cases that our method can determine the symmetries underlying both the environment physics and the trained RL policy.
arXiv Detail & Related papers (2022-11-29T17:00:26Z)
Enhanced Meta Reinforcement Learning using Demonstrations in Sparse Reward Environments [10.360491332190433]
We develop a class of algorithms entitled Enhanced Meta-RL using Demonstrations. We show how EMRLD jointly utilizes RL and supervised learning over the offline data to generate a meta-policy. We also show that our EMRLD algorithms significantly outperform existing approaches in a variety of sparse reward environments.
arXiv Detail & Related papers (2022-09-26T22:01:12Z)
Learning from Symmetry: Meta-Reinforcement Learning with Symmetrical Behaviors and Language Instructions [10.357414274820577]
Language-conditioned meta-RL improves the generalization capability by matching language instructions with the agent's behaviors. We propose a dual-MDP meta-reinforcement learning method that enables learning new tasks efficiently with symmetrical behaviors and language instructions.
arXiv Detail & Related papers (2022-09-21T20:54:21Z)
Meta Reinforcement Learning with Successor Feature Based Context [51.35452583759734]
We propose a novel meta-RL approach that achieves competitive performance comparing to existing meta-RL algorithms. Our method does not only learn high-quality policies for multiple tasks simultaneously but also can quickly adapt to new tasks with a small amount of training.
arXiv Detail & Related papers (2022-07-29T14:52:47Z)
On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning [71.55412580325743]
We show that multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation. This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL.
arXiv Detail & Related papers (2022-06-07T13:24:00Z)
Offline Meta-Reinforcement Learning with Advantage Weighting [125.21298190780259]
This paper introduces the offline meta-reinforcement learning (offline meta-RL) problem setting and proposes an algorithm that performs well in this setting. offline meta-RL is analogous to the widely successful supervised learning strategy of pre-training a model on a large batch of fixed, pre-collected data. We propose Meta-Actor Critic with Advantage Weighting (MACAW), an optimization-based meta-learning algorithm that uses simple, supervised regression objectives for both the inner and outer loop of meta-training.
arXiv Detail & Related papers (2020-08-13T17:57:14Z)
Curriculum in Gradient-Based Meta-Reinforcement Learning [10.447238563837173]
We show that gradient-based meta-learners are sensitive to task distributions. With the wrong curriculum, agents suffer the effects of meta-overfitting, shallow adaptation, and adaptation instability.
arXiv Detail & Related papers (2020-02-19T01:40:45Z)
HMRL: Hyper-Meta Learning for Sparse Reward Reinforcement Learning Problem [107.52043871875898]
We develop a novel meta reinforcement learning framework called Hyper-Meta RL(HMRL) for sparse reward RL problems. It is consisted with three modules including the cross-environment meta state embedding module which constructs a common meta state space to adapt to different environments. Experiments with sparse-reward environments show the superiority of HMRL on both transferability and policy learning efficiency.
arXiv Detail & Related papers (2020-02-11T07:31:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.