Transformers are Meta-Reinforcement Learners
- URL: http://arxiv.org/abs/2206.06614v1
- Date: Tue, 14 Jun 2022 06:21:13 GMT
- Title: Transformers are Meta-Reinforcement Learners
- Authors: Luckeciano C. Melo
- Abstract summary: We present TrMRL, a meta-RL agent that mimics the memory reinstatement mechanism using the transformer architecture.
We show that the self-attention computes a consensus representation that minimizes the Bayes Risk at each layer.
Results show that TrMRL presents comparable or superior performance, sample efficiency, and out-of-distribution generalization.
- Score: 0.060917028769172814
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The transformer architecture and variants presented remarkable success across
many machine learning tasks in recent years. This success is intrinsically
related to the capability of handling long sequences and the presence of
context-dependent weights from the attention mechanism. We argue that these
capabilities suit the central role of a Meta-Reinforcement Learning algorithm.
Indeed, a meta-RL agent needs to infer the task from a sequence of
trajectories. Furthermore, it requires a fast adaptation strategy to adapt its
policy for a new task -- which can be achieved using the self-attention
mechanism. In this work, we present TrMRL (Transformers for Meta-Reinforcement
Learning), a meta-RL agent that mimics the memory reinstatement mechanism using
the transformer architecture. It associates the recent past of working memories
to build an episodic memory recursively through the transformer layers. We show
that the self-attention computes a consensus representation that minimizes the
Bayes Risk at each layer and provides meaningful features to compute the best
actions. We conducted experiments in high-dimensional continuous control
environments for locomotion and dexterous manipulation. Results show that TrMRL
presents comparable or superior asymptotic performance, sample efficiency, and
out-of-distribution generalization compared to the baselines in these
environments.
Related papers
- Stop Regressing: Training Value Functions via Classification for
Scalable Deep RL [109.44370201929246]
We show that training value functions with categorical cross-entropy improves performance and scalability in a variety of domains.
These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers.
arXiv Detail & Related papers (2024-03-06T18:55:47Z) - Hierarchical Transformers are Efficient Meta-Reinforcement Learners [19.79721574250755]
We introduce Hierarchical Transformers for Meta-Reinforcement Learning (HTrMRL), a powerful online meta-reinforcement learning approach.
We demonstrate how past episodes serve as a rich source of information, which our model effectively distills and applies to new contexts.
arXiv Detail & Related papers (2024-02-09T13:40:11Z) - Multi-Objective Decision Transformers for Offline Reinforcement Learning [7.386356540208436]
offline RL is structured to derive policies from static trajectory data without requiring real-time environment interactions.
We reformulate offline RL as a multi-objective optimization problem, where prediction is extended to states and returns.
Our experiments on D4RL benchmark locomotion tasks reveal that our propositions allow for more effective utilization of the attention mechanism in the transformer model.
arXiv Detail & Related papers (2023-08-31T00:47:58Z) - Recurrent Action Transformer with Memory [39.58317527488534]
This paper proposes a novel model architecture that incorporates a recurrent memory mechanism designed to regulate information retention.
We conduct experiments on memory-intensive environments (ViZDoom-Two-Colors, T-Maze, Memory Maze, Minigrid-Memory), classic Atari games, and MuJoCo control environments.
The results show that using memory can significantly improve performance in memory-intensive environments, while maintaining or improving results in classic environments.
arXiv Detail & Related papers (2023-06-15T19:29:08Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Energy-Efficient and Federated Meta-Learning via Projected Stochastic
Gradient Ascent [79.58680275615752]
We propose an energy-efficient federated meta-learning framework.
We assume each task is owned by a separate agent, so a limited number of tasks is used to train a meta-model.
arXiv Detail & Related papers (2021-05-31T08:15:44Z) - UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z) - Curriculum in Gradient-Based Meta-Reinforcement Learning [10.447238563837173]
We show that gradient-based meta-learners are sensitive to task distributions.
With the wrong curriculum, agents suffer the effects of meta-overfitting, shallow adaptation, and adaptation instability.
arXiv Detail & Related papers (2020-02-19T01:40:45Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.