Related papers: AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers

AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers

URL: http://arxiv.org/abs/2411.11188v1
Date: Sun, 17 Nov 2024 22:25:40 GMT
Title: AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers
Authors: Jake Grigsby, Justin Sasek, Samyak Parajuli, Daniel Adebi, Amy Zhang, Yuke Zhu,
Abstract summary: We build upon recent advancements in Transformer-based (in-context) meta-RL. We evaluate a simple yet scalable solution where both an agent's actor and critic objectives are converted to classification terms. This design unlocks significant progress in online multi-task adaptation and memory problems without explicit task labels.
Score: 28.927809804613215
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language models trained on diverse datasets unlock generalization by in-context learning. Reinforcement Learning (RL) policies can achieve a similar effect by meta-learning within the memory of a sequence model. However, meta-RL research primarily focuses on adapting to minor variations of a single task. It is difficult to scale towards more general behavior without confronting challenges in multi-task optimization, and few solutions are compatible with meta-RL's goal of learning from large training sets of unlabeled tasks. To address this challenge, we revisit the idea that multi-task RL is bottlenecked by imbalanced training losses created by uneven return scales across different tasks. We build upon recent advancements in Transformer-based (in-context) meta-RL and evaluate a simple yet scalable solution where both an agent's actor and critic objectives are converted to classification terms that decouple optimization from the current scale of returns. Large-scale comparisons in Meta-World ML45, Multi-Game Procgen, Multi-Task POPGym, Multi-Game Atari, and BabyAI find that this design unlocks significant progress in online multi-task adaptation and memory problems without explicit task labels.

Related papers

Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners [60.75160178669076]
We show that the use of high-capacity value models trained via cross-entropy and conditioned on learnable task embeddings addresses the problem of task interference in online reinforcement learning.<n>We test our approach on 7 multi-task benchmarks with over 280 unique tasks, spanning high degree-of-freedom humanoid control and discrete vision-based RL.
arXiv Detail & Related papers (2025-05-29T06:41:45Z)
OmniRL: In-Context Reinforcement Learning by Large-Scale Meta-Training in Randomized Worlds [35.652208216209985]
We introduce OmniRL, a highly generalizable in-context reinforcement learning model that is meta-trained on hundreds of thousands of diverse tasks. For the first time, we demonstrate that in-context learning (ICL) alone, without any gradient-based fine-tuning, can successfully tackle unseen Gymnasium tasks.
arXiv Detail & Related papers (2025-02-05T03:59:13Z)
MetaModulation: Learning Variational Feature Hierarchies for Few-Shot Learning with Fewer Tasks [63.016244188951696]
We propose a method for few-shot learning with fewer tasks, which is by metaulation. We modify parameters at various batch levels to increase the meta-training tasks. We also introduce learning variational feature hierarchies by incorporating the variationalulation.
arXiv Detail & Related papers (2023-05-17T15:47:47Z)
A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL. We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task. We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z)
Meta Reinforcement Learning with Successor Feature Based Context [51.35452583759734]
We propose a novel meta-RL approach that achieves competitive performance comparing to existing meta-RL algorithms. Our method does not only learn high-quality policies for multiple tasks simultaneously but also can quickly adapt to new tasks with a small amount of training.
arXiv Detail & Related papers (2022-07-29T14:52:47Z)
Learning Action Translator for Meta Reinforcement Learning on Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks. We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy. We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z)
On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning [71.55412580325743]
We show that multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation. This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL.
arXiv Detail & Related papers (2022-06-07T13:24:00Z)
UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks. Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy. The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z)
FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks. This problem is still not fully understood, for which two major challenges need to be addressed. We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.