AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents
- URL: http://arxiv.org/abs/2310.09971v4
- Date: Thu, 1 Feb 2024 00:42:31 GMT
- Title: AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents
- Authors: Jake Grigsby, Linxi Fan, Yuke Zhu
- Abstract summary: We introduce AMAGO, an in-context Reinforcement Learning agent that uses sequence models to tackle the challenges of generalization, long-term memory, and meta-learning.
Our agent is scalable and applicable to a wide range of problems, and we demonstrate its strong performance empirically in meta-RL and long-term memory domains.
- Score: 36.71024242963793
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce AMAGO, an in-context Reinforcement Learning (RL) agent that uses
sequence models to tackle the challenges of generalization, long-term memory,
and meta-learning. Recent works have shown that off-policy learning can make
in-context RL with recurrent policies viable. Nonetheless, these approaches
require extensive tuning and limit scalability by creating key bottlenecks in
agents' memory capacity, planning horizon, and model size. AMAGO revisits and
redesigns the off-policy in-context approach to successfully train
long-sequence Transformers over entire rollouts in parallel with end-to-end RL.
Our agent is scalable and applicable to a wide range of problems, and we
demonstrate its strong performance empirically in meta-RL and long-term memory
domains. AMAGO's focus on sparse rewards and off-policy data also allows
in-context learning to extend to goal-conditioned problems with challenging
exploration. When combined with a multi-goal hindsight relabeling scheme, AMAGO
can solve a previously difficult category of open-world domains, where agents
complete many possible instructions in procedurally generated environments.
Related papers
- AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers [28.927809804613215]
We build upon recent advancements in Transformer-based (in-context) meta-RL.
We evaluate a simple yet scalable solution where both an agent's actor and critic objectives are converted to classification terms.
This design unlocks significant progress in online multi-task adaptation and memory problems without explicit task labels.
arXiv Detail & Related papers (2024-11-17T22:25:40Z) - Reinforcement Learning for Dynamic Memory Allocation [0.0]
We present a framework in which an RL agent continuously learns from interactions with the system to improve memory management tactics.
Our results show that RL can successfully train agents that can match and surpass traditional allocation strategies.
We also explore the potential of history-aware policies that leverage previous allocation requests to enhance the allocator's ability to handle complex request patterns.
arXiv Detail & Related papers (2024-10-20T20:13:46Z) - Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL [57.202733701029594]
Decision Mamba is a novel multi-grained state space model with a self-evolving policy learning strategy.
To mitigate the overfitting issue on noisy trajectories, a self-evolving policy is proposed by using progressive regularization.
The policy evolves by using its own past knowledge to refine the suboptimal actions, thus enhancing its robustness on noisy demonstrations.
arXiv Detail & Related papers (2024-06-08T10:12:00Z) - Hierarchical Continual Reinforcement Learning via Large Language Model [15.837883929274758]
Hi-Core is designed to facilitate the transfer of high-level knowledge.
It orchestrates a twolayer structure: high-level policy formulation by a large language model (LLM)
Hi-Core has demonstrated its effectiveness in handling diverse CRL tasks, which outperforms popular baselines.
arXiv Detail & Related papers (2024-01-25T03:06:51Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance
Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks.
This problem is still not fully understood, for which two major challenges need to be addressed.
We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.