Human-Timescale Adaptation in an Open-Ended Task Space
- URL: http://arxiv.org/abs/2301.07608v1
- Date: Wed, 18 Jan 2023 15:39:21 GMT
- Title: Human-Timescale Adaptation in an Open-Ended Task Space
- Authors: Adaptive Agent Team, Jakob Bauer, Kate Baumli, Satinder Baveja, Feryal
Behbahani, Avishkar Bhoopchand, Nathalie Bradley-Schmieg, Michael Chang,
Natalie Clay, Adrian Collister, Vibhavari Dasagi, Lucy Gonzalez, Karol
Gregor, Edward Hughes, Sheleem Kashem, Maria Loks-Thompson, Hannah Openshaw,
Jack Parker-Holder, Shreya Pathak, Nicolas Perez-Nieves, Nemanja Rakicevic,
Tim Rockt\"aschel, Yannick Schroecker, Jakub Sygnowski, Karl Tuyls, Sarah
York, Alexander Zacherl, Lei Zhang
- Abstract summary: We show that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans.
Our results lay the foundation for increasingly general and adaptive RL agents that perform well across ever-larger open-ended domains.
- Score: 56.55530165036327
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models have shown impressive adaptation and scalability in
supervised and self-supervised learning problems, but so far these successes
have not fully translated to reinforcement learning (RL). In this work, we
demonstrate that training an RL agent at scale leads to a general in-context
learning algorithm that can adapt to open-ended novel embodied 3D problems as
quickly as humans. In a vast space of held-out environment dynamics, our
adaptive agent (AdA) displays on-the-fly hypothesis-driven exploration,
efficient exploitation of acquired knowledge, and can successfully be prompted
with first-person demonstrations. Adaptation emerges from three ingredients:
(1) meta-reinforcement learning across a vast, smooth and diverse task
distribution, (2) a policy parameterised as a large-scale attention-based
memory architecture, and (3) an effective automated curriculum that prioritises
tasks at the frontier of an agent's capabilities. We demonstrate characteristic
scaling laws with respect to network size, memory length, and richness of the
training task distribution. We believe our results lay the foundation for
increasingly general and adaptive RL agents that perform well across
ever-larger open-ended domains.
Related papers
- Empowering Large Language Model Agents through Action Learning [85.39581419680755]
Large Language Model (LLM) Agents have recently garnered increasing interest yet they are limited in their ability to learn from trial and error.
We argue that the capacity to learn new actions from experience is fundamental to the advancement of learning in LLM agents.
We introduce a framework LearnAct with an iterative learning strategy to create and improve actions in the form of Python functions.
arXiv Detail & Related papers (2024-02-24T13:13:04Z) - Task Phasing: Automated Curriculum Learning from Demonstrations [46.1680279122598]
Applying reinforcement learning to sparse reward domains is notoriously challenging due to insufficient guiding signals.
This paper introduces a principled task phasing approach that uses demonstrations to automatically generate a curriculum sequence.
Experimental results on 3 sparse reward domains demonstrate that our task phasing approaches outperform state-of-the-art approaches with respect to performance.
arXiv Detail & Related papers (2022-10-20T03:59:11Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Task-Agnostic Continual Reinforcement Learning: Gaining Insights and
Overcoming Challenges [27.474011433615317]
Continual learning (CL) enables the development of models and agents that learn from a sequence of tasks.
We investigate the factors that contribute to the performance differences between task-agnostic CL and multi-task (MTL) agents.
arXiv Detail & Related papers (2022-05-28T17:59:00Z) - REIN-2: Giving Birth to Prepared Reinforcement Learning Agents Using
Reinforcement Learning Agents [0.0]
In this paper, we introduce a meta-learning scheme that shifts the objective of learning to solve a task into the objective of learning to learn to solve a task (or a set of tasks)
Our model, named REIN-2, is a meta-learning scheme formulated within the RL framework, the goal of which is to develop a meta-RL agent that learns how to produce other RL agents.
Compared to traditional state-of-the-art Deep RL algorithms, experimental results show remarkable performance of our model in popular OpenAI Gym environments.
arXiv Detail & Related papers (2021-10-11T10:13:49Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a
First-person Simulated 3D Environment [73.9469267445146]
First-person object-interaction tasks in high-fidelity, 3D, simulated environments such as the AI2Thor pose significant sample-efficiency challenges for reinforcement learning agents.
We show that one can learn object-interaction tasks from scratch without supervision by learning an attentive object-model as an auxiliary task.
arXiv Detail & Related papers (2020-10-28T19:27:26Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.