Open-Ended Learning Leads to Generally Capable Agents
- URL: http://arxiv.org/abs/2107.12808v1
- Date: Tue, 27 Jul 2021 13:30:07 GMT
- Title: Open-Ended Learning Leads to Generally Capable Agents
- Authors: Open-Ended Learning Team, Adam Stooke, Anuj Mahajan, Catarina Barros,
Charlie Deck, Jakob Bauer, Jakub Sygnowski, Maja Trebacz, Max Jaderberg,
Michael Mathieu, Nat McAleese, Nathalie Bradley-Schmieg, Nathaniel Wong,
Nicolas Porcel, Roberta Raileanu, Steph Hughes-Fitt, Valentin Dalibard,
Wojciech Marian Czarnecki
- Abstract summary: We define a universe of tasks within an environment domain and demonstrate the ability to train agents that are capable across this vast space and beyond.
The resulting space is exceptionally diverse in terms of the challenges posed to agents, and as such, even measuring the learning progress of an agent is an open research problem.
We show that through constructing an open-ended learning process, which dynamically changes the training task distributions and training objectives such that the agent never stops learning, we achieve consistent learning of new behaviours.
- Score: 12.079718607356178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work we create agents that can perform well beyond a single,
individual task, that exhibit much wider generalisation of behaviour to a
massive, rich space of challenges. We define a universe of tasks within an
environment domain and demonstrate the ability to train agents that are
generally capable across this vast space and beyond. The environment is
natively multi-agent, spanning the continuum of competitive, cooperative, and
independent games, which are situated within procedurally generated physical 3D
worlds. The resulting space is exceptionally diverse in terms of the challenges
posed to agents, and as such, even measuring the learning progress of an agent
is an open research problem. We propose an iterative notion of improvement
between successive generations of agents, rather than seeking to maximise a
singular objective, allowing us to quantify progress despite tasks being
incomparable in terms of achievable rewards. We show that through constructing
an open-ended learning process, which dynamically changes the training task
distributions and training objectives such that the agent never stops learning,
we achieve consistent learning of new behaviours. The resulting agent is able
to score reward in every one of our humanly solvable evaluation levels, with
behaviour generalising to many held-out points in the universe of tasks.
Examples of this zero-shot generalisation include good performance on Hide and
Seek, Capture the Flag, and Tag. Through analysis and hand-authored probe tasks
we characterise the behaviour of our agent, and find interesting emergent
heuristic behaviours such as trial-and-error experimentation, simple tool use,
option switching, and cooperation. Finally, we demonstrate that the general
capabilities of this agent could unlock larger scale transfer of behaviour
through cheap finetuning.
Related papers
- Emergence of Collective Open-Ended Exploration from Decentralized Meta-Reinforcement Learning [2.296343533657165]
Recent works have proven that intricate cooperative behaviors can emerge in agents trained using meta reinforcement learning on open ended task distributions using self-play.
We argue that self-play and other centralized training techniques do not accurately reflect how general collective exploration strategies emerge in the natural world.
arXiv Detail & Related papers (2023-11-01T16:56:44Z) - AgentVerse: Facilitating Multi-Agent Collaboration and Exploring
Emergent Behaviors [93.38830440346783]
We propose a multi-agent framework framework that can collaboratively adjust its composition as a greater-than-the-sum-of-its-parts system.
Our experiments demonstrate that framework framework can effectively deploy multi-agent groups that outperform a single agent.
In view of these behaviors, we discuss some possible strategies to leverage positive ones and mitigate negative ones for improving the collaborative potential of multi-agent groups.
arXiv Detail & Related papers (2023-08-21T16:47:11Z) - Inferring Versatile Behavior from Demonstrations by Matching Geometric
Descriptors [72.62423312645953]
Humans intuitively solve tasks in versatile ways, varying their behavior in terms of trajectory-based planning and for individual steps.
Current Imitation Learning algorithms often only consider unimodal expert demonstrations and act in a state-action-based setting.
Instead, we combine a mixture of movement primitives with a distribution matching objective to learn versatile behaviors that match the expert's behavior and versatility.
arXiv Detail & Related papers (2022-10-17T16:42:59Z) - Unsupervised Reinforcement Learning for Transferable Manipulation Skill
Discovery [22.32327908453603]
Current reinforcement learning (RL) in robotics often experiences difficulty in generalizing to new downstream tasks.
We propose a framework that pre-trains the agent in a task-agnostic manner without access to the task-specific reward.
We show that our approach achieves the most diverse interacting behavior and significantly improves sample efficiency in downstream tasks.
arXiv Detail & Related papers (2022-04-29T06:57:46Z) - Collaborative Training of Heterogeneous Reinforcement Learning Agents in
Environments with Sparse Rewards: What and When to Share? [7.489793155793319]
This work focuses on combining information obtained through intrinsic motivation with the aim of having a more efficient exploration and faster learning.
Our results reveal different ways in which a collaborative framework with little additional computational cost can outperform an independent learning process without knowledge sharing.
arXiv Detail & Related papers (2022-02-24T16:15:51Z) - Is Curiosity All You Need? On the Utility of Emergent Behaviours from
Curious Exploration [20.38772636693469]
We argue that merely using curiosity for fast environment exploration or as a bonus reward for a specific task does not harness the full potential of this technique.
We propose to shift the focus towards retaining the behaviours which emerge during curiosity-based learning.
arXiv Detail & Related papers (2021-09-17T15:28:25Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z) - Randomized Entity-wise Factorization for Multi-Agent Reinforcement
Learning [59.62721526353915]
Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities.
Our method aims to leverage these commonalities by asking the question: What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?''
arXiv Detail & Related papers (2020-06-07T18:28:41Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z) - Weakly-Supervised Reinforcement Learning for Controllable Behavior [126.04932929741538]
Reinforcement learning (RL) is a powerful framework for learning to take actions to solve tasks.
In many settings, an agent must winnow down the inconceivably large space of all possible tasks to the single task that it is currently being asked to solve.
We introduce a framework for using weak supervision to automatically disentangle this semantically meaningful subspace of tasks from the enormous space of nonsensical "chaff" tasks.
arXiv Detail & Related papers (2020-04-06T17:50:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.