Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination
- URL: http://arxiv.org/abs/2201.12436v1
- Date: Fri, 28 Jan 2022 21:43:58 GMT
- Title: Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination
- Authors: Keane Lucas and Ross E. Allen
- Abstract summary: We formalize an alternative criteria for evaluating cooperative AI, referred to as inter-algorithm cross-play.
We show that existing state-of-the-art cooperative AI algorithms, such as Other-Play and Off-Belief Learning, under-perform in this paradigm.
We propose the Any-Play learning augmentation for generalizing self-play-based algorithms to the inter-algorithm cross-play setting.
- Score: 0.4153433779716327
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Cooperative artificial intelligence with human or superhuman proficiency in
collaborative tasks stands at the frontier of machine learning research. Prior
work has tended to evaluate cooperative AI performance under the restrictive
paradigms of self-play (teams composed of agents trained together) and
cross-play (teams of agents trained independently but using the same
algorithm). Recent work has indicated that AI optimized for these narrow
settings may make for undesirable collaborators in the real-world. We formalize
an alternative criteria for evaluating cooperative AI, referred to as
inter-algorithm cross-play, where agents are evaluated on teaming performance
with all other agents within an experiment pool with no assumption of
algorithmic similarities between agents. We show that existing state-of-the-art
cooperative AI algorithms, such as Other-Play and Off-Belief Learning,
under-perform in this paradigm. We propose the Any-Play learning augmentation
-- a multi-agent extension of diversity-based intrinsic rewards for zero-shot
coordination (ZSC) -- for generalizing self-play-based algorithms to the
inter-algorithm cross-play setting. We apply the Any-Play learning augmentation
to the Simplified Action Decoder (SAD) and demonstrate state-of-the-art
performance in the collaborative card game Hanabi.
Related papers
- Multi-agent cooperation through learning-aware policy gradients [53.63948041506278]
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning.
We present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning.
We derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.
arXiv Detail & Related papers (2024-10-24T10:48:42Z) - Decentralized and Lifelong-Adaptive Multi-Agent Collaborative Learning [57.652899266553035]
Decentralized and lifelong-adaptive multi-agent collaborative learning aims to enhance collaboration among multiple agents without a central server.
We propose DeLAMA, a decentralized multi-agent lifelong collaborative learning algorithm with dynamic collaboration graphs.
arXiv Detail & Related papers (2024-03-11T09:21:11Z) - Aligning Individual and Collective Objectives in Multi-Agent Cooperation [18.082268221987956]
Mixed-motive cooperation is one of the most prominent challenges in multi-agent learning.
We introduce a novel optimization method named textbftextitAltruistic textbftextitGradient textbftextitAdjustment (textbftextitAgA) that employs gradient adjustments to progressively align individual and collective objectives.
We evaluate the effectiveness of our algorithm AgA through benchmark environments for testing mixed-motive collaboration with small-scale agents.
arXiv Detail & Related papers (2024-02-19T08:18:53Z) - MindAgent: Emergent Gaming Interaction [103.73707345211892]
Large Language Models (LLMs) have the capacity of performing complex scheduling in a multi-agent system.
We propose MindAgent to evaluate planning and coordination emergent capabilities for gaming interaction.
arXiv Detail & Related papers (2023-09-18T17:52:22Z) - ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination [36.33334853998621]
We introduce the Cooperative Open-ended LEarning (COLE) framework to solve cooperative incompatibility in learning.
COLE formulates open-ended objectives in cooperative games with two players using perspectives of graph theory to evaluate and pinpoint the cooperative capacity of each strategy.
We show that COLE could effectively overcome the cooperative incompatibility from theoretical and empirical analysis.
arXiv Detail & Related papers (2023-06-05T16:51:38Z) - A Reinforcement Learning-assisted Genetic Programming Algorithm for Team
Formation Problem Considering Person-Job Matching [70.28786574064694]
A reinforcement learning-assisted genetic programming algorithm (RL-GP) is proposed to enhance the quality of solutions.
The hyper-heuristic rules obtained through efficient learning can be utilized as decision-making aids when forming project teams.
arXiv Detail & Related papers (2023-04-08T14:32:12Z) - RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in
Multi-Agent Deep Reinforcement Learning [55.55009081609396]
We propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios.
RACA takes advantage of a graph-based encoder relation to encode the topological structure between agents.
Our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.
arXiv Detail & Related papers (2022-06-02T03:39:27Z) - Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams [14.215359943041369]
We propose and analyze a decentralized Multi-Armed Bandit (MAB) problem with coupled rewards as an abstraction of more general multi-agent collaboration.
We propose a Partner-Aware strategy for joint sequential decision-making that extends the well-known single-agent Upper Confidence Bound algorithm.
Our results show that the proposed partner-aware strategy outperforms other known methods, and our human subject studies suggest humans prefer to collaborate with AI agents implementing our partner-aware strategy.
arXiv Detail & Related papers (2021-10-02T08:17:30Z) - "Other-Play" for Zero-Shot Coordination [21.607428852157273]
Other-play learning algorithm enhances self-play by looking for more robust strategies.
We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents.
arXiv Detail & Related papers (2020-03-06T00:39:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.