Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination
- URL: http://arxiv.org/abs/2306.03034v2
- Date: Sun, 7 Jan 2024 16:18:02 GMT
- Title: Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination
- Authors: Yang Li, Shao Zhang, Jichen Sun, Wenhao Zhang, Yali Du, Ying Wen,
Xinbing Wang, Wei Pan
- Abstract summary: We introduce the Cooperative Open-ended LEarning (COLE) framework to solve cooperative incompatibility in learning.
COLE formulates open-ended objectives in cooperative games with two players using perspectives of graph theory to evaluate and pinpoint the cooperative capacity of each strategy.
We show that COLE could effectively overcome the cooperative incompatibility from theoretical and empirical analysis.
- Score: 36.33334853998621
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Securing coordination between AI agent and teammates (human players or AI
agents) in contexts involving unfamiliar humans continues to pose a significant
challenge in Zero-Shot Coordination. The issue of cooperative incompatibility
becomes particularly prominent when an AI agent is unsuccessful in
synchronizing with certain previously unknown partners. Traditional algorithms
have aimed to collaborate with partners by optimizing fixed objectives within a
population, fostering diversity in strategies and behaviors. However, these
techniques may lead to learning loss and an inability to cooperate with
specific strategies within the population, a phenomenon named cooperative
incompatibility in learning. In order to solve cooperative incompatibility in
learning and effectively address the problem in the context of ZSC, we
introduce the Cooperative Open-ended LEarning (COLE) framework, which
formulates open-ended objectives in cooperative games with two players using
perspectives of graph theory to evaluate and pinpoint the cooperative capacity
of each strategy. We present two practical algorithms, specifically \algo and
\algoR, which incorporate insights from game theory and graph theory. We also
show that COLE could effectively overcome the cooperative incompatibility from
theoretical and empirical analysis. Subsequently, we created an online
Overcooked human-AI experiment platform, the COLE platform, which enables easy
customization of questionnaires, model weights, and other aspects. Utilizing
the COLE platform, we enlist 130 participants for human experiments. Our
findings reveal a preference for our approach over state-of-the-art methods
using a variety of subjective metrics. Moreover, objective experimental
outcomes in the Overcooked game environment indicate that our method surpasses
existing ones when coordinating with previously unencountered AI agents and the
human proxy model.
Related papers
- Multi-agent cooperation through learning-aware policy gradients [53.63948041506278]
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning.
We present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning.
We derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.
arXiv Detail & Related papers (2024-10-24T10:48:42Z) - Aligning Individual and Collective Objectives in Multi-Agent Cooperation [18.082268221987956]
Mixed-motive cooperation is one of the most prominent challenges in multi-agent learning.
We introduce a novel optimization method named textbftextitAltruistic textbftextitGradient textbftextitAdjustment (textbftextitAgA) that employs gradient adjustments to progressively align individual and collective objectives.
We evaluate the effectiveness of our algorithm AgA through benchmark environments for testing mixed-motive collaboration with small-scale agents.
arXiv Detail & Related papers (2024-02-19T08:18:53Z) - Cooperative Open-ended Learning Framework for Zero-shot Coordination [35.330951448600594]
We propose a framework to construct open-ended objectives in cooperative games with two players.
We also propose a practical algorithm that leverages knowledge from game theory and graph theory.
Our method outperforms current state-of-the-art methods when coordinating with different-level partners.
arXiv Detail & Related papers (2023-02-09T18:37:04Z) - PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI
Coordination [52.991211077362586]
We propose a policy ensemble method to increase the diversity of partners in the population.
We then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives.
In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners.
arXiv Detail & Related papers (2023-01-16T12:14:58Z) - Coordination with Humans via Strategy Matching [5.072077366588174]
We present an algorithm for autonomously recognizing available task-completion strategies by observing human-human teams performing a collaborative task.
By transforming team actions into low dimensional representations using hidden Markov models, we can identify strategies without prior knowledge.
Robot policies are learned on each of the identified strategies to construct a Mixture-of-Experts model that adapts to the task strategies of unseen human partners.
arXiv Detail & Related papers (2022-10-27T01:00:50Z) - Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination [0.4153433779716327]
We formalize an alternative criteria for evaluating cooperative AI, referred to as inter-algorithm cross-play.
We show that existing state-of-the-art cooperative AI algorithms, such as Other-Play and Off-Belief Learning, under-perform in this paradigm.
We propose the Any-Play learning augmentation for generalizing self-play-based algorithms to the inter-algorithm cross-play setting.
arXiv Detail & Related papers (2022-01-28T21:43:58Z) - Conditional Imitation Learning for Multi-Agent Games [89.897635970366]
We study the problem of conditional multi-agent imitation learning, where we have access to joint trajectory demonstrations at training time.
We propose a novel approach to address the difficulties of scalability and data scarcity.
Our model learns a low-rank subspace over ego and partner agent strategies, then infers and adapts to a new partner strategy by interpolating in the subspace.
arXiv Detail & Related papers (2022-01-05T04:40:13Z) - Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams [14.215359943041369]
We propose and analyze a decentralized Multi-Armed Bandit (MAB) problem with coupled rewards as an abstraction of more general multi-agent collaboration.
We propose a Partner-Aware strategy for joint sequential decision-making that extends the well-known single-agent Upper Confidence Bound algorithm.
Our results show that the proposed partner-aware strategy outperforms other known methods, and our human subject studies suggest humans prefer to collaborate with AI agents implementing our partner-aware strategy.
arXiv Detail & Related papers (2021-10-02T08:17:30Z) - On the Critical Role of Conventions in Adaptive Human-AI Collaboration [73.21967490610142]
We propose a learning framework that teases apart rule-dependent representation from convention-dependent representation.
We experimentally validate our approach on three collaborative tasks varying in complexity.
arXiv Detail & Related papers (2021-04-07T02:46:19Z) - Cooperative Control of Mobile Robots with Stackelberg Learning [63.99843063704676]
Multi-robot cooperation requires agents to make decisions consistent with the shared goal.
We propose a method named SLiCC: Stackelberg Learning in Cooperative Control.
arXiv Detail & Related papers (2020-08-03T07:21:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.