Related papers: "Other-Play" for Zero-Shot Coordination

"Other-Play" for Zero-Shot Coordination

URL: http://arxiv.org/abs/2003.02979v3
Date: Wed, 12 May 2021 05:22:20 GMT
Title: "Other-Play" for Zero-Shot Coordination
Authors: Hengyuan Hu, Adam Lerer, Alex Peysakhovich, Jakob Foerster
Abstract summary: Other-play learning algorithm enhances self-play by looking for more robust strategies. We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents.
Score: 21.607428852157273
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e.g. humans). Standard Multi-Agent Reinforcement Learning (MARL) methods typically focus on the self-play (SP) setting where agents construct strategies by playing the game with themselves repeatedly. Unfortunately, applying SP naively to the zero-shot coordination problem can produce agents that establish highly specialized conventions that do not carry over to novel partners they have not been trained with. We introduce a novel learning algorithm called other-play (OP), that enhances self-play by looking for more robust strategies, exploiting the presence of known symmetries in the underlying problem. We characterize OP theoretically as well as experimentally. We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents. In preliminary results we also show that our OP agents obtains higher average scores when paired with human players, compared to state-of-the-art SP agents.

Related papers

A Generalist Hanabi Agent [14.30496247213363]
Traditional multi-agent reinforcement learning (MARL) systems can develop cooperative strategies through repeated interactions. MARL systems are unable to perform well on any other setting than the one they have been trained on. This is particularly visible in the Hanabi benchmark, a popular 2-to-5 player cooperative card-game.
arXiv Detail & Related papers (2025-03-17T22:25:15Z)
Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent. We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents. Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z)
ProAgent: Building Proactive Cooperative Agents with Large Language Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents. ProAgent can analyze the present state, and infer the intentions of teammates from observations. ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z)
Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition. We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy. We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z)
Learning in Stackelberg Games with Non-myopic Agents [60.927889817803745]
We study Stackelberg games where a principal repeatedly interacts with a non-myopic long-lived agent, without knowing the agent's payoff function. We provide a general framework that reduces learning in presence of non-myopic agents to robust bandit optimization in the presence of myopic agents.
arXiv Detail & Related papers (2022-08-19T15:49:30Z)
K-level Reasoning for Zero-Shot Coordination in Hanabi [26.38814779896388]
We show that we can obtain competitive ZSC and ad-hoc teamplay performance in Hanabi. We also introduce a new method, synchronous-k-level reasoning with a best response.
arXiv Detail & Related papers (2022-07-14T18:53:34Z)
On-the-fly Strategy Adaptation for ad-hoc Agent Coordination [21.029009561094725]
Training agents in cooperative settings offers the promise of AI agents able to interact effectively with humans (and other agents) in the real world. The vast majority of focus has been on the self-play paradigm. This paper proposes to solve this problem by adapting agent strategies on the fly, using a posterior belief over the other agents' strategy.
arXiv Detail & Related papers (2022-03-08T02:18:11Z)
Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination [0.4153433779716327]
We formalize an alternative criteria for evaluating cooperative AI, referred to as inter-algorithm cross-play. We show that existing state-of-the-art cooperative AI algorithms, such as Other-Play and Off-Belief Learning, under-perform in this paradigm. We propose the Any-Play learning augmentation for generalizing self-play-based algorithms to the inter-algorithm cross-play setting.
arXiv Detail & Related papers (2022-01-28T21:43:58Z)
Multi-Agent Collaboration via Reward Attribution Decomposition [75.36911959491228]
We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge. CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-10-16T17:42:11Z)
Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi [4.777698073163644]
In Hanabi, coordinated groups of players can leverage pre-established conventions to great effect, but playing in an ad-hoc setting requires agents to adapt to its partner's strategies with no previous coordination. This paper proposes Quality Diversity algorithms as a promising class of algorithms to generate diverse populations for this purpose. We also postulate that agents can benefit from a diverse population during training and implement a simple "meta-strategy" for adapting to an agent's perceived behavioral niche.
arXiv Detail & Related papers (2020-04-28T05:03:19Z)
Evaluating the Rainbow DQN Agent in Hanabi with Unseen Partners [4.4532936483984065]
Hanabi is a cooperative game that challenges exist-ing AI techniques due to its focus on modeling the mental states ofother players to interpret and predict their behavior. In this paper, we showthat agents trained through self-play using the popular RainbowDQN architecture fail to cooperate well with simple rule-basedagents that were not seen during training.
arXiv Detail & Related papers (2020-04-28T04:24:44Z)
Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games. We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game. We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.