Related papers: A Generalist Hanabi Agent

A Generalist Hanabi Agent

URL: http://arxiv.org/abs/2503.14555v1
Date: Mon, 17 Mar 2025 22:25:15 GMT
Title: A Generalist Hanabi Agent
Authors: Arjun V Sudhakar, Hadi Nekoei, Mathieu Reymond, Miao Liu, Janarthanan Rajendran, Sarath Chandar,
Abstract summary: Traditional multi-agent reinforcement learning (MARL) systems can develop cooperative strategies through repeated interactions.<n>MARL systems are unable to perform well on any other setting than the one they have been trained on.<n>This is particularly visible in the Hanabi benchmark, a popular 2-to-5 player cooperative card-game.
Score: 14.30496247213363
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Traditional multi-agent reinforcement learning (MARL) systems can develop cooperative strategies through repeated interactions. However, these systems are unable to perform well on any other setting than the one they have been trained on, and struggle to successfully cooperate with unfamiliar collaborators. This is particularly visible in the Hanabi benchmark, a popular 2-to-5 player cooperative card-game which requires complex reasoning and precise assistance to other agents. Current MARL agents for Hanabi can only learn one specific game-setting (e.g., 2-player games), and play with the same algorithmic agents. This is in stark contrast to humans, who can quickly adjust their strategies to work with unfamiliar partners or situations. In this paper, we introduce Recurrent Replay Relevance Distributed DQN (R3D2), a generalist agent for Hanabi, designed to overcome these limitations. We reformulate the task using text, as language has been shown to improve transfer. We then propose a distributed MARL algorithm that copes with the resulting dynamic observation- and action-space. In doing so, our agent is the first that can play all game settings concurrently, and extend strategies learned from one setting to other ones. As a consequence, our agent also demonstrates the ability to collaborate with different algorithmic agents -- agents that are themselves unable to do so. The implementation code is available at: $\href{https://github.com/chandar-lab/R3D2-A-Generalist-Hanabi-Agent}{R3D2-A-Generalist-Hanabi-Agent}$

Related papers

Impact of Decentralized Learning on Player Utilities in Stackelberg Games [57.08270857260131]
In many two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned. We model these systems as Stackelberg games with decentralized learning and show that standard regret benchmarks result in worst-case linear regret for at least one player. We develop algorithms to achieve near-optimal $O(T2/3)$ regret for both players with respect to these benchmarks.
arXiv Detail & Related papers (2024-02-29T23:38:28Z)
Leading the Pack: N-player Opponent Shaping [52.682734939786464]
We extend Opponent Shaping (OS) methods to environments involving multiple co-players and multiple shaping agents. We find that when playing with a large number of co-players, OS methods' relative performance reduces, suggesting that in the limit OS methods may not perform well.
arXiv Detail & Related papers (2023-12-19T20:01:42Z)
ProAgent: Building Proactive Cooperative Agents with Large Language Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents. ProAgent can analyze the present state, and infer the intentions of teammates from observations. ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z)
On-the-fly Strategy Adaptation for ad-hoc Agent Coordination [21.029009561094725]
Training agents in cooperative settings offers the promise of AI agents able to interact effectively with humans (and other agents) in the real world. The vast majority of focus has been on the self-play paradigm. This paper proposes to solve this problem by adapting agent strategies on the fly, using a posterior belief over the other agents' strategy.
arXiv Detail & Related papers (2022-03-08T02:18:11Z)
On the Critical Role of Conventions in Adaptive Human-AI Collaboration [73.21967490610142]
We propose a learning framework that teases apart rule-dependent representation from convention-dependent representation. We experimentally validate our approach on three collaborative tasks varying in complexity.
arXiv Detail & Related papers (2021-04-07T02:46:19Z)
Multi-Agent Collaboration via Reward Attribution Decomposition [75.36911959491228]
We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge. CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-10-16T17:42:11Z)
The Design Of "Stratega": A General Strategy Games Framework [62.997667081978825]
Stratega is a framework for creating turn-based and real-time strategy games. The framework has been built with a focus on statistical forward planning (SFP) agents. We hope that the development of this framework and its respective agents helps to better understand the complex decision-making process in strategy games.
arXiv Detail & Related papers (2020-09-11T20:02:00Z)
Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi [4.777698073163644]
In Hanabi, coordinated groups of players can leverage pre-established conventions to great effect, but playing in an ad-hoc setting requires agents to adapt to its partner's strategies with no previous coordination. This paper proposes Quality Diversity algorithms as a promising class of algorithms to generate diverse populations for this purpose. We also postulate that agents can benefit from a diverse population during training and implement a simple "meta-strategy" for adapting to an agent's perceived behavioral niche.
arXiv Detail & Related papers (2020-04-28T05:03:19Z)
Evaluating the Rainbow DQN Agent in Hanabi with Unseen Partners [4.4532936483984065]
Hanabi is a cooperative game that challenges exist-ing AI techniques due to its focus on modeling the mental states ofother players to interpret and predict their behavior. In this paper, we showthat agents trained through self-play using the popular RainbowDQN architecture fail to cooperate well with simple rule-basedagents that were not seen during training.
arXiv Detail & Related papers (2020-04-28T04:24:44Z)
"Other-Play" for Zero-Shot Coordination [21.607428852157273]
Other-play learning algorithm enhances self-play by looking for more robust strategies. We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents.
arXiv Detail & Related papers (2020-03-06T00:39:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.