K-level Reasoning for Zero-Shot Coordination in Hanabi
- URL: http://arxiv.org/abs/2207.07166v1
- Date: Thu, 14 Jul 2022 18:53:34 GMT
- Title: K-level Reasoning for Zero-Shot Coordination in Hanabi
- Authors: Brandon Cui, Hengyuan Hu, Luis Pineda, Jakob N. Foerster
- Abstract summary: We show that we can obtain competitive ZSC and ad-hoc teamplay performance in Hanabi.
We also introduce a new method, synchronous-k-level reasoning with a best response.
- Score: 26.38814779896388
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The standard problem setting in cooperative multi-agent settings is self-play
(SP), where the goal is to train a team of agents that works well together.
However, optimal SP policies commonly contain arbitrary conventions
("handshakes") and are not compatible with other, independently trained agents
or humans. This latter desiderata was recently formalized by Hu et al. 2020 as
the zero-shot coordination (ZSC) setting and partially addressed with their
Other-Play (OP) algorithm, which showed improved ZSC and human-AI performance
in the card game Hanabi. OP assumes access to the symmetries of the environment
and prevents agents from breaking these in a mutually incompatible way during
training. However, as the authors point out, discovering symmetries for a given
environment is a computationally hard problem. Instead, we show that through a
simple adaption of k-level reasoning (KLR) Costa Gomes et al. 2006,
synchronously training all levels, we can obtain competitive ZSC and ad-hoc
teamplay performance in Hanabi, including when paired with a human-like proxy
bot. We also introduce a new method, synchronous-k-level reasoning with a best
response (SyKLRBR), which further improves performance on our synchronous KLR
by co-training a best response.
Related papers
- Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent.
We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents.
Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z) - Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game.
Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z) - ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In
the Game of Hanabi [15.917861586043813]
We show that state-of-the-art ZSC algorithms have poor performance when paired with agents trained with different learning methods.
We create a framework based on a popular cooperative multi-agent game called Hanabi to evaluate the adaptability of MARL methods.
arXiv Detail & Related papers (2023-08-20T14:44:50Z) - Equivariant Networks for Zero-Shot Coordination [34.95582850032728]
Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner.
A common failure mode is symmetry breaking, when agents arbitrarily converge on one out of many equivalent but mutually incompatible policies.
We present a novel equivariant network architecture for use in Dec-POMDPs that effectively leverages environmental symmetry for improving zero-shot coordination.
arXiv Detail & Related papers (2022-10-21T17:25:34Z) - Quasi-Equivalence Discovery for Zero-Shot Emergent Communication [63.175848843466845]
We present a novel problem setting and the Quasi-Equivalence Discovery algorithm that allows for zero-shot coordination (ZSC)
We show that these two factors lead to unique optimal ZSC policies in referential games.
QED can iteratively discover the symmetries in this setting and converges to the optimal ZSC policy.
arXiv Detail & Related papers (2021-03-14T23:42:37Z) - Multi-Agent Coordination in Adversarial Environments through Signal
Mediated Strategies [37.00818384785628]
Team members can coordinate their strategies before the beginning of the game, but are unable to communicate during the playing phase of the game.
In this setting, model-free RL methods are oftentimes unable to capture coordination because agents' policies are executed in a decentralized fashion.
We show convergence to coordinated equilibria in cases where previous state-of-the-art multi-agent RL algorithms did not.
arXiv Detail & Related papers (2021-02-09T18:44:16Z) - High-Throughput Synchronous Deep RL [132.43861715707905]
We propose High-Throughput Synchronous Deep Reinforcement Learning (HTS-RL)
We perform learning and rollouts concurrently, devise a system design which avoids stale policies'
We evaluate our approach on Atari games and the Google Research Football environment.
arXiv Detail & Related papers (2020-12-17T18:59:01Z) - Resolving Implicit Coordination in Multi-Agent Deep Reinforcement
Learning with Deep Q-Networks & Game Theory [0.0]
We address two major challenges of implicit coordination in deep reinforcement learning: non-stationarity and exponential growth of state-action space.
We demonstrate that knowledge of game type leads to an assumption of mirrored best responses and faster convergence than Nash-Q.
Inspired by the dueling network architecture, we learn both a single and joint agent representation, and merge them via element-wise addition.
arXiv Detail & Related papers (2020-12-08T17:30:47Z) - Multi-Agent Collaboration via Reward Attribution Decomposition [75.36911959491228]
We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge.
CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-10-16T17:42:11Z) - "Other-Play" for Zero-Shot Coordination [21.607428852157273]
Other-play learning algorithm enhances self-play by looking for more robust strategies.
We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents.
arXiv Detail & Related papers (2020-03-06T00:39:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.