Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In
the Game of Hanabi
- URL: http://arxiv.org/abs/2308.10284v1
- Date: Sun, 20 Aug 2023 14:44:50 GMT
- Title: Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In
the Game of Hanabi
- Authors: Hadi Nekoei, Xutong Zhao, Janarthanan Rajendran, Miao Liu, Sarath
Chandar
- Abstract summary: We show that state-of-the-art ZSC algorithms have poor performance when paired with agents trained with different learning methods.
We create a framework based on a popular cooperative multi-agent game called Hanabi to evaluate the adaptability of MARL methods.
- Score: 15.917861586043813
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Cooperative Multi-agent Reinforcement Learning (MARL) algorithms with
Zero-Shot Coordination (ZSC) have gained significant attention in recent years.
ZSC refers to the ability of agents to coordinate zero-shot (without additional
interaction experience) with independently trained agents. While ZSC is crucial
for cooperative MARL agents, it might not be possible for complex tasks and
changing environments. Agents also need to adapt and improve their performance
with minimal interaction with other agents. In this work, we show empirically
that state-of-the-art ZSC algorithms have poor performance when paired with
agents trained with different learning methods, and they require millions of
interaction samples to adapt to these new partners. To investigate this issue,
we formally defined a framework based on a popular cooperative multi-agent game
called Hanabi to evaluate the adaptability of MARL methods. In particular, we
created a diverse set of pre-trained agents and defined a new metric called
adaptation regret that measures the agent's ability to efficiently adapt and
improve its coordination performance when paired with some held-out pool of
partners on top of its ZSC performance. After evaluating several SOTA
algorithms using our framework, our experiments reveal that naive Independent
Q-Learning (IQL) agents in most cases adapt as quickly as the SOTA ZSC
algorithm Off-Belief Learning (OBL). This finding raises an interesting
research question: How to design MARL algorithms with high ZSC performance and
capability of fast adaptation to unseen partners. As a first step, we studied
the role of different hyper-parameters and design choices on the adaptability
of current MARL algorithms. Our experiments show that two categories of
hyper-parameters controlling the training data diversity and optimization
process have a significant impact on the adaptability of Hanabi agents.
Related papers
- Situation-Dependent Causal Influence-Based Cooperative Multi-agent
Reinforcement Learning [18.054709749075194]
We propose a novel MARL algorithm named Situation-Dependent Causal Influence-Based Cooperative Multi-agent Reinforcement Learning (SCIC)
Our approach aims to detect inter-agent causal influences in specific situations based on the criterion using causal intervention and conditional mutual information.
The resulting update links coordinated exploration and intrinsic reward distribution, which enhance overall collaboration and performance.
arXiv Detail & Related papers (2023-12-15T05:09:32Z) - ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential
Decision-Making in Multi-Agent Reinforcement Learning [17.101534531286298]
We construct a Nash-level policy model based on a conditional hypernetwork shared by all agents.
This approach allows for asymmetric training with symmetric execution, with each agent responding optimally conditioned on the decisions made by superior agents.
Experiments demonstrate that our method effectively converges to the SE policies in repeated matrix game scenarios.
arXiv Detail & Related papers (2023-04-20T14:47:54Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - Towards Skilled Population Curriculum for Multi-Agent Reinforcement
Learning [42.540853953923495]
We introduce a novel automatic curriculum learning framework, Skilled Population Curriculum (SPC), which adapts curriculum learning to multi-agent coordination.
Specifically, we endow the student with population-invariant communication and a hierarchical skill set, allowing it to learn cooperation and behavior skills from distinct tasks with varying numbers of agents.
We also analyze the inherent non-stationarity of this multi-agent automatic curriculum teaching problem and provide a corresponding regret bound.
arXiv Detail & Related papers (2023-02-07T12:30:52Z) - Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution [41.23036865145942]
We study the heterogeneous zero-shot coordination (ZSC) problem for the first time.
We propose a general method based on coevolution, which coevolves two populations of agents and partners through three sub-processes: pairing, updating and selection.
arXiv Detail & Related papers (2022-08-09T16:16:28Z) - RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in
Multi-Agent Deep Reinforcement Learning [55.55009081609396]
We propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios.
RACA takes advantage of a graph-based encoder relation to encode the topological structure between agents.
Our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.
arXiv Detail & Related papers (2022-06-02T03:39:27Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.