Related papers: On-the-fly Strategy Adaptation for ad-hoc Agent Coordination

On-the-fly Strategy Adaptation for ad-hoc Agent Coordination

URL: http://arxiv.org/abs/2203.08015v1
Date: Tue, 8 Mar 2022 02:18:11 GMT
Title: On-the-fly Strategy Adaptation for ad-hoc Agent Coordination
Authors: Jaleh Zand, Jack Parker-Holder, Stephen J. Roberts
Abstract summary: Training agents in cooperative settings offers the promise of AI agents able to interact effectively with humans (and other agents) in the real world. The vast majority of focus has been on the self-play paradigm. This paper proposes to solve this problem by adapting agent strategies on the fly, using a posterior belief over the other agents' strategy.
Score: 21.029009561094725
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training agents in cooperative settings offers the promise of AI agents able to interact effectively with humans (and other agents) in the real world. Multi-agent reinforcement learning (MARL) has the potential to achieve this goal, demonstrating success in a series of challenging problems. However, whilst these advances are significant, the vast majority of focus has been on the self-play paradigm. This often results in a coordination problem, caused by agents learning to make use of arbitrary conventions when playing with themselves. This means that even the strongest self-play agents may have very low cross-play with other agents, including other initializations of the same algorithm. In this paper we propose to solve this problem by adapting agent strategies on the fly, using a posterior belief over the other agents' strategy. Concretely, we consider the problem of selecting a strategy from a finite set of previously trained agents, to play with an unknown partner. We propose an extension of the classic statistical technique, Gibbs sampling, to update beliefs about other agents and obtain close to optimal ad-hoc performance. Despite its simplicity, our method is able to achieve strong cross-play with unseen partners in the challenging card game of Hanabi, achieving successful ad-hoc coordination without knowledge of the partner's strategy a priori.

Related papers

ProAgent: Building Proactive Cooperative Agents with Large Language Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents. ProAgent can analyze the present state, and infer the intentions of teammates from observations. ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z)
Behavioral Differences is the Key of Ad-hoc Team Cooperation in Multiplayer Games Hanabi [3.7202899712601964]
Ad-hoc team cooperation is the problem of cooperating with other players that have not been seen in the learning process. We analyze the results of ad-hoc team cooperation into Failure, Success, and Synergy. Our results improve understanding of key factors to form successful ad-hoc team cooperation in multi-player games.
arXiv Detail & Related papers (2023-03-12T23:25:55Z)
Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents [120.91291581594773]
We present a formal formulation of a cooperative multi-agent reinforcement learning system with unexpected crashes. We propose a coach-assisted multi-agent reinforcement learning framework, which introduces a virtual coach agent to adjust the crash rate during training. To the best of our knowledge, this work is the first to study the unexpected crashes in the multi-agent system.
arXiv Detail & Related papers (2022-03-16T08:22:45Z)
Conditional Imitation Learning for Multi-Agent Games [89.897635970366]
We study the problem of conditional multi-agent imitation learning, where we have access to joint trajectory demonstrations at training time. We propose a novel approach to address the difficulties of scalability and data scarcity. Our model learns a low-rank subspace over ego and partner agent strategies, then infers and adapts to a new partner strategy by interpolating in the subspace.
arXiv Detail & Related papers (2022-01-05T04:40:13Z)
Learning Latent Representations to Influence Multi-Agent Interaction [65.44092264843538]
We propose a reinforcement learning-based framework for learning latent representations of an agent's policy. We show that our approach outperforms the alternatives and learns to influence the other agent.
arXiv Detail & Related papers (2020-11-12T19:04:26Z)
Multi-Agent Collaboration via Reward Attribution Decomposition [75.36911959491228]
We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge. CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-10-16T17:42:11Z)
Natural Emergence of Heterogeneous Strategies in Artificially Intelligent Competitive Teams [0.0]
We develop a competitive multi agent environment called FortAttack in which two teams compete against each other. We observe a natural emergence of heterogeneous behavior amongst homogeneous agents when such behavior can lead to the team's success. We propose ensemble training, in which we utilize the evolved opponent strategies to train a single policy for friendly agents.
arXiv Detail & Related papers (2020-07-06T22:35:56Z)
Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function. Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games. Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z)
Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi [4.777698073163644]
In Hanabi, coordinated groups of players can leverage pre-established conventions to great effect, but playing in an ad-hoc setting requires agents to adapt to its partner's strategies with no previous coordination. This paper proposes Quality Diversity algorithms as a promising class of algorithms to generate diverse populations for this purpose. We also postulate that agents can benefit from a diverse population during training and implement a simple "meta-strategy" for adapting to an agent's perceived behavioral niche.
arXiv Detail & Related papers (2020-04-28T05:03:19Z)
"Other-Play" for Zero-Shot Coordination [21.607428852157273]
Other-play learning algorithm enhances self-play by looking for more robust strategies. We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents.
arXiv Detail & Related papers (2020-03-06T00:39:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.