Evaluating the Rainbow DQN Agent in Hanabi with Unseen Partners
- URL: http://arxiv.org/abs/2004.13291v1
- Date: Tue, 28 Apr 2020 04:24:44 GMT
- Title: Evaluating the Rainbow DQN Agent in Hanabi with Unseen Partners
- Authors: Rodrigo Canaan, Xianbo Gao, Youjin Chung, Julian Togelius, Andy Nealen
and Stefan Menzel
- Abstract summary: Hanabi is a cooperative game that challenges exist-ing AI techniques due to its focus on modeling the mental states ofother players to interpret and predict their behavior.
In this paper, we showthat agents trained through self-play using the popular RainbowDQN architecture fail to cooperate well with simple rule-basedagents that were not seen during training.
- Score: 4.4532936483984065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hanabi is a cooperative game that challenges exist-ing AI techniques due to
its focus on modeling the mental states ofother players to interpret and
predict their behavior. While thereare agents that can achieve near-perfect
scores in the game byagreeing on some shared strategy, comparatively little
progresshas been made in ad-hoc cooperation settings, where partnersand
strategies are not known in advance. In this paper, we showthat agents trained
through self-play using the popular RainbowDQN architecture fail to cooperate
well with simple rule-basedagents that were not seen during training and,
conversely, whenthese agents are trained to play with any individual
rule-basedagent, or even a mix of these agents, they fail to achieve
goodself-play scores.
Related papers
- ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - Behavioral Differences is the Key of Ad-hoc Team Cooperation in
Multiplayer Games Hanabi [3.7202899712601964]
Ad-hoc team cooperation is the problem of cooperating with other players that have not been seen in the learning process.
We analyze the results of ad-hoc team cooperation into Failure, Success, and Synergy.
Our results improve understanding of key factors to form successful ad-hoc team cooperation in multi-player games.
arXiv Detail & Related papers (2023-03-12T23:25:55Z) - Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z) - Does DQN really learn? Exploring adversarial training schemes in Pong [1.0323063834827415]
We study two self-play training schemes, Chainer and Pool, and show they lead to improved agent performance in Atari Pong.
We show that training agents with Chainer or Pool leads to richer network activations with greater predictive power to estimate critical game-state features.
arXiv Detail & Related papers (2022-03-20T18:12:55Z) - On-the-fly Strategy Adaptation for ad-hoc Agent Coordination [21.029009561094725]
Training agents in cooperative settings offers the promise of AI agents able to interact effectively with humans (and other agents) in the real world.
The vast majority of focus has been on the self-play paradigm.
This paper proposes to solve this problem by adapting agent strategies on the fly, using a posterior belief over the other agents' strategy.
arXiv Detail & Related papers (2022-03-08T02:18:11Z) - Conditional Imitation Learning for Multi-Agent Games [89.897635970366]
We study the problem of conditional multi-agent imitation learning, where we have access to joint trajectory demonstrations at training time.
We propose a novel approach to address the difficulties of scalability and data scarcity.
Our model learns a low-rank subspace over ego and partner agent strategies, then infers and adapts to a new partner strategy by interpolating in the subspace.
arXiv Detail & Related papers (2022-01-05T04:40:13Z) - Incorporating Rivalry in Reinforcement Learning for a Competitive Game [65.2200847818153]
This study focuses on providing a novel learning mechanism based on a rivalry social impact.
Based on the concept of competitive rivalry, our analysis aims to investigate if we can change the assessment of these agents from a human perspective.
arXiv Detail & Related papers (2020-11-02T21:54:18Z) - Multi-Agent Collaboration via Reward Attribution Decomposition [75.36911959491228]
We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge.
CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-10-16T17:42:11Z) - Moody Learners -- Explaining Competitive Behaviour of Reinforcement
Learning Agents [65.2200847818153]
In a competitive scenario, the agent does not only have a dynamic environment but also is directly affected by the opponents' actions.
Observing the Q-values of the agent is usually a way of explaining its behavior, however, do not show the temporal-relation between the selected actions.
arXiv Detail & Related papers (2020-07-30T11:30:42Z) - Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi [4.777698073163644]
In Hanabi, coordinated groups of players can leverage pre-established conventions to great effect, but playing in an ad-hoc setting requires agents to adapt to its partner's strategies with no previous coordination.
This paper proposes Quality Diversity algorithms as a promising class of algorithms to generate diverse populations for this purpose.
We also postulate that agents can benefit from a diverse population during training and implement a simple "meta-strategy" for adapting to an agent's perceived behavioral niche.
arXiv Detail & Related papers (2020-04-28T05:03:19Z) - "Other-Play" for Zero-Shot Coordination [21.607428852157273]
Other-play learning algorithm enhances self-play by looking for more robust strategies.
We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents.
arXiv Detail & Related papers (2020-03-06T00:39:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.