Reinforcement Learning for Hanabi
- URL: http://arxiv.org/abs/2506.00458v1
- Date: Sat, 31 May 2025 08:24:16 GMT
- Title: Reinforcement Learning for Hanabi
- Authors: Nina Cohen, Kordel K. France,
- Abstract summary: We explored different reinforcement learning algorithms to see which had the best performance against an agent of the same type and also against other types of agents.<n>In the end, we found that temporal difference (TD) algorithms had better overall performance and balancing of play types compared to tabular agents.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hanabi has become a popular game for research when it comes to reinforcement learning (RL) as it is one of the few cooperative card games where you have incomplete knowledge of the entire environment, thus presenting a challenge for a RL agent. We explored different tabular and deep reinforcement learning algorithms to see which had the best performance both against an agent of the same type and also against other types of agents. We establish that certain agents played their highest scoring games against specific agents while others exhibited higher scores on average by adapting to the opposing agent's behavior. We attempted to quantify the conditions under which each algorithm provides the best advantage and identified the most interesting interactions between agents of different types. In the end, we found that temporal difference (TD) algorithms had better overall performance and balancing of play types compared to tabular agents. Specifically, tabular Expected SARSA and deep Q-Learning agents showed the best performance.
Related papers
- Bandit Social Learning: Exploration under Myopic Behavior [54.767961587919075]
We study social learning dynamics motivated by reviews on online platforms.<n>Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration.<n>We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z) - Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z) - Reinforcement Learning Agents in Colonel Blotto [0.0]
We focus on a specific instance of agent-based models, which uses reinforcement learning (RL) to train the agent how to act in its environment.
We find that the RL agent handily beats a single opponent, and still performs quite well when the number of opponents are increased.
We also analyze the RL agent and look at what strategies it has arrived by looking at the actions that it has given the highest and lowest Q-values.
arXiv Detail & Related papers (2022-04-04T16:18:01Z) - Predicting Game Engagement and Difficulty Using AI Players [3.0501851690100277]
This paper presents a novel approach to automated playtesting for the prediction of human player behavior and experience.
It has previously been demonstrated that Deep Reinforcement Learning game-playing agents can predict both game difficulty and player engagement.
We improve this approach by enhancing DRL with Monte Carlo Tree Search (MCTS)
arXiv Detail & Related papers (2021-07-26T09:31:57Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - An Empirical Study on the Generalization Power of Neural Representations
Learned via Visual Guessing Games [79.23847247132345]
This work investigates how well an artificial agent can benefit from playing guessing games when later asked to perform on novel NLP downstream tasks such as Visual Question Answering (VQA)
We propose two ways to exploit playing guessing games: 1) a supervised learning scenario in which the agent learns to mimic successful guessing games and 2) a novel way for an agent to play by itself, called Self-play via Iterated Experience Learning (SPIEL)
arXiv Detail & Related papers (2021-01-31T10:30:48Z) - Multi-Agent Collaboration via Reward Attribution Decomposition [75.36911959491228]
We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge.
CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-10-16T17:42:11Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z) - Approximate exploitability: Learning a best response in large games [31.066412349285994]
We introduce ISMCTS-BR, a scalable search-based deep reinforcement learning algorithm for learning a best response to an agent.
We demonstrate the technique in several two-player zero-sum games against a variety of agents.
arXiv Detail & Related papers (2020-04-20T23:36:40Z) - Learning from Learners: Adapting Reinforcement Learning Agents to be
Competitive in a Card Game [71.24825724518847]
We present a study on how popular reinforcement learning algorithms can be adapted to learn and to play a real-world implementation of a competitive multiplayer card game.
We propose specific training and validation routines for the learning agents, in order to evaluate how the agents learn to be competitive and explain how they adapt to each others' playing style.
arXiv Detail & Related papers (2020-04-08T14:11:05Z) - "Other-Play" for Zero-Shot Coordination [21.607428852157273]
Other-play learning algorithm enhances self-play by looking for more robust strategies.
We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents.
arXiv Detail & Related papers (2020-03-06T00:39:37Z) - Multi Type Mean Field Reinforcement Learning [26.110052366068533]
We extend mean field multiagent algorithms to multiple types.
We conduct experiments on three different testbeds for the field of many agent reinforcement learning.
arXiv Detail & Related papers (2020-02-06T20:58:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.