Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning
- URL: http://arxiv.org/abs/2210.05492v1
- Date: Tue, 11 Oct 2022 14:47:35 GMT
- Title: Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning
- Authors: Anton Bakhtin, David J Wu, Adam Lerer, Jonathan Gray, Athul Paul
Jacob, Gabriele Farina, Alexander H Miller, Noam Brown
- Abstract summary: No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
- Score: 95.78031053296513
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: No-press Diplomacy is a complex strategy game involving both cooperation and
competition that has served as a benchmark for multi-agent AI research. While
self-play reinforcement learning has resulted in numerous successes in purely
adversarial games like chess, Go, and poker, self-play alone is insufficient
for achieving optimal performance in domains involving cooperation with humans.
We address this shortcoming by first introducing a planning algorithm we call
DiL-piKL that regularizes a reward-maximizing policy toward a human
imitation-learned policy. We prove that this is a no-regret learning algorithm
under a modified utility function. We then show that DiL-piKL can be extended
into a self-play reinforcement learning algorithm we call RL-DiL-piKL that
provides a model of human play while simultaneously training an agent that
responds well to this human model. We used RL-DiL-piKL to train an agent we
name Diplodocus. In a 200-game no-press Diplomacy tournament involving 62 human
participants spanning skill levels from beginner to expert, two Diplodocus
agents both achieved a higher average score than all other participants who
played more than two games, and ranked first and third according to an Elo
ratings model.
Related papers
- Reinforcing Competitive Multi-Agents for Playing So Long Sucker [0.393259574660092]
This paper examines the use of classical deep reinforcement learning (DRL) algorithms, DQN, DDQN, and Dueling DQN, in the strategy game So Long Sucker.
The study's primary goal is to teach autonomous agents the game's rules and strategies using classical DRL methods.
arXiv Detail & Related papers (2024-11-17T12:38:13Z) - DanZero+: Dominating the GuanDan Game through Reinforcement Learning [95.90682269990705]
We develop an AI program for an exceptionally complex and popular card game called GuanDan.
We first put forward an AI program named DanZero for this game.
In order to further enhance the AI's capabilities, we apply policy-based reinforcement learning algorithm to GuanDan.
arXiv Detail & Related papers (2023-12-05T08:07:32Z) - Modeling Strong and Human-Like Gameplay with KL-Regularized Search [64.24339197581769]
We consider the task of building strong but human-like policies in multi-agent decision-making problems.
Imitation learning is effective at predicting human actions but may not match the strength of expert humans.
We show in chess and Go that regularizing search policies based on the KL divergence from an imitation-learned policy by applying Monte Carlo tree search produces policies that have higher human prediction accuracy and are stronger than the imitation policy.
arXiv Detail & Related papers (2021-12-14T16:52:49Z) - No-Press Diplomacy from Scratch [26.36204634856853]
We describe an algorithm for action exploration and equilibrium approximation in games with superhuman action spaces.
We train an agent, DORA, completely from scratch for a popular two-player variant of Diplomacy and show that it achieves superhuman performance.
We extend our methods to full-scale no-press Diplomacy and for the first time train an agent from scratch with no human data.
arXiv Detail & Related papers (2021-10-06T17:12:50Z) - Learning Monopoly Gameplay: A Hybrid Model-Free Deep Reinforcement
Learning and Imitation Learning Approach [31.066718635447746]
Reinforcement Learning (RL) relies on an agent interacting with an environment to maximize the cumulative sum of rewards received by it.
In multi-player Monopoly game, players have to make several decisions every turn which involves complex actions, such as making trades.
This paper introduces a Hybrid Model-Free Deep RL (DRL) approach that is capable of playing and learning winning strategies of Monopoly.
arXiv Detail & Related papers (2021-03-01T01:40:02Z) - Multi-Agent Collaboration via Reward Attribution Decomposition [75.36911959491228]
We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge.
CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-10-16T17:42:11Z) - Human-Level Performance in No-Press Diplomacy via Equilibrium Search [29.858369754530905]
We describe an agent for the no-press variant of Diplomacy that combines supervised learning on human data with one-step lookahead search via regret minimization.
We show that our agent greatly exceeds the performance of past no-press Diplomacy bots, is unexploitable by expert humans, and ranks in the top 2% of human players when playing anonymous games on a popular Diplomacy website.
arXiv Detail & Related papers (2020-10-06T01:28:34Z) - Learning to Play No-Press Diplomacy with Best Response Policy Iteration [31.367850729299665]
We apply deep reinforcement learning methods to Diplomacy, a 7-player board game.
We show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.
arXiv Detail & Related papers (2020-06-08T14:33:31Z) - Learning from Learners: Adapting Reinforcement Learning Agents to be
Competitive in a Card Game [71.24825724518847]
We present a study on how popular reinforcement learning algorithms can be adapted to learn and to play a real-world implementation of a competitive multiplayer card game.
We propose specific training and validation routines for the learning agents, in order to evaluate how the agents learn to be competitive and explain how they adapt to each others' playing style.
arXiv Detail & Related papers (2020-04-08T14:11:05Z) - Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games.
We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game.
We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.