Human-Level Performance in No-Press Diplomacy via Equilibrium Search
- URL: http://arxiv.org/abs/2010.02923v2
- Date: Mon, 3 May 2021 14:00:12 GMT
- Title: Human-Level Performance in No-Press Diplomacy via Equilibrium Search
- Authors: Jonathan Gray, Adam Lerer, Anton Bakhtin, Noam Brown
- Abstract summary: We describe an agent for the no-press variant of Diplomacy that combines supervised learning on human data with one-step lookahead search via regret minimization.
We show that our agent greatly exceeds the performance of past no-press Diplomacy bots, is unexploitable by expert humans, and ranks in the top 2% of human players when playing anonymous games on a popular Diplomacy website.
- Score: 29.858369754530905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prior AI breakthroughs in complex games have focused on either the purely
adversarial or purely cooperative settings. In contrast, Diplomacy is a game of
shifting alliances that involves both cooperation and competition. For this
reason, Diplomacy has proven to be a formidable research challenge. In this
paper we describe an agent for the no-press variant of Diplomacy that combines
supervised learning on human data with one-step lookahead search via regret
minimization. Regret minimization techniques have been behind previous AI
successes in adversarial games, most notably poker, but have not previously
been shown to be successful in large-scale games involving cooperation. We show
that our agent greatly exceeds the performance of past no-press Diplomacy bots,
is unexploitable by expert humans, and ranks in the top 2% of human players
when playing anonymous games on a popular Diplomacy website.
Related papers
- More Victories, Less Cooperation: Assessing Cicero's Diplomacy Play [28.004172388400132]
This work seeks to understand the degree to which Cicero succeeds at communication.
We run two dozen games with humans and Cicero, totaling over 200 human-player hours of competition.
While AI can consistently outplay human players, AI-Human communication is still limited because of AI's difficulty with deception and persuasion.
arXiv Detail & Related papers (2024-06-07T05:03:44Z) - DanZero+: Dominating the GuanDan Game through Reinforcement Learning [95.90682269990705]
We develop an AI program for an exceptionally complex and popular card game called GuanDan.
We first put forward an AI program named DanZero for this game.
In order to further enhance the AI's capabilities, we apply policy-based reinforcement learning algorithm to GuanDan.
arXiv Detail & Related papers (2023-12-05T08:07:32Z) - Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z) - No-Press Diplomacy from Scratch [26.36204634856853]
We describe an algorithm for action exploration and equilibrium approximation in games with superhuman action spaces.
We train an agent, DORA, completely from scratch for a popular two-player variant of Diplomacy and show that it achieves superhuman performance.
We extend our methods to full-scale no-press Diplomacy and for the first time train an agent from scratch with no human data.
arXiv Detail & Related papers (2021-10-06T17:12:50Z) - Multi-Agent Collaboration via Reward Attribution Decomposition [75.36911959491228]
We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge.
CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-10-16T17:42:11Z) - Learning to Play No-Press Diplomacy with Best Response Policy Iteration [31.367850729299665]
We apply deep reinforcement learning methods to Diplomacy, a 7-player board game.
We show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.
arXiv Detail & Related papers (2020-06-08T14:33:31Z) - Suphx: Mastering Mahjong with Deep Reinforcement Learning [114.68233321904623]
We design an AI for Mahjong, named Suphx, based on deep reinforcement learning with some newly introduced techniques.
Suphx has demonstrated stronger performance than most top human players in terms of stable rank.
This is the first time that a computer program outperforms most top human players in Mahjong.
arXiv Detail & Related papers (2020-03-30T16:18:16Z) - Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games [22.38765498549914]
We argue that a systematic study of many-player zero-sum games is a crucial element of artificial intelligence research.
Using symmetric zero-sum matrix games, we demonstrate formally that alliance formation may be seen as a social dilemma.
We show how reinforcement learning may be augmented with a peer-to-peer contract mechanism to discover and enforce alliances.
arXiv Detail & Related papers (2020-02-27T10:32:31Z) - Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games.
We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game.
We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.