No-Press Diplomacy from Scratch
- URL: http://arxiv.org/abs/2110.02924v1
- Date: Wed, 6 Oct 2021 17:12:50 GMT
- Title: No-Press Diplomacy from Scratch
- Authors: Anton Bakhtin, David Wu, Adam Lerer, Noam Brown
- Abstract summary: We describe an algorithm for action exploration and equilibrium approximation in games with superhuman action spaces.
We train an agent, DORA, completely from scratch for a popular two-player variant of Diplomacy and show that it achieves superhuman performance.
We extend our methods to full-scale no-press Diplomacy and for the first time train an agent from scratch with no human data.
- Score: 26.36204634856853
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prior AI successes in complex games have largely focused on settings with at
most hundreds of actions at each decision point. In contrast, Diplomacy is a
game with more than 10^20 possible actions per turn. Previous attempts to
address games with large branching factors, such as Diplomacy, StarCraft, and
Dota, used human data to bootstrap the policy or used handcrafted reward
shaping. In this paper, we describe an algorithm for action exploration and
equilibrium approximation in games with combinatorial action spaces. This
algorithm simultaneously performs value iteration while learning a policy
proposal network. A double oracle step is used to explore additional actions to
add to the policy proposals. At each state, the target state value and policy
for the model training are computed via an equilibrium search procedure. Using
this algorithm, we train an agent, DORA, completely from scratch for a popular
two-player variant of Diplomacy and show that it achieves superhuman
performance. Additionally, we extend our methods to full-scale no-press
Diplomacy and for the first time train an agent from scratch with no human
data. We present evidence that this agent plays a strategy that is incompatible
with human-data bootstrapped agents. This presents the first strong evidence of
multiple equilibria in Diplomacy and suggests that self play alone may be
insufficient for achieving superhuman performance in Diplomacy.
Related papers
- Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game.
Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z) - DanZero+: Dominating the GuanDan Game through Reinforcement Learning [95.90682269990705]
We develop an AI program for an exceptionally complex and popular card game called GuanDan.
We first put forward an AI program named DanZero for this game.
In order to further enhance the AI's capabilities, we apply policy-based reinforcement learning algorithm to GuanDan.
arXiv Detail & Related papers (2023-12-05T08:07:32Z) - Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z) - Policy Fusion for Adaptive and Customizable Reinforcement Learning
Agents [137.86426963572214]
We show how to combine distinct behavioral policies to obtain a meaningful "fusion" policy.
We propose four different policy fusion methods for combining pre-trained policies.
We provide several practical examples and use-cases for how these methods are indeed useful for video game production and designers.
arXiv Detail & Related papers (2021-04-21T16:08:44Z) - Discovering Diverse Multi-Agent Strategic Behavior via Reward
Randomization [42.33734089361143]
We propose a technique for discovering diverse strategic policies in complex multi-agent games.
We derive a new algorithm, Reward-Randomized Policy Gradient (RPG)
RPG is able to discover multiple distinctive human-interpretable strategies in challenging temporal trust dilemmas.
arXiv Detail & Related papers (2021-03-08T06:26:55Z) - Human-Level Performance in No-Press Diplomacy via Equilibrium Search [29.858369754530905]
We describe an agent for the no-press variant of Diplomacy that combines supervised learning on human data with one-step lookahead search via regret minimization.
We show that our agent greatly exceeds the performance of past no-press Diplomacy bots, is unexploitable by expert humans, and ranks in the top 2% of human players when playing anonymous games on a popular Diplomacy website.
arXiv Detail & Related papers (2020-10-06T01:28:34Z) - Learning to Play No-Press Diplomacy with Best Response Policy Iteration [31.367850729299665]
We apply deep reinforcement learning methods to Diplomacy, a 7-player board game.
We show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.
arXiv Detail & Related papers (2020-06-08T14:33:31Z) - Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay.
During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well.
Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z) - Deep RL Agent for a Real-Time Action Strategy Game [0.3867363075280543]
We introduce a reinforcement learning environment based on Heroic - Magic Duel, a 1 v 1 action strategy game.
Our main contribution is a deep reinforcement learning agent playing the game at a competitive level.
Our best self-play agent, obtains around $65%$ win rate against the existing AI and over $50%$ win rate against a top human player.
arXiv Detail & Related papers (2020-02-15T01:09:56Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.