Learning to Play No-Press Diplomacy with Best Response Policy Iteration
- URL: http://arxiv.org/abs/2006.04635v4
- Date: Tue, 4 Jan 2022 15:11:59 GMT
- Title: Learning to Play No-Press Diplomacy with Best Response Policy Iteration
- Authors: Thomas Anthony, Tom Eccles, Andrea Tacchetti, J\'anos Kram\'ar, Ian
Gemp, Thomas C. Hudson, Nicolas Porcel, Marc Lanctot, Julien P\'erolat,
Richard Everett, Roman Werpachowski, Satinder Singh, Thore Graepel, and Yoram
Bachrach
- Abstract summary: We apply deep reinforcement learning methods to Diplomacy, a 7-player board game.
We show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.
- Score: 31.367850729299665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in deep reinforcement learning (RL) have led to considerable
progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The
purely adversarial nature of such games allows for conceptually simple and
principled application of RL methods. However real-world settings are
many-agent, and agent interactions are complex mixtures of common-interest and
competitive aspects. We consider Diplomacy, a 7-player board game designed to
accentuate dilemmas resulting from many-agent interactions. It also features a
large combinatorial action space and simultaneous moves, which are challenging
for RL algorithms. We propose a simple yet effective approximate best response
operator, designed to handle large combinatorial action spaces and simultaneous
moves. We also introduce a family of policy iteration methods that approximate
fictitious play. With these methods, we successfully apply RL to Diplomacy: we
show that our agents convincingly outperform the previous state-of-the-art, and
game theoretic equilibrium analysis shows that the new process yields
consistent improvements.
Related papers
- Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent.
We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents.
Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z) - Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game.
Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z) - Leading the Pack: N-player Opponent Shaping [52.682734939786464]
We extend Opponent Shaping (OS) methods to environments involving multiple co-players and multiple shaping agents.
We find that when playing with a large number of co-players, OS methods' relative performance reduces, suggesting that in the limit OS methods may not perform well.
arXiv Detail & Related papers (2023-12-19T20:01:42Z) - Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed
Cooperative-Competitive Games [14.979239870856535]
Self-play (SP) is a popular reinforcement learning framework for solving competitive games.
In this work, we develop a novel algorithm, Fictitious Cross-Play (FXP), which inherits the benefits from both frameworks.
arXiv Detail & Related papers (2023-10-05T07:19:33Z) - ApproxED: Approximate exploitability descent via learned best responses [61.17702187957206]
We study the problem of finding an approximate Nash equilibrium of games with continuous action sets.
We propose two new methods that minimize an approximation of exploitability with respect to the strategy profile.
arXiv Detail & Related papers (2023-01-20T23:55:30Z) - Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z) - Reinforcement Learning Agents in Colonel Blotto [0.0]
We focus on a specific instance of agent-based models, which uses reinforcement learning (RL) to train the agent how to act in its environment.
We find that the RL agent handily beats a single opponent, and still performs quite well when the number of opponents are increased.
We also analyze the RL agent and look at what strategies it has arrived by looking at the actions that it has given the highest and lowest Q-values.
arXiv Detail & Related papers (2022-04-04T16:18:01Z) - Reinforcement Learning In Two Player Zero Sum Simultaneous Action Games [0.0]
Two player zero sum simultaneous action games are common in video games, financial markets, war, business competition, and many other settings.
We introduce the fundamental concepts of reinforcement learning in two player zero sum simultaneous action games and discuss the unique challenges this type of game poses.
We introduce two novel agents that attempt to handle these challenges by using joint action Deep Q-Networks.
arXiv Detail & Related papers (2021-10-10T16:03:44Z) - No-Press Diplomacy from Scratch [26.36204634856853]
We describe an algorithm for action exploration and equilibrium approximation in games with superhuman action spaces.
We train an agent, DORA, completely from scratch for a popular two-player variant of Diplomacy and show that it achieves superhuman performance.
We extend our methods to full-scale no-press Diplomacy and for the first time train an agent from scratch with no human data.
arXiv Detail & Related papers (2021-10-06T17:12:50Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Learning Monopoly Gameplay: A Hybrid Model-Free Deep Reinforcement
Learning and Imitation Learning Approach [31.066718635447746]
Reinforcement Learning (RL) relies on an agent interacting with an environment to maximize the cumulative sum of rewards received by it.
In multi-player Monopoly game, players have to make several decisions every turn which involves complex actions, such as making trades.
This paper introduces a Hybrid Model-Free Deep RL (DRL) approach that is capable of playing and learning winning strategies of Monopoly.
arXiv Detail & Related papers (2021-03-01T01:40:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.