Related papers: Learning to Play No-Press Diplomacy with Best Response Policy Iteration

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

URL: http://arxiv.org/abs/2006.04635v4
Date: Tue, 4 Jan 2022 15:11:59 GMT
Title: Learning to Play No-Press Diplomacy with Best Response Policy Iteration
Authors: Thomas Anthony, Tom Eccles, Andrea Tacchetti, J\'anos Kram\'ar, Ian Gemp, Thomas C. Hudson, Nicolas Porcel, Marc Lanctot, Julien P\'erolat, Richard Everett, Roman Werpachowski, Satinder Singh, Thore Graepel, and Yoram Bachrach
Abstract summary: We apply deep reinforcement learning methods to Diplomacy, a 7-player board game. We show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.
Score: 31.367850729299665
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.

Related papers

Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent. We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents. Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z)
Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game. Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z)
Leading the Pack: N-player Opponent Shaping [52.682734939786464]
We extend Opponent Shaping (OS) methods to environments involving multiple co-players and multiple shaping agents. We find that when playing with a large number of co-players, OS methods' relative performance reduces, suggesting that in the limit OS methods may not perform well.
arXiv Detail & Related papers (2023-12-19T20:01:42Z)
Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games [14.979239870856535]
Self-play (SP) is a popular reinforcement learning framework for solving competitive games. In this work, we develop a novel algorithm, Fictitious Cross-Play (FXP), which inherits the benefits from both frameworks.
arXiv Detail & Related papers (2023-10-05T07:19:33Z)
ApproxED: Approximate exploitability descent via learned best responses [61.17702187957206]
We study the problem of finding an approximate Nash equilibrium of games with continuous action sets. We propose two new methods that minimize an approximation of exploitability with respect to the strategy profile.
arXiv Detail & Related papers (2023-01-20T23:55:30Z)
Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition. We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy. We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z)
Reinforcement Learning Agents in Colonel Blotto [0.0]
We focus on a specific instance of agent-based models, which uses reinforcement learning (RL) to train the agent how to act in its environment. We find that the RL agent handily beats a single opponent, and still performs quite well when the number of opponents are increased. We also analyze the RL agent and look at what strategies it has arrived by looking at the actions that it has given the highest and lowest Q-values.
arXiv Detail & Related papers (2022-04-04T16:18:01Z)
Reinforcement Learning In Two Player Zero Sum Simultaneous Action Games [0.0]
Two player zero sum simultaneous action games are common in video games, financial markets, war, business competition, and many other settings. We introduce the fundamental concepts of reinforcement learning in two player zero sum simultaneous action games and discuss the unique challenges this type of game poses. We introduce two novel agents that attempt to handle these challenges by using joint action Deep Q-Networks.
arXiv Detail & Related papers (2021-10-10T16:03:44Z)
No-Press Diplomacy from Scratch [26.36204634856853]
We describe an algorithm for action exploration and equilibrium approximation in games with superhuman action spaces. We train an agent, DORA, completely from scratch for a popular two-player variant of Diplomacy and show that it achieves superhuman performance. We extend our methods to full-scale no-press Diplomacy and for the first time train an agent from scratch with no human data.
arXiv Detail & Related papers (2021-10-06T17:12:50Z)
Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards. We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z)
Learning Monopoly Gameplay: A Hybrid Model-Free Deep Reinforcement Learning and Imitation Learning Approach [31.066718635447746]
Reinforcement Learning (RL) relies on an agent interacting with an environment to maximize the cumulative sum of rewards received by it. In multi-player Monopoly game, players have to make several decisions every turn which involves complex actions, such as making trades. This paper introduces a Hybrid Model-Free Deep RL (DRL) approach that is capable of playing and learning winning strategies of Monopoly.
arXiv Detail & Related papers (2021-03-01T01:40:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.