Mastering the Game of Stratego with Model-Free Multiagent Reinforcement
Learning
- URL: http://arxiv.org/abs/2206.15378v1
- Date: Thu, 30 Jun 2022 15:53:19 GMT
- Title: Mastering the Game of Stratego with Model-Free Multiagent Reinforcement
Learning
- Authors: Julien Perolat, Bart de Vylder, Daniel Hennes, Eugene Tarassov,
Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch,
Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H. Cen, Zhe Wang,
Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr
Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste
Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent Sifre,
Nathalie Beauguerlange, Remi Munos, David Silver, Satinder Singh, Demis
Hassabis, Karl Tuyls
- Abstract summary: Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered.
Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome.
DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform.
- Score: 86.37438204416435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce DeepNash, an autonomous agent capable of learning to play the
imperfect information game Stratego from scratch, up to a human expert level.
Stratego is one of the few iconic board games that Artificial Intelligence (AI)
has not yet mastered. This popular game has an enormous game tree on the order
of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the
additional complexity of requiring decision-making under imperfect information,
similar to Texas hold'em poker, which has a significantly smaller game tree (on
the order of $10^{164}$ nodes). Decisions in Stratego are made over a large
number of discrete actions with no obvious link between action and outcome.
Episodes are long, with often hundreds of moves before a player wins, and
situations in Stratego can not easily be broken down into manageably-sized
sub-problems as in poker. For these reasons, Stratego has been a grand
challenge for the field of AI for decades, and existing AI methods barely reach
an amateur level of play. DeepNash uses a game-theoretic, model-free deep
reinforcement learning method, without search, that learns to master Stratego
via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component
of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling'
around it, by directly modifying the underlying multi-agent learning dynamics.
DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a
yearly (2022) and all-time top-3 rank on the Gravon games platform, competing
with human expert players.
Related papers
- Are AlphaZero-like Agents Robust to Adversarial Perturbations? [73.13944217915089]
AlphaZero (AZ) has demonstrated that neural-network-based Go AIs can surpass human performance by a large margin.
We ask whether adversarial states exist for Go AIs that may lead them to play surprisingly wrong actions.
We develop the first adversarial attack on Go AIs that can efficiently search for adversarial states by strategically reducing the search space.
arXiv Detail & Related papers (2022-11-07T18:43:25Z) - Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z) - DecisionHoldem: Safe Depth-Limited Solving With Diverse Opponents for Imperfect-Information Games [31.26667266662521]
DecisionHoldem is a high-level AI for heads-up no-limit Texas hold'em with safe depth-limited subgame solving.
We release the source codes and tools of DecisionHoldem to promote AI development in imperfect-information games.
arXiv Detail & Related papers (2022-01-27T15:35:49Z) - Generating Diverse and Competitive Play-Styles for Strategy Games [58.896302717975445]
We propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes)
We show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play.
Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.
arXiv Detail & Related papers (2021-04-17T20:33:24Z) - Online Double Oracle [20.291016382324777]
This paper proposes new learning algorithms in two-player zero-sum games where the number of pure strategies is huge or infinite.
Our method achieves the regret bound of $mathcalO(sqrtT k log(k))$ in self-play setting where $k$ is NOT the size of the game.
arXiv Detail & Related papers (2021-03-13T19:48:27Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Suphx: Mastering Mahjong with Deep Reinforcement Learning [114.68233321904623]
We design an AI for Mahjong, named Suphx, based on deep reinforcement learning with some newly introduced techniques.
Suphx has demonstrated stronger performance than most top human players in terms of stable rank.
This is the first time that a computer program outperforms most top human players in Mahjong.
arXiv Detail & Related papers (2020-03-30T16:18:16Z) - Deep RL Agent for a Real-Time Action Strategy Game [0.3867363075280543]
We introduce a reinforcement learning environment based on Heroic - Magic Duel, a 1 v 1 action strategy game.
Our main contribution is a deep reinforcement learning agent playing the game at a competitive level.
Our best self-play agent, obtains around $65%$ win rate against the existing AI and over $50%$ win rate against a top human player.
arXiv Detail & Related papers (2020-02-15T01:09:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.