JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player
Zero-Sum Games
- URL: http://arxiv.org/abs/2308.04719v1
- Date: Wed, 9 Aug 2023 05:48:58 GMT
- Title: JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player
Zero-Sum Games
- Authors: Yang Li and Kun Xiong and Yingping Zhang and Jiangcheng Zhu and
Stephen Mcaleer and Wei Pan and Jun Wang and Zonghong Dai and Yaodong Yang
- Abstract summary: This paper focuses on Xiangqi, a traditional Chinese board game comparable in game-tree complexity to chess and shogi.
We introduce the JiangJun algorithm, an innovative combination of Monte-Carlo Tree Search (MCTS) and Policy Space Response Oracles (PSRO) designed to approximate a Nash equilibrium.
We evaluate the algorithm empirically using a WeChat mini program and achieve a Master level with a 99.41% win rate against human players.
- Score: 15.500508239382583
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents an empirical exploration of non-transitivity in
perfect-information games, specifically focusing on Xiangqi, a traditional
Chinese board game comparable in game-tree complexity to chess and shogi. By
analyzing over 10,000 records of human Xiangqi play, we highlight the existence
of both transitive and non-transitive elements within the game's strategic
structure. To address non-transitivity, we introduce the JiangJun algorithm, an
innovative combination of Monte-Carlo Tree Search (MCTS) and Policy Space
Response Oracles (PSRO) designed to approximate a Nash equilibrium. We evaluate
the algorithm empirically using a WeChat mini program and achieve a Master
level with a 99.41\% win rate against human players. The algorithm's
effectiveness in overcoming non-transitivity is confirmed by a plethora of
metrics, such as relative population performance and visualization results. Our
project site is available at
\url{https://sites.google.com/view/jiangjun-site/}.
Related papers
- Tree Search for Simultaneous Move Games via Equilibrium Approximation [13.89302587642183]
We study the class of simultaneous-move games.
Both agents know the game state with the exception of the opponent's move.
In this study we answer the question: can we take tree search algorithms trained through self-play from perfect information settings and adapt them to simultaneous move games without significant loss of performance?
arXiv Detail & Related papers (2024-06-14T21:02:35Z) - Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game.
Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z) - Hardness of Independent Learning and Sparse Equilibrium Computation in
Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games.
We show that no algorithm attains no-regret in general-sum games when executed independently by all players.
We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - On the complexity of Dark Chinese Chess [5.019685897194575]
Dark Chinese chess combines some of the most complicated aspects of board and card games.
This paper provides a complexity analysis for the game of dark Chinese chess.
arXiv Detail & Related papers (2021-12-06T13:08:53Z) - Measuring the Non-Transitivity in Chess [19.618609913302855]
We quantify the non-transitivity in Chess through real-world data from human players.
There exists a strong connection between the degree of non-transitivity and the progression of a Chess player's rating.
arXiv Detail & Related papers (2021-10-22T12:15:42Z) - Generating Diverse and Competitive Play-Styles for Strategy Games [58.896302717975445]
We propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes)
We show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play.
Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.
arXiv Detail & Related papers (2021-04-17T20:33:24Z) - Computing Nash Equilibria in Multiplayer DAG-Structured Stochastic Games
with Persistent Imperfect Information [1.7132914341329848]
We present an algorithm for approximating Nash equilibrium in multiplayer general-suming games with persistent imperfect information.
Using a new procedure, we are able to demonstrate that our algorithm computes a strategy that closely approximates Nash equilibrium in this game.
arXiv Detail & Related papers (2020-10-26T19:27:26Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay.
During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well.
Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.