Mastering Terra Mystica: Applying Self-Play to Multi-agent Cooperative
Board Games
- URL: http://arxiv.org/abs/2102.10540v1
- Date: Sun, 21 Feb 2021 07:53:34 GMT
- Title: Mastering Terra Mystica: Applying Self-Play to Multi-agent Cooperative
Board Games
- Authors: Luis Perez
- Abstract summary: In this paper, we explore and compare multiple algorithms for solving the complex strategy game of Terra Mystica.
We apply these breakthroughs to a novel state-representation of TM with the goal of creating an AI that will rival human players.
In the end, we discuss the success and shortcomings of this method by comparing against multiple baselines and typical human scores.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we explore and compare multiple algorithms for solving the
complex strategy game of Terra Mystica, hereafter abbreviated as TM. Previous
work in the area of super-human game-play using AI has proven effective, with
recent break-through for generic algorithms in games such as Go, Chess, and
Shogi \cite{AlphaZero}. We directly apply these breakthroughs to a novel
state-representation of TM with the goal of creating an AI that will rival
human players. Specifically, we present the initial results of applying
AlphaZero to this state-representation and analyze the strategies developed. A
brief analysis is presented. We call this modified algorithm with our novel
state-representation AlphaTM. In the end, we discuss the success and
shortcomings of this method by comparing against multiple baselines and typical
human scores. All code used for this paper is available at on
\href{https://github.com/kandluis/terrazero}{GitHub}.
Related papers
- AlphaZero Gomoku [9.434566356382529]
We broaden the use of AlphaZero to Gomoku, an age-old tactical board game also referred to as "Five in a Row"
Our tests demonstrate AlphaZero's versatility in adapting to games other than Go.
arXiv Detail & Related papers (2023-09-04T00:20:06Z) - Targeted Search Control in AlphaZero for Effective Policy Improvement [93.30151539224144]
We introduce Go-Exploit, a novel search control strategy for AlphaZero.
Go-Exploit samples the start state of its self-play trajectories from an archive of states of interest.
Go-Exploit learns with a greater sample efficiency than standard AlphaZero.
arXiv Detail & Related papers (2023-02-23T22:50:24Z) - Generalised agent for solving higher board states of tic tac toe using
Reinforcement Learning [0.0]
This study is aimed at providing a generalized algorithm for higher board states of tic tac toe to make precise moves in a short period.
The idea is to pose the tic tac toe game as a well-posed learning problem.
The study and its results are promising, giving a high win to draw ratio with each epoch of training.
arXiv Detail & Related papers (2022-12-23T10:58:27Z) - Are AlphaZero-like Agents Robust to Adversarial Perturbations? [73.13944217915089]
AlphaZero (AZ) has demonstrated that neural-network-based Go AIs can surpass human performance by a large margin.
We ask whether adversarial states exist for Go AIs that may lead them to play surprisingly wrong actions.
We develop the first adversarial attack on Go AIs that can efficiently search for adversarial states by strategically reducing the search space.
arXiv Detail & Related papers (2022-11-07T18:43:25Z) - DanZero: Mastering GuanDan Game with Reinforcement Learning [121.93690719186412]
Card game AI has always been a hot topic in the research of artificial intelligence.
In this paper, we are devoted to developing an AI program for a more complex card game, GuanDan.
We propose the first AI program DanZero for GuanDan using reinforcement learning technique.
arXiv Detail & Related papers (2022-10-31T06:29:08Z) - Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games [31.97631243571394]
We introduce a framework, LMAC, that automates the discovery of the update rule without explicit human design.
Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance.
We show that LMAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO.
arXiv Detail & Related papers (2021-06-04T22:30:25Z) - Combining Off and On-Policy Training in Model-Based Reinforcement
Learning [77.34726150561087]
We propose a way to obtain off-policy targets using data from simulated games in MuZero.
Our results show that these targets speed up the training process and lead to faster convergence and higher rewards.
arXiv Detail & Related papers (2021-02-24T10:47:26Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games.
We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game.
We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.