AlphaZero-Inspired General Board Game Learning and Playing
- URL: http://arxiv.org/abs/2204.13307v1
- Date: Thu, 28 Apr 2022 07:04:14 GMT
- Title: AlphaZero-Inspired General Board Game Learning and Playing
- Authors: Johannes Scheiermann and Wolfgang Konen
- Abstract summary: Recently, the seminal algorithms AlphaGo and AlphaZero have started a new era in game learning and deep reinforcement learning.
In this paper, we pick an important element of AlphaZero - the Monte Carlo Tree Search (MCTS) planning stage - and combine it with reinforcement learning (RL) agents.
We apply this new architecture to several complex games (Othello, ConnectFour, Rubik's Cube) and show the advantages achieved with this AlphaZero-inspired MCTS wrapper.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recently, the seminal algorithms AlphaGo and AlphaZero have started a new era
in game learning and deep reinforcement learning. While the achievements of
AlphaGo and AlphaZero - playing Go and other complex games at super human level
- are truly impressive, these architectures have the drawback that they are
very complex and require high computational resources. Many researchers are
looking for methods that are similar to AlphaZero, but have lower computational
demands and are thus more easily reproducible. In this paper, we pick an
important element of AlphaZero - the Monte Carlo Tree Search (MCTS) planning
stage - and combine it with reinforcement learning (RL) agents. We wrap MCTS
for the first time around RL n-tuple networks to create versatile agents that
keep at the same time the computational demands low. We apply this new
architecture to several complex games (Othello, ConnectFour, Rubik's Cube) and
show the advantages achieved with this AlphaZero-inspired MCTS wrapper. In
particular, we present results that this AlphaZero-inspired agent is the first
one trained on standard hardware (no GPU or TPU) to beat the very strong
Othello program Edax up to and including level 7 (where most other algorithms
could only defeat Edax up to level 2).
Related papers
- DanZero+: Dominating the GuanDan Game through Reinforcement Learning [95.90682269990705]
We develop an AI program for an exceptionally complex and popular card game called GuanDan.
We first put forward an AI program named DanZero for this game.
In order to further enhance the AI's capabilities, we apply policy-based reinforcement learning algorithm to GuanDan.
arXiv Detail & Related papers (2023-12-05T08:07:32Z) - Accelerate Multi-Agent Reinforcement Learning in Zero-Sum Games with
Subgame Curriculum Learning [65.36326734799587]
We present a novel subgame curriculum learning framework for zero-sum games.
It adopts an adaptive initial state distribution by resetting agents to some previously visited states.
We derive a subgame selection metric that approximates the squared distance to NE values.
arXiv Detail & Related papers (2023-10-07T13:09:37Z) - AlphaZero Gomoku [9.434566356382529]
We broaden the use of AlphaZero to Gomoku, an age-old tactical board game also referred to as "Five in a Row"
Our tests demonstrate AlphaZero's versatility in adapting to games other than Go.
arXiv Detail & Related papers (2023-09-04T00:20:06Z) - SPRING: Studying the Paper and Reasoning to Play Games [102.5587155284795]
We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM)
In experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment.
Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories.
arXiv Detail & Related papers (2023-05-24T18:14:35Z) - Targeted Search Control in AlphaZero for Effective Policy Improvement [93.30151539224144]
We introduce Go-Exploit, a novel search control strategy for AlphaZero.
Go-Exploit samples the start state of its self-play trajectories from an archive of states of interest.
Go-Exploit learns with a greater sample efficiency than standard AlphaZero.
arXiv Detail & Related papers (2023-02-23T22:50:24Z) - On Efficient Reinforcement Learning for Full-length Game of StarCraft II [21.768578136029987]
We investigate a hierarchical RL approach involving extracted macro-actions and a hierarchical architecture of neural networks.
On a 64x64 map and using restrictive units, we achieve a win rate of 99% against the level-1 built-in AI.
We improve our architecture to train the agent against the cheating level AIs and achieve the win rate against the level-8, level-9, and level-10 AIs as 96%, 97%, and 94%, respectively.
arXiv Detail & Related papers (2022-09-23T12:24:21Z) - Neural Networks for Chess [2.055949720959582]
AlphaZero, Leela Chess Zero and Stockfish NNUE revolutionized Computer Chess.
This book gives a complete introduction into the technical inner workings of such engines.
arXiv Detail & Related papers (2022-09-03T22:17:16Z) - An AlphaZero-Inspired Approach to Solving Search Problems [63.24965775030674]
We adapt the methods and techniques used in AlphaZero for solving search problems.
We describe possible representations in terms of easy-instance solvers and self-reductions.
We also describe a version of Monte Carlo tree search adapted for search problems.
arXiv Detail & Related papers (2022-07-02T23:39:45Z) - Final Adaptation Reinforcement Learning for N-Player Games [0.0]
This paper covers n-tuple-based reinforcement learning (RL) algorithms for games.
We present new algorithms for TD-, SARSA- and Q-learning which work seamlessly on various games with arbitrary number of players.
We add a new element called Final Adaptation RL (FARL) to all these algorithms.
arXiv Detail & Related papers (2021-11-29T08:36:39Z) - Generating Diverse and Competitive Play-Styles for Strategy Games [58.896302717975445]
We propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes)
We show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play.
Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.
arXiv Detail & Related papers (2021-04-17T20:33:24Z) - Warm-Start AlphaZero Self-Play Search Enhancements [5.096685900776467]
Recently, AlphaZero has achieved landmark results in deep reinforcement learning.
We propose a novel approach to deal with this cold-start problem by employing simple search enhancements.
Our experiments indicate that most of these enhancements improve the performance of their baseline player in three different (small) board games.
arXiv Detail & Related papers (2020-04-26T11:48:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.