Monte-Carlo Graph Search for AlphaZero
- URL: http://arxiv.org/abs/2012.11045v1
- Date: Sun, 20 Dec 2020 22:51:38 GMT
- Title: Monte-Carlo Graph Search for AlphaZero
- Authors: Johannes Czech, Patrick Korus, Kristian Kersting
- Abstract summary: We introduce a new, improved search algorithm for AlphaZero that generalizes the search tree to a directed acyclic graph.
In our evaluations, we use the CrazyAra engine on chess and crazyhouse as examples to show that these changes bring significant improvements to AlphaZero.
- Score: 15.567057178736402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The AlphaZero algorithm has been successfully applied in a range of discrete
domains, most notably board games. It utilizes a neural network, that learns a
value and policy function to guide the exploration in a Monte-Carlo Tree
Search. Although many search improvements have been proposed for Monte-Carlo
Tree Search in the past, most of them refer to an older variant of the Upper
Confidence bounds for Trees algorithm that does not use a policy for planning.
We introduce a new, improved search algorithm for AlphaZero which generalizes
the search tree to a directed acyclic graph. This enables information flow
across different subtrees and greatly reduces memory consumption. Along with
Monte-Carlo Graph Search, we propose a number of further extensions, such as
the inclusion of Epsilon-greedy exploration, a revised terminal solver and the
integration of domain knowledge as constraints. In our evaluations, we use the
CrazyAra engine on chess and crazyhouse as examples to show that these changes
bring significant improvements to AlphaZero.
Related papers
- LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - AlphaZeroES: Direct score maximization outperforms planning loss minimization [61.17702187957206]
Planning at execution time has been shown to dramatically improve performance for agents in both single-agent and multi-agent settings.
A family of approaches to planning at execution time are AlphaZero and its variants, which use Monte Carlo Tree Search together with a neural network that guides the search by predicting state values and action probabilities.
We show that, across multiple environments, directly maximizing the episode score outperforms minimizing the planning loss.
arXiv Detail & Related papers (2024-06-12T23:00:59Z) - Monte-Carlo Tree Search for Multi-Agent Pathfinding: Preliminary Results [60.4817465598352]
We introduce an original variant of Monte-Carlo Tree Search (MCTS) tailored to multi-agent pathfinding.
Specifically, we use individual paths to assist the agents with the the goal-reaching behavior.
We also use a dedicated decomposition technique to reduce the branching factor of the tree search procedure.
arXiv Detail & Related papers (2023-07-25T12:33:53Z) - Nearly Optimal Steiner Trees using Graph Neural Network Assisted Monte
Carlo Tree Search [9.061356032792952]
We describe an approach for computing Steiner Trees by combining a graph neural network and Monte Carlo Tree Search.
We first train a graph neural network that takes as input a partial solution and proposes a new node to be added as output.
This neural network is then used in a Monte Carlo search to compute a Steiner tree.
arXiv Detail & Related papers (2023-04-30T17:15:38Z) - Beyond Games: A Systematic Review of Neural Monte Carlo Tree Search
Applications [0.0]
We review 129 articles detailing the application of neural Monte Carlo tree search methods in domains other than games.
Our goal is to systematically assess how such methods are structured in practice and if their success can be extended to other domains.
arXiv Detail & Related papers (2023-03-14T16:52:31Z) - A Metaheuristic Algorithm for Large Maximum Weight Independent Set
Problems [58.348679046591265]
Given a node-weighted graph, find a set of independent (mutually nonadjacent) nodes whose node-weight sum is maximum.
Some of the graphs airsing in this application are large, having hundreds of thousands of nodes and hundreds of millions of edges.
We develop a new local search algorithm, which is a metaheuristic in the greedy randomized adaptive search framework.
arXiv Detail & Related papers (2022-03-28T21:34:16Z) - Batch Monte Carlo Tree Search [9.114710429587479]
We build on this property to propose Monte Carlo Tree Search algorithms using batched inferences.
The transposition table contains the results of the inferences while the search tree contains the statistics of Monte Carlo Tree Search.
We also propose to analyze multiple algorithms that improve the search: the $mu$ FPU, the Virtual Mean, the Iteration and the Second Move Lasts.
arXiv Detail & Related papers (2021-04-09T09:54:21Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - Monte-Carlo Tree Search as Regularized Policy Optimization [47.541849128047865]
We show that AlphaZero's search algorithms are an approximation to the solution of a specific regularized policy optimization problem.
We propose a variant of AlphaZero which uses the exact solution to this policy optimization problem, and show experimentally that it reliably outperforms the original algorithm in multiple domains.
arXiv Detail & Related papers (2020-07-24T13:01:34Z) - Single-Agent Optimization Through Policy Iteration Using Monte-Carlo
Tree Search [8.22379888383833]
Combination of Monte-Carlo Tree Search (MCTS) and deep reinforcement learning is state-of-the-art in two-player perfect-information games.
We describe a search algorithm that uses a variant of MCTS which we enhanced by 1) a novel action value normalization mechanism for games with potentially unbounded rewards, 2) defining a virtual loss function that enables effective search parallelization, and 3) a policy network, trained by generations of self-play, to guide the search.
arXiv Detail & Related papers (2020-05-22T18:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.