Accelerating Monte-Carlo Tree Search with Optimized Posterior Policies
- URL: http://arxiv.org/abs/2601.01301v2
- Date: Fri, 09 Jan 2026 01:07:13 GMT
- Title: Accelerating Monte-Carlo Tree Search with Optimized Posterior Policies
- Authors: Keith Frankston, Benjamin Howard,
- Abstract summary: RMCTS is more than 40 times faster than MCTS-UCB when searching a single root state.<n>We find that RMCTS-trained networks match the quality of MCTS-UCB-trained networks in roughly one-third of the training time.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a recursive AlphaZero-style Monte--Carlo tree search algorithm, "RMCTS". The advantage of RMCTS over AlphaZero's MCTS-UCB is speed. In RMCTS, the search tree is explored in a breadth-first manner, so that network inferences naturally occur in large batches. This significantly reduces the GPU latency cost. We find that RMCTS is often more than 40 times faster than MCTS-UCB when searching a single root state, and about 3 times faster when searching a large batch of root states. The recursion in RMCTS is based on computing optimized posterior policies at each game state in the search tree, starting from the leaves and working back up to the root. Here we use the posterior policy explored in "Monte--Carlo tree search as regularized policy optimization" (Grill, et al.) Their posterior policy is the unique policy which maximizes the expected reward given estimated action rewards minus a penalty for diverging from the prior policy. The tree explored by RMCTS is not defined in an adaptive manner, as it is in MCTS-UCB. Instead, the RMCTS tree is defined by following prior network policies at each node. This is a disadvantage, but the speedup advantage is more significant, and in practice we find that RMCTS-trained networks match the quality of MCTS-UCB-trained networks in roughly one-third of the training time. We include timing and quality comparisons of RMCTS vs. MCTS-UCB for three games: Connect-4, Dots-and-Boxes, and Othello.
Related papers
- Variance-Aware Prior-Based Tree Policies for Monte Carlo Tree Search [0.0]
Monte Carlo Tree Search (MCTS) has profoundly influenced reinforcement learning (RL)<n>We introduce Inverse-RPO, a general methodology that systematically derives prior-based UCTs from any prior-free UCB.<n>Experiments indicate that these variance-aware prior-based UCTs outperform PUCT across multiple benchmarks without incurring additional computational cost.
arXiv Detail & Related papers (2025-12-25T12:25:26Z) - Parallelizing Tree Search with Twice Sequential Monte Carlo [7.863528049670872]
We present Twice Sequential Monte Carlo Tree Search (TSMCTS) as an alternative to the Monte Carlo Tree Search (MCTS) algorithm.<n>TSMCTS is easier to parallelize and more suitable to GPU acceleration.<n>We show that TSMCTS scales favorably with sequential compute while retaining the properties that make SMC natural to parallelize.
arXiv Detail & Related papers (2025-11-18T07:54:29Z) - Anytime Sequential Halving in Monte-Carlo Tree Search [1.3820916757781068]
This paper proposes an anytime version of the algorithm, which can be halted at any arbitrary time and still return a satisfactory result.
Empirical results in synthetic MAB problems and ten different board games demonstrate that the algorithm's performance is competitive with Sequential Halving and UCB1.
arXiv Detail & Related papers (2024-11-11T17:49:47Z) - ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search [50.45155830888697]
We develop a reinforced self-training approach, called ReST-MCTS*, based on integrating process reward guidance with tree search MCTS* for collecting higher-quality reasoning traces as well as per-step value to train policy and reward models.
We first show that the tree-search policy in ReST-MCTS* achieves higher accuracy compared with prior LLM reasoning baselines such as Best-of-N and Tree-of-Thought, within the same search budget.
arXiv Detail & Related papers (2024-06-06T07:40:00Z) - Amplifying Exploration in Monte-Carlo Tree Search by Focusing on the
Unknown [19.664506834858244]
Monte-Carlo tree search (MCTS) strategically allocates computational resources to focus on promising segments of the search tree.
Our proposed methodology, denoted as AmEx-MCTS, solves this problem by introducing a novel MCTS formulation.
Our empirical evaluation demonstrates the superior performance of AmEx-MCTS, surpassing classical MCTS and related approaches by a substantial margin.
arXiv Detail & Related papers (2024-02-13T15:05:54Z) - Monte-Carlo Tree Search for Multi-Agent Pathfinding: Preliminary Results [60.4817465598352]
We introduce an original variant of Monte-Carlo Tree Search (MCTS) tailored to multi-agent pathfinding.
Specifically, we use individual paths to assist the agents with the the goal-reaching behavior.
We also use a dedicated decomposition technique to reduce the branching factor of the tree search procedure.
arXiv Detail & Related papers (2023-07-25T12:33:53Z) - Spending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions [89.89612827542972]
This paper proposes a variant of Monte-Carlo tree search (MCTS) that spends more search time on harder states and less search time on simpler states adaptively.
We evaluate the performance and computations on $9 times 9$ Go board games and Atari games.
Experiments show that our method can achieve comparable performances to the original search algorithm while requiring less than $50%$ search time on average.
arXiv Detail & Related papers (2022-10-23T06:39:20Z) - Prioritized Architecture Sampling with Monto-Carlo Tree Search [54.72096546595955]
One-shot neural architecture search (NAS) methods significantly reduce the search cost by considering the whole search space as one network.
In this paper, we introduce a sampling strategy based on Monte Carlo tree search (MCTS) with the search space modeled as a Monte Carlo tree (MCT)
For a fair comparison, we construct an open-source NAS benchmark of a macro search space evaluated on CIFAR-10, namely NAS-Bench-Macro.
arXiv Detail & Related papers (2021-03-22T15:09:29Z) - Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search [66.34387649910046]
Monte Carlo tree search (MCTS) has achieved state-of-the-art results in many domains such as Go and Atari games.
We propose to achieve this goal by predicting the uncertainty of the current searching status and use the result to decide whether we should stop searching.
arXiv Detail & Related papers (2020-12-14T19:49:25Z) - On Effective Parallelization of Monte Carlo Tree Search [51.15940034629022]
Monte Carlo Tree Search (MCTS) is computationally expensive as it requires a substantial number of rollouts to construct the search tree.
How to design effective parallel MCTS algorithms has not been systematically studied and remains poorly understood.
We demonstrate how proposed necessary conditions can be adopted to design more effective parallel MCTS algorithms.
arXiv Detail & Related papers (2020-06-15T21:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.