Related papers: Anytime Sequential Halving in Monte-Carlo Tree Search

Anytime Sequential Halving in Monte-Carlo Tree Search

URL: http://arxiv.org/abs/2411.07171v1
Date: Mon, 11 Nov 2024 17:49:47 GMT
Title: Anytime Sequential Halving in Monte-Carlo Tree Search
Authors: Dominic Sagers, Mark H. M. Winands, Dennis J. N. J. Soemers,
Abstract summary: This paper proposes an anytime version of the algorithm, which can be halted at any arbitrary time and still return a satisfactory result. Empirical results in synthetic MAB problems and ten different board games demonstrate that the algorithm's performance is competitive with Sequential Halving and UCB1.
Score: 1.3820916757781068
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Monte-Carlo Tree Search (MCTS) typically uses multi-armed bandit (MAB) strategies designed to minimize cumulative regret, such as UCB1, as its selection strategy. However, in the root node of the search tree, it is more sensible to minimize simple regret. Previous work has proposed using Sequential Halving as selection strategy in the root node, as, in theory, it performs better with respect to simple regret. However, Sequential Halving requires a budget of iterations to be predetermined, which is often impractical. This paper proposes an anytime version of the algorithm, which can be halted at any arbitrary time and still return a satisfactory result, while being designed such that it approximates the behavior of Sequential Halving. Empirical results in synthetic MAB problems and ten different board games demonstrate that the algorithm's performance is competitive with Sequential Halving and UCB1 (and their analogues in MCTS).

Related papers

LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget. Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z)
Accelerating Monte Carlo Tree Search with Probability Tree State Abstraction [11.49169644917995]
We propose a novel probability tree state abstraction (PTSA) algorithm to improve the search efficiency of Monte Carlo Tree Search (MCTS) A general tree state abstraction with path transitivity is defined. In addition, the probability tree state abstraction is proposed for fewer mistakes during the aggregation step. Experimental results on different tasks demonstrate that our method can accelerate the training process of state-of-the-art algorithms with 10%-45% search space reduction.
arXiv Detail & Related papers (2023-10-10T10:55:12Z)
Monte-Carlo Tree Search for Multi-Agent Pathfinding: Preliminary Results [60.4817465598352]
We introduce an original variant of Monte-Carlo Tree Search (MCTS) tailored to multi-agent pathfinding. Specifically, we use individual paths to assist the agents with the the goal-reaching behavior. We also use a dedicated decomposition technique to reduce the branching factor of the tree search procedure.
arXiv Detail & Related papers (2023-07-25T12:33:53Z)
Bayesian Decision Trees Inspired from Evolutionary Algorithms [64.80360020499555]
We propose a replacement of the Markov Chain Monte Carlo (MCMC) with an inherently parallel algorithm, the Sequential Monte Carlo (SMC) Experiments show that SMC combined with the Evolutionary Algorithms (EA) can produce more accurate results compared to MCMC in 100 times fewer iterations.
arXiv Detail & Related papers (2023-05-30T06:17:35Z)
Scale-Adaptive Balancing of Exploration and Exploitation in Classical Planning [1.6574413179773757]
We show that a more detailed theoretical understanding of MAB literature helps improve existing planning algorithms. We propose GreedyUCT-Normal, a MCTS/THTS algorithm with UCB1-Normal bandit for agile classical planning.
arXiv Detail & Related papers (2023-05-16T22:46:37Z)
Towards Correlated Sequential Rules [4.743965372344134]
High-utility sequential rule mining (HUSRM) is designed to explore the confidence or probability of predicting the occurrence of consequence sequential patterns. The existing algorithm, known as HUSRM, is limited to extracting all eligible rules while neglecting the correlation between the generated sequential rules. We propose a novel algorithm called correlated high-utility sequential rule miner (CoUSR) to integrate the concept of correlation into HUSRM.
arXiv Detail & Related papers (2022-10-27T17:27:23Z)
TaSPM: Targeted Sequential Pattern Mining [53.234101208024335]
We propose a generic framework namely TaSPM, based on the fast CM-SPAM algorithm. We also propose several pruning strategies to reduce meaningless operations in mining processes. Experiments show that the novel targeted mining algorithm TaSPM can achieve faster running time and less memory consumption.
arXiv Detail & Related papers (2022-02-26T17:49:47Z)
Reconstructing Sparse Signals via Greedy Monte-Carlo Search [6.660458629649825]
We propose a Monte-Carlo-based method for reconstructing sparse signals in a high-dimensional setting. The greedy Monte-Carlo search algorithm is called the greedy Monte-Carlo (GMC) search algorithm.
arXiv Detail & Related papers (2020-08-07T13:36:57Z)
On Effective Parallelization of Monte Carlo Tree Search [51.15940034629022]
Monte Carlo Tree Search (MCTS) is computationally expensive as it requires a substantial number of rollouts to construct the search tree. How to design effective parallel MCTS algorithms has not been systematically studied and remains poorly understood. We demonstrate how proposed necessary conditions can be adopted to design more effective parallel MCTS algorithms.
arXiv Detail & Related papers (2020-06-15T21:36:00Z)
Stochastic Regret Minimization in Extensive-Form Games [109.43344748069933]
Monte-Carlo counterfactual regret minimization (MCCFR) is the state-of-the-art algorithm for solving sequential games that are too large for full trees. We develop a new framework for developing regret minimization methods. We show extensive experiments on three games, where some variants of our methods outperform MCCFR.
arXiv Detail & Related papers (2020-02-19T23:05:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.