The Second Type of Uncertainty in Monte Carlo Tree Search
- URL: http://arxiv.org/abs/2005.09645v1
- Date: Tue, 19 May 2020 09:10:51 GMT
- Title: The Second Type of Uncertainty in Monte Carlo Tree Search
- Authors: Thomas M Moerland, Joost Broekens, Aske Plaat, Catholijn M Jonker
- Abstract summary: Monte Carlo Tree Search (MCTS) efficiently balances exploration and exploitation in tree search based on count-derived uncertainty.
We first show how, due to the lack of this second uncertainty type, MCTS may completely fail in well-known sparse exploration problems.
We then introduce a new algorithm, which estimates the size of the subtree below an action, and leverages this information in the UCB formula to better direct exploration.
- Score: 2.564530030795554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monte Carlo Tree Search (MCTS) efficiently balances exploration and
exploitation in tree search based on count-derived uncertainty. However, these
local visit counts ignore a second type of uncertainty induced by the size of
the subtree below an action. We first show how, due to the lack of this second
uncertainty type, MCTS may completely fail in well-known sparse exploration
problems, known from the reinforcement learning community. We then introduce a
new algorithm, which estimates the size of the subtree below an action, and
leverages this information in the UCB formula to better direct exploration.
Subsequently, we generalize these ideas by showing that loops, i.e., the
repeated occurrence of (approximately) the same state in the same trace, are
actually a special case of subtree depth variation. Testing on a variety of
tasks shows that our algorithms increase sample efficiency, especially when the
planning budget per timestep is small.
Related papers
- ETS: Efficient Tree Search for Inference-Time Scaling [61.553681244572914]
One promising approach for test-time compute scaling is search against a process reward model.
diversity of trajectories in the tree search process affects the accuracy of the search, since increasing diversity promotes more exploration.
We propose Efficient Tree Search (ETS), which promotes KV sharing by pruning redundant trajectories while maintaining necessary diverse trajectories.
arXiv Detail & Related papers (2025-02-19T09:30:38Z) - Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls [83.89771461061903]
Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs)
Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs)
We identify two key challenges contributing to this inefficiency: $textitover-exploration$ due to redundant states with semantically equivalent content, and $textitunder-exploration$ caused by high variance in verifier scoring.
We propose FETCH, a flexible, plug-and-play system compatible with various tree search algorithms.
arXiv Detail & Related papers (2025-02-16T16:12:01Z) - Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization [18.25487451605638]
We derive a tree search algorithm based on policy optimization with state occupancy measure regularization, which we call it Volume-MCTS
We show that count-based exploration and sampling-based motion planning can be derived as approximate solutions to this state occupancy measure regularized objective.
We test our method on several robot navigation problems, and find that Volume-MCTS outperforms AlphaZero and displays significantly better long-horizon exploration properties.
arXiv Detail & Related papers (2024-07-07T22:58:52Z) - LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - Depth-Bounded Epistemic Planning [50.42592219248395]
We present a novel algorithm for planning based on dynamic epistemic logic.
The novelty is that we limit the depth of reasoning of the planning agent to an upper bound b.
We show it to be complete with respect to planning tasks having a solution within bound b of reasoning depth.
arXiv Detail & Related papers (2024-06-03T09:30:28Z) - Amplifying Exploration in Monte-Carlo Tree Search by Focusing on the
Unknown [19.664506834858244]
Monte-Carlo tree search (MCTS) strategically allocates computational resources to focus on promising segments of the search tree.
Our proposed methodology, denoted as AmEx-MCTS, solves this problem by introducing a novel MCTS formulation.
Our empirical evaluation demonstrates the superior performance of AmEx-MCTS, surpassing classical MCTS and related approaches by a substantial margin.
arXiv Detail & Related papers (2024-02-13T15:05:54Z) - Scale-Adaptive Balancing of Exploration and Exploitation in Classical Planning [1.6574413179773757]
We show that a more detailed theoretical understanding of MAB literature helps improve existing planning algorithms.
We propose GreedyUCT-Normal, a MCTS/THTS algorithm with UCB1-Normal bandit for agile classical planning.
arXiv Detail & Related papers (2023-05-16T22:46:37Z) - Branching Time Active Inference: the theory and its generality [3.1542695050861544]
We present an alternative framework that aims to unify tree search and active inference by casting planning as a structure learning problem.
The first propagates the expected free energy forward in time, while the second propagates it backward.
Then, we demonstrate that forward and backward propagations are related to active inference and sophisticated inference, respectively, thereby clarifying the differences between those two planning strategies.
arXiv Detail & Related papers (2021-11-22T10:56:03Z) - Probabilistic DAG Search [29.47649645431227]
We develop a probabilistic framework to exploit a search space's latent structure and share information across the search tree.
We empirically find our algorithm to compare favorably to existing non-probabilistic alternatives in Tic-Tac-Toe and a feature selection application.
arXiv Detail & Related papers (2021-06-16T11:35:19Z) - Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring.
Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains.
The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z) - Learning to Accelerate Heuristic Searching for Large-Scale Maximum
Weighted b-Matching Problems in Online Advertising [51.97494906131859]
Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc.
Existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource.
We propose textttNeuSearcher which leverages the knowledge learned from previously instances to solve new problem instances.
arXiv Detail & Related papers (2020-05-09T02:48:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.