Related papers: The Second Type of Uncertainty in Monte Carlo Tree Search

The Second Type of Uncertainty in Monte Carlo Tree Search

URL: http://arxiv.org/abs/2005.09645v1
Date: Tue, 19 May 2020 09:10:51 GMT
Title: The Second Type of Uncertainty in Monte Carlo Tree Search
Authors: Thomas M Moerland, Joost Broekens, Aske Plaat, Catholijn M Jonker
Abstract summary: Monte Carlo Tree Search (MCTS) efficiently balances exploration and exploitation in tree search based on count-derived uncertainty. We first show how, due to the lack of this second uncertainty type, MCTS may completely fail in well-known sparse exploration problems. We then introduce a new algorithm, which estimates the size of the subtree below an action, and leverages this information in the UCB formula to better direct exploration.
Score: 2.564530030795554
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Monte Carlo Tree Search (MCTS) efficiently balances exploration and exploitation in tree search based on count-derived uncertainty. However, these local visit counts ignore a second type of uncertainty induced by the size of the subtree below an action. We first show how, due to the lack of this second uncertainty type, MCTS may completely fail in well-known sparse exploration problems, known from the reinforcement learning community. We then introduce a new algorithm, which estimates the size of the subtree below an action, and leverages this information in the UCB formula to better direct exploration. Subsequently, we generalize these ideas by showing that loops, i.e., the repeated occurrence of (approximately) the same state in the same trace, are actually a special case of subtree depth variation. Testing on a variety of tasks shows that our algorithms increase sample efficiency, especially when the planning budget per timestep is small.

Related papers

ETS: Efficient Tree Search for Inference-Time Scaling [61.553681244572914]
One promising approach for test-time compute scaling is search against a process reward model. diversity of trajectories in the tree search process affects the accuracy of the search, since increasing diversity promotes more exploration. We propose Efficient Tree Search (ETS), which promotes KV sharing by pruning redundant trajectories while maintaining necessary diverse trajectories.
arXiv Detail & Related papers (2025-02-19T09:30:38Z)
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls [83.89771461061903]
Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs) Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs) We identify two key challenges contributing to this inefficiency: $textitover-exploration$ due to redundant states with semantically equivalent content, and $textitunder-exploration$ caused by high variance in verifier scoring. We propose FETCH, a flexible, plug-and-play system compatible with various tree search algorithms.
arXiv Detail & Related papers (2025-02-16T16:12:01Z)
Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization [18.25487451605638]
We derive a tree search algorithm based on policy optimization with state occupancy measure regularization, which we call it Volume-MCTS We show that count-based exploration and sampling-based motion planning can be derived as approximate solutions to this state occupancy measure regularized objective. We test our method on several robot navigation problems, and find that Volume-MCTS outperforms AlphaZero and displays significantly better long-horizon exploration properties.
arXiv Detail & Related papers (2024-07-07T22:58:52Z)
LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget. Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z)
Depth-Bounded Epistemic Planning [50.42592219248395]
We present a novel algorithm for planning based on dynamic epistemic logic. The novelty is that we limit the depth of reasoning of the planning agent to an upper bound b. We show it to be complete with respect to planning tasks having a solution within bound b of reasoning depth.
arXiv Detail & Related papers (2024-06-03T09:30:28Z)
Amplifying Exploration in Monte-Carlo Tree Search by Focusing on the Unknown [19.664506834858244]
Monte-Carlo tree search (MCTS) strategically allocates computational resources to focus on promising segments of the search tree. Our proposed methodology, denoted as AmEx-MCTS, solves this problem by introducing a novel MCTS formulation. Our empirical evaluation demonstrates the superior performance of AmEx-MCTS, surpassing classical MCTS and related approaches by a substantial margin.
arXiv Detail & Related papers (2024-02-13T15:05:54Z)
Scale-Adaptive Balancing of Exploration and Exploitation in Classical Planning [1.6574413179773757]
We show that a more detailed theoretical understanding of MAB literature helps improve existing planning algorithms. We propose GreedyUCT-Normal, a MCTS/THTS algorithm with UCB1-Normal bandit for agile classical planning.
arXiv Detail & Related papers (2023-05-16T22:46:37Z)
Branching Time Active Inference: the theory and its generality [3.1542695050861544]
We present an alternative framework that aims to unify tree search and active inference by casting planning as a structure learning problem. The first propagates the expected free energy forward in time, while the second propagates it backward. Then, we demonstrate that forward and backward propagations are related to active inference and sophisticated inference, respectively, thereby clarifying the differences between those two planning strategies.
arXiv Detail & Related papers (2021-11-22T10:56:03Z)
Complex Event Forecasting with Prediction Suffix Trees: Extended Technical Report [70.7321040534471]
Complex Event Recognition (CER) systems have become popular in the past two decades due to their ability to "instantly" detect patterns on real-time streams of events. There is a lack of methods for forecasting when a pattern might occur before such an occurrence is actually detected by a CER engine. We present a formal framework that attempts to address the issue of Complex Event Forecasting.
arXiv Detail & Related papers (2021-09-01T09:52:31Z)
Probabilistic DAG Search [29.47649645431227]
We develop a probabilistic framework to exploit a search space's latent structure and share information across the search tree. We empirically find our algorithm to compare favorably to existing non-probabilistic alternatives in Tic-Tac-Toe and a feature selection application.
arXiv Detail & Related papers (2021-06-16T11:35:19Z)
Robustifying Algorithms of Learning Latent Trees with Vector Variables [92.18777020401484]
We present the sample complexities of Recursive Grouping (RG) and Chow-Liu Recursive Grouping (CLRG) We robustify RG, CLRG, Neighbor Joining (NJ) and Spectral NJ (SNJ) by using the truncated inner product. We derive the first known instance-dependent impossibility result for structure learning of latent trees.
arXiv Detail & Related papers (2021-06-02T01:37:52Z)
Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring. Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains. The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z)
Learning to Accelerate Heuristic Searching for Large-Scale Maximum Weighted b-Matching Problems in Online Advertising [51.97494906131859]
Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc. Existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource. We propose textttNeuSearcher which leverages the knowledge learned from previously instances to solve new problem instances.
arXiv Detail & Related papers (2020-05-09T02:48:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.