The Second Type of Uncertainty in Monte Carlo Tree Search
- URL: http://arxiv.org/abs/2005.09645v1
- Date: Tue, 19 May 2020 09:10:51 GMT
- Title: The Second Type of Uncertainty in Monte Carlo Tree Search
- Authors: Thomas M Moerland, Joost Broekens, Aske Plaat, Catholijn M Jonker
- Abstract summary: Monte Carlo Tree Search (MCTS) efficiently balances exploration and exploitation in tree search based on count-derived uncertainty.
We first show how, due to the lack of this second uncertainty type, MCTS may completely fail in well-known sparse exploration problems.
We then introduce a new algorithm, which estimates the size of the subtree below an action, and leverages this information in the UCB formula to better direct exploration.
- Score: 2.564530030795554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monte Carlo Tree Search (MCTS) efficiently balances exploration and
exploitation in tree search based on count-derived uncertainty. However, these
local visit counts ignore a second type of uncertainty induced by the size of
the subtree below an action. We first show how, due to the lack of this second
uncertainty type, MCTS may completely fail in well-known sparse exploration
problems, known from the reinforcement learning community. We then introduce a
new algorithm, which estimates the size of the subtree below an action, and
leverages this information in the UCB formula to better direct exploration.
Subsequently, we generalize these ideas by showing that loops, i.e., the
repeated occurrence of (approximately) the same state in the same trace, are
actually a special case of subtree depth variation. Testing on a variety of
tasks shows that our algorithms increase sample efficiency, especially when the
planning budget per timestep is small.
Related papers
- Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization [18.25487451605638]
We derive a tree search algorithm based on policy optimization with state occupancy measure regularization, which we call it Volume-MCTS
We show that count-based exploration and sampling-based motion planning can be derived as approximate solutions to this state occupancy measure regularized objective.
We test our method on several robot navigation problems, and find that Volume-MCTS outperforms AlphaZero and displays significantly better long-horizon exploration properties.
arXiv Detail & Related papers (2024-07-07T22:58:52Z) - LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - Depth-Bounded Epistemic Planning [50.42592219248395]
We present a novel algorithm for planning based on dynamic epistemic logic.
The novelty is that we limit the depth of reasoning of the planning agent to an upper bound b.
We show it to be complete with respect to planning tasks having a solution within bound b of reasoning depth.
arXiv Detail & Related papers (2024-06-03T09:30:28Z) - Amplifying Exploration in Monte-Carlo Tree Search by Focusing on the
Unknown [19.664506834858244]
Monte-Carlo tree search (MCTS) strategically allocates computational resources to focus on promising segments of the search tree.
Our proposed methodology, denoted as AmEx-MCTS, solves this problem by introducing a novel MCTS formulation.
Our empirical evaluation demonstrates the superior performance of AmEx-MCTS, surpassing classical MCTS and related approaches by a substantial margin.
arXiv Detail & Related papers (2024-02-13T15:05:54Z) - Scale-Adaptive Balancing of Exploration and Exploitation in Classical Planning [1.6574413179773757]
We show that a more detailed theoretical understanding of MAB literature helps improve existing planning algorithms.
We propose GreedyUCT-Normal, a MCTS/THTS algorithm with UCB1-Normal bandit for agile classical planning.
arXiv Detail & Related papers (2023-05-16T22:46:37Z) - Branching Time Active Inference: the theory and its generality [3.1542695050861544]
We present an alternative framework that aims to unify tree search and active inference by casting planning as a structure learning problem.
The first propagates the expected free energy forward in time, while the second propagates it backward.
Then, we demonstrate that forward and backward propagations are related to active inference and sophisticated inference, respectively, thereby clarifying the differences between those two planning strategies.
arXiv Detail & Related papers (2021-11-22T10:56:03Z) - Complex Event Forecasting with Prediction Suffix Trees: Extended
Technical Report [70.7321040534471]
Complex Event Recognition (CER) systems have become popular in the past two decades due to their ability to "instantly" detect patterns on real-time streams of events.
There is a lack of methods for forecasting when a pattern might occur before such an occurrence is actually detected by a CER engine.
We present a formal framework that attempts to address the issue of Complex Event Forecasting.
arXiv Detail & Related papers (2021-09-01T09:52:31Z) - Probabilistic DAG Search [29.47649645431227]
We develop a probabilistic framework to exploit a search space's latent structure and share information across the search tree.
We empirically find our algorithm to compare favorably to existing non-probabilistic alternatives in Tic-Tac-Toe and a feature selection application.
arXiv Detail & Related papers (2021-06-16T11:35:19Z) - Robustifying Algorithms of Learning Latent Trees with Vector Variables [92.18777020401484]
We present the sample complexities of Recursive Grouping (RG) and Chow-Liu Recursive Grouping (CLRG)
We robustify RG, CLRG, Neighbor Joining (NJ) and Spectral NJ (SNJ) by using the truncated inner product.
We derive the first known instance-dependent impossibility result for structure learning of latent trees.
arXiv Detail & Related papers (2021-06-02T01:37:52Z) - Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring.
Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains.
The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z) - Learning to Accelerate Heuristic Searching for Large-Scale Maximum
Weighted b-Matching Problems in Online Advertising [51.97494906131859]
Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc.
Existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource.
We propose textttNeuSearcher which leverages the knowledge learned from previously instances to solve new problem instances.
arXiv Detail & Related papers (2020-05-09T02:48:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.