Related papers: Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization

Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization

URL: http://arxiv.org/abs/2407.05511v1
Date: Sun, 7 Jul 2024 22:58:52 GMT
Title: Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization
Authors: Liam Schramm, Abdeslam Boularias,
Abstract summary: We derive a tree search algorithm based on policy optimization with state occupancy measure regularization, which we call it Volume-MCTS We show that count-based exploration and sampling-based motion planning can be derived as approximate solutions to this state occupancy measure regularized objective. We test our method on several robot navigation problems, and find that Volume-MCTS outperforms AlphaZero and displays significantly better long-horizon exploration properties.
Score: 18.25487451605638
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Monte Carlo tree search (MCTS) has been successful in a variety of domains, but faces challenges with long-horizon exploration when compared to sampling-based motion planning algorithms like Rapidly-Exploring Random Trees. To address these limitations of MCTS, we derive a tree search algorithm based on policy optimization with state occupancy measure regularization, which we call {\it Volume-MCTS}. We show that count-based exploration and sampling-based motion planning can be derived as approximate solutions to this state occupancy measure regularized objective. We test our method on several robot navigation problems, and find that Volume-MCTS outperforms AlphaZero and displays significantly better long-horizon exploration properties.

Related papers

Exploring Explainable Multi-player MCTS-minimax Hybrids in Board Game Using Process Mining [3.5042452314350716]
This paper presents our ongoing investigation into potential explanations for the decision-making and behavior of Monte-Carlo Tree Search (MCTS) A weakness of MCTS is that it constructs a highly selective tree and, as a result, can miss crucial moves and fall into tactical traps. We integrate shallow minimax search into the rollout phase of multi-player MCTS and use process mining technique to explain agents' strategies in 3v3 checkers.
arXiv Detail & Related papers (2025-03-30T05:48:53Z)
Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search [95.06503095273395]
o1-like reasoning approach is challenging, and researchers have been making various attempts to advance this open area of research. We present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms.
arXiv Detail & Related papers (2024-11-18T16:15:17Z)
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation [65.5353313491402]
We introduce RethinkMCTS, which employs the Monte Carlo Tree Search (MCTS) algorithm to conduct thought-level searches before generating code. We construct verbal feedback from fine-turbo code execution feedback to refine erroneous thoughts during the search. We demonstrate that RethinkMCTS outperforms previous search-based and feedback-based code generation baselines.
arXiv Detail & Related papers (2024-09-15T02:07:28Z)
LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget. Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z)
Monte Carlo Tree Search with Boltzmann Exploration [16.06815496704043]
We introduce Boltzmann Tree Search (BTS) and Decaying ENtropy Tree-Search (DENTS) Our algorithms show consistent high performance across several benchmark domains, including the game of Go.
arXiv Detail & Related papers (2024-04-11T13:25:35Z)
Amplifying Exploration in Monte-Carlo Tree Search by Focusing on the Unknown [19.664506834858244]
Monte-Carlo tree search (MCTS) strategically allocates computational resources to focus on promising segments of the search tree. Our proposed methodology, denoted as AmEx-MCTS, solves this problem by introducing a novel MCTS formulation. Our empirical evaluation demonstrates the superior performance of AmEx-MCTS, surpassing classical MCTS and related approaches by a substantial margin.
arXiv Detail & Related papers (2024-02-13T15:05:54Z)
Monte-Carlo Tree Search for Multi-Agent Pathfinding: Preliminary Results [60.4817465598352]
We introduce an original variant of Monte-Carlo Tree Search (MCTS) tailored to multi-agent pathfinding. Specifically, we use individual paths to assist the agents with the the goal-reaching behavior. We also use a dedicated decomposition technique to reduce the branching factor of the tree search procedure.
arXiv Detail & Related papers (2023-07-25T12:33:53Z)
Continuous Monte Carlo Graph Search [61.11769232283621]
Continuous Monte Carlo Graph Search ( CMCGS) is an extension of Monte Carlo Tree Search (MCTS) to online planning. CMCGS takes advantage of the insight that, during planning, sharing the same action policy between several states can yield high performance. It can be scaled up through parallelization, and it outperforms the Cross-Entropy Method (CEM) in continuous control with learned dynamics models.
arXiv Detail & Related papers (2022-10-04T07:34:06Z)
Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search [66.34387649910046]
Monte Carlo tree search (MCTS) has achieved state-of-the-art results in many domains such as Go and Atari games. We propose to achieve this goal by predicting the uncertainty of the current searching status and use the result to decide whether we should stop searching.
arXiv Detail & Related papers (2020-12-14T19:49:25Z)
Autonomous UAV Exploration of Dynamic Environments via Incremental Sampling and Probabilistic Roadmap [0.3867363075280543]
We propose a novel dynamic exploration planner (DEP) for exploring unknown environments using incremental sampling and Probabilistic Roadmap (PRM) Our method safely explores dynamic environments and outperforms the benchmark planners in terms of exploration time, path length, and computational time.
arXiv Detail & Related papers (2020-10-14T22:52:37Z)
Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning [12.024736761925864]
Long-horizon planning in realistic environments requires the ability to reason over sequential tasks in high-dimensional state spaces. We present Broadly-Exploring-Local-policy Trees (BELT), a task-conditioned, model-based tree search. BELT is demonstrated experimentally to be able to plan long-horizon, sequential with a goal conditioned policy and generate plans that are robust.
arXiv Detail & Related papers (2020-10-13T15:51:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.