Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization
- URL: http://arxiv.org/abs/2407.05511v1
- Date: Sun, 7 Jul 2024 22:58:52 GMT
- Title: Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization
- Authors: Liam Schramm, Abdeslam Boularias,
- Abstract summary: We derive a tree search algorithm based on policy optimization with state occupancy measure regularization, which we call it Volume-MCTS
We show that count-based exploration and sampling-based motion planning can be derived as approximate solutions to this state occupancy measure regularized objective.
We test our method on several robot navigation problems, and find that Volume-MCTS outperforms AlphaZero and displays significantly better long-horizon exploration properties.
- Score: 18.25487451605638
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monte Carlo tree search (MCTS) has been successful in a variety of domains, but faces challenges with long-horizon exploration when compared to sampling-based motion planning algorithms like Rapidly-Exploring Random Trees. To address these limitations of MCTS, we derive a tree search algorithm based on policy optimization with state occupancy measure regularization, which we call {\it Volume-MCTS}. We show that count-based exploration and sampling-based motion planning can be derived as approximate solutions to this state occupancy measure regularized objective. We test our method on several robot navigation problems, and find that Volume-MCTS outperforms AlphaZero and displays significantly better long-horizon exploration properties.
Related papers
- RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation [65.5353313491402]
We introduce RethinkMCTS, which employs the Monte Carlo Tree Search (MCTS) algorithm to conduct thought-level searches before generating code.
We construct verbal feedback from fine-turbo code execution feedback to refine erroneous thoughts during the search.
We demonstrate that RethinkMCTS outperforms previous search-based and feedback-based code generation baselines.
arXiv Detail & Related papers (2024-09-15T02:07:28Z) - LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - Monte Carlo Tree Search with Boltzmann Exploration [16.06815496704043]
We introduce Boltzmann Tree Search (BTS) and Decaying ENtropy Tree-Search (DENTS)
Our algorithms show consistent high performance across several benchmark domains, including the game of Go.
arXiv Detail & Related papers (2024-04-11T13:25:35Z) - Amplifying Exploration in Monte-Carlo Tree Search by Focusing on the
Unknown [19.664506834858244]
Monte-Carlo tree search (MCTS) strategically allocates computational resources to focus on promising segments of the search tree.
Our proposed methodology, denoted as AmEx-MCTS, solves this problem by introducing a novel MCTS formulation.
Our empirical evaluation demonstrates the superior performance of AmEx-MCTS, surpassing classical MCTS and related approaches by a substantial margin.
arXiv Detail & Related papers (2024-02-13T15:05:54Z) - Autonomous Tree-search Ability of Large Language Models [58.68735916408101]
Large Language Models have excelled in remarkable reasoning capabilities with advanced prompting techniques.
Recent works propose to utilize external programs to define search logic, such that LLMs can perform passive tree search to solve more challenging reasoning tasks.
We propose a new concept called autonomous tree-search ability of LLM, which can automatically generate a response containing search trajectories for the correct answer.
arXiv Detail & Related papers (2023-10-14T14:14:38Z) - Monte-Carlo Tree Search for Multi-Agent Pathfinding: Preliminary Results [60.4817465598352]
We introduce an original variant of Monte-Carlo Tree Search (MCTS) tailored to multi-agent pathfinding.
Specifically, we use individual paths to assist the agents with the the goal-reaching behavior.
We also use a dedicated decomposition technique to reduce the branching factor of the tree search procedure.
arXiv Detail & Related papers (2023-07-25T12:33:53Z) - Continuous Monte Carlo Graph Search [61.11769232283621]
Continuous Monte Carlo Graph Search ( CMCGS) is an extension of Monte Carlo Tree Search (MCTS) to online planning.
CMCGS takes advantage of the insight that, during planning, sharing the same action policy between several states can yield high performance.
It can be scaled up through parallelization, and it outperforms the Cross-Entropy Method (CEM) in continuous control with learned dynamics models.
arXiv Detail & Related papers (2022-10-04T07:34:06Z) - Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search [66.34387649910046]
Monte Carlo tree search (MCTS) has achieved state-of-the-art results in many domains such as Go and Atari games.
We propose to achieve this goal by predicting the uncertainty of the current searching status and use the result to decide whether we should stop searching.
arXiv Detail & Related papers (2020-12-14T19:49:25Z) - Autonomous UAV Exploration of Dynamic Environments via Incremental
Sampling and Probabilistic Roadmap [0.3867363075280543]
We propose a novel dynamic exploration planner (DEP) for exploring unknown environments using incremental sampling and Probabilistic Roadmap (PRM)
Our method safely explores dynamic environments and outperforms the benchmark planners in terms of exploration time, path length, and computational time.
arXiv Detail & Related papers (2020-10-14T22:52:37Z) - Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning [12.024736761925864]
Long-horizon planning in realistic environments requires the ability to reason over sequential tasks in high-dimensional state spaces.
We present Broadly-Exploring-Local-policy Trees (BELT), a task-conditioned, model-based tree search.
BELT is demonstrated experimentally to be able to plan long-horizon, sequential with a goal conditioned policy and generate plans that are robust.
arXiv Detail & Related papers (2020-10-13T15:51:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.