Related papers: Plan-MCTS: Plan Exploration for Action Exploitation in Web Navigation

Plan-MCTS: Plan Exploration for Action Exploitation in Web Navigation

URL: http://arxiv.org/abs/2602.14083v1
Date: Sun, 15 Feb 2026 10:24:45 GMT
Title: Plan-MCTS: Plan Exploration for Action Exploitation in Web Navigation
Authors: Weiming Zhang, Jihong Wang, Jiamu Zhou, Qingyao Li, Xinbei Ma, Congmin Zheng, Xingyu Lou, Weiwen Liu, Zhuosheng Zhang, Jun Wang, Yong Yu, Weinan Zhang,
Abstract summary: Plan-MCTS is a framework that reformulates web navigation by shifting exploration to a semantic Plan Space.<n>Plan-MCTS achieves state-of-the-art performance, surpassing current approaches with higher task effectiveness and search efficiency.
Score: 50.406803870992974
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have empowered autonomous agents to handle complex web navigation tasks. While recent studies integrate tree search to enhance long-horizon reasoning, applying these algorithms in web navigation faces two critical challenges: sparse valid paths that lead to inefficient exploration, and a noisy context that dilutes accurate state perception. To address this, we introduce Plan-MCTS, a framework that reformulates web navigation by shifting exploration to a semantic Plan Space. By decoupling strategic planning from execution grounding, it transforms sparse action space into a Dense Plan Tree for efficient exploration, and distills noisy contexts into an Abstracted Semantic History for precise state awareness. To ensure efficiency and robustness, Plan-MCTS incorporates a Dual-Gating Reward to strictly validate both physical executability and strategic alignment and Structural Refinement for on-policy repair of failed subplans. Extensive experiments on WebArena demonstrate that Plan-MCTS achieves state-of-the-art performance, surpassing current approaches with higher task effectiveness and search efficiency.

Related papers

Branch-and-Browse: Efficient and Controllable Web Exploration with Tree-Structured Reasoning and Action Memory [69.49061918994882]
Branch-and-Browse is a fine-grained web agent framework that unifies structured reasoning-acting, contextual memory, and efficient execution.<n>On the WebArena benchmark, Branch-and-Browse achieves a task success rate of 35.8% and reduces execution time by up to 40.4% relative to state-of-the-art methods.
arXiv Detail & Related papers (2025-10-18T00:45:37Z)
DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping [74.34061104176554]
We propose DeepPlanner, an end-to-end RL framework that effectively enhances the planning capabilities of deep research agents.<n>Our approach shapes token-level advantage with an entropy-based term to allocate larger updates to high entropy tokens, and selectively upweights sample-level advantages for planning-intensive rollouts.
arXiv Detail & Related papers (2025-10-14T20:47:05Z)
HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking [109.09735490692202]
We propose HyperTree Planning (HTP), a novel reasoning paradigm that constructs hypertree-structured planning outlines for effective planning.<n> Experiments demonstrate the effectiveness of HTP, achieving state-of-the-art accuracy on the TravelPlanner benchmark with Gemini-1.5-Pro, resulting in a 3.6 times performance improvement over o1-preview.
arXiv Detail & Related papers (2025-05-05T02:38:58Z)
AI2STOW: End-to-End Deep Reinforcement Learning to Construct Master Stowage Plans under Demand Uncertainty [0.0]
This article proposes AI2STOW, an end-to-end deep reinforcement learning model with feasibility projection and an action mask to create master plans under demand uncertainty.<n>Our experimental results demonstrate that AI2STOW outperforms baseline methods from reinforcement learning and programming in objective performance and computational efficiency.
arXiv Detail & Related papers (2025-04-06T12:45:25Z)
Adaptive Interactive Navigation of Quadruped Robots using Large Language Models [14.14967096139099]
We present a primitive tree for task planning with large language models (LLMs)<n>We adopt reinforcement learning to pre-train a comprehensive skill library containing versatile locomotion and interaction behaviors for motion planning.<n> integrated with the tree structure, the replanning mechanism allows for convenient node addition and pruning.
arXiv Detail & Related papers (2025-03-29T02:17:52Z)
SCoTT: Strategic Chain-of-Thought Tasking for Wireless-Aware Robot Navigation in Digital Twins [78.53885607559958]
We propose SCoTT, a wireless-aware path planning framework.<n>We show that SCoTT achieves path gains within 2% of DP-WA* while consistently generating shorter trajectories.<n>We also show the practical viability of our approach by deploying SCoTT as a ROS node within Gazebo simulations.
arXiv Detail & Related papers (2024-11-27T10:45:49Z)
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration [42.8636989730348]
Existing LLM-based web agents rely on rigid, expert-designed policies specific to certain states and actions. Humans excel by exploring unknowns, continuously adapting strategies, and resolving ambiguities through exploration. We develop WebPilot, a multi-agent system with a dual optimization strategy that improves Monte Carlo Tree Search (MCTS) to better handle complex web environments.
arXiv Detail & Related papers (2024-08-28T17:49:29Z)
Diffusion-Reinforcement Learning Hierarchical Motion Planning in Multi-agent Adversarial Games [6.532258098619471]
We propose a hierarchical architecture that integrates a high-level diffusion model to plan global paths responsive to environment data.<n>We show that our approach outperforms baselines by 77.18% and 47.38% on detection and goal reaching rate.
arXiv Detail & Related papers (2024-03-16T03:53:55Z)
Path Planning based on 2D Object Bounding-box [8.082514573754954]
We present a path planning method that utilizes 2D bounding boxes of objects, developed through imitation learning in urban driving scenarios. This is achieved by integrating high-definition (HD) map data with images captured by surrounding cameras. We evaluate our model on the nuPlan planning task and observed that it performs competitively in comparison to existing vision-centric methods.
arXiv Detail & Related papers (2024-02-22T19:34:56Z)
Structurally guided task decomposition in spatial navigation tasks [7.21356271882087]
We extend an existing model of human task decomposition to explain a wide range of simple planning problems. Our results suggest that our framework can correctly predict the navigation strategies of the majority of the participants in an online experiment.
arXiv Detail & Related papers (2023-10-03T17:27:30Z)
Latent Space Roadmap for Visual Action Planning of Deformable and Rigid Object Manipulation [74.88956115580388]
Planning is performed in a low-dimensional latent state space that embeds images. Our framework consists of two main components: a Visual Foresight Module (VFM) that generates a visual plan as a sequence of images, and an Action Proposal Network (APN) that predicts the actions between them.
arXiv Detail & Related papers (2020-03-19T18:43:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.