Related papers: WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration

WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration

URL: http://arxiv.org/abs/2408.15978v1
Date: Wed, 28 Aug 2024 17:49:29 GMT
Title: WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration
Authors: Yao Zhang, Zijian Ma, Yunpu Ma, Zhen Han, Yu Wu, Volker Tresp,
Abstract summary: Existing LLM-based web agents rely on rigid, expert-designed policies specific to certain states and actions. Humans excel by exploring unknowns, continuously adapting strategies, and resolving ambiguities through exploration. We develop WebPilot, a multi-agent system with a dual optimization strategy that improves Monte Carlo Tree Search (MCTS) to better handle complex web environments.
Score: 42.8636989730348
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: LLM-based autonomous agents often fail to execute complex web tasks that require dynamic interaction due to the inherent uncertainty and complexity of these environments. Existing LLM-based web agents typically rely on rigid, expert-designed policies specific to certain states and actions, which lack the flexibility and generalizability needed to adapt to unseen tasks. In contrast, humans excel by exploring unknowns, continuously adapting strategies, and resolving ambiguities through exploration. To emulate human-like adaptability, web agents need strategic exploration and complex decision-making. Monte Carlo Tree Search (MCTS) is well-suited for this, but classical MCTS struggles with vast action spaces, unpredictable state transitions, and incomplete information in web tasks. In light of this, we develop WebPilot, a multi-agent system with a dual optimization strategy that improves MCTS to better handle complex web environments. Specifically, the Global Optimization phase involves generating a high-level plan by breaking down tasks into manageable subtasks and continuously refining this plan, thereby focusing the search process and mitigating the challenges posed by vast action spaces in classical MCTS. Subsequently, the Local Optimization phase executes each subtask using a tailored MCTS designed for complex environments, effectively addressing uncertainties and managing incomplete information. Experimental results on WebArena and MiniWoB++ demonstrate the effectiveness of WebPilot. Notably, on WebArena, WebPilot achieves SOTA performance with GPT-4, achieving a 93% relative increase in success rate over the concurrent tree search-based method. WebPilot marks a significant advancement in general autonomous agent capabilities, paving the way for more advanced and reliable decision-making in practical environments.

Related papers

Plan-MCTS: Plan Exploration for Action Exploitation in Web Navigation [50.406803870992974]
Plan-MCTS is a framework that reformulates web navigation by shifting exploration to a semantic Plan Space.<n>Plan-MCTS achieves state-of-the-art performance, surpassing current approaches with higher task effectiveness and search efficiency.
arXiv Detail & Related papers (2026-02-15T10:24:45Z)
OpAgent: Operator Agent for Web Navigation [23.928869500029432]
We develop an online interaction environment and fine-tune the Vision-Language Model (VLM) using a specialized RL pipeline.<n>We introduce a Hybrid Reward Mechanism that combines a ground-truth-agnostic WebJudge for holistic outcome assessment and a Rule-based Decision Tree (RDT) for progress reward.<n> Notably, our RL-enhanced model achieves a 38.1% success rate (pass@5) on WebArena, outperforming all existing monolithic baselines.
arXiv Detail & Related papers (2026-02-14T02:33:55Z)
Grounded Test-Time Adaptation for LLM Agents [75.62784644919803]
Large language model (LLM)-based agents struggle to generalize to novel and complex environments.<n>We propose two strategies for adapting LLM agents by leveraging environment-specific information available during deployment.
arXiv Detail & Related papers (2025-11-06T22:24:35Z)
WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis [34.998277998052444]
We propose WebSynthesis, a novel framework for trajectory synthesis and training.<n>We show that an agent trained using WebSynthesis on a small-scale synthetic dataset achieves performance comparable to or even surpassing that of models trained on large-scale real-world data.
arXiv Detail & Related papers (2025-07-06T12:31:10Z)
WebSailor: Navigating Super-human Reasoning for Web Agent [72.5231321118689]
WebSailor is a complete post-training methodology designed to instill this crucial capability.<n>Our approach involves generating novel, high-uncertainty tasks through structured sampling and information obfuscation.<n>WebSailor significantly outperforms all opensource agents in complex information-seeking tasks.
arXiv Detail & Related papers (2025-07-03T12:59:07Z)
Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning [11.697279328699489]
We propose a Multimodal Fused Learning framework to solve the Generalized Traveling Salesman Problem (GTSP)<n>We first introduce a coordinate-based image builder that transforms GTSP instances into spatially informative representations.<n>We then design an adaptive resolution scaling strategy to enhance adaptability across different problem scales, and develop a multimodal fusion module.
arXiv Detail & Related papers (2025-06-20T11:51:52Z)
Enhancing LLM-Based Agents via Global Planning and Hierarchical Execution [18.68431625184045]
GoalAct is a novel agent framework that introduces a continuously updated global planning mechanism and integrates a hierarchical execution strategy. GoalAct decomposes task execution into high-level skills, including searching, coding, writing and more. We evaluate GoalAct on LegalAgentBench, a benchmark with multiple types of legal tasks that require the use of multiple types of tools.
arXiv Detail & Related papers (2025-04-23T09:43:40Z)
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents [23.1522773245956]
We introduce a novel paradigm that augments language agents with model-based planning. Our method, WebDreamer, builds on the key insight that LLMs inherently encode comprehensive knowledge about website structures and functionalities.
arXiv Detail & Related papers (2024-11-10T18:50:51Z)
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space. AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z)
Agent-Oriented Planning in Multi-Agent Systems [54.429028104022066]
We propose AOP, a novel framework for agent-oriented planning in multi-agent systems. In this study, we identify three critical design principles of agent-oriented planning, including solvability, completeness, and non-redundancy. Extensive experiments demonstrate the advancement of AOP in solving real-world problems compared to both single-agent systems and existing planning strategies for multi-agent systems.
arXiv Detail & Related papers (2024-10-03T04:07:51Z)
Spatial Reasoning and Planning for Deep Embodied Agents [2.7195102129095003]
This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks. It focuses on enhancing learning efficiency, interpretability, and transferability across novel scenarios.
arXiv Detail & Related papers (2024-09-28T23:05:56Z)
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents [44.34340798542]
Large Language Models (LLMs) have shown remarkable capabilities in natural language tasks requiring complex reasoning. Traditional supervised pre-training on static datasets falls short in enabling autonomous agent capabilities. We propose a framework that combines guided Monte Carlo Tree Search (MCTS) search with a self-critique mechanism and iterative fine-tuning on agent interactions.
arXiv Detail & Related papers (2024-08-13T20:52:13Z)
Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning [61.294110816231886]
We introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP) SDP selectively activates experts and skills, enabling efficient and task-specific learning without retraining the entire model. Demos and codes can be found in https://forrest-110.io/sparse_diffusion_policy/.
arXiv Detail & Related papers (2024-07-01T17:59:56Z)
Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm. HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies. HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z)
On the Multi-turn Instruction Following for Conversational Web Agents [83.51251174629084]
We introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment. We propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques.
arXiv Detail & Related papers (2024-02-23T02:18:12Z)
TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation [45.028795422801764]
We propose a multi-agent framework based on dynamic Task Decomposition and Agent Generation (TDAG) This framework dynamically decomposes complex tasks into smaller subtasks and assigns each to a specifically generated subagent. ItineraryBench is designed to assess agents' abilities in memory, planning, and tool usage across tasks of varying complexity.
arXiv Detail & Related papers (2024-02-15T18:27:37Z)
WebArena: A Realistic Web Environment for Building Autonomous Agents [92.3291458543633]
We build an environment for language-guided agents that is highly realistic and reproducible. We focus on agents that perform tasks on the web, and create an environment with fully functional websites from four common domains. We release a set of benchmark tasks focusing on evaluating the functional correctness of task completions.
arXiv Detail & Related papers (2023-07-25T22:59:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.