Related papers: WebDART: Dynamic Decomposition and Re-planning for Complex Web Tasks

WebDART: Dynamic Decomposition and Re-planning for Complex Web Tasks

URL: http://arxiv.org/abs/2510.06587v1
Date: Wed, 08 Oct 2025 02:34:59 GMT
Title: WebDART: Dynamic Decomposition and Re-planning for Complex Web Tasks
Authors: Jingbo Yang, Bairu Hou, Wei Wei, Shiyu Chang, Yujia Bao,
Abstract summary: Large language model (LLM) agents are becoming competent at straightforward web tasks, but struggle with objectives that require long horizon navigation, large scale information extraction, and reasoning under constraints.<n>We present WebDART, a general framework that enables a single LLM to handle such complex chores.<n>WebDART lifts success rates by up to 13.7 percentage points over previous SOTA agents, while matching their performance on the easier WebArena suite.
Score: 30.48395228595732
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language model (LLM) agents are becoming competent at straightforward web tasks, such as opening an item page or submitting a form, but still struggle with objectives that require long horizon navigation, large scale information extraction, and reasoning under constraints. We present WebDART, a general framework that enables a single LLM to handle such complex chores. WebDART (i) dynamically decomposes each objective into three focused subtasks: navigation, information extraction, and execution, so the model concentrates on one skill at a time, and (ii) continuously replans the decomposition as new webpages are revealed, taking advantage of newly discovered filters or shortcuts and avoiding redundant exploration. Evaluated on WebChoreArena, WebDART lifts success rates by up to 13.7 percentage points over previous SOTA agents, while matching their performance on the easier WebArena suite and completing tasks with up to 14.7 fewer navigation steps.

Related papers

STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks [40.13135948595863]
STRUCTUREDAGENT is a hierarchical planning framework with two core components.<n>It produces interpretable hierarchical plans, enabling easier debug and facilitating human intervention when needed.<n>Our results on WebVoyager, WebArena, and custom shopping benchmarks show that STRUCTUREDAGENT improves performance on long-horizon web-browsing tasks compared to standard LLM-based agents.
arXiv Detail & Related papers (2026-03-05T15:37:06Z)
Plan-MCTS: Plan Exploration for Action Exploitation in Web Navigation [50.406803870992974]
Plan-MCTS is a framework that reformulates web navigation by shifting exploration to a semantic Plan Space.<n>Plan-MCTS achieves state-of-the-art performance, surpassing current approaches with higher task effectiveness and search efficiency.
arXiv Detail & Related papers (2026-02-15T10:24:45Z)
Nested Browser-Use Learning for Agentic Information Seeking [60.775556172513014]
Information-seeking (IS) agents have achieved strong performance across a range of wide and deep search tasks, yet their tool use remains largely restricted to API-level snippet retrieval and URL-based page fetching.<n>We propose Nested Browser-Use Learning (NestBrowse), which introduces a minimal and complete browser-action framework that decouples interaction control from page exploration through a nested structure.
arXiv Detail & Related papers (2025-12-29T17:59:14Z)
WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks [31.201406205897143]
We introduce WebChoreArena, a new fully reproducible benchmark comprising 532 carefully curated tasks.<n>WebChoreArena is built on top of the fully reproducible and widely adopted four WebArena simulation environments.<n>Our experimental results demonstrate that as LLMs evolve, significant improvements in performance are observed on WebChoreArena.
arXiv Detail & Related papers (2025-06-02T17:59:45Z)
WebNav: An Intelligent Agent for Voice-Controlled Web Navigation [0.0]
WebNav is a novel agent for multi-modal web navigation.<n>System combines vision-based context from screenshots with a dynamic DOM-labeling browser extension.
arXiv Detail & Related papers (2025-03-18T02:33:27Z)
R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory [53.94879482534949]
Current models often struggle with efficient navigation and action execution due to limited visibility and understanding of web structures.<n>Our proposed R2D2 framework addresses these challenges by integrating two paradigms: Remember and Reflect.<n>Our findings suggest that a combination of memory-enhanced navigation and reflective learning promisingly advances the capabilities of web agents.
arXiv Detail & Related papers (2025-01-21T20:21:58Z)
Infogent: An Agent-Based Framework for Web Information Aggregation [59.67710556177564]
We introduce Infogent, a novel framework for web information aggregation. Experiments on different information access settings demonstrate Infogent beats an existing SOTA multi-agent search framework by 7%.
arXiv Detail & Related papers (2024-10-24T18:01:28Z)
NaviQAte: Functionality-Guided Web Application Navigation [6.0759036120654315]
NaviQAte frames web application exploration as a question-and-answer task, generating action sequences for functionalities without requiring detailed parameters. Our three-phase approach utilizes advanced large language models like GPT-4o for complex decision-making and cost-effective models, such as GPT-4o mini, for simpler tasks.
arXiv Detail & Related papers (2024-09-16T21:18:39Z)
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher [50.68599514830046]
We introduce MindSearch to mimic the human minds in web information seeking and integration.<n>The framework can be instantiated by a simple yet effective LLM-based multi-agent framework.<n> MindSearch demonstrates significant improvement in the response quality in terms of depth and breadth.
arXiv Detail & Related papers (2024-07-29T17:12:40Z)
AutoWebGLM: A Large Language Model-based Web Navigating Agent [33.55199326570078]
We develop the open AutoWebGLM based on ChatGLM3-6B. Inspired by human browsing patterns, we first design an HTML simplification algorithm to represent webpages. We then employ a hybrid human-AI method to build web browsing data for curriculum training.
arXiv Detail & Related papers (2024-04-04T17:58:40Z)
Easy-to-Hard Learning for Information Extraction [57.827955646831526]
Information extraction systems aim to automatically extract structured information from unstructured texts. We propose a unified easy-to-hard learning framework consisting of three stages, i.e., the easy stage, the hard stage, and the main stage. By breaking down the learning process into multiple stages, our framework facilitates the model to acquire general IE task knowledge and improve its generalization ability.
arXiv Detail & Related papers (2023-05-16T06:04:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.