Related papers: STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

URL: http://arxiv.org/abs/2603.05294v1
Date: Thu, 05 Mar 2026 15:37:06 GMT
Title: STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks
Authors: ELita Lobo, Xu Chen, Jingjing Meng, Nan Xi, Yang Jiao, Chirag Agarwal, Yair Zick, Yan Gao,
Abstract summary: STRUCTUREDAGENT is a hierarchical planning framework with two core components.<n>It produces interpretable hierarchical plans, enabling easier debug and facilitating human intervention when needed.<n>Our results on WebVoyager, WebArena, and custom shopping benchmarks show that STRUCTUREDAGENT improves performance on long-horizon web-browsing tasks compared to standard LLM-based agents.
Score: 40.13135948595863
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in large language models (LLMs) have enabled agentic systems for sequential decision-making. Such agents must perceive their environment, reason across multiple time steps, and take actions that optimize long-term objectives. However, existing web agents struggle on complex, long-horizon tasks due to limited in-context memory for tracking history, weak planning abilities, and greedy behaviors that lead to premature termination. To address these challenges, we propose STRUCTUREDAGENT, a hierarchical planning framework with two core components: (1) an online hierarchical planner that uses dynamic AND/OR trees for efficient search and (2) a structured memory module that tracks and maintains candidate solutions to improve constraint satisfaction in information-seeking tasks. The framework also produces interpretable hierarchical plans, enabling easier debugging and facilitating human intervention when needed. Our results on WebVoyager, WebArena, and custom shopping benchmarks show that STRUCTUREDAGENT improves performance on long-horizon web-browsing tasks compared to standard LLM-based agents.

Related papers

Plan-MCTS: Plan Exploration for Action Exploitation in Web Navigation [50.406803870992974]
Plan-MCTS is a framework that reformulates web navigation by shifting exploration to a semantic Plan Space.<n>Plan-MCTS achieves state-of-the-art performance, surpassing current approaches with higher task effectiveness and search efficiency.
arXiv Detail & Related papers (2026-02-15T10:24:45Z)
H-AIM: Orchestrating LLMs, PDDL, and Behavior Trees for Hierarchical Multi-Robot Planning [3.2800662172795114]
H-AIM is a novel embodied multi-robot task planning framework.<n>It exploits large language models (LLMs) to parse instructions and generate Planning Domain Definition Language (PDDL) problem descriptions.<n>It compiles the resulting plan into behavior trees for reactive control.
arXiv Detail & Related papers (2026-01-16T07:59:50Z)
TALM: Dynamic Tree-Structured Multi-Agent Framework with Long-Term Memory for Scalable Code Generation [0.0]
Agentic code generation requires large language models capable of complex context management and multi-step reasoning.<n>We propose TALM, a dynamic framework that integrates structured task decomposition, localized re-reasoning, and long-term memory mechanisms.<n> Experimental results on HumanEval, BigCodeBench, and ClassEval benchmarks demonstrate that TALM consistently delivers strong reasoning performance and high token efficiency.
arXiv Detail & Related papers (2025-10-27T05:07:36Z)
Log2Plan: An Adaptive GUI Automation Framework Integrated with Task Mining Approach [1.7970227672578558]
Existing VLM-based planner-executor agents suffer from brittle generalization, high latency, and limited long-horizon coherence.<n>Log2Plan addresses these limitations by combining a structured two-level planning framework with a task mining approach over user behavior logs.<n>We evaluate Log2Plan on 200 real-world tasks, demonstrating significant improvements in task success rate and execution time.
arXiv Detail & Related papers (2025-09-26T09:56:44Z)
HiRA: A Hierarchical Reasoning Framework for Decoupled Planning and Execution in Deep Search [85.12447821237045]
HiRA is a hierarchical framework that separates strategic planning from specialized execution.<n>Our approach decomposes complex search tasks into focused subtasks, assigns each subtask to domain-specific agents equipped with external tools and reasoning capabilities.<n> Experiments on four complex, cross-modal deep search benchmarks demonstrate that HiRA significantly outperforms state-of-the-art RAG and agent-based systems.
arXiv Detail & Related papers (2025-07-03T14:18:08Z)
RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation [59.9896841079005]
We introduce RoboCerebra, a benchmark for evaluating high-level reasoning in long-horizon robotic manipulation.<n>The dataset is constructed via a top-down pipeline, where GPT generates task instructions and decomposes them into subtask sequences.<n>Compared to prior benchmarks, RoboCerebra features significantly longer action sequences and denser annotations.
arXiv Detail & Related papers (2025-06-07T06:15:49Z)
Beyond Needle(s) in the Embodied Haystack: Environment, Architecture, and Training Considerations for Long Context Reasoning [11.179019629415514]
$infty$-THOR is a new framework for long-horizon embodied tasks that advances long-context understanding in embodied AI.<n>$infty$-THOR provides: (1) a generation framework for scalable, reproducible, and unlimited long-horizon trajectories; (2) a novel embodied QA task, Needle(s) in the Embodied Haystack; and (3) a long-horizon dataset and benchmark suite.
arXiv Detail & Related papers (2025-05-22T17:20:38Z)
HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking [109.09735490692202]
We propose HyperTree Planning (HTP), a novel reasoning paradigm that constructs hypertree-structured planning outlines for effective planning.<n> Experiments demonstrate the effectiveness of HTP, achieving state-of-the-art accuracy on the TravelPlanner benchmark with Gemini-1.5-Pro, resulting in a 3.6 times performance improvement over o1-preview.
arXiv Detail & Related papers (2025-05-05T02:38:58Z)
Nl2Hltl2Plan: Scaling Up Natural Language Understanding for Multi-Robots Through Hierarchical Temporal Logic Task Representation [8.180994118420053]
Nl2Hltl2Plan is a framework that translates natural language commands into hierarchical Linear Temporal Logic (LTL)<n>First, an LLM transforms instructions into a Hierarchical Task Tree, capturing logical and temporal relations.<n>Next, a fine-tuned LLM converts sub-tasks into flat formulas, which are aggregated into hierarchical specifications.
arXiv Detail & Related papers (2024-08-15T14:46:13Z)
Learning adaptive planning representations with natural language guidance [90.24449752926866]
This paper describes Ada, a framework for automatically constructing task-specific planning representations. Ada interactively learns a library of planner-compatible high-level action abstractions and low-level controllers adapted to a particular domain of planning tasks.
arXiv Detail & Related papers (2023-12-13T23:35:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.