Related papers: BEAP-Agent: Backtrackable Execution and Adaptive Planning for GUI Agents

BEAP-Agent: Backtrackable Execution and Adaptive Planning for GUI Agents

URL: http://arxiv.org/abs/2601.21352v1
Date: Thu, 29 Jan 2026 07:22:50 GMT
Title: BEAP-Agent: Backtrackable Execution and Adaptive Planning for GUI Agents
Authors: Ziyu Lu, Tengjin Weng, Yiying Yang, Yuhang Zhao, Xinxin Huang, Wenhao Jiang,
Abstract summary: Existing GUI agents struggle to recover once they follow an incorrect exploration path, often leading to task failure.<n>We propose BEAP-Agent, a framework that supports long-range, multi-level state backtracking with dynamic task tracking and updating.
Score: 10.011001146444325
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: GUI agents are designed to automate repetitive tasks and enhance productivity. However, existing GUI agents struggle to recover once they follow an incorrect exploration path, often leading to task failure. In this work, we model GUI task execution as a DFS process and propose BEAP-Agent, a DFS-based framework that supports long-range, multi-level state backtracking with dynamic task tracking and updating. The framework consists of three collaborative components: Planner, Executor, and Tracker. Together, they enable effective task exploration and execution. BEAP-Agent fills the gap in systematic backtracking mechanisms for GUI agents, offering a systematic solution for long-horizon task exploration. We conducted a systematic evaluation on the OSWorld benchmark, where BEAP-Agent achieved an accuracy of 28.2%, validating the effectiveness of the proposed method.

Related papers

ANCHOR: Branch-Point Data Generation for GUI Agents [52.22377425487]
End-to-end GUI agents for real desktop environments require large amounts of high-quality interaction data.<n>We present a trajectory expansion framework Anchor that bootstraps scalable desktop supervision from a small set of verified seed demonstrations.<n>Experiments on standard desktop benchmarks, OSWorld and WindowsAgentArena, show that models fine-tuned on our expanded corpus achieve consistent improvements.
arXiv Detail & Related papers (2026-02-06T19:55:26Z)
Lemon Agent Technical Report [12.663220335253529]
Lemon Agent is a multi-agent orchestrator-worker system built on a newly proposed AgentCortex framework.<n>Our system integrates a hierarchical self-adaptive scheduling mechanism that operates at both the overall orchestrator layer and workers layer.<n>By virtue of this two-tier architecture, the system achieves synergistic balance between global task coordination and local task execution.
arXiv Detail & Related papers (2026-02-06T10:09:49Z)
EchoTrail-GUI: Building Actionable Memory for GUI Agents via Critic-Guided Self-Exploration [16.593979443102754]
We introduce EchoTrail-GUI, a novel framework designed to mimic human-like experiential learning by equipping agents with a dynamic, accessible memory.<n>First, an agent autonomously interacts with GUI environments to build a curated database of successful task trajectories, validated by a reward model.<n>Second, in the Memory Injection stage, upon receiving a new task, our system efficiently retrieves the most relevant past trajectories to serve as actionable ''memories''<n>Third, during GUI Task Inference, these memories are injected as in-context guidance to inform the agent's reasoning and decision-making process.
arXiv Detail & Related papers (2025-12-22T13:42:18Z)
GUI-360$^\circ$: A Comprehensive Dataset and Benchmark for Computer-Using Agents [59.107657859025586]
GUI-360$circ$ is a large-scale, comprehensive dataset and benchmark suite designed to advance computer-using agents (CUAs)<n>The released corpus contains over 1.2M executed action steps across thousands of trajectories in popular Windows office applications.<n>The dataset supports three canonical tasks, GUI grounding, screen parsing, and action prediction, and a hybrid GUI+API action space.
arXiv Detail & Related papers (2025-11-06T12:19:02Z)
Log2Plan: An Adaptive GUI Automation Framework Integrated with Task Mining Approach [1.7970227672578558]
Existing VLM-based planner-executor agents suffer from brittle generalization, high latency, and limited long-horizon coherence.<n>Log2Plan addresses these limitations by combining a structured two-level planning framework with a task mining approach over user behavior logs.<n>We evaluate Log2Plan on 200 real-world tasks, demonstrating significant improvements in task success rate and execution time.
arXiv Detail & Related papers (2025-09-26T09:56:44Z)
Instruction Agent: Enhancing Agent with Expert Demonstration [12.67489098612846]
Graphical user interface (GUI) agents have advanced rapidly but still struggle with complex tasks involving novel UI elements, long-horizon actions, and personalized trajectories.<n>In this work, we introduce Instruction Agent, a GUI agent that leverages expert demonstrations to solve such tasks, enabling completion of otherwise difficult tasks.<n>Given a single demonstration, the agent extracts step-by-step instructions and executes them by strictly following the trajectory intended by the user, which avoids making mistakes during execution.
arXiv Detail & Related papers (2025-09-08T18:00:12Z)
CoAct-1: Computer-using Agents with Coding as Actions [94.99657662893338]
CoAct-1 is a novel multi-agent system that combines GUI-based control with direct programmatic execution.<n>We evaluate our system on the challenging OSWorld benchmark, where CoAct-1 achieves a new state-of-the-art success rate of 60.76%.
arXiv Detail & Related papers (2025-08-05T21:33:36Z)
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents [88.35544552383581]
We introduce MMBench-GUI, a hierarchical benchmark for evaluating GUI automation agents across Windows, Linux, iOS, Android, and Web platforms.<n>It comprises four levels: GUI Content Understanding, Element Grounding, Task Automation, and Task Collaboration, covering essential skills for GUI agents.
arXiv Detail & Related papers (2025-07-25T17:59:26Z)
BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism [11.786947907397131]
BacktrackAgent is a framework that incorporates a backtracking mechanism to improve task completion efficiency.<n>We show that BacktrackAgent has achieved performance improvements in both task success rate and step accuracy on Mobile3M and Auto-UI benchmarks.
arXiv Detail & Related papers (2025-05-27T03:09:06Z)
Agent-Oriented Planning in Multi-Agent Systems [54.429028104022066]
We propose AOP, a novel framework for agent-oriented planning in multi-agent systems.<n>In this study, we identify three critical design principles of agent-oriented planning, including solvability, completeness, and non-redundancy.<n> Extensive experiments demonstrate the advancement of AOP in solving real-world problems compared to both single-agent systems and existing planning strategies for multi-agent systems.
arXiv Detail & Related papers (2024-10-03T04:07:51Z)
TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation. Specifically, task decomposition, tool selection, and parameter prediction are assessed. Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.