Log2Plan: An Adaptive GUI Automation Framework Integrated with Task Mining Approach
- URL: http://arxiv.org/abs/2509.22137v1
- Date: Fri, 26 Sep 2025 09:56:44 GMT
- Title: Log2Plan: An Adaptive GUI Automation Framework Integrated with Task Mining Approach
- Authors: Seoyoung Lee, Seonbin Yoon, Seongbeen Lee, Hyesoo Kim, Joo Yong Sim,
- Abstract summary: Existing VLM-based planner-executor agents suffer from brittle generalization, high latency, and limited long-horizon coherence.<n>Log2Plan addresses these limitations by combining a structured two-level planning framework with a task mining approach over user behavior logs.<n>We evaluate Log2Plan on 200 real-world tasks, demonstrating significant improvements in task success rate and execution time.
- Score: 1.7970227672578558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: GUI task automation streamlines repetitive tasks, but existing LLM or VLM-based planner-executor agents suffer from brittle generalization, high latency, and limited long-horizon coherence. Their reliance on single-shot reasoning or static plans makes them fragile under UI changes or complex tasks. Log2Plan addresses these limitations by combining a structured two-level planning framework with a task mining approach over user behavior logs, enabling robust and adaptable GUI automation. Log2Plan constructs high-level plans by mapping user commands to a structured task dictionary, enabling consistent and generalizable automation. To support personalization and reuse, it employs a task mining approach from user behavior logs that identifies user-specific patterns. These high-level plans are then grounded into low-level action sequences by interpreting real-time GUI context, ensuring robust execution across varying interfaces. We evaluated Log2Plan on 200 real-world tasks, demonstrating significant improvements in task success rate and execution time. Notably, it maintains over 60.0% success rate even on long-horizon task sequences, highlighting its robustness in complex, multi-step workflows.
Related papers
- STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks [40.13135948595863]
STRUCTUREDAGENT is a hierarchical planning framework with two core components.<n>It produces interpretable hierarchical plans, enabling easier debug and facilitating human intervention when needed.<n>Our results on WebVoyager, WebArena, and custom shopping benchmarks show that STRUCTUREDAGENT improves performance on long-horizon web-browsing tasks compared to standard LLM-based agents.
arXiv Detail & Related papers (2026-03-05T15:37:06Z) - BEAP-Agent: Backtrackable Execution and Adaptive Planning for GUI Agents [10.011001146444325]
Existing GUI agents struggle to recover once they follow an incorrect exploration path, often leading to task failure.<n>We propose BEAP-Agent, a framework that supports long-range, multi-level state backtracking with dynamic task tracking and updating.
arXiv Detail & Related papers (2026-01-29T07:22:50Z) - MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents [88.35544552383581]
We introduce MMBench-GUI, a hierarchical benchmark for evaluating GUI automation agents across Windows, Linux, iOS, Android, and Web platforms.<n>It comprises four levels: GUI Content Understanding, Element Grounding, Task Automation, and Task Collaboration, covering essential skills for GUI agents.
arXiv Detail & Related papers (2025-07-25T17:59:26Z) - Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI Agent [13.259836345131525]
We propose SPlanner, a plug-and-play planning module to generate execution plans that guide vision language model(VLMs) in executing tasks.<n>SPlanner achieves a 63.8% task success rate when paired with Qwen2.5-VL-72B as the VLM, yielding a 28.8 percentage point improvement compared to using Qwen2.5-VL-72B without planning assistance.
arXiv Detail & Related papers (2025-05-20T09:45:55Z) - Plan-over-Graph: Towards Parallelable LLM Agent Schedule [53.834646147919436]
Large Language Models (LLMs) have demonstrated exceptional abilities in reasoning for task planning.<n>This paper introduces a novel paradigm, plan-over-graph, in which the model first decomposes a real-life textual task into executable subtasks and constructs an abstract task graph.<n>The model then understands this task graph as input and generates a plan for parallel execution.
arXiv Detail & Related papers (2025-02-20T13:47:51Z) - Dynamic Planning for LLM-based Graphical User Interface Automation [48.31532014795368]
We propose a novel approach called Dynamic Planning of Thoughts (D-PoT) for LLM-based GUI agents.<n>D-PoT involves the dynamic adjustment of planning based on the environmental feedback and execution history.<n> Experimental results reveal that the proposed D-PoT significantly surpassed the strong GPT-4V baseline by +12.7%.
arXiv Detail & Related papers (2024-10-01T07:49:24Z) - Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning [94.76546523689113]
We introduce CodePlan, a framework that generates and follows textcode-form plans -- pseudocode that outlines high-level, structured reasoning processes.
CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks.
It achieves a 25.1% relative improvement compared with directly generating responses.
arXiv Detail & Related papers (2024-09-19T04:13:58Z) - Learning adaptive planning representations with natural language
guidance [90.24449752926866]
This paper describes Ada, a framework for automatically constructing task-specific planning representations.
Ada interactively learns a library of planner-compatible high-level action abstractions and low-level controllers adapted to a particular domain of planning tasks.
arXiv Detail & Related papers (2023-12-13T23:35:31Z) - Interactive Task Planning with Language Models [89.5839216871244]
An interactive robot framework accomplishes long-horizon task planning and can easily generalize to new goals and distinct tasks, even during execution.<n>Recent large language model based approaches can allow for more open-ended planning but often require heavy prompt engineering or domain specific pretrained models.<n>We propose a simple framework that achieves interactive task planning with language models by incorporating both high-level planning and low-level skill execution.
arXiv Detail & Related papers (2023-10-16T17:59:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.