Related papers: Log2Plan: An Adaptive GUI Automation Framework Integrated with Task Mining Approach

Log2Plan: An Adaptive GUI Automation Framework Integrated with Task Mining Approach

URL: http://arxiv.org/abs/2509.22137v1
Date: Fri, 26 Sep 2025 09:56:44 GMT
Title: Log2Plan: An Adaptive GUI Automation Framework Integrated with Task Mining Approach
Authors: Seoyoung Lee, Seonbin Yoon, Seongbeen Lee, Hyesoo Kim, Joo Yong Sim,
Abstract summary: Existing VLM-based planner-executor agents suffer from brittle generalization, high latency, and limited long-horizon coherence.<n>Log2Plan addresses these limitations by combining a structured two-level planning framework with a task mining approach over user behavior logs.<n>We evaluate Log2Plan on 200 real-world tasks, demonstrating significant improvements in task success rate and execution time.
Score: 1.7970227672578558
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: GUI task automation streamlines repetitive tasks, but existing LLM or VLM-based planner-executor agents suffer from brittle generalization, high latency, and limited long-horizon coherence. Their reliance on single-shot reasoning or static plans makes them fragile under UI changes or complex tasks. Log2Plan addresses these limitations by combining a structured two-level planning framework with a task mining approach over user behavior logs, enabling robust and adaptable GUI automation. Log2Plan constructs high-level plans by mapping user commands to a structured task dictionary, enabling consistent and generalizable automation. To support personalization and reuse, it employs a task mining approach from user behavior logs that identifies user-specific patterns. These high-level plans are then grounded into low-level action sequences by interpreting real-time GUI context, ensuring robust execution across varying interfaces. We evaluated Log2Plan on 200 real-world tasks, demonstrating significant improvements in task success rate and execution time. Notably, it maintains over 60.0% success rate even on long-horizon task sequences, highlighting its robustness in complex, multi-step workflows.

Related papers

STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks [40.13135948595863]
STRUCTUREDAGENT is a hierarchical planning framework with two core components.<n>It produces interpretable hierarchical plans, enabling easier debug and facilitating human intervention when needed.<n>Our results on WebVoyager, WebArena, and custom shopping benchmarks show that STRUCTUREDAGENT improves performance on long-horizon web-browsing tasks compared to standard LLM-based agents.
arXiv Detail & Related papers (2026-03-05T15:37:06Z)
BEAP-Agent: Backtrackable Execution and Adaptive Planning for GUI Agents [10.011001146444325]
Existing GUI agents struggle to recover once they follow an incorrect exploration path, often leading to task failure.<n>We propose BEAP-Agent, a framework that supports long-range, multi-level state backtracking with dynamic task tracking and updating.
arXiv Detail & Related papers (2026-01-29T07:22:50Z)
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents [88.35544552383581]
We introduce MMBench-GUI, a hierarchical benchmark for evaluating GUI automation agents across Windows, Linux, iOS, Android, and Web platforms.<n>It comprises four levels: GUI Content Understanding, Element Grounding, Task Automation, and Task Collaboration, covering essential skills for GUI agents.
arXiv Detail & Related papers (2025-07-25T17:59:26Z)
Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI Agent [13.259836345131525]
We propose SPlanner, a plug-and-play planning module to generate execution plans that guide vision language model(VLMs) in executing tasks.<n>SPlanner achieves a 63.8% task success rate when paired with Qwen2.5-VL-72B as the VLM, yielding a 28.8 percentage point improvement compared to using Qwen2.5-VL-72B without planning assistance.
arXiv Detail & Related papers (2025-05-20T09:45:55Z)
Plan-over-Graph: Towards Parallelable LLM Agent Schedule [53.834646147919436]
Large Language Models (LLMs) have demonstrated exceptional abilities in reasoning for task planning.<n>This paper introduces a novel paradigm, plan-over-graph, in which the model first decomposes a real-life textual task into executable subtasks and constructs an abstract task graph.<n>The model then understands this task graph as input and generates a plan for parallel execution.
arXiv Detail & Related papers (2025-02-20T13:47:51Z)
Dynamic Planning for LLM-based Graphical User Interface Automation [48.31532014795368]
We propose a novel approach called Dynamic Planning of Thoughts (D-PoT) for LLM-based GUI agents.<n>D-PoT involves the dynamic adjustment of planning based on the environmental feedback and execution history.<n> Experimental results reveal that the proposed D-PoT significantly surpassed the strong GPT-4V baseline by +12.7%.
arXiv Detail & Related papers (2024-10-01T07:49:24Z)
Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning [94.76546523689113]
We introduce CodePlan, a framework that generates and follows textcode-form plans -- pseudocode that outlines high-level, structured reasoning processes. CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks. It achieves a 25.1% relative improvement compared with directly generating responses.
arXiv Detail & Related papers (2024-09-19T04:13:58Z)
Learning adaptive planning representations with natural language guidance [90.24449752926866]
This paper describes Ada, a framework for automatically constructing task-specific planning representations. Ada interactively learns a library of planner-compatible high-level action abstractions and low-level controllers adapted to a particular domain of planning tasks.
arXiv Detail & Related papers (2023-12-13T23:35:31Z)
Interactive Task Planning with Language Models [89.5839216871244]
An interactive robot framework accomplishes long-horizon task planning and can easily generalize to new goals and distinct tasks, even during execution.<n>Recent large language model based approaches can allow for more open-ended planning but often require heavy prompt engineering or domain specific pretrained models.<n>We propose a simple framework that achieves interactive task planning with language models by incorporating both high-level planning and low-level skill execution.
arXiv Detail & Related papers (2023-10-16T17:59:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.