CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning
- URL: http://arxiv.org/abs/2503.03743v1
- Date: Wed, 05 Mar 2025 18:56:16 GMT
- Title: CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning
- Authors: Yuqi Zhou, Shuai Wang, Sunhao Dai, Qinglin Jia, Zhaocheng Du, Zhenhua Dong, Jun Xu,
- Abstract summary: We propose a new mobile assistant architecture with constrained high-frequency optimized planning (CHOP)<n>Our approach overcomes the VLM's deficiency in GUI scenarios planning by using human-planned subtasks as the basis vector.<n>We evaluate our architecture in both English and Chinese contexts across 20 Apps, demonstrating significant improvements in both effectiveness and efficiency.
- Score: 18.826366389246385
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The advancement of visual language models (VLMs) has enhanced mobile device operations, allowing simulated human-like actions to address user requirements. Current VLM-based mobile operating assistants can be structured into three levels: task, subtask, and action. The subtask level, linking high-level goals with low-level executable actions, is crucial for task completion but faces two challenges: ineffective subtasks that lower-level agent cannot execute and inefficient subtasks that fail to contribute to the completion of the higher-level task. These challenges stem from VLM's lack of experience in decomposing subtasks within GUI scenarios in multi-agent architecture. To address these, we propose a new mobile assistant architecture with constrained high-frequency o}ptimized planning (CHOP). Our approach overcomes the VLM's deficiency in GUI scenarios planning by using human-planned subtasks as the basis vector. We evaluate our architecture in both English and Chinese contexts across 20 Apps, demonstrating significant improvements in both effectiveness and efficiency. Our dataset and code is available at https://github.com/Yuqi-Zhou/CHOP
Related papers
- MapAgent: Trajectory-Constructed Memory-Augmented Planning for Mobile Task Automation [5.433829353194621]
MapAgent is a framework that leverages memory constructed from historical trajectories to augment current task planning.<n>We introduce a coarse-to-fine task planning approach that retrieves relevant pages from the memory database based on similarity.<n>Results in real-world scenarios demonstrate that MapAgent achieves superior performance to existing methods.
arXiv Detail & Related papers (2025-07-29T16:05:32Z) - RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation [80.20970723577818]
We introduce RoboCerebra, a benchmark for evaluating high-level reasoning in long-horizon robotic manipulation.<n>The dataset is constructed via a top-down pipeline, where GPT generates task instructions and decomposes them into subtask sequences.<n>Compared to prior benchmarks, RoboCerebra features significantly longer action sequences and denser annotations.
arXiv Detail & Related papers (2025-06-07T06:15:49Z) - LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks [31.3295171851909]
Real-world embodied agents face high-level goals demanding multi-step solutions.<n>Long-horizon tasks require high-level task planning and low-level motion control.<n>We introduce a new unified vision language framework for long-horizon tasks, dubbed LoHoVLA.
arXiv Detail & Related papers (2025-05-31T06:01:03Z) - Data-Agnostic Robotic Long-Horizon Manipulation with Vision-Language-Guided Closed-Loop Feedback [12.600525101342026]
We introduce DAHLIA, a data-agnostic framework for language-conditioned long-horizon robotic manipulation.
LLMs are large language models for real-time task planning and execution.
Our framework demonstrates state-of-the-art performance across diverse long-horizon tasks, achieving strong generalization in both simulated and real-world scenarios.
arXiv Detail & Related papers (2025-03-27T20:32:58Z) - Anticipate & Act : Integrating LLMs and Classical Planning for Efficient Task Execution in Household Environments [16.482992646001996]
We develop a framework for anticipating household tasks and computing an action sequence that jointly achieves these tasks.<n>We demonstrate a 31% reduction in execution time compared with a system that does not consider upcoming tasks.
arXiv Detail & Related papers (2025-02-04T07:31:55Z) - Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks [85.48034185086169]
Mobile-Agent-E is a hierarchical multi-agent framework capable of self-evolution through past experience.
Mobile-Agent-E achieves a 22% absolute improvement over previous state-of-the-art approaches.
arXiv Detail & Related papers (2025-01-20T20:35:46Z) - MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation [23.026244256950086]
We propose MobA, a novel MLLM-based mobile assistant system.<n>A multifaceted memory module provides comprehensive memory support to enhance adaptability and efficiency.<n> Experimental results on MobBench and AndroidArena demonstrate MobA's ability to handle dynamic GUI environments and perform complex mobile task.
arXiv Detail & Related papers (2024-10-17T16:53:50Z) - Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration [52.25473993987409]
We propose Mobile-Agent-v2, a multi-agent architecture for mobile device operation assistance.
The architecture comprises three agents: planning agent, decision agent, and reflection agent.
We show that Mobile-Agent-v2 achieves over a 30% improvement in task completion compared to the single-agent architecture.
arXiv Detail & Related papers (2024-06-03T05:50:00Z) - LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied
Agents [2.8927500190704567]
Large language models (LLMs) have recently received considerable attention as alternative solutions for task planning.
We propose a benchmark system for automatically quantifying performance of task planning for home-service embodied agents.
arXiv Detail & Related papers (2024-02-13T02:28:57Z) - TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation.
Specifically, task decomposition, tool selection, and parameter prediction are assessed.
Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z) - ADaPT: As-Needed Decomposition and Planning with Language Models [131.063805299796]
We introduce As-Needed Decomposition and Planning for complex Tasks (ADaPT)
ADaPT explicitly plans and decomposes complex sub-tasks as-needed, when the Large Language Models is unable to execute them.
Our results demonstrate that ADaPT substantially outperforms established strong baselines.
arXiv Detail & Related papers (2023-11-08T17:59:15Z) - Plan, Eliminate, and Track -- Language Models are Good Teachers for
Embodied Agents [99.17668730578586]
Pre-trained large language models (LLMs) capture procedural knowledge about the world.
Plan, Eliminate, and Track (PET) framework translates a task description into a list of high-level sub-tasks.
PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications.
arXiv Detail & Related papers (2023-05-03T20:11:22Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.