Related papers: MobiAgent: A Systematic Framework for Customizable Mobile Agents

MobiAgent: A Systematic Framework for Customizable Mobile Agents

URL: http://arxiv.org/abs/2509.00531v1
Date: Sat, 30 Aug 2025 15:24:47 GMT
Title: MobiAgent: A Systematic Framework for Customizable Mobile Agents
Authors: Cheng Zhang, Erhu Feng, Xi Zhao, Yisheng Zhao, Wangbo Gong, Jiahui Sun, Dong Du, Zhichao Hua, Yubin Xia, Haibo Chen,
Abstract summary: We propose MobiAgent, a comprehensive mobile agent system.<n>It consists of the MobiMind-series agent models, the AgentRR acceleration framework, and the MobiFlow benchmarking suite.<n>Compared to both general-purpose LLMs and specialized GUI agent models, MobiAgent achieves state-of-the-art performance in real-world mobile scenarios.
Score: 11.72214553752663
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rapid advancement of Vision-Language Models (VLMs), GUI-based mobile agents have emerged as a key development direction for intelligent mobile systems. However, existing agent models continue to face significant challenges in real-world task execution, particularly in terms of accuracy and efficiency. To address these limitations, we propose MobiAgent, a comprehensive mobile agent system comprising three core components: the MobiMind-series agent models, the AgentRR acceleration framework, and the MobiFlow benchmarking suite. Furthermore, recognizing that the capabilities of current mobile agents are still limited by the availability of high-quality data, we have developed an AI-assisted agile data collection pipeline that significantly reduces the cost of manual annotation. Compared to both general-purpose LLMs and specialized GUI agent models, MobiAgent achieves state-of-the-art performance in real-world mobile scenarios.

Related papers

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis [70.39500621448383]
Open-world mobile manipulation task remains a challenge due to the need for generalization to open-ended instructions and environments.<n>We propose a novel multi-modal agent architecture that maintains multi-view scene frames and agent states for decision-making and controls the robot by function calling.<n>We highlight our fine-tuned OWMM-VLM as the first dedicated foundation model for mobile manipulators with global scene understanding, robot state tracking, and multi-modal action generation in a unified model.
arXiv Detail & Related papers (2025-06-04T17:57:44Z)
MobileA3gent: Training Mobile GUI Agents Using Decentralized Self-Sourced Data from Diverse Users [52.696186533146516]
MobileA3gent is a collaborative framework that trains mobile GUI Agents using decentralized self-sourced data.<n>MobileA3gent achieves superior performance over traditional approaches at only 1% of the cost.
arXiv Detail & Related papers (2025-02-05T08:26:17Z)
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks [85.48034185086169]
Mobile-Agent-E is a hierarchical multi-agent framework capable of self-evolution through past experience.<n>Mobile-Agent-E achieves a 22% absolute improvement over previous state-of-the-art approaches.
arXiv Detail & Related papers (2025-01-20T20:35:46Z)
Foundations and Recent Trends in Multimodal Mobile Agents: A Survey [59.419801718418384]
Mobile agents are essential for automating tasks in complex and dynamic mobile environments.<n>Recent advancements enhance real-time adaptability and multimodal interaction.<n>We categorize these advancements into two main approaches: prompt-based methods and training-based methods.
arXiv Detail & Related papers (2024-11-04T11:50:58Z)
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation [23.026244256950086]
We propose MobA, a novel MLLM-based mobile assistant system.<n>We introduce an adaptive planning module that incorporates a reflection mechanism for error recovery.<n>We also present MobBench, a dataset designed for complex mobile interactions.
arXiv Detail & Related papers (2024-10-17T16:53:50Z)
MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents [7.4568642040547894]
Large language model (LLM)-based mobile agents are increasingly popular due to their capability to interact directly with mobile phone Graphic User Interfaces (GUIs) Despite their promising prospects in both academic and industrial sectors, little research has focused on benchmarking the performance of existing mobile agents. We propose an efficient and user-friendly benchmark, MobileAgentBench, designed to alleviate the burden of extensive manual testing.
arXiv Detail & Related papers (2024-06-12T13:14:50Z)
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration [52.25473993987409]
We propose Mobile-Agent-v2, a multi-agent architecture for mobile device operation assistance. The architecture comprises three agents: planning agent, decision agent, and reflection agent. We show that Mobile-Agent-v2 achieves over a 30% improvement in task completion compared to the single-agent architecture.
arXiv Detail & Related papers (2024-06-03T05:50:00Z)
Benchmarking Mobile Device Control Agents across Diverse Configurations [19.01954948183538]
B-MoCA is a benchmark for evaluating and developing mobile device control agents.<n>We benchmark diverse agents, including agents employing large language models (LLMs) or multi-modal LLMs.<n>While these agents demonstrate proficiency in executing straightforward tasks, their poor performance on complex tasks highlights significant opportunities for future research to improve effectiveness.
arXiv Detail & Related papers (2024-04-25T14:56:32Z)
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception [52.5831204440714]
We introduce Mobile-Agent, an autonomous multi-modal mobile device agent. Mobile-Agent first leverages visual perception tools to accurately identify and locate both the visual and textual elements within the app's front-end interface. It then autonomously plans and decomposes the complex operation task, and navigates the mobile Apps through operations step by step.
arXiv Detail & Related papers (2024-01-29T13:46:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.