PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration
- URL: http://arxiv.org/abs/2508.18040v1
- Date: Mon, 25 Aug 2025 13:57:02 GMT
- Title: PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration
- Authors: Xin Wang, Zhiyao Cui, Hao Li, Ya Zeng, Chenxu Wang, Ruiqi Song, Yihang Chen, Kun Shao, Qiaosheng Zhang, Jinzhuo Liu, Siyue Ren, Shuyue Hu, Zhen Wang,
- Abstract summary: We introduce PerInstruct, a novel human-annotated dataset covering diverse personalized instructions across various mobile scenarios.<n>We propose PerPilot, a plug-and-play framework powered by large language models (LLMs) that enables mobile agents to autonomously perceive, understand, and execute personalized user instructions.
- Score: 25.464268064728017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision language model (VLM)-based mobile agents show great potential for assisting users in performing instruction-driven tasks. However, these agents typically struggle with personalized instructions -- those containing ambiguous, user-specific context -- a challenge that has been largely overlooked in previous research. In this paper, we define personalized instructions and introduce PerInstruct, a novel human-annotated dataset covering diverse personalized instructions across various mobile scenarios. Furthermore, given the limited personalization capabilities of existing mobile agents, we propose PerPilot, a plug-and-play framework powered by large language models (LLMs) that enables mobile agents to autonomously perceive, understand, and execute personalized user instructions. PerPilot identifies personalized elements and autonomously completes instructions via two complementary approaches: memory-based retrieval and reasoning-based exploration. Experimental results demonstrate that PerPilot effectively handles personalized tasks with minimal user intervention and progressively improves its performance with continued use, underscoring the importance of personalization-aware reasoning for next-generation mobile agents. The dataset and code are available at: https://github.com/xinwang-nwpu/PerPilot
Related papers
- Toward Personalized LLM-Powered Agents: Foundations, Evaluation, and Future Directions [44.14477000176553]
This survey provides a capability-oriented review of personalized LLM-powered agents.<n>We organize the literature around four interdependent components: profile modeling, memory, planning, and action execution.<n>By offering a structured framework for understanding and designing personalized LLM-powered agents, this survey charts a roadmap toward more user-aligned, adaptive, robust, and deployable agentic systems.
arXiv Detail & Related papers (2026-02-26T06:52:47Z) - Me-Agent: A Personalized Mobile Agent with Two-Level User Habit Learning for Enhanced Interaction [20.029487905328004]
We propose Me-Agent, a learnable and memorable personalized mobile agent.<n>Me-Agent incorporates a two-level user habit learning approach.<n>Me-Agent achieves state-of-the-art performance in personalization while maintaining competitive instruction execution performance.
arXiv Detail & Related papers (2026-01-28T01:44:19Z) - PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time [87.99027488664282]
PersonaAgent is a framework designed to address versatile personalization tasks.<n>It integrates a personalized memory module and a personalized action module.<n>Test-time user-preference alignment strategy ensures real-time user preference alignment.
arXiv Detail & Related papers (2025-06-06T17:29:49Z) - Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance [18.820008753896623]
Embodied agents empowered by large language models (LLMs) have shown strong performance in household object rearrangement tasks.<n>Yet, the effectiveness of embodied agents in utilizing memory for personalized assistance remains largely underexplored.<n>We present MEMENTO, a personalized embodied agent evaluation framework designed to assess memory utilization capabilities.
arXiv Detail & Related papers (2025-05-22T08:00:10Z) - GRACE: Generalizing Robot-Assisted Caregiving with User Functionality Embeddings [6.240250538289624]
We learn to predict personalized fROM using functional assessment scores from occupational therapy.<n>We develop a neural model that learns to embed functional assessment scores into a latent representation of the user's physical function.
arXiv Detail & Related papers (2025-01-29T18:55:07Z) - Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks [85.48034185086169]
Mobile-Agent-E is a hierarchical multi-agent framework capable of self-evolution through past experience.<n>Mobile-Agent-E achieves a 22% absolute improvement over previous state-of-the-art approaches.
arXiv Detail & Related papers (2025-01-20T20:35:46Z) - AppAgent v2: Advanced Agent for Flexible Mobile Interactions [57.98933460388985]
This work introduces a novel LLM-based multimodal agent framework for mobile devices.<n>Our agent constructs a flexible action space that enhances adaptability across various applications.<n>Our results demonstrate the framework's superior performance, confirming its effectiveness in real-world scenarios.
arXiv Detail & Related papers (2024-08-05T06:31:39Z) - Tell Me More! Towards Implicit User Intention Understanding of Language
Model Driven Agents [110.25679611755962]
Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions.
We introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries.
We empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals.
arXiv Detail & Related papers (2024-02-14T14:36:30Z) - Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception [52.5831204440714]
We introduce Mobile-Agent, an autonomous multi-modal mobile device agent.
Mobile-Agent first leverages visual perception tools to accurately identify and locate both the visual and textual elements within the app's front-end interface.
It then autonomously plans and decomposes the complex operation task, and navigates the mobile Apps through operations step by step.
arXiv Detail & Related papers (2024-01-29T13:46:37Z) - MobileAgent: enhancing mobile control via human-machine interaction and
SOP integration [0.0]
Large Language Models (LLMs) are now capable of automating mobile device operations for users.
Privacy concerns related to personalized user data arise during mobile operations, requiring user confirmation.
We have designed interactive tasks between agents and humans to identify sensitive information and align with personalized user needs.
Our approach is evaluated on the new device control benchmark AitW, which encompasses 30K unique instructions across multi-step tasks.
arXiv Detail & Related papers (2024-01-04T03:44:42Z) - When Large Language Models Meet Personalization: Perspectives of
Challenges and Opportunities [60.5609416496429]
The capability of large language models has been dramatically improved.
Such a major leap-forward in general AI capacity will change the pattern of how personalization is conducted.
By leveraging large language models as general-purpose interface, personalization systems may compile user requests into plans.
arXiv Detail & Related papers (2023-07-31T02:48:56Z) - Context-Aware Target Apps Selection and Recommendation for Enhancing
Personal Mobile Assistants [42.25496752260081]
This paper addresses two research problems that are vital for developing effective personal mobile assistants: target apps selection and recommendation.
Here we focus on context-aware models to leverage the rich contextual information available to mobile devices.
We propose a family of context-aware neural models that take into account the sequential, temporal, and personal behavior of users.
arXiv Detail & Related papers (2021-01-09T17:07:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.