PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records
- URL: http://arxiv.org/abs/2601.09636v1
- Date: Wed, 14 Jan 2026 17:12:48 GMT
- Title: PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records
- Authors: Yibo Lyu, Gongwei Chen, Rui Shao, Weili Guan, Liqiang Nie,
- Abstract summary: We highlight Hierarchical Implicit Intent Alignment for Personalized GUI Agent (PersonalAlign)<n>PersonalAlign requires agents to leverage long-term user records as persistent context to resolve omitted preferences in vague instructions.<n>We evaluate a range of GUI agents on AndroidIntent, including GPT-5, Qwen3-VL, and UI-TARS.
- Score: 67.68348568175718
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While GUI agents have shown strong performance under explicit and completion instructions, real-world deployment requires aligning with users' more complex implicit intents. In this work, we highlight Hierarchical Implicit Intent Alignment for Personalized GUI Agent (PersonalAlign), a new agent task that requires agents to leverage long-term user records as persistent context to resolve omitted preferences in vague instructions and anticipate latent routines by user state for proactive assistance. To facilitate this study, we introduce AndroidIntent, a benchmark designed to evaluate agents' ability in resolving vague instructions and providing proactive suggestions through reasoning over long-term user records. We annotated 775 user-specific preferences and 215 routines from 20k long-term records across different users for evaluation. Furthermore, we introduce Hierarchical Intent Memory Agent (HIM-Agent), which maintains a continuously updating personal memory and hierarchically organizes user preferences and routines for personalization. Finally, we evaluate a range of GUI agents on AndroidIntent, including GPT-5, Qwen3-VL, and UI-TARS, further results show that HIM-Agent significantly improves both execution and proactive performance by 15.7% and 7.3%.
Related papers
- Me-Agent: A Personalized Mobile Agent with Two-Level User Habit Learning for Enhanced Interaction [20.029487905328004]
We propose Me-Agent, a learnable and memorable personalized mobile agent.<n>Me-Agent incorporates a two-level user habit learning approach.<n>Me-Agent achieves state-of-the-art performance in personalization while maintaining competitive instruction execution performance.
arXiv Detail & Related papers (2026-01-28T01:44:19Z) - SwipeGen: Bridging the Execution Gap in GUI Agents via Human-like Swipe Synthesis [11.291868789244496]
We decompose human swipe gestures into quantifiable dimensions and propose an automated pipeline SwipeGen to synthesize human-like swipe interactions.<n>Based on this pipeline, we construct and release the first benchmark for evaluating the swipe execution capability of GUI agents.<n>We propose GUISwiper, a GUI agent with enhanced interaction execution capabilities.
arXiv Detail & Related papers (2026-01-26T09:35:10Z) - Towards Proactive Personalization through Profile Customization for Individual Users in Dialogues [28.522406727886395]
PersonalAgent is a lifelong agent designed to continuously infer and adapt to user preferences.<n>Experiments show that PersonalAgent achieves superior performance over strong prompt-based and policy optimization baselines.<n>Our findings underscore the importance of lifelong personalization for developing more inclusive and adaptive conversational agents.
arXiv Detail & Related papers (2025-12-17T10:47:06Z) - TOM-SWE: User Mental Modeling For Software Engineering Agents [75.28749912645127]
ToM-SWE is a dual-agent architecture that pairs a primary software-engineering (SWE) agent with a lightweight theory-of-mind (ToM) partner agent.<n>ToM-SWE infers user goals, constraints, and preferences from instructions and interaction history.<n>In two software engineering benchmarks, ToM-SWE improves task success rates and user satisfaction.
arXiv Detail & Related papers (2025-10-24T16:09:51Z) - PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time [87.99027488664282]
PersonaAgent is a framework designed to address versatile personalization tasks.<n>It integrates a personalized memory module and a personalized action module.<n>Test-time user-preference alignment strategy ensures real-time user preference alignment.
arXiv Detail & Related papers (2025-06-06T17:29:49Z) - Creating General User Models from Computer Use [53.59999173952482]
This paper presents an architecture for a general user model (GUM) that learns about you by observing any interaction you have with your computer.<n>The GUM takes as input any unstructured observation of a user (e.g., device screenshots) and constructs confidence-weighted propositions that capture user knowledge and preferences.
arXiv Detail & Related papers (2025-05-16T04:00:31Z) - SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World [50.937342998351426]
Chain-of-User-Thought (COUT) is a novel embodied reasoning paradigm.<n>We introduce SmartAgent, an agent framework perceiving cyber environments and reasoning personalized requirements.<n>Our work is the first to formulate the COUT process, serving as a preliminary attempt towards embodied personalized agent learning.
arXiv Detail & Related papers (2024-12-10T12:40:35Z) - Identifying User Goals from UI Trajectories [19.492331502146886]
We propose a new task goal identification from observed UI trajectories.<n>We also introduce a novel evaluation methodology designed to assess whether two intent descriptions can be considered paraphrases.<n>To benchmark this task, we compare the performance of humans and state-of-the-art models, specifically GPT-4 and Gemini-1.5 Pro.
arXiv Detail & Related papers (2024-06-20T13:46:10Z) - Tell Me More! Towards Implicit User Intention Understanding of Language
Model Driven Agents [110.25679611755962]
Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions.
We introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries.
We empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals.
arXiv Detail & Related papers (2024-02-14T14:36:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.