Related papers: PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records

PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records

URL: http://arxiv.org/abs/2601.09636v1
Date: Wed, 14 Jan 2026 17:12:48 GMT
Title: PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records
Authors: Yibo Lyu, Gongwei Chen, Rui Shao, Weili Guan, Liqiang Nie,
Abstract summary: We highlight Hierarchical Implicit Intent Alignment for Personalized GUI Agent (PersonalAlign)<n>PersonalAlign requires agents to leverage long-term user records as persistent context to resolve omitted preferences in vague instructions.<n>We evaluate a range of GUI agents on AndroidIntent, including GPT-5, Qwen3-VL, and UI-TARS.
Score: 67.68348568175718
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While GUI agents have shown strong performance under explicit and completion instructions, real-world deployment requires aligning with users' more complex implicit intents. In this work, we highlight Hierarchical Implicit Intent Alignment for Personalized GUI Agent (PersonalAlign), a new agent task that requires agents to leverage long-term user records as persistent context to resolve omitted preferences in vague instructions and anticipate latent routines by user state for proactive assistance. To facilitate this study, we introduce AndroidIntent, a benchmark designed to evaluate agents' ability in resolving vague instructions and providing proactive suggestions through reasoning over long-term user records. We annotated 775 user-specific preferences and 215 routines from 20k long-term records across different users for evaluation. Furthermore, we introduce Hierarchical Intent Memory Agent (HIM-Agent), which maintains a continuously updating personal memory and hierarchically organizes user preferences and routines for personalization. Finally, we evaluate a range of GUI agents on AndroidIntent, including GPT-5, Qwen3-VL, and UI-TARS, further results show that HIM-Agent significantly improves both execution and proactive performance by 15.7% and 7.3%.

Related papers

Me-Agent: A Personalized Mobile Agent with Two-Level User Habit Learning for Enhanced Interaction [20.029487905328004]
We propose Me-Agent, a learnable and memorable personalized mobile agent.<n>Me-Agent incorporates a two-level user habit learning approach.<n>Me-Agent achieves state-of-the-art performance in personalization while maintaining competitive instruction execution performance.
arXiv Detail & Related papers (2026-01-28T01:44:19Z)
SwipeGen: Bridging the Execution Gap in GUI Agents via Human-like Swipe Synthesis [11.291868789244496]
We decompose human swipe gestures into quantifiable dimensions and propose an automated pipeline SwipeGen to synthesize human-like swipe interactions.<n>Based on this pipeline, we construct and release the first benchmark for evaluating the swipe execution capability of GUI agents.<n>We propose GUISwiper, a GUI agent with enhanced interaction execution capabilities.
arXiv Detail & Related papers (2026-01-26T09:35:10Z)
Towards Proactive Personalization through Profile Customization for Individual Users in Dialogues [28.522406727886395]
PersonalAgent is a lifelong agent designed to continuously infer and adapt to user preferences.<n>Experiments show that PersonalAgent achieves superior performance over strong prompt-based and policy optimization baselines.<n>Our findings underscore the importance of lifelong personalization for developing more inclusive and adaptive conversational agents.
arXiv Detail & Related papers (2025-12-17T10:47:06Z)
TOM-SWE: User Mental Modeling For Software Engineering Agents [75.28749912645127]
ToM-SWE is a dual-agent architecture that pairs a primary software-engineering (SWE) agent with a lightweight theory-of-mind (ToM) partner agent.<n>ToM-SWE infers user goals, constraints, and preferences from instructions and interaction history.<n>In two software engineering benchmarks, ToM-SWE improves task success rates and user satisfaction.
arXiv Detail & Related papers (2025-10-24T16:09:51Z)
PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time [87.99027488664282]
PersonaAgent is a framework designed to address versatile personalization tasks.<n>It integrates a personalized memory module and a personalized action module.<n>Test-time user-preference alignment strategy ensures real-time user preference alignment.
arXiv Detail & Related papers (2025-06-06T17:29:49Z)
Creating General User Models from Computer Use [53.59999173952482]
This paper presents an architecture for a general user model (GUM) that learns about you by observing any interaction you have with your computer.<n>The GUM takes as input any unstructured observation of a user (e.g., device screenshots) and constructs confidence-weighted propositions that capture user knowledge and preferences.
arXiv Detail & Related papers (2025-05-16T04:00:31Z)
SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World [50.937342998351426]
Chain-of-User-Thought (COUT) is a novel embodied reasoning paradigm.<n>We introduce SmartAgent, an agent framework perceiving cyber environments and reasoning personalized requirements.<n>Our work is the first to formulate the COUT process, serving as a preliminary attempt towards embodied personalized agent learning.
arXiv Detail & Related papers (2024-12-10T12:40:35Z)
Identifying User Goals from UI Trajectories [19.492331502146886]
We propose a new task goal identification from observed UI trajectories.<n>We also introduce a novel evaluation methodology designed to assess whether two intent descriptions can be considered paraphrases.<n>To benchmark this task, we compare the performance of humans and state-of-the-art models, specifically GPT-4 and Gemini-1.5 Pro.
arXiv Detail & Related papers (2024-06-20T13:46:10Z)
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents [110.25679611755962]
Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions. We introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries. We empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals.
arXiv Detail & Related papers (2024-02-14T14:36:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.