Related papers: ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation

ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation

URL: http://arxiv.org/abs/2510.07988v1
Date: Thu, 09 Oct 2025 09:22:05 GMT
Title: ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation
Authors: Haitao Jia, Ming He, Zimo Yin, Likang Wu, Jianping Fan, Jitao Sang,
Abstract summary: ReInAgent is a context-aware multi-agent framework to enable human-in-the-loop mobile task navigation.<n>It overcomes the limitation of existing approaches that rely on clear and static task assumptions.<n>It produces outcomes that are more closely aligned with genuine user preferences.
Score: 26.254354188188177
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mobile GUI agents exhibit substantial potential to facilitate and automate the execution of user tasks on mobile phones. However, exist mobile GUI agents predominantly privilege autonomous operation and neglect the necessity of active user engagement during task execution. This omission undermines their adaptability to information dilemmas including ambiguous, dynamically evolving, and conflicting task scenarios, leading to execution outcomes that deviate from genuine user requirements and preferences. To address these shortcomings, we propose ReInAgent, a context-aware multi-agent framework that leverages dynamic information management to enable human-in-the-loop mobile task navigation. ReInAgent integrates three specialized agents around a shared memory module: an information-managing agent for slot-based information management and proactive interaction with the user, a decision-making agent for conflict-aware planning, and a reflecting agent for task reflection and information consistency validation. Through continuous contextual information analysis and sustained user-agent collaboration, ReInAgent overcomes the limitation of existing approaches that rely on clear and static task assumptions. Consequently, it enables more adaptive and reliable mobile task navigation in complex, real-world scenarios. Experimental results demonstrate that ReInAgent effectively resolves information dilemmas and produces outcomes that are more closely aligned with genuine user preferences. Notably, on complex tasks involving information dilemmas, ReInAgent achieves a 25% higher success rate than Mobile-Agent-v2.

Related papers

AppAgent-Pro: A Proactive GUI Agent System for Multidomain Information Integration and User Assistance [64.78994124332989]
AppAgent-Pro is a proactive GUI agent system that actively integrates multi-domain information based on user instructions.<n>AppAgent-Pro has the potential to fundamentally redefine information acquisition in daily life.
arXiv Detail & Related papers (2025-08-26T05:23:24Z)
EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation [36.08217588070538]
Cloud-based mobile agents powered by (multimodal) large language models ((M)LLMs) offer strong reasoning abilities but suffer from high latency and cost.<n>We propose textbfEcoAgent, an textbfEdge-textbfCloud ctextbfOllaborative multi-agent framework for mobile automation.<n>EcoAgent features a closed-loop collaboration among a cloud-based Planning Agent and two edge-based agents: the Execution Agent for action execution and the Observation Agent for verifying outcomes.
arXiv Detail & Related papers (2025-05-08T17:31:20Z)
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks [85.48034185086169]
Mobile-Agent-E is a hierarchical multi-agent framework capable of self-evolution through past experience.<n>Mobile-Agent-E achieves a 22% absolute improvement over previous state-of-the-art approaches.
arXiv Detail & Related papers (2025-01-20T20:35:46Z)
SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World [50.937342998351426]
Chain-of-User-Thought (COUT) is a novel embodied reasoning paradigm.<n>We introduce SmartAgent, an agent framework perceiving cyber environments and reasoning personalized requirements.<n>Our work is the first to formulate the COUT process, serving as a preliminary attempt towards embodied personalized agent learning.
arXiv Detail & Related papers (2024-12-10T12:40:35Z)
Agent S: An Open Agentic Framework that Uses Computers Like a Human [31.16046798529319]
We present Agent S, an open agentic framework that enables autonomous interaction with computers through a Graphical User Interface (GUI) Agent S aims to address three key challenges in automating computer tasks: acquiring domain-specific knowledge, planning over long task horizons, and handling dynamic, non-uniform interfaces.
arXiv Detail & Related papers (2024-10-10T17:43:51Z)
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration [52.25473993987409]
We propose Mobile-Agent-v2, a multi-agent architecture for mobile device operation assistance. The architecture comprises three agents: planning agent, decision agent, and reflection agent. We show that Mobile-Agent-v2 achieves over a 30% improvement in task completion compared to the single-agent architecture.
arXiv Detail & Related papers (2024-06-03T05:50:00Z)
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents [110.25679611755962]
Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions. We introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries. We empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals.
arXiv Detail & Related papers (2024-02-14T14:36:30Z)
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception [52.5831204440714]
We introduce Mobile-Agent, an autonomous multi-modal mobile device agent. Mobile-Agent first leverages visual perception tools to accurately identify and locate both the visual and textual elements within the app's front-end interface. It then autonomously plans and decomposes the complex operation task, and navigates the mobile Apps through operations step by step.
arXiv Detail & Related papers (2024-01-29T13:46:37Z)
MobileAgent: enhancing mobile control via human-machine interaction and SOP integration [0.0]
Large Language Models (LLMs) are now capable of automating mobile device operations for users. Privacy concerns related to personalized user data arise during mobile operations, requiring user confirmation. We have designed interactive tasks between agents and humans to identify sensitive information and align with personalized user needs. Our approach is evaluated on the new device control benchmark AitW, which encompasses 30K unique instructions across multi-step tasks.
arXiv Detail & Related papers (2024-01-04T03:44:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.