Related papers: AppAgent-Pro: A Proactive GUI Agent System for Multidomain Information Integration and User Assistance

AppAgent-Pro: A Proactive GUI Agent System for Multidomain Information Integration and User Assistance

URL: http://arxiv.org/abs/2508.18689v2
Date: Wed, 27 Aug 2025 04:25:35 GMT
Title: AppAgent-Pro: A Proactive GUI Agent System for Multidomain Information Integration and User Assistance
Authors: Yuyang Zhao, Wentao Shi, Fuli Feng, Xiangnan He,
Abstract summary: AppAgent-Pro is a proactive GUI agent system that actively integrates multi-domain information based on user instructions.<n>AppAgent-Pro has the potential to fundamentally redefine information acquisition in daily life.
Score: 64.78994124332989
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM)-based agents have demonstrated remarkable capabilities in addressing complex tasks, thereby enabling more advanced information retrieval and supporting deeper, more sophisticated human information-seeking behaviors. However, most existing agents operate in a purely reactive manner, responding passively to user instructions, which significantly constrains their effectiveness and efficiency as general-purpose platforms for information acquisition. To overcome this limitation, this paper proposes AppAgent-Pro, a proactive GUI agent system that actively integrates multi-domain information based on user instructions. This approach enables the system to proactively anticipate users' underlying needs and conduct in-depth multi-domain information mining, thereby facilitating the acquisition of more comprehensive and intelligent information. AppAgent-Pro has the potential to fundamentally redefine information acquisition in daily life, leading to a profound impact on human society. Our code is available at: https://github.com/LaoKuiZe/AppAgent-Pro. The demonstration video could be found at: https://www.dropbox.com/scl/fi/hvzqo5vnusg66srydzixo/AppAgent-Pro-demo-video.mp4?rlkey=o2nlfqgq6ihl125mcqg7bpgqu&st=d29vrzii&dl=0.

Related papers

ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation [26.254354188188177]
ReInAgent is a context-aware multi-agent framework to enable human-in-the-loop mobile task navigation.<n>It overcomes the limitation of existing approaches that rely on clear and static task assumptions.<n>It produces outcomes that are more closely aligned with genuine user preferences.
arXiv Detail & Related papers (2025-10-09T09:22:05Z)
VC-Agent: An Interactive Agent for Customized Video Dataset Collection [48.65498668743145]
We propose VC-Agent, an interactive agent that understands users' queries and feedback, and accordingly retrieves/scales up relevant video clips with minimal user input.<n>As for agent functions, we leverage existing multi-modal large language models to connect the user's requirements with the video content.<n>We provide a new benchmark for personalized video dataset collection, and carefully conduct the user study to verify our agent's usage in various real scenarios.
arXiv Detail & Related papers (2025-09-25T15:08:28Z)
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications [95.42093979627703]
AgentScope supports flexible and efficient tool-based agent-environment interactions.<n>We ground agent behaviors in the ReAct paradigm and offer advanced agent-level infrastructure.<n>AgentScope also includes robust engineering support for developer-friendly experiences.
arXiv Detail & Related papers (2025-08-22T10:35:56Z)
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use [101.57043903478257]
The dream to create AI assistants as capable and versatile as the fictional J.A.R.V.I.S from Iron Man has long captivated imaginations.<n>With the evolution of (multi-modal) large language models ((M)LLMs), this dream is closer to reality.<n>This survey aims to consolidate the state of OS Agents research, providing insights to guide both academic inquiry and industrial development.
arXiv Detail & Related papers (2025-08-06T14:33:45Z)
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC [98.82146219495792]
In this paper, we propose a hierarchical agent framework named PC-Agent.<n>From the perception perspective, we devise an Active Perception Module (APM) to overcome the inadequate abilities of current MLLMs in perceiving screenshot content.<n>From the decision-making perspective, to handle complex user instructions and interdependent subtasks more effectively, we propose a hierarchical multi-agent collaboration architecture.
arXiv Detail & Related papers (2025-02-20T05:41:55Z)
Agent S: An Open Agentic Framework that Uses Computers Like a Human [31.16046798529319]
We present Agent S, an open agentic framework that enables autonomous interaction with computers through a Graphical User Interface (GUI) Agent S aims to address three key challenges in automating computer tasks: acquiring domain-specific knowledge, planning over long task horizons, and handling dynamic, non-uniform interfaces.
arXiv Detail & Related papers (2024-10-10T17:43:51Z)
AppAgent v2: Advanced Agent for Flexible Mobile Interactions [57.98933460388985]
This work introduces a novel LLM-based multimodal agent framework for mobile devices.<n>Our agent constructs a flexible action space that enhances adaptability across various applications.<n>Our results demonstrate the framework's superior performance, confirming its effectiveness in real-world scenarios.
arXiv Detail & Related papers (2024-08-05T06:31:39Z)
CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only [21.054681757006385]
We propose an agent that perceives its environment solely through screenshot images.<n>By leveraging the reasoning capability of the Large Language Models, we eliminate the need for large-scale human demonstration data.<n>Agent achieves an average success rate of 94.5% on MiniWoB++ and an average task score of 62.3 on WebShop.
arXiv Detail & Related papers (2024-06-11T05:21:20Z)
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration [52.25473993987409]
We propose Mobile-Agent-v2, a multi-agent architecture for mobile device operation assistance. The architecture comprises three agents: planning agent, decision agent, and reflection agent. We show that Mobile-Agent-v2 achieves over a 30% improvement in task completion compared to the single-agent architecture.
arXiv Detail & Related papers (2024-06-03T05:50:00Z)
AgentScope: A Flexible yet Robust Multi-Agent Platform [66.64116117163755]
AgentScope is a developer-centric multi-agent platform with message exchange as its core communication mechanism. The abundant syntactic tools, built-in agents and service functions, user-friendly interfaces for application demonstration and utility monitor, zero-code programming workstation, and automatic prompt tuning mechanism significantly lower the barriers to both development and deployment.
arXiv Detail & Related papers (2024-02-21T04:11:28Z)
MobileAgent: enhancing mobile control via human-machine interaction and SOP integration [0.0]
Large Language Models (LLMs) are now capable of automating mobile device operations for users. Privacy concerns related to personalized user data arise during mobile operations, requiring user confirmation. We have designed interactive tasks between agents and humans to identify sensitive information and align with personalized user needs. Our approach is evaluated on the new device control benchmark AitW, which encompasses 30K unique instructions across multi-step tasks.
arXiv Detail & Related papers (2024-01-04T03:44:42Z)
AppAgent: Multimodal Agents as Smartphone Users [23.318925173980446]
Our framework enables the agent to operate smartphone applications through a simplified action space. The agent learns to navigate and use new apps either through autonomous exploration or by observing human demonstrations. To demonstrate the practicality of our agent, we conducted extensive testing over 50 tasks in 10 different applications.
arXiv Detail & Related papers (2023-12-21T11:52:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.