AppAgent: Multimodal Agents as Smartphone Users
- URL: http://arxiv.org/abs/2312.13771v2
- Date: Fri, 22 Dec 2023 02:29:17 GMT
- Title: AppAgent: Multimodal Agents as Smartphone Users
- Authors: Chi Zhang and Zhao Yang and Jiaxuan Liu and Yucheng Han and Xin Chen
and Zebiao Huang and Bin Fu and Gang Yu
- Abstract summary: Our framework enables the agent to operate smartphone applications through a simplified action space.
The agent learns to navigate and use new apps either through autonomous exploration or by observing human demonstrations.
To demonstrate the practicality of our agent, we conducted extensive testing over 50 tasks in 10 different applications.
- Score: 23.318925173980446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in large language models (LLMs) have led to the creation
of intelligent agents capable of performing complex tasks. This paper
introduces a novel LLM-based multimodal agent framework designed to operate
smartphone applications. Our framework enables the agent to operate smartphone
applications through a simplified action space, mimicking human-like
interactions such as tapping and swiping. This novel approach bypasses the need
for system back-end access, thereby broadening its applicability across diverse
apps. Central to our agent's functionality is its innovative learning method.
The agent learns to navigate and use new apps either through autonomous
exploration or by observing human demonstrations. This process generates a
knowledge base that the agent refers to for executing complex tasks across
different applications. To demonstrate the practicality of our agent, we
conducted extensive testing over 50 tasks in 10 different applications,
including social media, email, maps, shopping, and sophisticated image editing
tools. The results affirm our agent's proficiency in handling a diverse array
of high-level tasks.
Related papers
- Very Large-Scale Multi-Agent Simulation in AgentScope [115.83581238212611]
We develop new features and components for AgentScope, a user-friendly multi-agent platform.
We propose an actor-based distributed mechanism towards great scalability and high efficiency.
We provide a web-based interface for conveniently monitoring and managing a large number of agents.
arXiv Detail & Related papers (2024-07-25T05:50:46Z) - Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence [79.5316642687565]
Existing multi-agent frameworks often struggle with integrating diverse capable third-party agents.
We propose the Internet of Agents (IoA), a novel framework that addresses these limitations.
IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control.
arXiv Detail & Related papers (2024-07-09T17:33:24Z) - MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents [7.4568642040547894]
Large language model (LLM)-based mobile agents are increasingly popular due to their capability to interact directly with mobile phone Graphic User Interfaces (GUIs)
Despite their promising prospects in both academic and industrial sectors, little research has focused on benchmarking the performance of existing mobile agents.
We propose an efficient and user-friendly benchmark, MobileAgentBench, designed to alleviate the burden of extensive manual testing.
arXiv Detail & Related papers (2024-06-12T13:14:50Z) - Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration [52.25473993987409]
We propose Mobile-Agent-v2, a multi-agent architecture for mobile device operation assistance.
The architecture comprises three agents: planning agent, decision agent, and reflection agent.
We show that Mobile-Agent-v2 achieves over a 30% improvement in task completion compared to the single-agent architecture.
arXiv Detail & Related papers (2024-06-03T05:50:00Z) - AgentLite: A Lightweight Library for Building and Advancing
Task-Oriented LLM Agent System [91.41155892086252]
We open-source a new AI agent library, AgentLite, which simplifies research investigation into LLM agents.
AgentLite is a task-oriented framework designed to enhance the ability of agents to break down tasks.
We introduce multiple practical applications developed with AgentLite to demonstrate its convenience and flexibility.
arXiv Detail & Related papers (2024-02-23T06:25:20Z) - AgentScope: A Flexible yet Robust Multi-Agent Platform [66.64116117163755]
AgentScope is a developer-centric multi-agent platform with message exchange as its core communication mechanism.
The abundant syntactic tools, built-in agents and service functions, user-friendly interfaces for application demonstration and utility monitor, zero-code programming workstation, and automatic prompt tuning mechanism significantly lower the barriers to both development and deployment.
arXiv Detail & Related papers (2024-02-21T04:11:28Z) - Exploring Large Language Model based Intelligent Agents: Definitions,
Methods, and Prospects [32.91556128291915]
This paper surveys current research to provide an in-depth overview of intelligent agents within single and multi-agent systems.
It covers their definitions, research frameworks, and foundational components such as their composition, cognitive and planning methods, tool utilization, and responses to environmental feedback.
We conclude by envisioning prospects for LLM-based agents, considering the evolving landscape of AI and natural language processing.
arXiv Detail & Related papers (2024-01-07T09:08:24Z) - AutoAgents: A Framework for Automatic Agent Generation [27.74332323317923]
AutoAgents is an innovative framework that adaptively generates and coordinates multiple specialized agents to build an AI team according to different tasks.
Our experiments on various benchmarks demonstrate that AutoAgents generates more coherent and accurate solutions than the existing multi-agent methods.
arXiv Detail & Related papers (2023-09-29T14:46:30Z) - AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement
Learning [19.990946219992992]
We introduce an RL-based framework for learning to accomplish tasks in mobile apps.
RL agents are provided with states derived from the underlying representation of on-screen elements.
We develop a platform which addresses several engineering challenges to enable an effective RL training environment.
arXiv Detail & Related papers (2021-05-31T23:02:38Z) - SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for
Autonomous Driving [96.50297622371457]
Multi-agent interaction is a fundamental aspect of autonomous driving in the real world.
Despite more than a decade of research and development, the problem of how to interact with diverse road users in diverse scenarios remains largely unsolved.
We develop a dedicated simulation platform called SMARTS that generates diverse and competent driving interactions.
arXiv Detail & Related papers (2020-10-19T18:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.