AutoDroid: LLM-powered Task Automation in Android
- URL: http://arxiv.org/abs/2308.15272v4
- Date: Sat, 9 Mar 2024 09:38:51 GMT
- Title: AutoDroid: LLM-powered Task Automation in Android
- Authors: Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun
Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, Yunxin Liu
- Abstract summary: We introduce AutoDroid, a mobile task automation system capable of handling arbitrary tasks on any Android application without manual efforts.
The main components include a functionality-aware UI representation method that bridges the UI with the LLM.
We evaluate its performance on a new benchmark for memory-augmented Android task automation with 158 common tasks.
- Score: 32.241570727243534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mobile task automation is an attractive technique that aims to enable
voice-based hands-free user interaction with smartphones. However, existing
approaches suffer from poor scalability due to the limited language
understanding ability and the non-trivial manual efforts required from
developers or end-users. The recent advance of large language models (LLMs) in
language understanding and reasoning inspires us to rethink the problem from a
model-centric perspective, where task preparation, comprehension, and execution
are handled by a unified language model. In this work, we introduce AutoDroid,
a mobile task automation system capable of handling arbitrary tasks on any
Android application without manual efforts. The key insight is to combine the
commonsense knowledge of LLMs and domain-specific knowledge of apps through
automated dynamic analysis. The main components include a functionality-aware
UI representation method that bridges the UI with the LLM, exploration-based
memory injection techniques that augment the app-specific domain knowledge of
LLM, and a multi-granularity query optimization module that reduces the cost of
model inference. We integrate AutoDroid with off-the-shelf LLMs including
online GPT-4/GPT-3.5 and on-device Vicuna, and evaluate its performance on a
new benchmark for memory-augmented Android task automation with 158 common
tasks. The results demonstrated that AutoDroid is able to precisely generate
actions with an accuracy of 90.9%, and complete tasks with a success rate of
71.3%, outperforming the GPT-4-powered baselines by 36.4% and 39.7%. The demo,
benchmark suites, and source code of AutoDroid will be released at
url{https://autodroid-sys.github.io/}.
Related papers
- ClickAgent: Enhancing UI Location Capabilities of Autonomous Agents [0.0]
ClickAgent is a novel framework for building autonomous agents.
In ClickAgent, the MLLM handles reasoning and action planning, while a separate UI location model identifies the relevant UI elements on the screen.
Our evaluation was conducted on both an Android smartphone emulator and an actual Android smartphone, using the task success rate as the key metric for measuring agent performance.
arXiv Detail & Related papers (2024-10-09T14:49:02Z) - ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning [74.58666091522198]
We present a framework for intuitive robot programming by non-experts.
We leverage natural language prompts and contextual information from the Robot Operating System (ROS)
Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface.
arXiv Detail & Related papers (2024-06-28T08:28:38Z) - AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA.
It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation.
Specifically, task decomposition, tool selection, and parameter prediction are assessed.
Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z) - Autonomous Large Language Model Agents Enabling Intent-Driven Mobile GUI
Testing [17.24045904273874]
We propose DroidAgent, an autonomous GUI testing agent for Android.
It is based on Large Language Models and support mechanisms such as long- and short-term memory.
DroidAgent achieved 61% activity coverage, compared to 51% for current state-of-the-art GUI testing techniques.
arXiv Detail & Related papers (2023-11-15T01:59:40Z) - Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI
Testing via Functionality-aware Decisions [23.460051600514806]
GPTDroid is a Q&A-based GUI testing framework for mobile apps.
We introduce a functionality-aware memory prompting mechanism.
It outperforms the best baseline by 32% in activity coverage, and detects 31% more bugs at a faster rate.
arXiv Detail & Related papers (2023-10-24T12:30:26Z) - Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI
Testing [23.460051600514806]
We propose GPTDroid, asking Large Language Model to chat with the mobile apps by passing the GUI page information to LLM to elicit testing scripts.
Within it, we extract the static context of the GUI page and the dynamic context of the iterative testing process.
We evaluate GPTDroid on 86 apps from Google Play, and its activity coverage is 71%, with 32% higher than the best baseline, and can detect 36% more bugs with faster speed than the best baseline.
arXiv Detail & Related papers (2023-05-16T13:46:52Z) - DroidBot-GPT: GPT-powered UI Automation for Android [11.980924738484994]
DroidBot-GPT is a tool that utilizes GPT-like large language models (LLMs) to automate the interactions with Android mobile applications.
Given a natural language description of a desired task, DroidBot-GPT can automatically generate and execute actions that navigate the app to complete the task.
arXiv Detail & Related papers (2023-04-14T11:31:56Z) - AutoFIS: Automatic Feature Interaction Selection in Factorization Models
for Click-Through Rate Prediction [75.16836697734995]
We propose a two-stage algorithm called Automatic Feature Interaction Selection (AutoFIS)
AutoFIS can automatically identify important feature interactions for factorization models with computational cost just equivalent to training the target model to convergence.
AutoFIS has been deployed onto the training platform of Huawei App Store recommendation service.
arXiv Detail & Related papers (2020-03-25T06:53:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.