Related papers: AutoDroid: LLM-powered Task Automation in Android

AutoDroid: LLM-powered Task Automation in Android

URL: http://arxiv.org/abs/2308.15272v4
Date: Sat, 9 Mar 2024 09:38:51 GMT
Title: AutoDroid: LLM-powered Task Automation in Android
Authors: Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, Yunxin Liu
Abstract summary: We introduce AutoDroid, a mobile task automation system capable of handling arbitrary tasks on any Android application without manual efforts. The main components include a functionality-aware UI representation method that bridges the UI with the LLM. We evaluate its performance on a new benchmark for memory-augmented Android task automation with 158 common tasks.
Score: 32.241570727243534
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mobile task automation is an attractive technique that aims to enable voice-based hands-free user interaction with smartphones. However, existing approaches suffer from poor scalability due to the limited language understanding ability and the non-trivial manual efforts required from developers or end-users. The recent advance of large language models (LLMs) in language understanding and reasoning inspires us to rethink the problem from a model-centric perspective, where task preparation, comprehension, and execution are handled by a unified language model. In this work, we introduce AutoDroid, a mobile task automation system capable of handling arbitrary tasks on any Android application without manual efforts. The key insight is to combine the commonsense knowledge of LLMs and domain-specific knowledge of apps through automated dynamic analysis. The main components include a functionality-aware UI representation method that bridges the UI with the LLM, exploration-based memory injection techniques that augment the app-specific domain knowledge of LLM, and a multi-granularity query optimization module that reduces the cost of model inference. We integrate AutoDroid with off-the-shelf LLMs including online GPT-4/GPT-3.5 and on-device Vicuna, and evaluate its performance on a new benchmark for memory-augmented Android task automation with 158 common tasks. The results demonstrated that AutoDroid is able to precisely generate actions with an accuracy of 90.9%, and complete tasks with a success rate of 71.3%, outperforming the GPT-4-powered baselines by 36.4% and 39.7%. The demo, benchmark suites, and source code of AutoDroid will be released at url{https://autodroid-sys.github.io/}.

Related papers

AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs [54.58905728115257]
We propose the textbfAutoGUI pipeline for automatically annotating UI elements with detailed functionality descriptions at scale.<n>We construct a high-quality AutoGUI-704k dataset using the proposed pipeline, featuring diverse and detailed functionality annotations.<n>Our dataset remarkably enhances VLM's UI grounding capabilities and exhibits significant scaling effects.
arXiv Detail & Related papers (2025-02-04T03:39:59Z)
AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation [27.984521240600493]
Large language models (LLMs) have brought exciting new advances to mobile UI agents. One way to reduce the required model size is to customize a smaller domain-specific model. We propose to convert the UI task automation problem to a code generation problem.
arXiv Detail & Related papers (2024-12-24T02:54:56Z)
AutoGLM: Autonomous Foundation Agents for GUIs [51.276965515952]
We present AutoGLM, a new series in the ChatGLM family, designed to serve as foundation agents for autonomous control of digital devices through Graphical User Interfaces (GUIs) We have developed AutoGLM as a practical foundation agent system for real-world GUI interactions. Our evaluations demonstrate AutoGLM's effectiveness across multiple domains.
arXiv Detail & Related papers (2024-10-28T17:05:10Z)
ClickAgent: Enhancing UI Location Capabilities of Autonomous Agents [0.0]
ClickAgent is a novel framework for building autonomous agents. In ClickAgent, the MLLM handles reasoning and action planning, while a separate UI location model identifies the relevant UI elements on the screen. Our evaluation was conducted on both an Android smartphone emulator and an actual Android smartphone, using the task success rate as the key metric for measuring agent performance.
arXiv Detail & Related papers (2024-10-09T14:49:02Z)
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning [74.58666091522198]
We present a framework for intuitive robot programming by non-experts. We leverage natural language prompts and contextual information from the Robot Operating System (ROS) Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface.
arXiv Detail & Related papers (2024-06-28T08:28:38Z)
AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA. It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z)
Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z)
TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation. Specifically, task decomposition, tool selection, and parameter prediction are assessed. Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z)
Autonomous Large Language Model Agents Enabling Intent-Driven Mobile GUI Testing [17.24045904273874]
We propose DroidAgent, an autonomous GUI testing agent for Android. It is based on Large Language Models and support mechanisms such as long- and short-term memory. DroidAgent achieved 61% activity coverage, compared to 51% for current state-of-the-art GUI testing techniques.
arXiv Detail & Related papers (2023-11-15T01:59:40Z)
Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions [23.460051600514806]
GPTDroid is a Q&A-based GUI testing framework for mobile apps. We introduce a functionality-aware memory prompting mechanism. It outperforms the best baseline by 32% in activity coverage, and detects 31% more bugs at a faster rate.
arXiv Detail & Related papers (2023-10-24T12:30:26Z)
Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing [23.460051600514806]
We propose GPTDroid, asking Large Language Model to chat with the mobile apps by passing the GUI page information to LLM to elicit testing scripts. Within it, we extract the static context of the GUI page and the dynamic context of the iterative testing process. We evaluate GPTDroid on 86 apps from Google Play, and its activity coverage is 71%, with 32% higher than the best baseline, and can detect 36% more bugs with faster speed than the best baseline.
arXiv Detail & Related papers (2023-05-16T13:46:52Z)
DroidBot-GPT: GPT-powered UI Automation for Android [11.980924738484994]
DroidBot-GPT is a tool that utilizes GPT-like large language models (LLMs) to automate the interactions with Android mobile applications. Given a natural language description of a desired task, DroidBot-GPT can automatically generate and execute actions that navigate the app to complete the task.
arXiv Detail & Related papers (2023-04-14T11:31:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.