ReuseDroid: A VLM-empowered Android UI Test Migrator Boosted by Active Feedback
- URL: http://arxiv.org/abs/2504.02357v1
- Date: Thu, 03 Apr 2025 07:45:09 GMT
- Title: ReuseDroid: A VLM-empowered Android UI Test Migrator Boosted by Active Feedback
- Authors: Xiaolei Li, Jialun Cao, Yepang Liu, Shing-Chi Cheung, Hailong Wang,
- Abstract summary: We propose REUSEDROID, a novel multiagent framework for GUI test migration empowered by Large Vision-Language Models (VLMs)<n>An insight of REUSEDROID is to migrate tests based only on the core logic shared across similar apps, while their entire operational logic could differ.<n>We evaluate REUSEDROID on LinPro, a new test migration dataset that consists of migration tasks for 39 popular apps across 4 categories.
- Score: 11.624163693084446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: GUI testing is an essential quality assurance process in mobile app development. However, the creation and maintenance of GUI tests for mobile apps are resource-intensive and costly. Recognizing that many apps share similar functionalities, researchers have proposed various techniques to migrate GUI tests from one app to another with similar features. For example, some techniques employ mapping-based approaches to align the GUI elements traversed by the tests of a source app to those present in the target app. Other test migration techniques have also been proposed to leverage large language models (LLMs) by adapting the GUI tasks in source tests. However, these techniques are ineffective in dealing with different operational logic between the source and target apps. The semantics of GUI elements may not be correctly inferred due to the missing analysis of these flows. In this work, we propose REUSEDROID, a novel multiagent framework for GUI test migration empowered by Large Vision-Language Models (VLMs). REUSEDROID is powered by multiple VLM-based agents, each tackling a stage of the test migration process by leveraging the relevant visual and textual information embedded in GUI pages. An insight of REUSEDROID is to migrate tests based only on the core logic shared across similar apps, while their entire operational logic could differ. We evaluate REUSEDROID on LinPro, a new test migration dataset that consists of 578 migration tasks for 39 popular apps across 4 categories. The experimental result shows that REUSEDROID can successfully migrate 90.3% of the migration tasks, outperforming the best mapping-based and LLM-based baselines by 318.1% and 109.1%, respectively.
Related papers
- Reinforcement Learning for Long-Horizon Interactive LLM Agents [56.9860859585028]
Interactive digital agents (IDAs) leverage APIs of stateful digital environments to perform tasks in response to user requests.<n>We present a reinforcement learning (RL) approach that trains IDAs directly in their target environments.<n>We derive LOOP, a data- and memory-efficient variant of proximal policy optimization.
arXiv Detail & Related papers (2025-02-03T18:35:42Z) - UI-TARS: Pioneering Automated GUI Interaction with Native Agents [58.18100825673032]
This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions.
In the OSWorld benchmark, UI-TARS achieves scores of 24.6 with 50 steps and 22.7 with 15 steps, outperforming Claude (22.0 and 14.9 respectively)
arXiv Detail & Related papers (2025-01-21T17:48:10Z) - Automated Test Transfer Across Android Apps Using Large Language Models [7.865081492588628]
This paper introduces an innovative technique, LLMigrate, which leverages Large Language Models (LLMs) to efficiently transfer usage-based UI tests across mobile apps.<n>Our experimental evaluation shows LLMigrate can achieve a 97.5% success rate in automated test transfer, reducing the manual effort required to write tests from scratch by 91.1%.
arXiv Detail & Related papers (2024-11-26T23:06:09Z) - Skill-Adpative Imitation Learning for UI Test Reuse [13.538724823517292]
We propose a skill-adaptive imitation learning framework designed to enhance the effectiveness of UI test migration.
Results show that SAIL substantially improves the effectiveness of UI test migration, with 149% higher success rate than state-of-the-art approaches.
arXiv Detail & Related papers (2024-09-20T08:13:04Z) - LLM-based Abstraction and Concretization for GUI Test Migration [26.503512328876198]
GUI test migration aims to produce test cases with events and assertions to test specific functionalities of a target app.
We propose a new migration paradigm (i.e., abstraction-concretization paradigm) that first abstracts the test logic for the target functionality.
We introduce MACdroid, the first approach that migrates GUI test cases based on this paradigm.
arXiv Detail & Related papers (2024-09-08T08:46:05Z) - AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents [50.39555842254652]
We introduce the Android Multi-annotation EXpo (AMEX) to advance research on AI agents in mobile scenarios.
AMEX comprises over 104K high-resolution screenshots from 110 popular mobile applications, which are annotated at multiple levels.
AMEX includes three levels of annotations: GUI interactive element grounding, GUI screen and element functionality descriptions, and complex natural language instructions.
arXiv Detail & Related papers (2024-07-03T17:59:58Z) - GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding [73.9254861755974]
This paper introduces a new dataset, termed GUI-World, which features meticulously crafted Human-MLLM annotations.<n>We evaluate the capabilities of current state-of-the-art MLLMs, including Image LLMs and Video LLMs, in understanding various types of GUI content.
arXiv Detail & Related papers (2024-06-16T06:56:53Z) - Get my drift? Catching LLM Task Drift with Activation Deltas [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.<n>We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.<n>We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z) - Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL)
GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval.
Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z) - Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI
Testing via Functionality-aware Decisions [23.460051600514806]
GPTDroid is a Q&A-based GUI testing framework for mobile apps.
We introduce a functionality-aware memory prompting mechanism.
It outperforms the best baseline by 32% in activity coverage, and detects 31% more bugs at a faster rate.
arXiv Detail & Related papers (2023-10-24T12:30:26Z) - Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI
Testing [23.460051600514806]
We propose GPTDroid, asking Large Language Model to chat with the mobile apps by passing the GUI page information to LLM to elicit testing scripts.
Within it, we extract the static context of the GUI page and the dynamic context of the iterative testing process.
We evaluate GPTDroid on 86 apps from Google Play, and its activity coverage is 71%, with 32% higher than the best baseline, and can detect 36% more bugs with faster speed than the best baseline.
arXiv Detail & Related papers (2023-05-16T13:46:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.