Automated Test Transfer Across Android Apps Using Large Language Models
- URL: http://arxiv.org/abs/2411.17933v1
- Date: Tue, 26 Nov 2024 23:06:09 GMT
- Title: Automated Test Transfer Across Android Apps Using Large Language Models
- Authors: Benyamin Beyzaei, Saghar Talebipour, Ghazal Rafiei, Nenad Medvidovic, Sam Malek,
- Abstract summary: This paper introduces an innovative technique, LLMigrate, which leverages Large Language Models (LLMs) to efficiently transfer usage-based UI tests across mobile apps.
Our experimental evaluation shows LLMigrate can achieve a 97.5% success rate in automated test transfer, reducing the manual effort required to write tests from scratch by 91.1%.
- Score: 7.865081492588628
- License:
- Abstract: The pervasiveness of mobile apps in everyday life necessitates robust testing strategies to ensure quality and efficiency, especially through end-to-end usage-based tests for mobile apps' user interfaces (UIs). However, manually creating and maintaining such tests can be costly for developers. Since many apps share similar functionalities beneath diverse UIs, previous works have shown the possibility of transferring UI tests across different apps within the same domain, thereby eliminating the need for writing the tests manually. However, these methods have struggled to accommodate real-world variations, often facing limitations in scenarios where source and target apps are not very similar or fail to accurately transfer test oracles. This paper introduces an innovative technique, LLMigrate, which leverages Large Language Models (LLMs) to efficiently transfer usage-based UI tests across mobile apps. Our experimental evaluation shows LLMigrate can achieve a 97.5% success rate in automated test transfer, reducing the manual effort required to write tests from scratch by 91.1%. This represents an improvement of 9.1% in success rate and 38.2% in effort reduction compared to the best-performing prior technique, setting a new benchmark for automated test transfer.
Related papers
- FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning [74.25049012472502]
FLaRe is a large-scale Reinforcement Learning framework that integrates robust pre-trained representations, large-scale training, and gradient stabilization techniques.
Our method aligns pre-trained policies towards task completion, achieving state-of-the-art (SoTA) performance on previously demonstrated and on entirely novel tasks and embodiments.
arXiv Detail & Related papers (2024-09-25T03:15:17Z) - Skill-Adpative Imitation Learning for UI Test Reuse [13.538724823517292]
We propose a skill-adaptive imitation learning framework designed to enhance the effectiveness of UI test migration.
Results show that SAIL substantially improves the effectiveness of UI test migration, with 149% higher success rate than state-of-the-art approaches.
arXiv Detail & Related papers (2024-09-20T08:13:04Z) - Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat [8.80569452545511]
We introduce CAT to create cost-effective UI automation tests for industry apps by combining machine learning and Large Language Models.
CAT then employs machine learning techniques, with LLMs serving as a complementary, to map the target element on the UI screen.
Our evaluations on the WeChat testing dataset demonstrate the CAT's performance and cost-effectiveness, achieving 90% UI automation with $0.34 cost.
arXiv Detail & Related papers (2024-09-12T08:25:33Z) - Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge [93.4434417387526]
We propose Open Vocabulary Mobile Manipulation as a key benchmark task for robotics.
We organized a NeurIPS 2023 competition featuring both simulation and real-world components to evaluate solutions to this task.
We detail the results and methodologies used, both in simulation and real-world settings.
arXiv Detail & Related papers (2024-07-09T15:15:01Z) - Active Test-Time Adaptation: Theoretical Analyses and An Algorithm [51.84691955495693]
Test-time adaptation (TTA) addresses distribution shifts for streaming test data in unsupervised settings.
We propose the novel problem setting of active test-time adaptation (ATTA) that integrates active learning within the fully TTA setting.
arXiv Detail & Related papers (2024-04-07T22:31:34Z) - Align Your Prompts: Test-Time Prompting with Distribution Alignment for
Zero-Shot Generalization [64.62570402941387]
We use a single test sample to adapt multi-modal prompts at test time by minimizing the feature distribution shift to bridge the gap in the test domain.
Our method improves zero-shot top- 1 accuracy beyond existing prompt-learning techniques, with a 3.08% improvement over the baseline MaPLe.
arXiv Detail & Related papers (2023-11-02T17:59:32Z) - Testing Updated Apps by Adapting Learned Models [2.362412515574206]
Continuous Adaptation of Learned Models (CALM) is an automated App testing approach that efficiently test App updates.
Since functional correctness can be mainly verified through the visual inspection of App screens, CALM minimizes the number of App screens to be visualized by software testers.
Our empirical evaluation shows that CALM exercises a significantly higher proportion of updated methods and instructions than six state-of-the-art approaches.
arXiv Detail & Related papers (2023-08-10T12:59:24Z) - Neural Embeddings for Web Testing [49.66745368789056]
Existing crawlers rely on app-specific, threshold-based, algorithms to assess state equivalence.
We propose WEBEMBED, a novel abstraction function based on neural network embeddings and threshold-free classifiers.
Our evaluation on nine web apps shows that WEBEMBED outperforms state-of-the-art techniques by detecting near-duplicates more accurately.
arXiv Detail & Related papers (2023-06-12T19:59:36Z) - Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI
Testing [23.460051600514806]
We propose GPTDroid, asking Large Language Model to chat with the mobile apps by passing the GUI page information to LLM to elicit testing scripts.
Within it, we extract the static context of the GUI page and the dynamic context of the iterative testing process.
We evaluate GPTDroid on 86 apps from Google Play, and its activity coverage is 71%, with 32% higher than the best baseline, and can detect 36% more bugs with faster speed than the best baseline.
arXiv Detail & Related papers (2023-05-16T13:46:52Z) - Hybrid Intelligent Testing in Simulation-Based Verification [0.0]
Several millions of tests may be required to achieve coverage goals.
Coverage-Directed Test Selection learns from coverage feedback to bias testing towards the most effective tests.
Novelty-Driven Verification learns to identify and simulate stimuli that differ from previous stimuli.
arXiv Detail & Related papers (2022-05-19T13:22:08Z) - Emerging App Issue Identification via Online Joint Sentiment-Topic
Tracing [66.57888248681303]
We propose a novel emerging issue detection approach named MERIT.
Based on the AOBST model, we infer the topics negatively reflected in user reviews for one app version.
Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT.
arXiv Detail & Related papers (2020-08-23T06:34:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.