Related papers: Skill-Adpative Imitation Learning for UI Test Reuse

Skill-Adpative Imitation Learning for UI Test Reuse

URL: http://arxiv.org/abs/2409.13311v1
Date: Fri, 20 Sep 2024 08:13:04 GMT
Title: Skill-Adpative Imitation Learning for UI Test Reuse
Authors: Mengzhou Wu, Hao Wang, Jun Ren, Yuan Cao, Yuetong Li, Alex Jiang, Dezhi Ran, Yitao Hu, Wei Yang, Tao Xie,
Abstract summary: We propose a skill-adaptive imitation learning framework designed to enhance the effectiveness of UI test migration. Results show that SAIL substantially improves the effectiveness of UI test migration, with 149% higher success rate than state-of-the-art approaches.
Score: 13.538724823517292
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To alleviate the substantial cost of manually crafting user interface (UI) test cases, UI test migration aims to automatically generate test cases for a target mobile application (app) by adapting those from a source app that shares similar functionalities. Traditionally, this process has been approached as a sequential UI-event-mapping problem, where events in the source app are mapped to those in the target one based on their textual descriptions. Prior research has extensively focused on enhancing the event-mapping accuracy of NLP models. Although the advent of large language models (LLMs) with impressive NLP capabilities suggests the potential for near-perfect event-mapping, our study demonstrates that even the highly accurate event-mapping of LLMs is insufficient to address the implementation discrepancies between the source and the target apps, reducing the overall effectiveness of LLM-driven solutions for UI test migration. To address this challenge, in this paper, we propose SAIL, a skill-adaptive imitation learning framework designed to enhance the effectiveness of UI test migration through two key designs. First, SAIL leverages the source test cases as demonstrations and employs a multi-level abstraction of test cases' underlying skills, so as to extract the testing information from source test cases as the knowledge base for the subsequent test generation on the target app. Second, SAIL selectively reuses a subset of the learned skills to guide the generation of test cases for the target app with its novel context- and history-aware skill adaptation. While SAIL can be instantiated with any imitation learning techniques, we utilize the in-context learning capabilities of LLMs to instantiate SAIL. Evaluations results show that SAIL substantially improves the effectiveness of UI test migration, with 149\% higher success rate than state-of-the-art approaches.

Related papers

LELANTE: LEveraging LLM for Automated ANdroid TEsting [6.112769800569302]
Existing testing approaches require developers to manually write scripts using tools such as Appium and Espresso to execute the corresponding test case. We introduce LELANTE, a novel framework that utilizes large language models (LLMs) to automate test case execution without requiring pre-written scripts. In experiments across 390 test cases spanning 10 popular Android applications, LELANTE achieved a 73% test execution success rate.
arXiv Detail & Related papers (2025-04-29T16:13:49Z)
Challenges in Testing Large Language Model Based Software: A Faceted Taxonomy [14.041979999979166]
Large Language Models (LLMs) and Multi-Agent LLMs (MALLMs) introduce non-determinism unlike traditional or machine learning software. This paper presents a taxonomy for LLM test case design, informed by both the research literature, our experience, and open-source tools that represent the state of practice.
arXiv Detail & Related papers (2025-03-01T13:15:56Z)
Redefining Crowdsourced Test Report Prioritization: An Innovative Approach with Large Language Model [13.980850130657208]
This paper introduces LLMPrior, a novel approach for prioritizing crowdsourced test reports using large language models (LLMs) The findings indicate that LLMPrior not only surpasses current state-of-the-art approaches in terms of performance but also proves to be more feasible, efficient, and reliable.
arXiv Detail & Related papers (2024-11-26T02:23:30Z)
BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping [64.8477128397529]
We propose a training-required and training-free test-time adaptation framework. We maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples. We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets.
arXiv Detail & Related papers (2024-10-20T15:58:43Z)
Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat [8.80569452545511]
We introduce CAT to create cost-effective UI automation tests for industry apps by combining machine learning and Large Language Models. CAT then employs machine learning techniques, with LLMs serving as a complementary, to map the target element on the UI screen. Our evaluations on the WeChat testing dataset demonstrate the CAT's performance and cost-effectiveness, achieving 90% UI automation with $0.34 cost.
arXiv Detail & Related papers (2024-09-12T08:25:33Z)
MILE: A Mutation Testing Framework of In-Context Learning Systems [5.419884861365132]
We propose a mutation testing framework designed to characterize the quality and effectiveness of test data for ICL systems. First, we propose several mutation operators specialized for ICL demonstrations, as well as corresponding mutation scores for ICL test sets. With comprehensive experiments, we showcase the effectiveness of our framework in evaluating the reliability and quality of ICL test suites.
arXiv Detail & Related papers (2024-09-07T13:51:42Z)
RETAIN: Interactive Tool for Regression Testing Guided LLM Migration [8.378294455013284]
RETAIN (REgression Testing guided LLM migrAtIoN) is a tool designed explicitly for regression testing in LLM Migrations. Our automatic evaluation and empirical user studies demonstrate that RETAIN, when compared to manual evaluation, enabled participants to identify twice as many errors, facilitated experimentation with 75% more prompts, and achieves 12% higher metric scores in a given time frame.
arXiv Detail & Related papers (2024-09-05T22:22:57Z)
Active Testing of Large Language Model via Multi-Stage Sampling [17.89896012553348]
AcTracer is an active testing framework tailored for large language models (LLMs) It strategically selects a small subset of test data to achieve a nearly optimal performance estimation. Our experiment results demonstrate that AcTracer achieves state-of-the-art performance compared to existing methods.
arXiv Detail & Related papers (2024-08-07T06:17:48Z)
Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence [60.37934652213881]
Domain Adaptation (DA) facilitates knowledge transfer from a source domain to a related target domain. This paper investigates a practical DA paradigm, namely Source data-Free Active Domain Adaptation (SFADA), where source data becomes inaccessible during adaptation. We present learn from the learnt (LFTL), a novel paradigm for SFADA to leverage the learnt knowledge from the source pretrained model and actively iterated models without extra overhead.
arXiv Detail & Related papers (2024-07-26T17:51:58Z)
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods [97.2995389188179]
This paper seeks to refine the evaluation of machine unlearning for large language models. It addresses two key challenges -- the robustness of evaluation metrics and the trade-offs between competing goals.
arXiv Detail & Related papers (2024-06-13T14:41:00Z)
Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios. We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples. Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z)
Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications. The quality of these exemplars in the prompt greatly impacts performance. Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z)
OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs. Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z)
MC-BERT: Efficient Language Pre-Training via a Meta Controller [96.68140474547602]
Large-scale pre-training is computationally expensive. ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator. We propose a novel meta-learning framework, MC-BERT, to achieve better efficiency and effectiveness.
arXiv Detail & Related papers (2020-06-10T09:22:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.