Testing Updated Apps by Adapting Learned Models
- URL: http://arxiv.org/abs/2308.05549v2
- Date: Wed, 17 Apr 2024 09:21:27 GMT
- Title: Testing Updated Apps by Adapting Learned Models
- Authors: Chanh-Duc Ngo, Fabrizio Pastore, Lionel Briand,
- Abstract summary: Continuous Adaptation of Learned Models (CALM) is an automated App testing approach that efficiently test App updates.
Since functional correctness can be mainly verified through the visual inspection of App screens, CALM minimizes the number of App screens to be visualized by software testers.
Our empirical evaluation shows that CALM exercises a significantly higher proportion of updated methods and instructions than six state-of-the-art approaches.
- Score: 2.362412515574206
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Although App updates are frequent and software engineers would like to verify updated features only, automated testing techniques verify entire Apps and are thus wasting resources. We present Continuous Adaptation of Learned Models (CALM), an automated App testing approach that efficiently test App updates by adapting App models learned when automatically testing previous App versions. CALM focuses on functional testing. Since functional correctness can be mainly verified through the visual inspection of App screens, CALM minimizes the number of App screens to be visualized by software testers while maximizing the percentage of updated methods and instructions exercised. Our empirical evaluation shows that CALM exercises a significantly higher proportion of updated methods and instructions than six state-of-the-art approaches, for the same maximum number of App screens to be visually inspected. Further, in common update scenarios, where only a small fraction of methods are updated, CALM is even quicker to outperform all competing approaches in a more significant way.
Related papers
- Diffusion-Enhanced Test-time Adaptation with Text and Image Augmentation [67.37146712877794]
IT3A is a novel test-time adaptation method that utilizes a pre-trained generative model for multi-modal augmentation of each test sample from unknown new domains.
By combining augmented data from pre-trained vision and language models, we enhance the ability of the model to adapt to unknown new test data.
In a zero-shot setting, IT3A outperforms state-of-the-art test-time prompt tuning methods with a 5.50% increase in accuracy.
arXiv Detail & Related papers (2024-12-12T20:01:24Z) - Automated Test Transfer Across Android Apps Using Large Language Models [7.865081492588628]
This paper introduces an innovative technique, LLMigrate, which leverages Large Language Models (LLMs) to efficiently transfer usage-based UI tests across mobile apps.
Our experimental evaluation shows LLMigrate can achieve a 97.5% success rate in automated test transfer, reducing the manual effort required to write tests from scratch by 91.1%.
arXiv Detail & Related papers (2024-11-26T23:06:09Z) - Historical Test-time Prompt Tuning for Vision Foundation Models [99.96912440427192]
HisTPT is a Historical Test-time Prompt Tuning technique that memorizes the useful knowledge of the learnt test samples.
HisTPT achieves superior prompt tuning performance consistently while handling different visual recognition tasks.
arXiv Detail & Related papers (2024-10-27T06:03:15Z) - Adaptive Retention & Correction: Test-Time Training for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning? [13.803180972839213]
We introduce a robust MeanShift for Test-time Augmentation (MTA)
MTA surpasses prompt-based methods without requiring this intensive training procedure.
We extensively benchmark our method on 15 datasets and demonstrate MTA's superiority and computational efficiency.
arXiv Detail & Related papers (2024-05-03T17:34:02Z) - What, How, and When Should Object Detectors Update in Continually
Changing Test Domains? [34.13756022890991]
Test-time adaptation algorithms have been proposed to adapt the model online while inferring test data.
We propose a novel online adaption approach for object detection in continually changing test domains.
Our approach surpasses baselines on widely used benchmarks, achieving improvements of up to 4.9%p and 7.9%p in mAP.
arXiv Detail & Related papers (2023-12-12T07:13:08Z) - Point-TTA: Test-Time Adaptation for Point Cloud Registration Using
Multitask Meta-Auxiliary Learning [17.980649681325406]
We present Point-TTA, a novel test-time adaptation framework for point cloud registration (PCR)
Our model can adapt to unseen distributions at test-time without requiring any prior knowledge of the test data.
During training, our model is trained using a meta-auxiliary learning approach, such that the adapted model via auxiliary tasks improves the accuracy of the primary task.
arXiv Detail & Related papers (2023-08-31T06:32:11Z) - Neural Embeddings for Web Testing [49.66745368789056]
Existing crawlers rely on app-specific, threshold-based, algorithms to assess state equivalence.
We propose WEBEMBED, a novel abstraction function based on neural network embeddings and threshold-free classifiers.
Our evaluation on nine web apps shows that WEBEMBED outperforms state-of-the-art techniques by detecting near-duplicates more accurately.
arXiv Detail & Related papers (2023-06-12T19:59:36Z) - Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample.
TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average.
In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z) - Evaluating Pre-Trained Models for User Feedback Analysis in Software
Engineering: A Study on Classification of App-Reviews [2.66512000865131]
We study the accuracy and time efficiency of pre-trained neural language models (PTMs) for app review classification.
We set up different studies to evaluate PTMs in multiple settings.
In all cases, Micro and Macro Precision, Recall, and F1-scores will be used.
arXiv Detail & Related papers (2021-04-12T23:23:45Z) - Emerging App Issue Identification via Online Joint Sentiment-Topic
Tracing [66.57888248681303]
We propose a novel emerging issue detection approach named MERIT.
Based on the AOBST model, we infer the topics negatively reflected in user reviews for one app version.
Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT.
arXiv Detail & Related papers (2020-08-23T06:34:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.