Related papers: Testing Updated Apps by Adapting Learned Models

Testing Updated Apps by Adapting Learned Models

URL: http://arxiv.org/abs/2308.05549v2
Date: Wed, 17 Apr 2024 09:21:27 GMT
Title: Testing Updated Apps by Adapting Learned Models
Authors: Chanh-Duc Ngo, Fabrizio Pastore, Lionel Briand,
Abstract summary: Continuous Adaptation of Learned Models (CALM) is an automated App testing approach that efficiently test App updates. Since functional correctness can be mainly verified through the visual inspection of App screens, CALM minimizes the number of App screens to be visualized by software testers. Our empirical evaluation shows that CALM exercises a significantly higher proportion of updated methods and instructions than six state-of-the-art approaches.
Score: 2.362412515574206
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Although App updates are frequent and software engineers would like to verify updated features only, automated testing techniques verify entire Apps and are thus wasting resources. We present Continuous Adaptation of Learned Models (CALM), an automated App testing approach that efficiently test App updates by adapting App models learned when automatically testing previous App versions. CALM focuses on functional testing. Since functional correctness can be mainly verified through the visual inspection of App screens, CALM minimizes the number of App screens to be visualized by software testers while maximizing the percentage of updated methods and instructions exercised. Our empirical evaluation shows that CALM exercises a significantly higher proportion of updated methods and instructions than six state-of-the-art approaches, for the same maximum number of App screens to be visually inspected. Further, in common update scenarios, where only a small fraction of methods are updated, CALM is even quicker to outperform all competing approaches in a more significant way.

Related papers

TAPS : Frustratingly Simple Test Time Active Learning for VLMs [0.0]
Test-Time Optimization enables models to adapt to new data during inference by updating parameters on-the-fly.<n>We propose a novel Test-Time Active Learning framework that adaptively queries uncertain samples and updates prompts dynamically.<n>Our framework provides a practical and effective solution for real-world deployment in safety-critical applications such as autonomous systems and medical diagnostics.
arXiv Detail & Related papers (2025-07-26T18:04:49Z)
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models [55.2480439325792]
We introduce QAlign, a new test-time alignment approach. As we scale test-time compute, QAlign converges to sampling from the optimal aligned distribution for each individual prompt. By adopting recent advances in Markov chain Monte Carlo for text generation, our method enables better-aligned outputs without modifying the underlying model or even requiring logit access.
arXiv Detail & Related papers (2025-04-04T00:41:40Z)
Diffusion-Enhanced Test-time Adaptation with Text and Image Augmentation [67.37146712877794]
IT3A is a novel test-time adaptation method that utilizes a pre-trained generative model for multi-modal augmentation of each test sample from unknown new domains. By combining augmented data from pre-trained vision and language models, we enhance the ability of the model to adapt to unknown new test data. In a zero-shot setting, IT3A outperforms state-of-the-art test-time prompt tuning methods with a 5.50% increase in accuracy.
arXiv Detail & Related papers (2024-12-12T20:01:24Z)
Automated Test Transfer Across Android Apps Using Large Language Models [7.865081492588628]
This paper introduces an innovative technique, LLMigrate, which leverages Large Language Models (LLMs) to efficiently transfer usage-based UI tests across mobile apps. Our experimental evaluation shows LLMigrate can achieve a 97.5% success rate in automated test transfer, reducing the manual effort required to write tests from scratch by 91.1%.
arXiv Detail & Related papers (2024-11-26T23:06:09Z)
Historical Test-time Prompt Tuning for Vision Foundation Models [99.96912440427192]
HisTPT is a Historical Test-time Prompt Tuning technique that memorizes the useful knowledge of the learnt test samples. HisTPT achieves superior prompt tuning performance consistently while handling different visual recognition tasks.
arXiv Detail & Related papers (2024-10-27T06:03:15Z)
IT$^3$: Idempotent Test-Time Training [95.78053599609044]
Deep learning models often struggle when deployed in real-world settings due to distribution shifts between training and test data.<n>We present Idempotent Test-Time Training (IT$3$), a novel approach that enables on-the-fly adaptation to distribution shifts using only the current test instance.<n>Our results suggest that idempotence provides a universal principle for test-time adaptation that generalizes across domains and architectures.
arXiv Detail & Related papers (2024-10-05T15:39:51Z)
Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat [8.80569452545511]
We introduce CAT to create cost-effective UI automation tests for industry apps by combining machine learning and Large Language Models. CAT then employs machine learning techniques, with LLMs serving as a complementary, to map the target element on the UI screen. Our evaluations on the WeChat testing dataset demonstrate the CAT's performance and cost-effectiveness, achieving 90% UI automation with $0.34 cost.
arXiv Detail & Related papers (2024-09-12T08:25:33Z)
Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning [50.26965628047682]
Adapting pre-trained models to open classes is a challenging problem in machine learning. In this paper, we consider combining the advantages of both and come up with a test-time prompt tuning approach. Our proposed method outperforms all comparison methods on average considering both base and new classes.
arXiv Detail & Related papers (2024-08-29T12:34:01Z)
Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task. We name our approach Adaptive Retention & Correction (ARC) ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z)
On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning? [13.803180972839213]
We introduce a robust MeanShift for Test-time Augmentation (MTA) MTA surpasses prompt-based methods without requiring this intensive training procedure. We extensively benchmark our method on 15 datasets and demonstrate MTA's superiority and computational efficiency.
arXiv Detail & Related papers (2024-05-03T17:34:02Z)
What, How, and When Should Object Detectors Update in Continually Changing Test Domains? [34.13756022890991]
Test-time adaptation algorithms have been proposed to adapt the model online while inferring test data. We propose a novel online adaption approach for object detection in continually changing test domains. Our approach surpasses baselines on widely used benchmarks, achieving improvements of up to 4.9%p and 7.9%p in mAP.
arXiv Detail & Related papers (2023-12-12T07:13:08Z)
Point-TTA: Test-Time Adaptation for Point Cloud Registration Using Multitask Meta-Auxiliary Learning [17.980649681325406]
We present Point-TTA, a novel test-time adaptation framework for point cloud registration (PCR) Our model can adapt to unseen distributions at test-time without requiring any prior knowledge of the test data. During training, our model is trained using a meta-auxiliary learning approach, such that the adapted model via auxiliary tasks improves the accuracy of the primary task.
arXiv Detail & Related papers (2023-08-31T06:32:11Z)
Neural Embeddings for Web Testing [49.66745368789056]
Existing crawlers rely on app-specific, threshold-based, algorithms to assess state equivalence. We propose WEBEMBED, a novel abstraction function based on neural network embeddings and threshold-free classifiers. Our evaluation on nine web apps shows that WEBEMBED outperforms state-of-the-art techniques by detecting near-duplicates more accurately.
arXiv Detail & Related papers (2023-06-12T19:59:36Z)
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample. TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average. In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z)
Evaluating Pre-Trained Models for User Feedback Analysis in Software Engineering: A Study on Classification of App-Reviews [2.66512000865131]
We study the accuracy and time efficiency of pre-trained neural language models (PTMs) for app review classification. We set up different studies to evaluate PTMs in multiple settings. In all cases, Micro and Macro Precision, Recall, and F1-scores will be used.
arXiv Detail & Related papers (2021-04-12T23:23:45Z)
Emerging App Issue Identification via Online Joint Sentiment-Topic Tracing [66.57888248681303]
We propose a novel emerging issue detection approach named MERIT. Based on the AOBST model, we infer the topics negatively reflected in user reviews for one app version. Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT.
arXiv Detail & Related papers (2020-08-23T06:34:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.