Related papers: PopSweeper: Automatically Detecting and Resolving App-Blocking Pop-Ups to Assist Automated Mobile GUI Testing

PopSweeper: Automatically Detecting and Resolving App-Blocking Pop-Ups to Assist Automated Mobile GUI Testing

URL: http://arxiv.org/abs/2412.02933v1
Date: Wed, 04 Dec 2024 01:05:44 GMT
Title: PopSweeper: Automatically Detecting and Resolving App-Blocking Pop-Ups to Assist Automated Mobile GUI Testing
Authors: Linqiang Guo, Wei Liu, Yi Wen Heng, Tse-Hsun, Chen, Yang Wang,
Abstract summary: PopSweeper is a tool designed to detect and resolve app-blocking pop-ups in real-time during automated GUI testing.<n>It combines deep learning-based computer vision techniques for pop-up detection and close button localization.<n>We evaluated PopSweeper on over 72K app screenshots and 87 top-ranked mobile apps collected from app stores.
Score: 46.18718721121415
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Graphical User Interfaces (GUIs) are the primary means by which users interact with mobile applications, making them crucial to both app functionality and user experience. However, a major challenge in automated testing is the frequent appearance of app-blocking pop-ups, such as ads or system alerts, which obscure critical UI elements and disrupt test execution, often requiring manual intervention. These interruptions lead to inaccurate test results, increased testing time, and reduced reliability, particularly for stakeholders conducting large-scale app testing. To address this issue, we introduce PopSweeper, a novel tool designed to detect and resolve app-blocking pop-ups in real-time during automated GUI testing. PopSweeper combines deep learning-based computer vision techniques for pop-up detection and close button localization, allowing it to autonomously identify pop-ups and ensure uninterrupted testing. We evaluated PopSweeper on over 72K app screenshots from the RICO dataset and 87 top-ranked mobile apps collected from app stores, manually identifying 832 app-blocking pop-ups. PopSweeper achieved 91.7% precision and 93.5% recall in pop-up classification and 93.9% BoxAP with 89.2% recall in close button detection. Furthermore, end-to-end evaluations demonstrated that PopSweeper successfully resolved blockages in 87.1% of apps with minimal overhead, achieving classification and close button detection within 60 milliseconds per frame. These results highlight PopSweeper's capability to enhance the accuracy and efficiency of automated GUI testing by mitigating pop-up interruptions.

Related papers

Attacking Vision-Language Computer Agents via Pop-ups [61.744008541021124]
We show that VLM agents can be easily attacked by a set of carefully designed adversarial pop-ups. This distraction leads agents to click these pop-ups instead of performing the tasks as usual.
arXiv Detail & Related papers (2024-11-04T18:56:42Z)
Autonomous Large Language Model Agents Enabling Intent-Driven Mobile GUI Testing [17.24045904273874]
We propose DroidAgent, an autonomous GUI testing agent for Android. It is based on Large Language Models and support mechanisms such as long- and short-term memory. DroidAgent achieved 61% activity coverage, compared to 51% for current state-of-the-art GUI testing techniques.
arXiv Detail & Related papers (2023-11-15T01:59:40Z)
Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions [23.460051600514806]
GPTDroid is a Q&A-based GUI testing framework for mobile apps. We introduce a functionality-aware memory prompting mechanism. It outperforms the best baseline by 32% in activity coverage, and detects 31% more bugs at a faster rate.
arXiv Detail & Related papers (2023-10-24T12:30:26Z)
Testing Updated Apps by Adapting Learned Models [2.362412515574206]
Continuous Adaptation of Learned Models (CALM) is an automated App testing approach that efficiently test App updates. Since functional correctness can be mainly verified through the visual inspection of App screens, CALM minimizes the number of App screens to be visualized by software testers. Our empirical evaluation shows that CALM exercises a significantly higher proportion of updated methods and instructions than six state-of-the-art approaches.
arXiv Detail & Related papers (2023-08-10T12:59:24Z)
Neural Embeddings for Web Testing [49.66745368789056]
Existing crawlers rely on app-specific, threshold-based, algorithms to assess state equivalence. We propose WEBEMBED, a novel abstraction function based on neural network embeddings and threshold-free classifiers. Our evaluation on nine web apps shows that WEBEMBED outperforms state-of-the-art techniques by detecting near-duplicates more accurately.
arXiv Detail & Related papers (2023-06-12T19:59:36Z)
Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing [23.460051600514806]
We propose GPTDroid, asking Large Language Model to chat with the mobile apps by passing the GUI page information to LLM to elicit testing scripts. Within it, we extract the static context of the GUI page and the dynamic context of the iterative testing process. We evaluate GPTDroid on 86 apps from Google Play, and its activity coverage is 71%, with 32% higher than the best baseline, and can detect 36% more bugs with faster speed than the best baseline.
arXiv Detail & Related papers (2023-05-16T13:46:52Z)
Emerging App Issue Identification via Online Joint Sentiment-Topic Tracing [66.57888248681303]
We propose a novel emerging issue detection approach named MERIT. Based on the AOBST model, we infer the topics negatively reflected in user reviews for one app version. Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT.
arXiv Detail & Related papers (2020-08-23T06:34:05Z)
An Empirical Study of In-App Advertising Issues Based on Large Scale App Review Analysis [67.58267006314415]
We present a large-scale analysis on ad-related user feedback from App Store and Google Play. From a statistical analysis of 36,309 ad-related reviews, we find that users care most about the number of unique ads and ad display frequency during usage. Some ad issue types are addressed more quickly by developers than other ad issues.
arXiv Detail & Related papers (2020-08-22T05:38:24Z)
Maat: Automatically Analyzing VirusTotal for Accurate Labeling and Effective Malware Detection [71.84087757644708]
The malware analysis and detection research community relies on the online platform VirusTotal to label Android apps based on the scan results of around 60 scanners. There are no standards on how to best interpret the scan results acquired from VirusTotal, which leads to the utilization of different threshold-based labeling strategies. We implemented a method, Maat, that tackles these issues of standardization and sustainability by automatically generating a Machine Learning (ML)-based labeling scheme.
arXiv Detail & Related papers (2020-07-01T14:15:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.