Related papers: Vision-Based Mobile App GUI Testing: A Survey

Vision-Based Mobile App GUI Testing: A Survey

URL: http://arxiv.org/abs/2310.13518v1
Date: Fri, 20 Oct 2023 14:04:04 GMT
Title: Vision-Based Mobile App GUI Testing: A Survey
Authors: Shengcheng Yu, Chunrong Fang, Ziyuan Tuo, Quanjun Zhang, Chunyang Chen, Zhenyu Chen, Zhendong Su
Abstract summary: Vision-based mobile app GUI testing approaches emerged with the development of computer vision technologies. We provide a comprehensive investigation of the state-of-the-art techniques on 226 papers, among which 78 are vision-based studies.
Score: 30.49909140195575
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Graphical User Interface (GUI) has become one of the most significant parts of mobile applications (apps). It is a direct bridge between mobile apps and end users, which directly affects the end user's experience. Neglecting GUI quality can undermine the value and effectiveness of the entire mobile app solution. Significant research efforts have been devoted to GUI testing, one effective method to ensure mobile app quality. By conducting rigorous GUI testing, developers can ensure that the visual and interactive elements of the mobile apps not only meet functional requirements but also provide a seamless and user-friendly experience. However, traditional solutions, relying on the source code or layout files, have met challenges in both effectiveness and efficiency due to the gap between what is obtained and what app GUI actually presents. Vision-based mobile app GUI testing approaches emerged with the development of computer vision technologies and have achieved promising progress. In this survey paper, we provide a comprehensive investigation of the state-of-the-art techniques on 226 papers, among which 78 are vision-based studies. This survey covers different topics of GUI testing, like GUI test generation, GUI test record & replay, GUI testing framework, etc. Specifically, the research emphasis of this survey is placed mostly on how vision-based techniques outperform traditional solutions and have gradually taken a vital place in the GUI testing field. Based on the investigation of existing studies, we outline the challenges and opportunities of (vision-based) mobile app GUI testing and propose promising research directions with the combination of emerging techniques.

Related papers

Screencast-Based Analysis of User-Perceived GUI Responsiveness [53.53923672866705]
tool is a technique that measures GUI responsiveness directly from mobile screencasts.<n>It uses computer vision to detect user interactions and analyzes frame-level visual changes to compute two key metrics.<n>tool has been deployed in an industrial testing pipeline and analyzes thousands of screencasts daily.
arXiv Detail & Related papers (2025-08-02T12:13:50Z)
MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment [63.62778707277929]
MobileGUI-RL is a scalable framework that trains GUI agent in online environment.<n>It synthesizes a curriculum of learnable tasks through self-exploration and filtering.<n>It adapts GRPO to GUI navigation with trajectory-aware advantages and composite rewards.
arXiv Detail & Related papers (2025-07-08T07:07:53Z)
GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration [56.58744345634623]
We propose GUI-Bee, an MLLM-based autonomous agent, to collect high-quality, environment-specific data through exploration. We also introduce NovelScreenSpot, a benchmark for testing how well the data can help align GUI action grounding models to novel environments.
arXiv Detail & Related papers (2025-01-23T18:16:21Z)
GUI Testing Arena: A Unified Benchmark for Advancing Autonomous GUI Testing Agent [24.97846085313314]
We propose a formalized and comprehensive environment to evaluate the entire process of automated GUI Testing. We divide the testing process into three key subtasks: test intention generation, test task execution, and GUI defect detection. It evaluates the performance of different models using three data types: real mobile applications, mobile applications with artificially injected defects, and synthetic data.
arXiv Detail & Related papers (2024-12-24T13:41:47Z)
GUI Agents: A Survey [129.94551809688377]
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. Motivated by the growing interest and fundamental importance of GUI agents, we provide a comprehensive survey that categorizes their benchmarks, evaluation metrics, architectures, and training methods.
arXiv Detail & Related papers (2024-12-18T04:48:28Z)
Zero-Shot Prompting Approaches for LLM-based Graphical User Interface Generation [53.1000575179389]
We propose a Retrieval-Augmented GUI Generation (RAGG) approach, integrated with an LLM-based GUI retrieval re-ranking and filtering mechanism. In addition, we adapt Prompt Decomposition (PDGG) and Self-Critique (SCGG) for GUI generation. Our evaluation, which encompasses over 3,000 GUI annotations from over 100 crowd-workers with UI/UX experience, shows that SCGG, in contrast to PDGG and RAGG, can lead to more effective GUI generation.
arXiv Detail & Related papers (2024-12-15T22:17:30Z)
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction [69.57190742976091]
We introduce Aguvis, a unified vision-based framework for autonomous GUI agents. Our approach leverages image-based observations, and grounding instructions in natural language to visual elements. To address the limitations of previous work, we integrate explicit planning and reasoning within the model.
arXiv Detail & Related papers (2024-12-05T18:58:26Z)
ShowUI: One Vision-Language-Action Model for GUI Visual Agent [80.50062396585004]
Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity. We develop a vision-language-action model in digital world, namely ShowUI, which features the following innovations. ShowUI, a lightweight 2B model using 256K data, achieves a strong 75.1% accuracy in zero-shot screenshot grounding.
arXiv Detail & Related papers (2024-11-26T14:29:47Z)
GUICourse: From General Vision Language Models to Versatile GUI Agents [75.5150601913659]
We contribute GUICourse, a suite of datasets to train visual-based GUI agents. First, we introduce the GUIEnv dataset to strengthen the OCR and grounding capabilities of VLMs. Then, we introduce the GUIAct and GUIChat datasets to enrich their knowledge of GUI components and interactions.
arXiv Detail & Related papers (2024-06-17T08:30:55Z)
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents [73.9254861755974]
This paper introduces a new dataset, called GUI-World, which features meticulously crafted Human-MLLM annotations. We evaluate the capabilities of current state-of-the-art MLLMs, including ImageLLMs and VideoLLMs, in understanding various types of GUI content.
arXiv Detail & Related papers (2024-06-16T06:56:53Z)
Interlinking User Stories and GUI Prototyping: A Semi-Automatic LLM-based Approach [55.762798168494726]
We present a novel Large Language Model (LLM)-based approach for validating the implementation of functional NL-based requirements in a graphical user interface (GUI) prototype. Our approach aims to detect functional user stories that are not implemented in a GUI prototype and provides recommendations for suitable GUI components directly implementing the requirements.
arXiv Detail & Related papers (2024-06-12T11:59:26Z)
Tell Me What's Next: Textual Foresight for Generic UI Representations [65.10591722192609]
We propose Textual Foresight, a novel pretraining objective for learning UI screen representations. Textual Foresight generates global text descriptions of future UI states given a current UI and local action taken. We train with our newly constructed mobile app dataset, OpenApp, which results in the first public dataset for app UI representation learning.
arXiv Detail & Related papers (2024-06-12T02:43:19Z)
GUing: A Mobile GUI Search Engine using a Vision-Language Model [6.024602799136753]
This paper proposes GUing, a GUI search engine based on a vision-language model called GUIClip. We first collected from Google Play app introduction images which display the most representative screenshots. Then, we developed an automated pipeline to classify, crop, and extract the captions from these images. We used this dataset to train a novel vision-language model, which is, to the best of our knowledge, the first of its kind for GUI retrieval.
arXiv Detail & Related papers (2024-04-30T18:42:18Z)
Gamified GUI testing with Selenium in the IntelliJ IDE: A Prototype Plugin [0.559239450391449]
This paper presents GIPGUT: a prototype of a gamification plugin for IntelliJ IDEA. The plugin enhances testers' engagement with typically monotonous and tedious tasks through achievements, rewards, and profile customization. The results indicate high usability and positive reception of the gamification elements.
arXiv Detail & Related papers (2024-03-14T20:11:11Z)
Effective, Platform-Independent GUI Testing via Image Embedding and Reinforcement Learning [15.458315113767686]
We propose PIRLTest, an effective platform-independent approach for app testing. It utilizes computer vision and reinforcement learning techniques in a novel, synergistic manner for automated testing. PILTest explores apps with the guidance of a curiosity-driven strategy, which uses a Q-network to estimate the values of specific state-action pairs.
arXiv Detail & Related papers (2022-08-19T01:51:16Z)
Towards Informed Design and Validation Assistance in Computer Games Using Imitation Learning [65.12226891589592]
This paper proposes a new approach to automated game validation and testing. Our method leverages a data-driven imitation learning technique, which requires little effort and time and no knowledge of machine learning or programming.
arXiv Detail & Related papers (2022-08-15T11:08:44Z)
Object Detection for Graphical User Interface: Old Fashioned or Deep Learning or a Combination? [21.91118062303175]
We conduct the first large-scale empirical study of seven representative GUI element detection methods on over 50k GUI images. This study sheds the light on the technical challenges to be addressed and informs the design of new GUI element detection methods. Our evaluation on 25,000 GUI images shows that our method significantly advances the start-of-the-art performance in GUI element detection.
arXiv Detail & Related papers (2020-08-12T06:36:33Z)
Applied Awareness: Test-Driven GUI Development using Computer Vision and Cryptography [0.0]
Test-driven development is impractical: it generally requires an initial implementation of the GUI to generate golden images or to construct interactive test scenarios. We demonstrate a novel and immediately applicable approach of interpreting GUI presentation in terms of backend communications. This focus on backend communication circumvents deficiencies in typical testing methodologies that rely on platform-dependent UI affordances or accessibility features.
arXiv Detail & Related papers (2020-06-05T22:46:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.