LLM-Guided Scenario-based GUI Testing
- URL: http://arxiv.org/abs/2506.05079v1
- Date: Thu, 05 Jun 2025 14:27:40 GMT
- Title: LLM-Guided Scenario-based GUI Testing
- Authors: Shengcheng Yu, Yuchen Ling, Chunrong Fang, Quan Zhou, Chunyang Chen, Shaomin Zhu, Zhenyu Chen,
- Abstract summary: ScenGen is a novel LLM-guided scenario-based GUI testing approach involving five agents.<n>The Observer perceives the app GUI state by extracting GUI widgets and forming GUI layouts.<n>The Executor then executes the demanding operations on the apps.
- Score: 25.180945629233786
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The assurance of mobile app GUI is more and more significant. Automated GUI testing approaches of different strategies have been developed, while there are still huge gaps between the approaches and the app business logic, not taking the completion of specific testing scenarios as the exploration target, leading to the exploration missing of critical app functionalities. Learning from the manual testing, which takes testing scenarios with app business logic as the basic granularity, in this paper, we utilize the LLMs to understand the semantics presented in app GUI and how they are mapped in the testing context based on specific testing scenarios. Then, scenario-based GUI tests are generated with the guidance of multi-agent collaboration. Specifically, we propose ScenGen, a novel LLM-guided scenario-based GUI testing approach involving five agents to respectively take responsibilities of different phases of the manual testing process. The Observer perceives the app GUI state by extracting GUI widgets and forming GUI layouts, understanding the expressed semantics. Then the app GUI info is sent to the Decider to make decisions on target widgets based on the target testing scenarios. The decision-making process takes the completion of specific testing scenarios as the exploration target. The Executor then executes the demanding operations on the apps. The execution results are checked by the Supervisor on whether the generated tests are consistent with the completion target of the testing scenarios, ensuring the traceability of the test generation and execution. Furthermore, the corresponding GUI test operations are recorded to the context memory by Recorder as an important basis for further decision-making, meanwhile monitoring the runtime bug occurrences. ScenGen is evaluated and the results show that ScenGen can effectively generate scenario-based GUI tests guided by LLMs.
Related papers
- GTA1: GUI Test-time Scaling Agent [77.60727242084971]
This paper investigates the two main challenges with our GUI Test-time Scaling Agent, GTA1.<n>First, to select the most appropriate action proposal, we introduce a test-time scaling method.<n>Second, we propose a model that achieves improved accuracy when grounding the selected action proposal to its corresponding visual elements.
arXiv Detail & Related papers (2025-07-08T08:52:18Z) - MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment [63.62778707277929]
MobileGUI-RL is a scalable framework that trains GUI agent in online environment.<n>It synthesizes a curriculum of learnable tasks through self-exploration and filtering.<n>It adapts GRPO to GUI navigation with trajectory-aware advantages and composite rewards.
arXiv Detail & Related papers (2025-07-08T07:07:53Z) - Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation [83.92224427735859]
We introduce a pre-operative critic mechanism that provides effective feedback prior to the actual execution.<n>We develop a reasoning-bootstrapping based data collection pipeline to create a GUI-Critic-Train and a GUI-Critic-Test.<n>Our model offers significant advantages in critic accuracy compared to current MLLMs.
arXiv Detail & Related papers (2025-06-05T04:12:36Z) - ViMoTest: A Tool to Specify ViewModel-Based GUI Test Scenarios using Projectional Editing [0.8010120037374623]
We introduce the ViMoTest tool to test presentation logic independently of GUI frameworks.<n>We demonstrate the tool with a small JavaFX-based task manager example and generate executable code.
arXiv Detail & Related papers (2025-04-23T14:26:35Z) - ReuseDroid: A VLM-empowered Android UI Test Migrator Boosted by Active Feedback [11.624163693084446]
We propose REUSEDROID, a novel multiagent framework for GUI test migration empowered by Large Vision-Language Models (VLMs)<n>An insight of REUSEDROID is to migrate tests based only on the core logic shared across similar apps, while their entire operational logic could differ.<n>We evaluate REUSEDROID on LinPro, a new test migration dataset that consists of migration tasks for 39 popular apps across 4 categories.
arXiv Detail & Related papers (2025-04-03T07:45:09Z) - GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration [56.58744345634623]
We propose GUI-Bee, an MLLM-based autonomous agent, to collect high-quality, environment-specific data through exploration.<n>We also introduce NovelScreenSpot, a benchmark for testing how well the data can help align GUI action grounding models to novel environments.
arXiv Detail & Related papers (2025-01-23T18:16:21Z) - GUI Testing Arena: A Unified Benchmark for Advancing Autonomous GUI Testing Agent [24.97846085313314]
We propose a formalized and comprehensive environment to evaluate the entire process of automated GUI Testing.<n>We divide the testing process into three key subtasks: test intention generation, test task execution, and GUI defect detection.<n>It evaluates the performance of different models using three data types: real mobile applications, mobile applications with artificially injected defects, and synthetic data.
arXiv Detail & Related papers (2024-12-24T13:41:47Z) - Zero-Shot Prompting Approaches for LLM-based Graphical User Interface Generation [53.1000575179389]
We propose a Retrieval-Augmented GUI Generation (RAGG) approach, integrated with an LLM-based GUI retrieval re-ranking and filtering mechanism.<n>In addition, we adapt Prompt Decomposition (PDGG) and Self-Critique (SCGG) for GUI generation.<n>Our evaluation, which encompasses over 3,000 GUI annotations from over 100 crowd-workers with UI/UX experience, shows that SCGG, in contrast to PDGG and RAGG, can lead to more effective GUI generation.
arXiv Detail & Related papers (2024-12-15T22:17:30Z) - GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding [73.9254861755974]
This paper introduces a new dataset, termed GUI-World, which features meticulously crafted Human-MLLM annotations.<n>We evaluate the capabilities of current state-of-the-art MLLMs, including Image LLMs and Video LLMs, in understanding various types of GUI content.
arXiv Detail & Related papers (2024-06-16T06:56:53Z) - VideoGUI: A Benchmark for GUI Automation from Instructional Videos [78.97292966276706]
VideoGUI is a novel multi-modal benchmark designed to evaluate GUI assistants on visual-centric GUI tasks.
Sourced from high-quality web instructional videos, our benchmark focuses on tasks involving professional and novel software.
Our evaluation reveals that even the SoTA large multimodal model GPT4o performs poorly on visual-centric GUI tasks.
arXiv Detail & Related papers (2024-06-14T17:59:08Z) - Practical, Automated Scenario-based Mobile App Testing [13.52057950260007]
Test scripts developed by human testers consider business logic by focusing on testing scenarios.
Due to the GUI-intensive feature of mobile apps, human testers always understand app GUI to organize test scripts for scenarios.
ScenTest tries to start automated testing by imitating human practices and integrating domain knowledge into scenario-based mobile app testing.
arXiv Detail & Related papers (2024-06-12T15:48:39Z) - Test Script Intention Generation for Mobile Application via GUI Image and Code Understanding [12.973016336177047]
Test scripts play a more important role in mobile app testing than test cases for source code.<n>TestIntention is a novel approach to infer the intention of GUI test scripts.<n>Results of all operations are combined to generate test intention for test scripts.
arXiv Detail & Related papers (2021-07-12T02:08:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.