LLM-Guided Scenario-based GUI Testing
- URL: http://arxiv.org/abs/2506.05079v3
- Date: Fri, 31 Oct 2025 07:58:52 GMT
- Title: LLM-Guided Scenario-based GUI Testing
- Authors: Shengcheng Yu, Yuchen Ling, Chunrong Fang, Quan Zhou, Yi Zhao, Chunyang Chen, Shaomin Zhu, Zhenyu Chen,
- Abstract summary: This paper introduces an approach that leverages large language models to comprehend GUI semantics and contextual relevance to given scenarios.<n>We propose ScenGen, an LLM-guided scenario-based GUI testing framework employing multi-agent collaboration to simulate and automate manual testing phases.
- Score: 22.70111721644705
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The assurance of mobile app GUIs has become increasingly important, as the GUI serves as the primary medium of interaction between users and apps. Although numerous automated GUI testing approaches have been developed with diverse strategies, a substantial gap remains between these approaches and the underlying app business logic. Most existing approaches focus on general exploration rather than the completion of specific testing scenarios, often missing critical functionalities. Inspired by manual testing, which treats business logic-driven scenarios as the fundamental unit of testing, this paper introduces an approach that leverages large language models to comprehend GUI semantics and contextual relevance to given scenarios. Building on this capability, we propose ScenGen, an LLM-guided scenario-based GUI testing framework employing multi-agent collaboration to simulate and automate manual testing phases. Specifically, ScenGen integrates five agents: the Observer, Decider, Executor, Supervisor, and Recorder. The Observer perceives the app GUI state by extracting and structuring GUI widgets and layouts, interpreting semantic information. This is passed to the Decider, which makes scenario-driven decisions with LLM guidance to identify target widgets and determine actions toward fulfilling specific goals. The Executor performs these operations, while the Supervisor verifies alignment with intended scenario completion, ensuring traceability and consistency. Finally, the Recorder logs GUI operations into context memory as a knowledge base for subsequent decision-making and monitors runtime bugs. Comprehensive evaluations demonstrate that ScenGen effectively generates scenario-based GUI tests guided by LLM collaboration, achieving higher relevance to business logic and improving the completeness of automated GUI testing.
Related papers
- ProBench: Benchmarking GUI Agents with Accurate Process Information [15.519853892615272]
We introduce ProBench, a comprehensive benchmark with over 200 challenging GUI tasks covering widely-used scenarios.<n>We extend our dataset to include Process-related Task and design a specialized evaluation method.<n>Our evaluation of advanced GUI agents reveals significant limitations for real-world GUI scenarios.
arXiv Detail & Related papers (2025-11-12T09:49:31Z) - AUTO-Explorer: Automated Data Collection for GUI Agent [58.58097564914626]
We propose an automated data collection method with minimal annotation costs, named Auto-Explorer.<n>It incorporates a simple yet effective exploration mechanism that autonomously parses and explores GUI environments.<n>Using the data gathered, we fine-tune a multimodal large language model (MLLM) and establish a GUI element grounding testing set.
arXiv Detail & Related papers (2025-11-09T15:13:45Z) - GUISpector: An MLLM Agent Framework for Automated Verification of Natural Language Requirements in GUI Prototypes [58.197090145723735]
We introduce a novel framework that leverages a multi-modal (M)LLM-based agent for the automated verification of NL requirements in GUI prototypes.<n>GuiSpector extracts detailed NL feedback from the agent's verification process, providing developers with actionable insights.<n>We present an integrated tool that unifies these capabilities, offering an interface for supervising verification runs, inspecting agent rationales and managing the end-to-end requirements verification process.
arXiv Detail & Related papers (2025-10-06T13:15:24Z) - GTA1: GUI Test-time Scaling Agent [77.60727242084971]
This paper investigates the two main challenges with our GUI Test-time Scaling Agent, GTA1.<n>First, to select the most appropriate action proposal, we introduce a test-time scaling method.<n>Second, we propose a model that achieves improved accuracy when grounding the selected action proposal to its corresponding visual elements.
arXiv Detail & Related papers (2025-07-08T08:52:18Z) - MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment [63.62778707277929]
MobileGUI-RL is a scalable framework that trains GUI agent in online environment.<n>It synthesizes a curriculum of learnable tasks through self-exploration and filtering.<n>It adapts GRPO to GUI navigation with trajectory-aware advantages and composite rewards.
arXiv Detail & Related papers (2025-07-08T07:07:53Z) - Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation [83.92224427735859]
We introduce a pre-operative critic mechanism that provides effective feedback prior to the actual execution.<n>We develop a reasoning-bootstrapping based data collection pipeline to create a GUI-Critic-Train and a GUI-Critic-Test.<n>Our model offers significant advantages in critic accuracy compared to current MLLMs.
arXiv Detail & Related papers (2025-06-05T04:12:36Z) - ViMoTest: A Tool to Specify ViewModel-Based GUI Test Scenarios using Projectional Editing [0.8010120037374623]
We introduce the ViMoTest tool to test presentation logic independently of GUI frameworks.<n>We demonstrate the tool with a small JavaFX-based task manager example and generate executable code.
arXiv Detail & Related papers (2025-04-23T14:26:35Z) - ReuseDroid: A VLM-empowered Android UI Test Migrator Boosted by Active Feedback [11.624163693084446]
We propose REUSEDROID, a novel multiagent framework for GUI test migration empowered by Large Vision-Language Models (VLMs)<n>An insight of REUSEDROID is to migrate tests based only on the core logic shared across similar apps, while their entire operational logic could differ.<n>We evaluate REUSEDROID on LinPro, a new test migration dataset that consists of migration tasks for 39 popular apps across 4 categories.
arXiv Detail & Related papers (2025-04-03T07:45:09Z) - GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration [56.58744345634623]
We propose GUI-Bee, an MLLM-based autonomous agent, to collect high-quality, environment-specific data through exploration.<n>We also introduce NovelScreenSpot, a benchmark for testing how well the data can help align GUI action grounding models to novel environments.
arXiv Detail & Related papers (2025-01-23T18:16:21Z) - GUI Testing Arena: A Unified Benchmark for Advancing Autonomous GUI Testing Agent [24.97846085313314]
We propose a formalized and comprehensive environment to evaluate the entire process of automated GUI Testing.<n>We divide the testing process into three key subtasks: test intention generation, test task execution, and GUI defect detection.<n>It evaluates the performance of different models using three data types: real mobile applications, mobile applications with artificially injected defects, and synthetic data.
arXiv Detail & Related papers (2024-12-24T13:41:47Z) - Zero-Shot Prompting Approaches for LLM-based Graphical User Interface Generation [53.1000575179389]
We propose a Retrieval-Augmented GUI Generation (RAGG) approach, integrated with an LLM-based GUI retrieval re-ranking and filtering mechanism.<n>In addition, we adapt Prompt Decomposition (PDGG) and Self-Critique (SCGG) for GUI generation.<n>Our evaluation, which encompasses over 3,000 GUI annotations from over 100 crowd-workers with UI/UX experience, shows that SCGG, in contrast to PDGG and RAGG, can lead to more effective GUI generation.
arXiv Detail & Related papers (2024-12-15T22:17:30Z) - Large Language Model-Brained GUI Agents: A Survey [42.82362907348966]
multimodal models have ushered in a new era of GUI automation.<n>They have demonstrated exceptional capabilities in natural language understanding, code generation, and visual processing.<n>These agents represent a paradigm shift, enabling users to perform intricate, multi-step tasks through simple conversational commands.
arXiv Detail & Related papers (2024-11-27T12:13:39Z) - GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding [73.9254861755974]
This paper introduces a new dataset, termed GUI-World, which features meticulously crafted Human-MLLM annotations.<n>We evaluate the capabilities of current state-of-the-art MLLMs, including Image LLMs and Video LLMs, in understanding various types of GUI content.
arXiv Detail & Related papers (2024-06-16T06:56:53Z) - VideoGUI: A Benchmark for GUI Automation from Instructional Videos [78.97292966276706]
VideoGUI is a novel multi-modal benchmark designed to evaluate GUI assistants on visual-centric GUI tasks.
Sourced from high-quality web instructional videos, our benchmark focuses on tasks involving professional and novel software.
Our evaluation reveals that even the SoTA large multimodal model GPT4o performs poorly on visual-centric GUI tasks.
arXiv Detail & Related papers (2024-06-14T17:59:08Z) - Practical, Automated Scenario-based Mobile App Testing [13.52057950260007]
Test scripts developed by human testers consider business logic by focusing on testing scenarios.
Due to the GUI-intensive feature of mobile apps, human testers always understand app GUI to organize test scripts for scenarios.
ScenTest tries to start automated testing by imitating human practices and integrating domain knowledge into scenario-based mobile app testing.
arXiv Detail & Related papers (2024-06-12T15:48:39Z) - ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation [30.693616802332745]
This paper presents a novel benchmark, AssistGUI, to evaluate whether models are capable of manipulating the mouse and keyboard on the Windows platform in response to user-requested tasks.
We propose an advanced Actor-Critic framework, which incorporates a sophisticated GUI driven by an AI agent and adept at handling lengthy procedural tasks.
arXiv Detail & Related papers (2023-12-20T15:28:38Z) - You Only Look at Screens: Multimodal Chain-of-Action Agents [37.118034745972956]
Auto-GUI is a multimodal solution that directly interacts with the interface.
We propose a chain-of-action technique to help the agent decide what action to execute.
We evaluate our approach on a new device-control benchmark AITW with 30$K$ unique instructions.
arXiv Detail & Related papers (2023-09-20T16:12:32Z) - Test Script Intention Generation for Mobile Application via GUI Image and Code Understanding [12.973016336177047]
Test scripts play a more important role in mobile app testing than test cases for source code.<n>TestIntention is a novel approach to infer the intention of GUI test scripts.<n>Results of all operations are combined to generate test intention for test scripts.
arXiv Detail & Related papers (2021-07-12T02:08:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.