Related papers: NiCro: Purely Vision-based, Non-intrusive Cross-Device and Cross-Platform GUI Testing

NiCro: Purely Vision-based, Non-intrusive Cross-Device and Cross-Platform GUI Testing

URL: http://arxiv.org/abs/2305.14611v1
Date: Wed, 24 May 2023 01:19:05 GMT
Title: NiCro: Purely Vision-based, Non-intrusive Cross-Device and Cross-Platform GUI Testing
Authors: Mulong Xie, Jiaming Ye, Zhenchang Xing, Lei Ma
Abstract summary: We propose a non-intrusive cross-device and cross-platform system NiCro. NiCro uses the state-of-the-art GUI widget detector to detect widgets from GUI images and then analyses a set of comprehensive information to match the widgets across diverse devices. At the system level, NiCro can interact with a virtual device farm and a robotic arm system to perform cross-device, cross-platform testing non-intrusively.
Score: 19.462053492572142
License: http://creativecommons.org/licenses/by/4.0/
Abstract: To ensure app compatibility and smoothness of user experience across diverse devices and platforms, developers have to perform cross-device, cross-platform testing of their apps, which is laborious. There comes a recently increasing trend of using a record and replay approach to facilitate the testing process. However, the graphic user interface (GUI) of an app running on different devices and platforms differs dramatically. This complicates the record and replay process as the presence, appearance and layout of the GUI widgets in the recording phase and replaying phase can be inconsistent. Existing techniques resort to instrumenting into the underlying system to obtain the app metadata for widget identification and matching between various devices. But such intrusive practices are limited by the accessibility and accuracy of the metadata on different platforms. On the other hand, several recent works attempt to derive the GUI information by analyzing the GUI image. Nevertheless, their performance is curbed by the applied preliminary visual approaches and the failure to consider the divergence of the same GUI displayed on different devices. To address the challenge, we propose a non-intrusive cross-device and cross-platform system NiCro. NiCro utilizes the state-of-the-art GUI widget detector to detect widgets from GUI images and then analyses a set of comprehensive information to match the widgets across diverse devices. At the system level, NiCro can interact with a virtual device farm and a robotic arm system to perform cross-device, cross-platform testing non-intrusively. We first evaluated NiCro by comparing its multi-modal widget and GUI matching approach with 4 commonly used matching techniques. Then, we further examined its overall performance on 8 various devices, using it to record and replay 107 test cases of 28 popular apps and the home page to show its effectiveness.

Related papers

Screencast-Based Analysis of User-Perceived GUI Responsiveness [53.53923672866705]
tool is a technique that measures GUI responsiveness directly from mobile screencasts.<n>It uses computer vision to detect user interactions and analyzes frame-level visual changes to compute two key metrics.<n>tool has been deployed in an industrial testing pipeline and analyzes thousands of screencasts daily.
arXiv Detail & Related papers (2025-08-02T12:13:50Z)
MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment [63.62778707277929]
MobileGUI-RL is a scalable framework that trains GUI agent in online environment.<n>It synthesizes a curriculum of learnable tasks through self-exploration and filtering.<n>It adapts GRPO to GUI navigation with trajectory-aware advantages and composite rewards.
arXiv Detail & Related papers (2025-07-08T07:07:53Z)
Explorer: Robust Collection of Interactable GUI Elements [0.0]
We show how a given user needs confidence, that the relevant UI elements are being detected correctly throughout one app or digital environment. Proposed Explorer system focuses on detecting on-screen buttons and text-entry fields, where the training process has access to a live version of the application. Explorer also enables the recording of interactive user sessions, and subsequent mapping of how such sessions overlap and sometimes loop back to similar states.
arXiv Detail & Related papers (2025-04-12T22:02:29Z)
GUIWatcher: Automatically Detecting GUI Lags by Analyzing Mobile Application Screencasts [9.997570370503617]
The Graphical User Interface (GUI) plays a central role in mobile applications, directly affecting usability and user satisfaction. Poor GUI performance, such as lag or unresponsiveness, can lead to negative user experience and decreased mobile application (app) ratings. We present GUIWatcher, a framework designed to detect GUI lags by analyzing screencasts recorded during mobile app testing.
arXiv Detail & Related papers (2025-02-06T16:43:51Z)
GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration [56.58744345634623]
We propose GUI-Bee, an MLLM-based autonomous agent, to collect high-quality, environment-specific data through exploration. We also introduce NovelScreenSpot, a benchmark for testing how well the data can help align GUI action grounding models to novel environments.
arXiv Detail & Related papers (2025-01-23T18:16:21Z)
UI-TARS: Pioneering Automated GUI Interaction with Native Agents [58.18100825673032]
This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions. In the OSWorld benchmark, UI-TARS achieves scores of 24.6 with 50 steps and 22.7 with 15 steps, outperforming Claude (22.0 and 14.9 respectively)
arXiv Detail & Related papers (2025-01-21T17:48:10Z)
Zero-Shot Prompting Approaches for LLM-based Graphical User Interface Generation [53.1000575179389]
We propose a Retrieval-Augmented GUI Generation (RAGG) approach, integrated with an LLM-based GUI retrieval re-ranking and filtering mechanism. In addition, we adapt Prompt Decomposition (PDGG) and Self-Critique (SCGG) for GUI generation. Our evaluation, which encompasses over 3,000 GUI annotations from over 100 crowd-workers with UI/UX experience, shows that SCGG, in contrast to PDGG and RAGG, can lead to more effective GUI generation.
arXiv Detail & Related papers (2024-12-15T22:17:30Z)
Falcon-UI: Understanding GUI Before Following User Instructions [57.67308498231232]
We introduce an instruction-free GUI navigation dataset, termed Insight-UI dataset, to enhance model comprehension of GUI environments. Insight-UI dataset is automatically generated from the Common Crawl corpus, simulating various platforms. We develop the GUI agent model Falcon-UI, which is initially pretrained on Insight-UI dataset and subsequently fine-tuned on Android and Web GUI datasets.
arXiv Detail & Related papers (2024-12-12T15:29:36Z)
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction [69.57190742976091]
We introduce Aguvis, a unified vision-based framework for autonomous GUI agents. Our approach leverages image-based observations, and grounding instructions in natural language to visual elements. To address the limitations of previous work, we integrate explicit planning and reasoning within the model.
arXiv Detail & Related papers (2024-12-05T18:58:26Z)
ShowUI: One Vision-Language-Action Model for GUI Visual Agent [80.50062396585004]
Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity. We develop a vision-language-action model in digital world, namely ShowUI, which features the following innovations. ShowUI, a lightweight 2B model using 256K data, achieves a strong 75.1% accuracy in zero-shot screenshot grounding.
arXiv Detail & Related papers (2024-11-26T14:29:47Z)
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents [50.39555842254652]
We introduce the Android Multi-annotation EXpo (AMEX) to advance research on AI agents in mobile scenarios. AMEX comprises over 104K high-resolution screenshots from 110 popular mobile applications, which are annotated at multiple levels. AMEX includes three levels of annotations: GUI interactive element grounding, GUI screen and element functionality descriptions, and complex natural language instructions.
arXiv Detail & Related papers (2024-07-03T17:59:58Z)
Vision-driven Automated Mobile GUI Testing via Multimodal Large Language Model [27.97964877860671]
This paper proposes a vision-driven automated GUI testing approach to detect non-crash functional bugs with Multimodal Large Language Models. It begins by extracting GUI text information and aligning it with screenshots to form a vision prompt, enabling MLLM to understand GUI context. VisionDroid identifies 29 new bugs on Google Play, of which 19 have been confirmed and fixed.
arXiv Detail & Related papers (2024-07-03T11:58:09Z)
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents [73.9254861755974]
This paper introduces a new dataset, called GUI-World, which features meticulously crafted Human-MLLM annotations. We evaluate the capabilities of current state-of-the-art MLLMs, including ImageLLMs and VideoLLMs, in understanding various types of GUI content.
arXiv Detail & Related papers (2024-06-16T06:56:53Z)
VideoGUI: A Benchmark for GUI Automation from Instructional Videos [78.97292966276706]
VideoGUI is a novel multi-modal benchmark designed to evaluate GUI assistants on visual-centric GUI tasks. Sourced from high-quality web instructional videos, our benchmark focuses on tasks involving professional and novel software. Our evaluation reveals that even the SoTA large multimodal model GPT4o performs poorly on visual-centric GUI tasks.
arXiv Detail & Related papers (2024-06-14T17:59:08Z)
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices [61.48043339441149]
GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos. We developed OdysseyAgent, a multimodal cross-app navigation agent by fine-tuning the Qwen-VL model with a history resampling module.
arXiv Detail & Related papers (2024-06-12T17:44:26Z)
Practical Non-Intrusive GUI Exploration Testing with Visual-based Robotic Arms [14.3266199543725]
We propose a practical non-intrusive GUI testing framework with visual robotic arms. RoboTest integrates novel GUI screen and widget detection algorithms, adaptive to detecting screens of different sizes. We evaluate RoboTest with 20 mobile apps, with a case study on an embedded system.
arXiv Detail & Related papers (2023-12-17T09:05:39Z)
Vision-Based Mobile App GUI Testing: A Survey [29.042723121518765]
Vision-based mobile app GUI testing approaches emerged with the development of computer vision technologies. We provide a comprehensive investigation of the state-of-the-art techniques on 271 papers, among which 92 are vision-based studies.
arXiv Detail & Related papers (2023-10-20T14:04:04Z)
Scene-Driven Exploration and GUI Modeling for Android Apps [13.647261033241364]
The extracted transition graphs for apps such as ATG, WTG, and STG have a low transition coverage and coarse-grained granularity. We propose SceneDroid, a scene-driven exploration approach to extracting the GUI scenes dynamically. Compared with the existing GUI modeling tools, SceneDroid has improved by 168.74% in the coverage of transition pairs and 162.42% in scene extraction.
arXiv Detail & Related papers (2023-08-20T10:54:25Z)
Effective, Platform-Independent GUI Testing via Image Embedding and Reinforcement Learning [15.458315113767686]
We propose PIRLTest, an effective platform-independent approach for app testing. It utilizes computer vision and reinforcement learning techniques in a novel, synergistic manner for automated testing. PILTest explores apps with the guidance of a curiosity-driven strategy, which uses a Q-network to estimate the values of specific state-action pairs.
arXiv Detail & Related papers (2022-08-19T01:51:16Z)
SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild [62.450907796261646]
Recognition of hand gestures can be performed directly from the stream of hand skeletons estimated by software. Despite the recent advancements in gesture and action recognition from skeletons, it is unclear how well the current state-of-the-art techniques can perform in a real-world scenario. This paper presents the results of the SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild contest.
arXiv Detail & Related papers (2021-06-21T10:57:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.