NiCro: Purely Vision-based, Non-intrusive Cross-Device and
Cross-Platform GUI Testing
- URL: http://arxiv.org/abs/2305.14611v1
- Date: Wed, 24 May 2023 01:19:05 GMT
- Title: NiCro: Purely Vision-based, Non-intrusive Cross-Device and
Cross-Platform GUI Testing
- Authors: Mulong Xie, Jiaming Ye, Zhenchang Xing, Lei Ma
- Abstract summary: We propose a non-intrusive cross-device and cross-platform system NiCro.
NiCro uses the state-of-the-art GUI widget detector to detect widgets from GUI images and then analyses a set of comprehensive information to match the widgets across diverse devices.
At the system level, NiCro can interact with a virtual device farm and a robotic arm system to perform cross-device, cross-platform testing non-intrusively.
- Score: 19.462053492572142
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To ensure app compatibility and smoothness of user experience across diverse
devices and platforms, developers have to perform cross-device, cross-platform
testing of their apps, which is laborious. There comes a recently increasing
trend of using a record and replay approach to facilitate the testing process.
However, the graphic user interface (GUI) of an app running on different
devices and platforms differs dramatically. This complicates the record and
replay process as the presence, appearance and layout of the GUI widgets in the
recording phase and replaying phase can be inconsistent. Existing techniques
resort to instrumenting into the underlying system to obtain the app metadata
for widget identification and matching between various devices. But such
intrusive practices are limited by the accessibility and accuracy of the
metadata on different platforms. On the other hand, several recent works
attempt to derive the GUI information by analyzing the GUI image. Nevertheless,
their performance is curbed by the applied preliminary visual approaches and
the failure to consider the divergence of the same GUI displayed on different
devices. To address the challenge, we propose a non-intrusive cross-device and
cross-platform system NiCro. NiCro utilizes the state-of-the-art GUI widget
detector to detect widgets from GUI images and then analyses a set of
comprehensive information to match the widgets across diverse devices. At the
system level, NiCro can interact with a virtual device farm and a robotic arm
system to perform cross-device, cross-platform testing non-intrusively. We
first evaluated NiCro by comparing its multi-modal widget and GUI matching
approach with 4 commonly used matching techniques. Then, we further examined
its overall performance on 8 various devices, using it to record and replay 107
test cases of 28 popular apps and the home page to show its effectiveness.
Related papers
- ShowUI: One Vision-Language-Action Model for GUI Visual Agent [80.50062396585004]
Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity.
We develop a vision-language-action model in digital world, namely ShowUI, which features the following innovations.
ShowUI, a lightweight 2B model using 256K data, achieves a strong 75.1% accuracy in zero-shot screenshot grounding.
arXiv Detail & Related papers (2024-11-26T14:29:47Z) - AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents [50.39555842254652]
We introduce the Android Multi-annotation EXpo (AMEX) to advance research on AI agents in mobile scenarios.
AMEX comprises over 104K high-resolution screenshots from 110 popular mobile applications, which are annotated at multiple levels.
AMEX includes three levels of annotations: GUI interactive element grounding, GUI screen and element functionality descriptions, and complex natural language instructions.
arXiv Detail & Related papers (2024-07-03T17:59:58Z) - Vision-driven Automated Mobile GUI Testing via Multimodal Large Language Model [27.97964877860671]
This paper proposes a vision-driven automated GUI testing approach to detect non-crash functional bugs with Multimodal Large Language Models.
It begins by extracting GUI text information and aligning it with screenshots to form a vision prompt, enabling MLLM to understand GUI context.
VisionDroid identifies 29 new bugs on Google Play, of which 19 have been confirmed and fixed.
arXiv Detail & Related papers (2024-07-03T11:58:09Z) - GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents [73.9254861755974]
This paper introduces a new dataset, called GUI-World, which features meticulously crafted Human-MLLM annotations.
We evaluate the capabilities of current state-of-the-art MLLMs, including ImageLLMs and VideoLLMs, in understanding various types of GUI content.
arXiv Detail & Related papers (2024-06-16T06:56:53Z) - VideoGUI: A Benchmark for GUI Automation from Instructional Videos [78.97292966276706]
VideoGUI is a novel multi-modal benchmark designed to evaluate GUI assistants on visual-centric GUI tasks.
Sourced from high-quality web instructional videos, our benchmark focuses on tasks involving professional and novel software.
Our evaluation reveals that even the SoTA large multimodal model GPT4o performs poorly on visual-centric GUI tasks.
arXiv Detail & Related papers (2024-06-14T17:59:08Z) - GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices [61.48043339441149]
GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos.
We developed OdysseyAgent, a multimodal cross-app navigation agent by fine-tuning the Qwen-VL model with a history resampling module.
arXiv Detail & Related papers (2024-06-12T17:44:26Z) - Practical Non-Intrusive GUI Exploration Testing with Visual-based
Robotic Arms [14.3266199543725]
We propose a practical non-intrusive GUI testing framework with visual robotic arms.
RoboTest integrates novel GUI screen and widget detection algorithms, adaptive to detecting screens of different sizes.
We evaluate RoboTest with 20 mobile apps, with a case study on an embedded system.
arXiv Detail & Related papers (2023-12-17T09:05:39Z) - Vision-Based Mobile App GUI Testing: A Survey [29.042723121518765]
Vision-based mobile app GUI testing approaches emerged with the development of computer vision technologies.
We provide a comprehensive investigation of the state-of-the-art techniques on 271 papers, among which 92 are vision-based studies.
arXiv Detail & Related papers (2023-10-20T14:04:04Z) - Scene-Driven Exploration and GUI Modeling for Android Apps [13.647261033241364]
The extracted transition graphs for apps such as ATG, WTG, and STG have a low transition coverage and coarse-grained granularity.
We propose SceneDroid, a scene-driven exploration approach to extracting the GUI scenes dynamically.
Compared with the existing GUI modeling tools, SceneDroid has improved by 168.74% in the coverage of transition pairs and 162.42% in scene extraction.
arXiv Detail & Related papers (2023-08-20T10:54:25Z) - Effective, Platform-Independent GUI Testing via Image Embedding and Reinforcement Learning [15.458315113767686]
We propose PIRLTest, an effective platform-independent approach for app testing.
It utilizes computer vision and reinforcement learning techniques in a novel, synergistic manner for automated testing.
PILTest explores apps with the guidance of a curiosity-driven strategy, which uses a Q-network to estimate the values of specific state-action pairs.
arXiv Detail & Related papers (2022-08-19T01:51:16Z) - SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild [62.450907796261646]
Recognition of hand gestures can be performed directly from the stream of hand skeletons estimated by software.
Despite the recent advancements in gesture and action recognition from skeletons, it is unclear how well the current state-of-the-art techniques can perform in a real-world scenario.
This paper presents the results of the SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild contest.
arXiv Detail & Related papers (2021-06-21T10:57:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.