Related papers: Scene-Driven Exploration and GUI Modeling for Android Apps

Scene-Driven Exploration and GUI Modeling for Android Apps

URL: http://arxiv.org/abs/2308.10228v1
Date: Sun, 20 Aug 2023 10:54:25 GMT
Title: Scene-Driven Exploration and GUI Modeling for Android Apps
Authors: Xiangyu Zhang, Lingling Fan, Sen Chen, Yucheng Su, Boyuan Li
Abstract summary: The extracted transition graphs for apps such as ATG, WTG, and STG have a low transition coverage and coarse-grained granularity. We propose SceneDroid, a scene-driven exploration approach to extracting the GUI scenes dynamically. Compared with the existing GUI modeling tools, SceneDroid has improved by 168.74% in the coverage of transition pairs and 162.42% in scene extraction.
Score: 13.647261033241364
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Due to the competitive environment, mobile apps are usually produced under pressure with lots of complicated functionality and UI pages. Therefore, it is challenging for various roles to design, understand, test, and maintain these apps. The extracted transition graphs for apps such as ATG, WTG, and STG have a low transition coverage and coarse-grained granularity, which limits the existing methods of graphical user interface (GUI) modeling by UI exploration. To solve these problems, in this paper, we propose SceneDroid, a scene-driven exploration approach to extracting the GUI scenes dynamically by integrating a series of novel techniques including smart exploration, state fuzzing, and indirect launching strategies. We present the GUI scenes as a scene transition graph (SceneTG) to model the GUI of apps with high transition coverage and fine? grained granularity. Compared with the existing GUI modeling tools, SceneDroid has improved by 168.74% in the coverage of transition pairs and 162.42% in scene extraction. Apart from the effectiveness evaluation of SceneDroid, we also illustrate the future potential of SceneDroid as a fundamental capability to support app development, reverse engineering, and GUI regression testing.

Related papers

ViMo: A Generative Visual GUI World Model for App Agent [60.27668506731929]
ViMo is a visual world model designed to generate future App observations as images. We propose a novel data representation, the Symbolic Text Representation, to overlay text content with symbolic placeholders. With this design, ViMo employs a STR Predictor to predict future GUIs' graphics and a GUI-text Predictor for generating the corresponding text.
arXiv Detail & Related papers (2025-04-15T14:03:10Z)
GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration [56.58744345634623]
We propose GUI-Bee, an MLLM-based autonomous agent, to collect high-quality, environment-specific data through exploration. We also introduce NovelScreenSpot, a benchmark for testing how well the data can help align GUI action grounding models to novel environments.
arXiv Detail & Related papers (2025-01-23T18:16:21Z)
UI-TARS: Pioneering Automated GUI Interaction with Native Agents [58.18100825673032]
This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions. In the OSWorld benchmark, UI-TARS achieves scores of 24.6 with 50 steps and 22.7 with 15 steps, outperforming Claude (22.0 and 14.9 respectively)
arXiv Detail & Related papers (2025-01-21T17:48:10Z)
Zero-Shot Prompting Approaches for LLM-based Graphical User Interface Generation [53.1000575179389]
We propose a Retrieval-Augmented GUI Generation (RAGG) approach, integrated with an LLM-based GUI retrieval re-ranking and filtering mechanism. In addition, we adapt Prompt Decomposition (PDGG) and Self-Critique (SCGG) for GUI generation. Our evaluation, which encompasses over 3,000 GUI annotations from over 100 crowd-workers with UI/UX experience, shows that SCGG, in contrast to PDGG and RAGG, can lead to more effective GUI generation.
arXiv Detail & Related papers (2024-12-15T22:17:30Z)
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction [69.57190742976091]
We introduce Aguvis, a unified vision-based framework for autonomous GUI agents. Our approach leverages image-based observations, and grounding instructions in natural language to visual elements. To address the limitations of previous work, we integrate explicit planning and reasoning within the model.
arXiv Detail & Related papers (2024-12-05T18:58:26Z)
ShowUI: One Vision-Language-Action Model for GUI Visual Agent [80.50062396585004]
Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity. We develop a vision-language-action model in digital world, namely ShowUI, which features the following innovations. ShowUI, a lightweight 2B model using 256K data, achieves a strong 75.1% accuracy in zero-shot screenshot grounding.
arXiv Detail & Related papers (2024-11-26T14:29:47Z)
GUICourse: From General Vision Language Models to Versatile GUI Agents [75.5150601913659]
We contribute GUICourse, a suite of datasets to train visual-based GUI agents. First, we introduce the GUIEnv dataset to strengthen the OCR and grounding capabilities of VLMs. Then, we introduce the GUIAct and GUIChat datasets to enrich their knowledge of GUI components and interactions.
arXiv Detail & Related papers (2024-06-17T08:30:55Z)
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents [73.9254861755974]
This paper introduces a new dataset, called GUI-World, which features meticulously crafted Human-MLLM annotations. We evaluate the capabilities of current state-of-the-art MLLMs, including ImageLLMs and VideoLLMs, in understanding various types of GUI content.
arXiv Detail & Related papers (2024-06-16T06:56:53Z)
GUing: A Mobile GUI Search Engine using a Vision-Language Model [6.024602799136753]
This paper proposes GUing, a GUI search engine based on a vision-language model called GUIClip. We first collected from Google Play app introduction images which display the most representative screenshots. Then, we developed an automated pipeline to classify, crop, and extract the captions from these images. We used this dataset to train a novel vision-language model, which is, to the best of our knowledge, the first of its kind for GUI retrieval.
arXiv Detail & Related papers (2024-04-30T18:42:18Z)
3D scene generation from scene graphs and self-attention [51.49886604454926]
We present a variant of the conditional variational autoencoder (cVAE) model to synthesize 3D scenes from scene graphs and floor plans. We exploit the properties of self-attention layers to capture high-level relationships between objects in a scene.
arXiv Detail & Related papers (2024-04-02T12:26:17Z)
SceneX:Procedural Controllable Large-scale Scene Generation via Large-language Models [53.961002112433576]
We introduce a large-scale scene generation framework, SceneX, which can automatically produce high-quality procedural models according to designers' textual descriptions. Our SceneX can generate a city spanning 2.5 km times 2.5 km with delicate geometric layout and structures, drastically reducing the time cost from several weeks for professional PCG engineers to just a few hours for an ordinary user.
arXiv Detail & Related papers (2024-03-23T03:23:29Z)
Vision-Based Mobile App GUI Testing: A Survey [29.042723121518765]
Vision-based mobile app GUI testing approaches emerged with the development of computer vision technologies. We provide a comprehensive investigation of the state-of-the-art techniques on 271 papers, among which 92 are vision-based studies.
arXiv Detail & Related papers (2023-10-20T14:04:04Z)
Boosting GUI Prototyping with Diffusion Models [0.440401067183266]
Deep learning models such as Stable Diffusion have emerged as a powerful text-to-image tool. We propose UI-Diffuser, an approach that leverages Stable Diffusion to generate mobile UIs. Preliminary results show that UI-Diffuser provides an efficient and cost-effective way to generate mobile GUI designs.
arXiv Detail & Related papers (2023-06-09T20:08:46Z)
NiCro: Purely Vision-based, Non-intrusive Cross-Device and Cross-Platform GUI Testing [19.462053492572142]
We propose a non-intrusive cross-device and cross-platform system NiCro. NiCro uses the state-of-the-art GUI widget detector to detect widgets from GUI images and then analyses a set of comprehensive information to match the widgets across diverse devices. At the system level, NiCro can interact with a virtual device farm and a robotic arm system to perform cross-device, cross-platform testing non-intrusively.
arXiv Detail & Related papers (2023-05-24T01:19:05Z)
Effective, Platform-Independent GUI Testing via Image Embedding and Reinforcement Learning [15.458315113767686]
We propose PIRLTest, an effective platform-independent approach for app testing. It utilizes computer vision and reinforcement learning techniques in a novel, synergistic manner for automated testing. PILTest explores apps with the guidance of a curiosity-driven strategy, which uses a Q-network to estimate the values of specific state-action pairs.
arXiv Detail & Related papers (2022-08-19T01:51:16Z)
Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs [85.54212143154986]
Controllable scene synthesis consists of generating 3D information that satisfy underlying specifications. Scene graphs are representations of a scene composed of objects (nodes) and inter-object relationships (edges) We propose the first work that directly generates shapes from a scene graph in an end-to-end manner.
arXiv Detail & Related papers (2021-08-19T17:59:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.