Scene-Driven Exploration and GUI Modeling for Android Apps
- URL: http://arxiv.org/abs/2308.10228v1
- Date: Sun, 20 Aug 2023 10:54:25 GMT
- Title: Scene-Driven Exploration and GUI Modeling for Android Apps
- Authors: Xiangyu Zhang, Lingling Fan, Sen Chen, Yucheng Su, Boyuan Li
- Abstract summary: The extracted transition graphs for apps such as ATG, WTG, and STG have a low transition coverage and coarse-grained granularity.
We propose SceneDroid, a scene-driven exploration approach to extracting the GUI scenes dynamically.
Compared with the existing GUI modeling tools, SceneDroid has improved by 168.74% in the coverage of transition pairs and 162.42% in scene extraction.
- Score: 13.647261033241364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the competitive environment, mobile apps are usually produced under
pressure with lots of complicated functionality and UI pages. Therefore, it is
challenging for various roles to design, understand, test, and maintain these
apps. The extracted transition graphs for apps such as ATG, WTG, and STG have a
low transition coverage and coarse-grained granularity, which limits the
existing methods of graphical user interface (GUI) modeling by UI exploration.
To solve these problems, in this paper, we propose SceneDroid, a scene-driven
exploration approach to extracting the GUI scenes dynamically by integrating a
series of novel techniques including smart exploration, state fuzzing, and
indirect launching strategies. We present the GUI scenes as a scene transition
graph (SceneTG) to model the GUI of apps with high transition coverage and
fine? grained granularity. Compared with the existing GUI modeling tools,
SceneDroid has improved by 168.74% in the coverage of transition pairs and
162.42% in scene extraction. Apart from the effectiveness evaluation of
SceneDroid, we also illustrate the future potential of SceneDroid as a
fundamental capability to support app development, reverse engineering, and GUI
regression testing.
Related papers
- ShowUI: One Vision-Language-Action Model for GUI Visual Agent [80.50062396585004]
Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity.
We develop a vision-language-action model in digital world, namely ShowUI, which features the following innovations.
ShowUI, a lightweight 2B model using 256K data, achieves a strong 75.1% accuracy in zero-shot screenshot grounding.
arXiv Detail & Related papers (2024-11-26T14:29:47Z) - GUICourse: From General Vision Language Models to Versatile GUI Agents [75.5150601913659]
We contribute GUICourse, a suite of datasets to train visual-based GUI agents.
First, we introduce the GUIEnv dataset to strengthen the OCR and grounding capabilities of VLMs.
Then, we introduce the GUIAct and GUIChat datasets to enrich their knowledge of GUI components and interactions.
arXiv Detail & Related papers (2024-06-17T08:30:55Z) - GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents [73.9254861755974]
This paper introduces a new dataset, called GUI-World, which features meticulously crafted Human-MLLM annotations.
We evaluate the capabilities of current state-of-the-art MLLMs, including ImageLLMs and VideoLLMs, in understanding various types of GUI content.
arXiv Detail & Related papers (2024-06-16T06:56:53Z) - GUing: A Mobile GUI Search Engine using a Vision-Language Model [6.024602799136753]
This paper proposes GUing, a GUI search engine based on a vision-language model called GUIClip.
We first collected from Google Play app introduction images which display the most representative screenshots.
Then, we developed an automated pipeline to classify, crop, and extract the captions from these images.
We used this dataset to train a novel vision-language model, which is, to the best of our knowledge, the first of its kind for GUI retrieval.
arXiv Detail & Related papers (2024-04-30T18:42:18Z) - 3D scene generation from scene graphs and self-attention [51.49886604454926]
We present a variant of the conditional variational autoencoder (cVAE) model to synthesize 3D scenes from scene graphs and floor plans.
We exploit the properties of self-attention layers to capture high-level relationships between objects in a scene.
arXiv Detail & Related papers (2024-04-02T12:26:17Z) - SceneX:Procedural Controllable Large-scale Scene Generation via Large-language Models [53.961002112433576]
We introduce a large-scale scene generation framework, SceneX, which can automatically produce high-quality procedural models according to designers' textual descriptions.
Our SceneX can generate a city spanning 2.5 km times 2.5 km with delicate geometric layout and structures, drastically reducing the time cost from several weeks for professional PCG engineers to just a few hours for an ordinary user.
arXiv Detail & Related papers (2024-03-23T03:23:29Z) - Vision-Based Mobile App GUI Testing: A Survey [29.042723121518765]
Vision-based mobile app GUI testing approaches emerged with the development of computer vision technologies.
We provide a comprehensive investigation of the state-of-the-art techniques on 271 papers, among which 92 are vision-based studies.
arXiv Detail & Related papers (2023-10-20T14:04:04Z) - Boosting GUI Prototyping with Diffusion Models [0.440401067183266]
Deep learning models such as Stable Diffusion have emerged as a powerful text-to-image tool.
We propose UI-Diffuser, an approach that leverages Stable Diffusion to generate mobile UIs.
Preliminary results show that UI-Diffuser provides an efficient and cost-effective way to generate mobile GUI designs.
arXiv Detail & Related papers (2023-06-09T20:08:46Z) - NiCro: Purely Vision-based, Non-intrusive Cross-Device and
Cross-Platform GUI Testing [19.462053492572142]
We propose a non-intrusive cross-device and cross-platform system NiCro.
NiCro uses the state-of-the-art GUI widget detector to detect widgets from GUI images and then analyses a set of comprehensive information to match the widgets across diverse devices.
At the system level, NiCro can interact with a virtual device farm and a robotic arm system to perform cross-device, cross-platform testing non-intrusively.
arXiv Detail & Related papers (2023-05-24T01:19:05Z) - Effective, Platform-Independent GUI Testing via Image Embedding and Reinforcement Learning [15.458315113767686]
We propose PIRLTest, an effective platform-independent approach for app testing.
It utilizes computer vision and reinforcement learning techniques in a novel, synergistic manner for automated testing.
PILTest explores apps with the guidance of a curiosity-driven strategy, which uses a Q-network to estimate the values of specific state-action pairs.
arXiv Detail & Related papers (2022-08-19T01:51:16Z) - Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using
Scene Graphs [85.54212143154986]
Controllable scene synthesis consists of generating 3D information that satisfy underlying specifications.
Scene graphs are representations of a scene composed of objects (nodes) and inter-object relationships (edges)
We propose the first work that directly generates shapes from a scene graph in an end-to-end manner.
arXiv Detail & Related papers (2021-08-19T17:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.