Scene-Driven Exploration and GUI Modeling for Android Apps
- URL: http://arxiv.org/abs/2308.10228v1
- Date: Sun, 20 Aug 2023 10:54:25 GMT
- Title: Scene-Driven Exploration and GUI Modeling for Android Apps
- Authors: Xiangyu Zhang, Lingling Fan, Sen Chen, Yucheng Su, Boyuan Li
- Abstract summary: The extracted transition graphs for apps such as ATG, WTG, and STG have a low transition coverage and coarse-grained granularity.
We propose SceneDroid, a scene-driven exploration approach to extracting the GUI scenes dynamically.
Compared with the existing GUI modeling tools, SceneDroid has improved by 168.74% in the coverage of transition pairs and 162.42% in scene extraction.
- Score: 13.647261033241364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the competitive environment, mobile apps are usually produced under
pressure with lots of complicated functionality and UI pages. Therefore, it is
challenging for various roles to design, understand, test, and maintain these
apps. The extracted transition graphs for apps such as ATG, WTG, and STG have a
low transition coverage and coarse-grained granularity, which limits the
existing methods of graphical user interface (GUI) modeling by UI exploration.
To solve these problems, in this paper, we propose SceneDroid, a scene-driven
exploration approach to extracting the GUI scenes dynamically by integrating a
series of novel techniques including smart exploration, state fuzzing, and
indirect launching strategies. We present the GUI scenes as a scene transition
graph (SceneTG) to model the GUI of apps with high transition coverage and
fine? grained granularity. Compared with the existing GUI modeling tools,
SceneDroid has improved by 168.74% in the coverage of transition pairs and
162.42% in scene extraction. Apart from the effectiveness evaluation of
SceneDroid, we also illustrate the future potential of SceneDroid as a
fundamental capability to support app development, reverse engineering, and GUI
regression testing.
Related papers
- GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration [56.58744345634623]
We propose GUI-Bee, an MLLM-based autonomous agent, to collect high-quality, environment-specific data through exploration.
We also introduce NovelScreenSpot, a benchmark for testing how well the data can help align GUI action grounding models to novel environments.
arXiv Detail & Related papers (2025-01-23T18:16:21Z) - UI-TARS: Pioneering Automated GUI Interaction with Native Agents [58.18100825673032]
This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions.
In the OSWorld benchmark, UI-TARS achieves scores of 24.6 with 50 steps and 22.7 with 15 steps, outperforming Claude (22.0 and 14.9 respectively)
arXiv Detail & Related papers (2025-01-21T17:48:10Z) - Zero-Shot Prompting Approaches for LLM-based Graphical User Interface Generation [53.1000575179389]
We propose a Retrieval-Augmented GUI Generation (RAGG) approach, integrated with an LLM-based GUI retrieval re-ranking and filtering mechanism.
In addition, we adapt Prompt Decomposition (PDGG) and Self-Critique (SCGG) for GUI generation.
Our evaluation, which encompasses over 3,000 GUI annotations from over 100 crowd-workers with UI/UX experience, shows that SCGG, in contrast to PDGG and RAGG, can lead to more effective GUI generation.
arXiv Detail & Related papers (2024-12-15T22:17:30Z) - Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction [69.57190742976091]
We introduce Aguvis, a unified vision-based framework for autonomous GUI agents.
Our approach leverages image-based observations, and grounding instructions in natural language to visual elements.
To address the limitations of previous work, we integrate explicit planning and reasoning within the model.
arXiv Detail & Related papers (2024-12-05T18:58:26Z) - ShowUI: One Vision-Language-Action Model for GUI Visual Agent [80.50062396585004]
Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity.
We develop a vision-language-action model in digital world, namely ShowUI, which features the following innovations.
ShowUI, a lightweight 2B model using 256K data, achieves a strong 75.1% accuracy in zero-shot screenshot grounding.
arXiv Detail & Related papers (2024-11-26T14:29:47Z) - GUICourse: From General Vision Language Models to Versatile GUI Agents [75.5150601913659]
We contribute GUICourse, a suite of datasets to train visual-based GUI agents.
First, we introduce the GUIEnv dataset to strengthen the OCR and grounding capabilities of VLMs.
Then, we introduce the GUIAct and GUIChat datasets to enrich their knowledge of GUI components and interactions.
arXiv Detail & Related papers (2024-06-17T08:30:55Z) - GUing: A Mobile GUI Search Engine using a Vision-Language Model [6.024602799136753]
This paper proposes GUing, a GUI search engine based on a vision-language model called GUIClip.
We first collected from Google Play app introduction images which display the most representative screenshots.
Then, we developed an automated pipeline to classify, crop, and extract the captions from these images.
We used this dataset to train a novel vision-language model, which is, to the best of our knowledge, the first of its kind for GUI retrieval.
arXiv Detail & Related papers (2024-04-30T18:42:18Z) - Vision-Based Mobile App GUI Testing: A Survey [29.042723121518765]
Vision-based mobile app GUI testing approaches emerged with the development of computer vision technologies.
We provide a comprehensive investigation of the state-of-the-art techniques on 271 papers, among which 92 are vision-based studies.
arXiv Detail & Related papers (2023-10-20T14:04:04Z) - Boosting GUI Prototyping with Diffusion Models [0.440401067183266]
Deep learning models such as Stable Diffusion have emerged as a powerful text-to-image tool.
We propose UI-Diffuser, an approach that leverages Stable Diffusion to generate mobile UIs.
Preliminary results show that UI-Diffuser provides an efficient and cost-effective way to generate mobile GUI designs.
arXiv Detail & Related papers (2023-06-09T20:08:46Z) - Effective, Platform-Independent GUI Testing via Image Embedding and Reinforcement Learning [15.458315113767686]
We propose PIRLTest, an effective platform-independent approach for app testing.
It utilizes computer vision and reinforcement learning techniques in a novel, synergistic manner for automated testing.
PILTest explores apps with the guidance of a curiosity-driven strategy, which uses a Q-network to estimate the values of specific state-action pairs.
arXiv Detail & Related papers (2022-08-19T01:51:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.