Psychologically-Inspired, Unsupervised Inference of Perceptual Groups of
GUI Widgets from GUI Images
- URL: http://arxiv.org/abs/2206.10352v2
- Date: Wed, 24 May 2023 01:18:23 GMT
- Title: Psychologically-Inspired, Unsupervised Inference of Perceptual Groups of
GUI Widgets from GUI Images
- Authors: Mulong Xie, Zhenchang Xing, Sidong Feng, Chunyang Chen, Liming Zhu,
Xiwei Xu
- Abstract summary: We present a novel unsupervised image-based method for inferring perceptual groups of GUI widgets.
The evaluation on a dataset of 1,091 GUIs collected from 772 mobile apps and 20 UI design mockups shows that our method significantly outperforms the state-of-the-art ad-hocs-based baseline.
- Score: 21.498096538797952
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graphical User Interface (GUI) is not merely a collection of individual and
unrelated widgets, but rather partitions discrete widgets into groups by
various visual cues, thus forming higher-order perceptual units such as tab,
menu, card or list. The ability to automatically segment a GUI into perceptual
groups of widgets constitutes a fundamental component of visual intelligence to
automate GUI design, implementation and automation tasks. Although humans can
partition a GUI into meaningful perceptual groups of widgets in a highly
reliable way, perceptual grouping is still an open challenge for computational
approaches. Existing methods rely on ad-hoc heuristics or supervised machine
learning that is dependent on specific GUI implementations and runtime
information. Research in psychology and biological vision has formulated a set
of principles (i.e., Gestalt theory of perception) that describe how humans
group elements in visual scenes based on visual cues like connectivity,
similarity, proximity and continuity. These principles are domain-independent
and have been widely adopted by practitioners to structure content on GUIs to
improve aesthetic pleasant and usability. Inspired by these principles, we
present a novel unsupervised image-based method for inferring perceptual groups
of GUI widgets. Our method requires only GUI pixel images, is independent of
GUI implementation, and does not require any training data. The evaluation on a
dataset of 1,091 GUIs collected from 772 mobile apps and 20 UI design mockups
shows that our method significantly outperforms the state-of-the-art ad-hoc
heuristics-based baseline. Our perceptual grouping method creates the
opportunities for improving UI-related software engineering tasks.
Related papers
- GUI Agents: A Survey [129.94551809688377]
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction.
Motivated by the growing interest and fundamental importance of GUI agents, we provide a comprehensive survey that categorizes their benchmarks, evaluation metrics, architectures, and training methods.
arXiv Detail & Related papers (2024-12-18T04:48:28Z) - Zero-Shot Prompting Approaches for LLM-based Graphical User Interface Generation [53.1000575179389]
We propose a Retrieval-Augmented GUI Generation (RAGG) approach, integrated with an LLM-based GUI retrieval re-ranking and filtering mechanism.
In addition, we adapt Prompt Decomposition (PDGG) and Self-Critique (SCGG) for GUI generation.
Our evaluation, which encompasses over 3,000 GUI annotations from over 100 crowd-workers with UI/UX experience, shows that SCGG, in contrast to PDGG and RAGG, can lead to more effective GUI generation.
arXiv Detail & Related papers (2024-12-15T22:17:30Z) - Falcon-UI: Understanding GUI Before Following User Instructions [57.67308498231232]
We introduce an instruction-free GUI navigation dataset, termed Insight-UI dataset, to enhance model comprehension of GUI environments.
Insight-UI dataset is automatically generated from the Common Crawl corpus, simulating various platforms.
We develop the GUI agent model Falcon-UI, which is initially pretrained on Insight-UI dataset and subsequently fine-tuned on Android and Web GUI datasets.
arXiv Detail & Related papers (2024-12-12T15:29:36Z) - Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction [69.57190742976091]
We introduce Aguvis, a unified vision-based framework for autonomous GUI agents.
Our approach leverages image-based observations, and grounding instructions in natural language to visual elements.
To address the limitations of previous work, we integrate explicit planning and reasoning within the model.
arXiv Detail & Related papers (2024-12-05T18:58:26Z) - GUICourse: From General Vision Language Models to Versatile GUI Agents [75.5150601913659]
We contribute GUICourse, a suite of datasets to train visual-based GUI agents.
First, we introduce the GUIEnv dataset to strengthen the OCR and grounding capabilities of VLMs.
Then, we introduce the GUIAct and GUIChat datasets to enrich their knowledge of GUI components and interactions.
arXiv Detail & Related papers (2024-06-17T08:30:55Z) - GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents [73.9254861755974]
This paper introduces a new dataset, called GUI-World, which features meticulously crafted Human-MLLM annotations.
We evaluate the capabilities of current state-of-the-art MLLMs, including ImageLLMs and VideoLLMs, in understanding various types of GUI content.
arXiv Detail & Related papers (2024-06-16T06:56:53Z) - Graph4GUI: Graph Neural Networks for Representing Graphical User Interfaces [27.84098739594353]
Graph4GUI exploits graph neural networks to capture individual elements' properties and semantic-visuo-spatial constraints in a layout.
The learned representation demonstrated its effectiveness in multiple tasks, especially generating designs in a challenging GUI autocompletion task.
arXiv Detail & Related papers (2024-04-21T04:06:09Z) - GUILGET: GUI Layout GEneration with Transformer [26.457270239234383]
The goal is to support the initial step of GUI design by producing realistic and diverse GUI layouts.
GUILGET is based on transformers in order to capture the semantic in relationships between elements from GUI-AG.
Our experiments, which are conducted on the CLAY dataset, reveal that our model has the best understanding of relationships from GUI-AG.
arXiv Detail & Related papers (2023-04-18T14:27:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.