GUILGET: GUI Layout GEneration with Transformer
- URL: http://arxiv.org/abs/2304.09012v1
- Date: Tue, 18 Apr 2023 14:27:34 GMT
- Title: GUILGET: GUI Layout GEneration with Transformer
- Authors: Andrey Sobolevsky, Guillaume-Alexandre Bilodeau, Jinghui Cheng, Jin
L.C. Guo
- Abstract summary: The goal is to support the initial step of GUI design by producing realistic and diverse GUI layouts.
GUILGET is based on transformers in order to capture the semantic in relationships between elements from GUI-AG.
Our experiments, which are conducted on the CLAY dataset, reveal that our model has the best understanding of relationships from GUI-AG.
- Score: 26.457270239234383
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sketching out Graphical User Interface (GUI) layout is part of the pipeline
of designing a GUI and a crucial task for the success of a software
application. Arranging all components inside a GUI layout manually is a
time-consuming task. In order to assist designers, we developed a method named
GUILGET to automatically generate GUI layouts from positional constraints
represented as GUI arrangement graphs (GUI-AGs). The goal is to support the
initial step of GUI design by producing realistic and diverse GUI layouts. The
existing image layout generation techniques often cannot incorporate GUI design
constraints. Thus, GUILGET needs to adapt existing techniques to generate GUI
layouts that obey to constraints specific to GUI designs. GUILGET is based on
transformers in order to capture the semantic in relationships between elements
from GUI-AG. Moreover, the model learns constraints through the minimization of
losses responsible for placing each component inside its parent layout, for not
letting components overlap if they are inside the same parent, and for
component alignment. Our experiments, which are conducted on the CLAY dataset,
reveal that our model has the best understanding of relationships from GUI-AG
and has the best performances in most of evaluation metrics. Therefore, our
work contributes to improved GUI layout generation by proposing a novel method
that effectively accounts for the constraints on GUI elements and paves the
road for a more efficient GUI design pipeline.
Related papers
- ShowUI: One Vision-Language-Action Model for GUI Visual Agent [80.50062396585004]
Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity.
We develop a vision-language-action model in digital world, namely ShowUI, which features the following innovations.
ShowUI, a lightweight 2B model using 256K data, achieves a strong 75.1% accuracy in zero-shot screenshot grounding.
arXiv Detail & Related papers (2024-11-26T14:29:47Z) - GUICourse: From General Vision Language Models to Versatile GUI Agents [75.5150601913659]
We contribute GUICourse, a suite of datasets to train visual-based GUI agents.
First, we introduce the GUIEnv dataset to strengthen the OCR and grounding capabilities of VLMs.
Then, we introduce the GUIAct and GUIChat datasets to enrich their knowledge of GUI components and interactions.
arXiv Detail & Related papers (2024-06-17T08:30:55Z) - GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents [73.9254861755974]
This paper introduces a new dataset, called GUI-World, which features meticulously crafted Human-MLLM annotations.
We evaluate the capabilities of current state-of-the-art MLLMs, including ImageLLMs and VideoLLMs, in understanding various types of GUI content.
arXiv Detail & Related papers (2024-06-16T06:56:53Z) - VideoGUI: A Benchmark for GUI Automation from Instructional Videos [78.97292966276706]
VideoGUI is a novel multi-modal benchmark designed to evaluate GUI assistants on visual-centric GUI tasks.
Sourced from high-quality web instructional videos, our benchmark focuses on tasks involving professional and novel software.
Our evaluation reveals that even the SoTA large multimodal model GPT4o performs poorly on visual-centric GUI tasks.
arXiv Detail & Related papers (2024-06-14T17:59:08Z) - Interlinking User Stories and GUI Prototyping: A Semi-Automatic LLM-based Approach [55.762798168494726]
We present a novel Large Language Model (LLM)-based approach for validating the implementation of functional NL-based requirements in a graphical user interface (GUI) prototype.
Our approach aims to detect functional user stories that are not implemented in a GUI prototype and provides recommendations for suitable GUI components directly implementing the requirements.
arXiv Detail & Related papers (2024-06-12T11:59:26Z) - Graph4GUI: Graph Neural Networks for Representing Graphical User Interfaces [27.84098739594353]
Graph4GUI exploits graph neural networks to capture individual elements' properties and semantic-visuo-spatial constraints in a layout.
The learned representation demonstrated its effectiveness in multiple tasks, especially generating designs in a challenging GUI autocompletion task.
arXiv Detail & Related papers (2024-04-21T04:06:09Z) - SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents [17.43878828389188]
We propose a novel visual Graphical User Interface (GUI) agent, SeeClick, which only relies on screenshots for task automation.
To tackle this challenge, we propose to enhance SeeClick with GUI grounding pre-training and devise a method to automate the curation of GUI grounding data.
We have also created ScreenSpot, the first realistic GUI grounding benchmark that encompasses mobile, desktop, and web environments.
arXiv Detail & Related papers (2024-01-17T08:10:35Z) - Psychologically-Inspired, Unsupervised Inference of Perceptual Groups of
GUI Widgets from GUI Images [21.498096538797952]
We present a novel unsupervised image-based method for inferring perceptual groups of GUI widgets.
The evaluation on a dataset of 1,091 GUIs collected from 772 mobile apps and 20 UI design mockups shows that our method significantly outperforms the state-of-the-art ad-hocs-based baseline.
arXiv Detail & Related papers (2022-06-15T05:16:03Z) - ReverseORC: Reverse Engineering of Resizable User Interface Layouts with
OR-Constraints [47.164878414034234]
ReverseORC is a novel reverse engineering (RE) approach to discover diverse layout types and their dynamic resizing behaviours.
It can create specifications that replicate even some non-standard layout managers with complex dynamic layout behaviours.
It can be used to detect and fix problems in legacy UIs, extend UIs with enhanced layout behaviours, and support the creation of flexible UI layouts.
arXiv Detail & Related papers (2022-02-23T13:57:25Z) - GUIGAN: Learning to Generate GUI Designs Using Generative Adversarial
Networks [0.0]
We develop a model GUIGAN to automatically generate GUI designs.
Our model significantly outperforms the best of the baseline methods by 30.77% in Frechet Inception distance (FID) and 12.35% in 1-Nearest Neighbor Accuracy (1-NNA)
arXiv Detail & Related papers (2021-01-25T09:42:58Z) - Object Detection for Graphical User Interface: Old Fashioned or Deep
Learning or a Combination? [21.91118062303175]
We conduct the first large-scale empirical study of seven representative GUI element detection methods on over 50k GUI images.
This study sheds the light on the technical challenges to be addressed and informs the design of new GUI element detection methods.
Our evaluation on 25,000 GUI images shows that our method significantly advances the start-of-the-art performance in GUI element detection.
arXiv Detail & Related papers (2020-08-12T06:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.