Related papers: GUILGET: GUI Layout GEneration with Transformer

GUILGET: GUI Layout GEneration with Transformer

URL: http://arxiv.org/abs/2304.09012v1
Date: Tue, 18 Apr 2023 14:27:34 GMT
Title: GUILGET: GUI Layout GEneration with Transformer
Authors: Andrey Sobolevsky, Guillaume-Alexandre Bilodeau, Jinghui Cheng, Jin L.C. Guo
Abstract summary: The goal is to support the initial step of GUI design by producing realistic and diverse GUI layouts. GUILGET is based on transformers in order to capture the semantic in relationships between elements from GUI-AG. Our experiments, which are conducted on the CLAY dataset, reveal that our model has the best understanding of relationships from GUI-AG.
Score: 26.457270239234383
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sketching out Graphical User Interface (GUI) layout is part of the pipeline of designing a GUI and a crucial task for the success of a software application. Arranging all components inside a GUI layout manually is a time-consuming task. In order to assist designers, we developed a method named GUILGET to automatically generate GUI layouts from positional constraints represented as GUI arrangement graphs (GUI-AGs). The goal is to support the initial step of GUI design by producing realistic and diverse GUI layouts. The existing image layout generation techniques often cannot incorporate GUI design constraints. Thus, GUILGET needs to adapt existing techniques to generate GUI layouts that obey to constraints specific to GUI designs. GUILGET is based on transformers in order to capture the semantic in relationships between elements from GUI-AG. Moreover, the model learns constraints through the minimization of losses responsible for placing each component inside its parent layout, for not letting components overlap if they are inside the same parent, and for component alignment. Our experiments, which are conducted on the CLAY dataset, reveal that our model has the best understanding of relationships from GUI-AG and has the best performances in most of evaluation metrics. Therefore, our work contributes to improved GUI layout generation by proposing a novel method that effectively accounts for the constraints on GUI elements and paves the road for a more efficient GUI design pipeline.

Related papers

MP-GUI: Modality Perception with MLLMs for GUI Understanding [12.812289005013797]
MP-GUI is a specially designed MLLM for GUI understanding. MP-GUI features three precisely specialized perceivers to extract graphical, textual, and spatial modalities from the screen. To cope with the scarcity of training data, we also introduce a pipeline for automatically data collecting.
arXiv Detail & Related papers (2025-03-18T08:32:22Z)
GUIDE: LLM-Driven GUI Generation Decomposition for Automated Prototyping [55.762798168494726]
Large Language Models (LLMs) with their impressive code generation capabilities offer a promising approach for automating GUI prototyping. But there is a gap between current LLM-based prototyping solutions and traditional user-based GUI prototyping approaches. We propose GUIDE, a novel LLM-driven GUI generation decomposition approach seamlessly integrated into the popular prototyping framework Figma.
arXiv Detail & Related papers (2025-02-28T14:03:53Z)
GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration [56.58744345634623]
We propose GUI-Bee, an MLLM-based autonomous agent, to collect high-quality, environment-specific data through exploration. We also introduce NovelScreenSpot, a benchmark for testing how well the data can help align GUI action grounding models to novel environments.
arXiv Detail & Related papers (2025-01-23T18:16:21Z)
Zero-Shot Prompting Approaches for LLM-based Graphical User Interface Generation [53.1000575179389]
We propose a Retrieval-Augmented GUI Generation (RAGG) approach, integrated with an LLM-based GUI retrieval re-ranking and filtering mechanism. In addition, we adapt Prompt Decomposition (PDGG) and Self-Critique (SCGG) for GUI generation. Our evaluation, which encompasses over 3,000 GUI annotations from over 100 crowd-workers with UI/UX experience, shows that SCGG, in contrast to PDGG and RAGG, can lead to more effective GUI generation.
arXiv Detail & Related papers (2024-12-15T22:17:30Z)
Falcon-UI: Understanding GUI Before Following User Instructions [57.67308498231232]
We introduce an instruction-free GUI navigation dataset, termed Insight-UI dataset, to enhance model comprehension of GUI environments. Insight-UI dataset is automatically generated from the Common Crawl corpus, simulating various platforms. We develop the GUI agent model Falcon-UI, which is initially pretrained on Insight-UI dataset and subsequently fine-tuned on Android and Web GUI datasets.
arXiv Detail & Related papers (2024-12-12T15:29:36Z)
ShowUI: One Vision-Language-Action Model for GUI Visual Agent [80.50062396585004]
Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity. We develop a vision-language-action model in digital world, namely ShowUI, which features the following innovations. ShowUI, a lightweight 2B model using 256K data, achieves a strong 75.1% accuracy in zero-shot screenshot grounding.
arXiv Detail & Related papers (2024-11-26T14:29:47Z)
GUICourse: From General Vision Language Models to Versatile GUI Agents [75.5150601913659]
We contribute GUICourse, a suite of datasets to train visual-based GUI agents. First, we introduce the GUIEnv dataset to strengthen the OCR and grounding capabilities of VLMs. Then, we introduce the GUIAct and GUIChat datasets to enrich their knowledge of GUI components and interactions.
arXiv Detail & Related papers (2024-06-17T08:30:55Z)
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents [73.9254861755974]
This paper introduces a new dataset, called GUI-World, which features meticulously crafted Human-MLLM annotations. We evaluate the capabilities of current state-of-the-art MLLMs, including ImageLLMs and VideoLLMs, in understanding various types of GUI content.
arXiv Detail & Related papers (2024-06-16T06:56:53Z)
VideoGUI: A Benchmark for GUI Automation from Instructional Videos [78.97292966276706]
VideoGUI is a novel multi-modal benchmark designed to evaluate GUI assistants on visual-centric GUI tasks. Sourced from high-quality web instructional videos, our benchmark focuses on tasks involving professional and novel software. Our evaluation reveals that even the SoTA large multimodal model GPT4o performs poorly on visual-centric GUI tasks.
arXiv Detail & Related papers (2024-06-14T17:59:08Z)
Interlinking User Stories and GUI Prototyping: A Semi-Automatic LLM-based Approach [55.762798168494726]
We present a novel Large Language Model (LLM)-based approach for validating the implementation of functional NL-based requirements in a graphical user interface (GUI) prototype. Our approach aims to detect functional user stories that are not implemented in a GUI prototype and provides recommendations for suitable GUI components directly implementing the requirements.
arXiv Detail & Related papers (2024-06-12T11:59:26Z)
Graph4GUI: Graph Neural Networks for Representing Graphical User Interfaces [27.84098739594353]
Graph4GUI exploits graph neural networks to capture individual elements' properties and semantic-visuo-spatial constraints in a layout. The learned representation demonstrated its effectiveness in multiple tasks, especially generating designs in a challenging GUI autocompletion task.
arXiv Detail & Related papers (2024-04-21T04:06:09Z)
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents [17.43878828389188]
We propose a novel visual Graphical User Interface (GUI) agent, SeeClick, which only relies on screenshots for task automation. To tackle this challenge, we propose to enhance SeeClick with GUI grounding pre-training and devise a method to automate the curation of GUI grounding data. We have also created ScreenSpot, the first realistic GUI grounding benchmark that encompasses mobile, desktop, and web environments.
arXiv Detail & Related papers (2024-01-17T08:10:35Z)
Psychologically-Inspired, Unsupervised Inference of Perceptual Groups of GUI Widgets from GUI Images [21.498096538797952]
We present a novel unsupervised image-based method for inferring perceptual groups of GUI widgets. The evaluation on a dataset of 1,091 GUIs collected from 772 mobile apps and 20 UI design mockups shows that our method significantly outperforms the state-of-the-art ad-hocs-based baseline.
arXiv Detail & Related papers (2022-06-15T05:16:03Z)
ReverseORC: Reverse Engineering of Resizable User Interface Layouts with OR-Constraints [47.164878414034234]
ReverseORC is a novel reverse engineering (RE) approach to discover diverse layout types and their dynamic resizing behaviours. It can create specifications that replicate even some non-standard layout managers with complex dynamic layout behaviours. It can be used to detect and fix problems in legacy UIs, extend UIs with enhanced layout behaviours, and support the creation of flexible UI layouts.
arXiv Detail & Related papers (2022-02-23T13:57:25Z)
GUIGAN: Learning to Generate GUI Designs Using Generative Adversarial Networks [0.0]
We develop a model GUIGAN to automatically generate GUI designs. Our model significantly outperforms the best of the baseline methods by 30.77% in Frechet Inception distance (FID) and 12.35% in 1-Nearest Neighbor Accuracy (1-NNA)
arXiv Detail & Related papers (2021-01-25T09:42:58Z)
Object Detection for Graphical User Interface: Old Fashioned or Deep Learning or a Combination? [21.91118062303175]
We conduct the first large-scale empirical study of seven representative GUI element detection methods on over 50k GUI images. This study sheds the light on the technical challenges to be addressed and informs the design of new GUI element detection methods. Our evaluation on 25,000 GUI images shows that our method significantly advances the start-of-the-art performance in GUI element detection.
arXiv Detail & Related papers (2020-08-12T06:36:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.