Related papers: EGFE: End-to-end Grouping of Fragmented Elements in UI Designs with Multimodal Learning

EGFE: End-to-end Grouping of Fragmented Elements in UI Designs with Multimodal Learning

URL: http://arxiv.org/abs/2309.09867v1
Date: Mon, 18 Sep 2023 15:28:12 GMT
Title: EGFE: End-to-end Grouping of Fragmented Elements in UI Designs with Multimodal Learning
Authors: Liuqing Chen, Yunnong Chen, Shuhong Xiao, Yaxuan Song, Lingyun Sun, Yankun Zhen, Tingting Zhou, Yanfang Chang
Abstract summary: Grouping fragmented elements can greatly improve the readability and maintainability of the generated code. Current methods employ a two-stage strategy that introduces hand-crafted rules to group fragmented elements. We propose EGFE, a novel method for automatically End-to-end Grouping Fragmented Elements via UI sequence prediction.
Score: 10.885275494978478
License: http://creativecommons.org/licenses/by/4.0/
Abstract: When translating UI design prototypes to code in industry, automatically generating code from design prototypes can expedite the development of applications and GUI iterations. However, in design prototypes without strict design specifications, UI components may be composed of fragmented elements. Grouping these fragmented elements can greatly improve the readability and maintainability of the generated code. Current methods employ a two-stage strategy that introduces hand-crafted rules to group fragmented elements. Unfortunately, the performance of these methods is not satisfying due to visually overlapped and tiny UI elements. In this study, we propose EGFE, a novel method for automatically End-to-end Grouping Fragmented Elements via UI sequence prediction. To facilitate the UI understanding, we innovatively construct a Transformer encoder to model the relationship between the UI elements with multi-modal representation learning. The evaluation on a dataset of 4606 UI prototypes collected from professional UI designers shows that our method outperforms the state-of-the-art baselines in the precision (by 29.75\%), recall (by 31.07\%), and F1-score (by 30.39\%) at edit distance threshold of 4. In addition, we conduct an empirical study to assess the improvement of the generated front-end code. The results demonstrate the effectiveness of our method on a real software engineering application. Our end-to-end fragmented elements grouping method creates opportunities for improving UI-related software engineering tasks.

Related papers

UI-TARS: Pioneering Automated GUI Interaction with Native Agents [58.18100825673032]
This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions. In the OSWorld benchmark, UI-TARS achieves scores of 24.6 with 50 steps and 22.7 with 15 steps, outperforming Claude (22.0 and 14.9 respectively)
arXiv Detail & Related papers (2025-01-21T17:48:10Z)
GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a VLM-based framework that generates content-aware text logo layouts. We introduce two model techniques to reduce the computation for processing multiple glyph images simultaneously. To support instruction-tuning of out model, we construct two extensive text logo datasets, which are 5x more larger than the existing public dataset.
arXiv Detail & Related papers (2024-11-18T10:04:10Z)
Tell Me What's Next: Textual Foresight for Generic UI Representations [65.10591722192609]
We propose Textual Foresight, a novel pretraining objective for learning UI screen representations. Textual Foresight generates global text descriptions of future UI states given a current UI and local action taken. We train with our newly constructed mobile app dataset, OpenApp, which results in the first public dataset for app UI representation learning.
arXiv Detail & Related papers (2024-06-12T02:43:19Z)
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation. Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts. We conduct extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks.
arXiv Detail & Related papers (2024-06-05T03:05:52Z)
UI Semantic Group Detection: Grouping UI Elements with Similar Semantics in Mobile Graphical User Interface [10.80156450091773]
Existing studies on UI elements grouping mainly focus on a single UI-related software engineering task, and their groups vary in appearance and function. We propose our semantic component groups that pack adjacent text and non-text elements with similar semantics. To recognize semantic component groups on a UI page, we propose a robust, deep learning-based vision detector, UISCGD.
arXiv Detail & Related papers (2024-03-08T01:52:44Z)
Compositional Generative Inverse Design [69.22782875567547]
Inverse design, where we seek to design input variables in order to optimize an underlying objective function, is an important problem. We show that by instead optimizing over the learned energy function captured by the diffusion model, we can avoid such adversarial examples. In an N-body interaction task and a challenging 2D multi-airfoil design task, we demonstrate that by composing the learned diffusion model at test time, our method allows us to design initial states and boundary shapes.
arXiv Detail & Related papers (2024-01-24T01:33:39Z)
UI Layers Group Detector: Grouping UI Layers via Text Fusion and Box Attention [7.614630088064978]
We propose a vision-based method that automatically detects images (i.e., basic shapes and visual elements) and text layers that present the same semantic meanings. We construct a large-scale UI dataset for training and testing, and present a data augmentation approach to boost the detection performance.
arXiv Detail & Related papers (2022-12-07T03:50:20Z)
ReverseORC: Reverse Engineering of Resizable User Interface Layouts with OR-Constraints [47.164878414034234]
ReverseORC is a novel reverse engineering (RE) approach to discover diverse layout types and their dynamic resizing behaviours. It can create specifications that replicate even some non-standard layout managers with complex dynamic layout behaviours. It can be used to detect and fix problems in legacy UIs, extend UIs with enhanced layout behaviours, and support the creation of flexible UI layouts.
arXiv Detail & Related papers (2022-02-23T13:57:25Z)
Creating User Interface Mock-ups from High-Level Text Descriptions with Deep-Learning Models [19.63933191791183]
We introduce three deep-learning techniques to create low-fidelity UI mock-ups from a natural language phrase. We quantitatively and qualitatively compare and contrast each method's ability in suggesting coherent, diverse and relevant UI design mock-ups.
arXiv Detail & Related papers (2021-10-14T23:48:46Z)
UIBert: Learning Generic Multimodal Representations for UI Understanding [12.931540149350633]
We introduce a transformer-based joint image-text model trained through novel pre-training tasks on large-scale unlabeled UI data. Our key intuition is that the heterogeneous features in a UI are self-aligned, i.e., the image and text features of UI components, are predictive of each other. We propose five pretraining tasks utilizing this self-alignment among different features of a UI component and across various components in the same UI. We evaluate our method on nine real-world downstream UI tasks where UIBert outperforms strong multimodal baselines by up to 9.26% accuracy.
arXiv Detail & Related papers (2021-07-29T03:51:36Z)
Magic Layouts: Structural Prior for Component Detection in User Interface Designs [28.394160581239174]
We present Magic Layouts; a method for parsing screenshots or hand-drawn sketches of user interface (UI) layouts. Our core contribution is to extend existing detectors to exploit a learned structural prior for UI designs. We demonstrate within the context an interactive application for rapidly acquiring digital prototypes of user experience (UX) designs.
arXiv Detail & Related papers (2021-06-14T17:20:36Z)
VINS: Visual Search for Mobile User Interface Design [66.28088601689069]
This paper introduces VINS, a visual search framework, that takes as input a UI image and retrieves visually similar design examples. The framework achieves a mean Average Precision of 76.39% for the UI detection and high performance in querying similar UI designs.
arXiv Detail & Related papers (2021-02-10T01:46:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.