Magic Layouts: Structural Prior for Component Detection in User
Interface Designs
- URL: http://arxiv.org/abs/2106.07615v1
- Date: Mon, 14 Jun 2021 17:20:36 GMT
- Title: Magic Layouts: Structural Prior for Component Detection in User
Interface Designs
- Authors: Dipu Manandhar, Hailin Jin, John Collomosse
- Abstract summary: We present Magic Layouts; a method for parsing screenshots or hand-drawn sketches of user interface (UI) layouts.
Our core contribution is to extend existing detectors to exploit a learned structural prior for UI designs.
We demonstrate within the context an interactive application for rapidly acquiring digital prototypes of user experience (UX) designs.
- Score: 28.394160581239174
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Magic Layouts; a method for parsing screenshots or hand-drawn
sketches of user interface (UI) layouts. Our core contribution is to extend
existing detectors to exploit a learned structural prior for UI designs,
enabling robust detection of UI components; buttons, text boxes and similar.
Specifically we learn a prior over mobile UI layouts, encoding common spatial
co-occurrence relationships between different UI components. Conditioning
region proposals using this prior leads to performance gains on UI layout
parsing for both hand-drawn UIs and app screenshots, which we demonstrate
within the context an interactive application for rapidly acquiring digital
prototypes of user experience (UX) designs.
Related papers
- GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a VLM-based framework that generates content-aware text logo layouts.
We introduce two model techniques to reduce the computation for processing multiple glyph images simultaneously.
To support instruction-tuning of out model, we construct two extensive text logo datasets, which are 5x more larger than the existing public dataset.
arXiv Detail & Related papers (2024-11-18T10:04:10Z) - Tell Me What's Next: Textual Foresight for Generic UI Representations [65.10591722192609]
We propose Textual Foresight, a novel pretraining objective for learning UI screen representations.
Textual Foresight generates global text descriptions of future UI states given a current UI and local action taken.
We train with our newly constructed mobile app dataset, OpenApp, which results in the first public dataset for app UI representation learning.
arXiv Detail & Related papers (2024-06-12T02:43:19Z) - UI Semantic Group Detection: Grouping UI Elements with Similar Semantics
in Mobile Graphical User Interface [10.80156450091773]
Existing studies on UI elements grouping mainly focus on a single UI-related software engineering task, and their groups vary in appearance and function.
We propose our semantic component groups that pack adjacent text and non-text elements with similar semantics.
To recognize semantic component groups on a UI page, we propose a robust, deep learning-based vision detector, UISCGD.
arXiv Detail & Related papers (2024-03-08T01:52:44Z) - EGFE: End-to-end Grouping of Fragmented Elements in UI Designs with
Multimodal Learning [10.885275494978478]
Grouping fragmented elements can greatly improve the readability and maintainability of the generated code.
Current methods employ a two-stage strategy that introduces hand-crafted rules to group fragmented elements.
We propose EGFE, a novel method for automatically End-to-end Grouping Fragmented Elements via UI sequence prediction.
arXiv Detail & Related papers (2023-09-18T15:28:12Z) - From Pixels to UI Actions: Learning to Follow Instructions via Graphical
User Interfaces [66.85108822706489]
This paper focuses on creating agents that interact with the digital world using the same conceptual interface that humans commonly use.
It is possible for such agents to outperform human crowdworkers on the MiniWob++ benchmark of GUI-based instruction following tasks.
arXiv Detail & Related papers (2023-05-31T23:39:18Z) - PosterLayout: A New Benchmark and Approach for Content-aware
Visual-Textual Presentation Layout [62.12447593298437]
Content-aware visual-textual presentation layout aims at arranging spatial space on the given canvas for pre-defined elements.
We propose design sequence formation (DSF) that reorganizes elements in layouts to imitate the design processes of human designers.
A novel CNN-LSTM-based conditional generative adversarial network (GAN) is presented to generate proper layouts.
arXiv Detail & Related papers (2023-03-28T12:48:36Z) - UI Layers Group Detector: Grouping UI Layers via Text Fusion and Box
Attention [7.614630088064978]
We propose a vision-based method that automatically detects images (i.e., basic shapes and visual elements) and text layers that present the same semantic meanings.
We construct a large-scale UI dataset for training and testing, and present a data augmentation approach to boost the detection performance.
arXiv Detail & Related papers (2022-12-07T03:50:20Z) - ReverseORC: Reverse Engineering of Resizable User Interface Layouts with
OR-Constraints [47.164878414034234]
ReverseORC is a novel reverse engineering (RE) approach to discover diverse layout types and their dynamic resizing behaviours.
It can create specifications that replicate even some non-standard layout managers with complex dynamic layout behaviours.
It can be used to detect and fix problems in legacy UIs, extend UIs with enhanced layout behaviours, and support the creation of flexible UI layouts.
arXiv Detail & Related papers (2022-02-23T13:57:25Z) - UIBert: Learning Generic Multimodal Representations for UI Understanding [12.931540149350633]
We introduce a transformer-based joint image-text model trained through novel pre-training tasks on large-scale unlabeled UI data.
Our key intuition is that the heterogeneous features in a UI are self-aligned, i.e., the image and text features of UI components, are predictive of each other.
We propose five pretraining tasks utilizing this self-alignment among different features of a UI component and across various components in the same UI.
We evaluate our method on nine real-world downstream UI tasks where UIBert outperforms strong multimodal baselines by up to 9.26% accuracy.
arXiv Detail & Related papers (2021-07-29T03:51:36Z) - VINS: Visual Search for Mobile User Interface Design [66.28088601689069]
This paper introduces VINS, a visual search framework, that takes as input a UI image and retrieves visually similar design examples.
The framework achieves a mean Average Precision of 76.39% for the UI detection and high performance in querying similar UI designs.
arXiv Detail & Related papers (2021-02-10T01:46:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.