Related papers: UI Layers Group Detector: Grouping UI Layers via Text Fusion and Box Attention

UI Layers Group Detector: Grouping UI Layers via Text Fusion and Box Attention

URL: http://arxiv.org/abs/2212.03440v1
Date: Wed, 7 Dec 2022 03:50:20 GMT
Title: UI Layers Group Detector: Grouping UI Layers via Text Fusion and Box Attention
Authors: Shuhong Xiao, Tingting Zhou, Yunnong Chen, Dengming Zhang, Liuqing Chen, Lingyun Sun, Shiyu Yue
Abstract summary: We propose a vision-based method that automatically detects images (i.e., basic shapes and visual elements) and text layers that present the same semantic meanings. We construct a large-scale UI dataset for training and testing, and present a data augmentation approach to boost the detection performance.
Score: 7.614630088064978
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graphic User Interface (GUI) is facing great demand with the popularization and prosperity of mobile apps. Automatic UI code generation from UI design draft dramatically simplifies the development process. However, the nesting layer structure in the design draft affects the quality and usability of the generated code. Few existing GUI automated techniques detect and group the nested layers to improve the accessibility of generated code. In this paper, we proposed our UI Layers Group Detector as a vision-based method that automatically detects images (i.e., basic shapes and visual elements) and text layers that present the same semantic meanings. We propose two plug-in components, text fusion and box attention, that utilize text information from design drafts as a priori information for group localization. We construct a large-scale UI dataset for training and testing, and present a data augmentation approach to boost the detection performance. The experiment shows that the proposed method achieves a decent accuracy regarding layers grouping.

Related papers

Zero-Shot Prompting Approaches for LLM-based Graphical User Interface Generation [53.1000575179389]
We propose a Retrieval-Augmented GUI Generation (RAGG) approach, integrated with an LLM-based GUI retrieval re-ranking and filtering mechanism. In addition, we adapt Prompt Decomposition (PDGG) and Self-Critique (SCGG) for GUI generation. Our evaluation, which encompasses over 3,000 GUI annotations from over 100 crowd-workers with UI/UX experience, shows that SCGG, in contrast to PDGG and RAGG, can lead to more effective GUI generation.
arXiv Detail & Related papers (2024-12-15T22:17:30Z)
Fragmented Layer Grouping in GUI Designs Through Graph Learning Based on Multimodal Information [12.302861965706885]
In the industrial GUI-to-code process, fragmented layers may decrease the readability and maintainability of generated code. This study proposes a graph-learning-based approach to tackle the fragmented layer grouping problem according to multi-modal information in design prototypes.
arXiv Detail & Related papers (2024-12-07T06:31:09Z)
ShowUI: One Vision-Language-Action Model for GUI Visual Agent [80.50062396585004]
Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity. We develop a vision-language-action model in digital world, namely ShowUI, which features the following innovations. ShowUI, a lightweight 2B model using 256K data, achieves a strong 75.1% accuracy in zero-shot screenshot grounding.
arXiv Detail & Related papers (2024-11-26T14:29:47Z)
Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection [57.883265488038134]
We propose a hierarchical graph interaction network termed HGINet for camouflaged object detection. The network is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features. Our experiments demonstrate the superior performance of HGINet compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-27T12:53:25Z)
Tell Me What's Next: Textual Foresight for Generic UI Representations [65.10591722192609]
We propose Textual Foresight, a novel pretraining objective for learning UI screen representations. Textual Foresight generates global text descriptions of future UI states given a current UI and local action taken. We train with our newly constructed mobile app dataset, OpenApp, which results in the first public dataset for app UI representation learning.
arXiv Detail & Related papers (2024-06-12T02:43:19Z)
UI Semantic Group Detection: Grouping UI Elements with Similar Semantics in Mobile Graphical User Interface [10.80156450091773]
Existing studies on UI elements grouping mainly focus on a single UI-related software engineering task, and their groups vary in appearance and function. We propose our semantic component groups that pack adjacent text and non-text elements with similar semantics. To recognize semantic component groups on a UI page, we propose a robust, deep learning-based vision detector, UISCGD.
arXiv Detail & Related papers (2024-03-08T01:52:44Z)
EGFE: End-to-end Grouping of Fragmented Elements in UI Designs with Multimodal Learning [10.885275494978478]
Grouping fragmented elements can greatly improve the readability and maintainability of the generated code. Current methods employ a two-stage strategy that introduces hand-crafted rules to group fragmented elements. We propose EGFE, a novel method for automatically End-to-end Grouping Fragmented Elements via UI sequence prediction.
arXiv Detail & Related papers (2023-09-18T15:28:12Z)
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture. TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling. It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z)
Adaptively Clustering Neighbor Elements for Image-Text Generation [78.82346492527425]
We propose a novel Transformer-based image-to-text generation model termed as textbfACF. ACF adaptively clusters vision patches into object regions and language words into phrases to implicitly learn object-phrase alignments. Experiment results demonstrate the effectiveness of ACF, which outperforms most SOTA captioning and VQA models.
arXiv Detail & Related papers (2023-01-05T08:37:36Z)
ULDGNN: A Fragmented UI Layer Detector Based on Graph Neural Networks [7.614630088064978]
fragmented layers could degrade the code quality without being merged into a whole part if all of them are involved in the code generation. In this paper, we propose a pipeline to merge fragmented layers automatically. Our approach can retrieve most fragmented layers in UI design drafts, and achieve 87% accuracy in the detection task.
arXiv Detail & Related papers (2022-08-13T14:14:37Z)
UI Layers Merger: Merging UI layers via Visual Learning and Boundary Prior [7.251022347055101]
fragmented layers inevitably appear in the UI design drafts which greatly reduces the quality of code generation. We propose UI Layers Merger (UILM), a vision-based method, which can automatically detect and merge fragmented layers into UI components.
arXiv Detail & Related papers (2022-06-18T16:09:28Z)
VINS: Visual Search for Mobile User Interface Design [66.28088601689069]
This paper introduces VINS, a visual search framework, that takes as input a UI image and retrieves visually similar design examples. The framework achieves a mean Average Precision of 76.39% for the UI detection and high performance in querying similar UI designs.
arXiv Detail & Related papers (2021-02-10T01:46:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.