MLLM-Based UI2Code Automation Guided by UI Layout Information
- URL: http://arxiv.org/abs/2506.10376v1
- Date: Thu, 12 Jun 2025 06:04:16 GMT
- Title: MLLM-Based UI2Code Automation Guided by UI Layout Information
- Authors: Fan Wu, Cuiyun Gao, Shuqing Li, Xin-Cheng Wen, Qing Liao,
- Abstract summary: We propose a novel MLLM-based framework generating UI code from real-world webpage images, which includes three key modules.<n>For evaluation, we build a new benchmark dataset which involves 350 real-world websites named Snap2Code.
- Score: 17.177322441575196
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Converting user interfaces into code (UI2Code) is a crucial step in website development, which is time-consuming and labor-intensive. The automation of UI2Code is essential to streamline this task, beneficial for improving the development efficiency. There exist deep learning-based methods for the task; however, they heavily rely on a large amount of labeled training data and struggle with generalizing to real-world, unseen web page designs. The advent of Multimodal Large Language Models (MLLMs) presents potential for alleviating the issue, but they are difficult to comprehend the complex layouts in UIs and generate the accurate code with layout preserved. To address these issues, we propose LayoutCoder, a novel MLLM-based framework generating UI code from real-world webpage images, which includes three key modules: (1) Element Relation Construction, which aims at capturing UI layout by identifying and grouping components with similar structures; (2) UI Layout Parsing, which aims at generating UI layout trees for guiding the subsequent code generation process; and (3) Layout-Guided Code Fusion, which aims at producing the accurate code with layout preserved. For evaluation, we build a new benchmark dataset which involves 350 real-world websites named Snap2Code, divided into seen and unseen parts for mitigating the data leakage issue, besides the popular dataset Design2Code. Extensive evaluation shows the superior performance of LayoutCoder over the state-of-the-art approaches. Compared with the best-performing baseline, LayoutCoder improves 10.14% in the BLEU score and 3.95% in the CLIP score on average across all datasets.
Related papers
- LaTCoder: Converting Webpage Design to Code with Layout-as-Thought [27.815304610123754]
We propose LaTCoder, a novel approach that enhances layout preservation in webpage design during code generation with layout-as-Thought (LaT)<n>Specifically, we first introduce a simple yet efficient algorithm to divide the webpage design into image blocks. Next, we prompt MLLMs using a CoTbased approach to generate code for each block. Finally, we apply two assembly strategies-absolute positioning and an MLLM-based method-followed by dynamic selection to determine the optimal output.
arXiv Detail & Related papers (2025-08-05T15:28:48Z) - DesignCoder: Hierarchy-Aware and Self-Correcting UI Code Generation with Large Language Models [17.348284143568282]
DesignCoder is a novel hierarchical-aware and self-correcting automated code generation framework.<n>We introduce UI Grouping Chains, which enhance MLLMs' capability to understand and predict complex nested UI hierarchies.<n>We also incorporate a self-correction mechanism to improve the model's ability to identify and rectify errors in the generated code.
arXiv Detail & Related papers (2025-06-16T16:20:43Z) - Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting [86.15347226865826]
We design a new end-to-end object-aware lifting approach, named Unified-Lift.<n>We augment each Gaussian point with an additional Gaussian-level feature learned using a contrastive loss to encode instance information.<n>We conduct experiments on three benchmarks: LERF-Masked, Replica, and Messy Rooms.
arXiv Detail & Related papers (2025-03-18T08:42:23Z) - GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a Vision-Language Model (VLM)-based framework that generates content-aware text logo layouts.<n>We introduce two model techniques that reduce the computational cost for processing multiple glyph images simultaneously.<n>To support instruction tuning of our model, we construct two extensive text logo datasets that are five times larger than existing public datasets.
arXiv Detail & Related papers (2024-11-18T10:04:10Z) - Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach [51.522121376987634]
We propose DCGen, a divide-and-based approach to automate the translation of webpage design to UI code.<n>We show that DCGen achieves up to a 15% improvement in visual similarity and 8% in code similarity for large input images.<n>Human evaluations show that DCGen can help developers implement webpages significantly faster and more similar to the UI designs.
arXiv Detail & Related papers (2024-06-24T07:58:36Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.<n>Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.<n>We develop an automated text-to-poster system that generates editable posters based on users' design intentions.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - WebCode2M: A Real-World Dataset for Code Generation from Webpage Designs [49.91550773480978]
This paper introduces WebCode2M, a new dataset comprising 2.56 million instances, each containing a design image along with the corresponding webpage code and layout details.<n>To validate the effectiveness of WebCode2M, we introduce a baseline model based on the Vision Transformer (ViT), named WebCoder, and establish a benchmark for fair comparison.<n>The benchmarking results demonstrate that our dataset significantly improves the ability of MLLMs to generate code from webpage designs.
arXiv Detail & Related papers (2024-04-09T15:05:48Z) - LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language
Models [84.16541551923221]
We propose a model that treats layout generation as a code generation task to enhance semantic information.
We develop a Code Instruct Tuning (CIT) approach comprising three interconnected modules.
We attain significant state-of-the-art performance on multiple datasets.
arXiv Detail & Related papers (2023-09-18T06:35:10Z) - Sketch2FullStack: Generating Skeleton Code of Full Stack Website and
Application from Sketch using Deep Learning and Computer Vision [2.422788410602121]
It requires a team of experienced developers specifically to design a large website and then convert it to code.
It would save valuable resources and fasten the overall development process.
arXiv Detail & Related papers (2022-11-26T16:32:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.