LaTCoder: Converting Webpage Design to Code with Layout-as-Thought
- URL: http://arxiv.org/abs/2508.03560v1
- Date: Tue, 05 Aug 2025 15:28:48 GMT
- Title: LaTCoder: Converting Webpage Design to Code with Layout-as-Thought
- Authors: Yi Gui, Zhen Li, Zhongyi Zhang, Guohao Wang, Tianpeng Lv, Gaoyang Jiang, Yi Liu, Dongping Chen, Yao Wan, Hongyu Zhang, Wenbin Jiang, Xuanhua Shi, Hai Jin,
- Abstract summary: We propose LaTCoder, a novel approach that enhances layout preservation in webpage design during code generation with layout-as-Thought (LaT)<n>Specifically, we first introduce a simple yet efficient algorithm to divide the webpage design into image blocks. Next, we prompt MLLMs using a CoTbased approach to generate code for each block. Finally, we apply two assembly strategies-absolute positioning and an MLLM-based method-followed by dynamic selection to determine the optimal output.
- Score: 27.815304610123754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Converting webpage designs into code (design-to-code) plays a vital role in User Interface (UI) development for front-end developers, bridging the gap between visual design and functional implementation. While recent Multimodal Large Language Models (MLLMs) have shown significant potential in design-to-code tasks, they often fail to accurately preserve the layout during code generation. To this end, we draw inspiration from the Chain-of-Thought (CoT) reasoning in human cognition and propose LaTCoder, a novel approach that enhances layout preservation in webpage design during code generation with Layout-as-Thought (LaT). Specifically, we first introduce a simple yet efficient algorithm to divide the webpage design into image blocks. Next, we prompt MLLMs using a CoTbased approach to generate code for each block. Finally, we apply two assembly strategies-absolute positioning and an MLLM-based method-followed by dynamic selection to determine the optimal output. We evaluate the effectiveness of LaTCoder using multiple backbone MLLMs (i.e., DeepSeek-VL2, Gemini, and GPT-4o) on both a public benchmark and a newly introduced, more challenging benchmark (CC-HARD) that features complex layouts. The experimental results on automatic metrics demonstrate significant improvements. Specifically, TreeBLEU scores increased by 66.67% and MAE decreased by 38% when using DeepSeek-VL2, compared to direct prompting. Moreover, the human preference evaluation results indicate that annotators favor the webpages generated by LaTCoder in over 60% of cases, providing strong evidence of the effectiveness of our method.
Related papers
- MLLM-Based UI2Code Automation Guided by UI Layout Information [17.177322441575196]
We propose a novel MLLM-based framework generating UI code from real-world webpage images, which includes three key modules.<n>For evaluation, we build a new benchmark dataset which involves 350 real-world websites named Snap2Code.
arXiv Detail & Related papers (2025-06-12T06:04:16Z) - UICopilot: Automating UI Synthesis via Hierarchical Code Generation from Webpage Designs [43.006316221657904]
This paper proposes a novel approach to automating the synthesis of User Interfaces (UIs) via hierarchical code generation from webpage designs.<n>The core idea of UICopilot is to decompose the generation process into two stages: first, generating the coarse-grained HTML structure, followed by the generation of fine-grained code.<n> Experimental results demonstrate that UICopilot significantly outperforms existing baselines in both automatic evaluation and human evaluations.
arXiv Detail & Related papers (2025-05-15T02:09:54Z) - Interaction2Code: Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive Prototyping [57.024913536420264]
Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance on the design-to-code task.<n>We present the first systematic investigation of MLLMs in generating interactive webpages.
arXiv Detail & Related papers (2024-11-05T17:40:03Z) - Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach [51.522121376987634]
We propose DCGen, a divide-and-based approach to automate the translation of webpage design to UI code.<n>We show that DCGen achieves up to a 15% improvement in visual similarity and 8% in code similarity for large input images.<n>Human evaluations show that DCGen can help developers implement webpages significantly faster and more similar to the UI designs.
arXiv Detail & Related papers (2024-06-24T07:58:36Z) - Prototype2Code: End-to-end Front-end Code Generation from UI Design Prototypes [13.005027924553012]
We introduce Prototype2Code, which achieves end-to-end front-end code generation with business demands.
For Prototype2Code, we incorporate design linting into the workflow, addressing the detection of fragmented elements and perceptual groups.
By optimizing the hierarchical structure and intelligently recognizing UI element types, Prototype2Code generates code that is more readable and structurally clearer.
arXiv Detail & Related papers (2024-05-08T11:32:50Z) - WebCode2M: A Real-World Dataset for Code Generation from Webpage Designs [49.91550773480978]
This paper introduces WebCode2M, a new dataset comprising 2.56 million instances, each containing a design image along with the corresponding webpage code and layout details.<n>To validate the effectiveness of WebCode2M, we introduce a baseline model based on the Vision Transformer (ViT), named WebCoder, and establish a benchmark for fair comparison.<n>The benchmarking results demonstrate that our dataset significantly improves the ability of MLLMs to generate code from webpage designs.
arXiv Detail & Related papers (2024-04-09T15:05:48Z) - Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering [74.99736967448423]
We construct Design2Code - the first real-world benchmark for this task.<n>We manually curate 484 diverse real-world webpages as test cases and develop a set of automatic evaluation metrics.<n>Our fine-grained break-down metrics indicate that models mostly lag in recalling visual elements from the input webpages and generating correct layout designs.
arXiv Detail & Related papers (2024-03-05T17:56:27Z) - Coding by Design: GPT-4 empowers Agile Model Driven Development [0.03683202928838613]
This research offers an Agile Model-Driven Development (MDD) approach that enhances code auto-generation using OpenAI's GPT-4.
Our work emphasizes "Agility" as a significant contribution to the current MDD method, particularly when the model undergoes changes or needs deployment in a different programming language.
Ultimately, leveraging GPT-4, our last layer auto-generates code in both Java and Python.
arXiv Detail & Related papers (2023-10-06T15:05:05Z) - LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language
Models [84.16541551923221]
We propose a model that treats layout generation as a code generation task to enhance semantic information.
We develop a Code Instruct Tuning (CIT) approach comprising three interconnected modules.
We attain significant state-of-the-art performance on multiple datasets.
arXiv Detail & Related papers (2023-09-18T06:35:10Z) - MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down
Distillation [153.56211546576978]
In this work, we propose that better soft targets with higher compatibil-ity can be generated by using a label generator.
We can employ the meta-learning technique to optimize this label generator.
The experiments are conducted on two standard classificationbenchmarks, namely CIFAR-100 and ILSVRC2012.
arXiv Detail & Related papers (2020-08-27T13:04:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.