DesignCoder: Hierarchy-Aware and Self-Correcting UI Code Generation with Large Language Models
- URL: http://arxiv.org/abs/2506.13663v1
- Date: Mon, 16 Jun 2025 16:20:43 GMT
- Title: DesignCoder: Hierarchy-Aware and Self-Correcting UI Code Generation with Large Language Models
- Authors: Yunnong Chen, Shixian Ding, YingYing Zhang, Wenkai Chen, Jinzhou Du, Lingyun Sun, Liuqing Chen,
- Abstract summary: DesignCoder is a novel hierarchical-aware and self-correcting automated code generation framework.<n>We introduce UI Grouping Chains, which enhance MLLMs' capability to understand and predict complex nested UI hierarchies.<n>We also incorporate a self-correction mechanism to improve the model's ability to identify and rectify errors in the generated code.
- Score: 17.348284143568282
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multimodal large language models (MLLMs) have streamlined front-end interface development by automating code generation. However, these models also introduce challenges in ensuring code quality. Existing approaches struggle to maintain both visual consistency and functional completeness in the generated components. Moreover, they lack mechanisms to assess the fidelity and correctness of the rendered pages. To address these issues, we propose DesignCoder, a novel hierarchical-aware and self-correcting automated code generation framework. Specifically, we introduce UI Grouping Chains, which enhance MLLMs' capability to understand and predict complex nested UI hierarchies. Subsequently, DesignCoder employs a hierarchical divide-and-conquer approach to generate front-end code. Finally, we incorporate a self-correction mechanism to improve the model's ability to identify and rectify errors in the generated code. Extensive evaluations on a dataset of UI mockups collected from both open-source communities and industry projects demonstrate that DesignCoder outperforms state-of-the-art baselines in React Native, a widely adopted UI framework. Our method achieves a 37.63%, 9.52%, 12.82% performance increase in visual similarity metrics (MSE, CLIP, SSIM) and significantly improves code structure similarity in terms of TreeBLEU, Container Match, and Tree Edit Distance by 30.19%, 29.31%, 24.67%. Furthermore, we conducted a user study with professional developers to assess the quality and practicality of the generated code. Results indicate that DesignCoder aligns with industry best practices, demonstrating high usability, readability, and maintainability. Our approach provides an efficient and practical solution for agile front-end development, enabling development teams to focus more on core functionality and product innovation.
Related papers
- Architecture-Aware Multi-Design Generation for Repository-Level Feature Addition [53.50448142467294]
RAIM is a multi-design and architecture-aware framework for repository-level feature addition.<n>It shifts away from linear patching by generating multiple diverse implementation designs.<n>Experiments on the NoCode-bench Verified dataset demonstrate that RAIM establishes a new state-of-the-art performance.
arXiv Detail & Related papers (2026-03-02T12:50:40Z) - VSA:Visual-Structural Alignment for UI-to-Code [29.15071743239679]
We propose bfVSA (VSA), a multi-stage paradigm designed to synthesize organized assets through visual-text alignment.<n>Our framework yields a substantial improvement in code modularity and architectural consistency over state-of-the-art benchmarks.
arXiv Detail & Related papers (2025-12-23T03:55:45Z) - PSD2Code: Automated Front-End Code Generation from Design Files via Multimodal Large Language Models [4.847776029952831]
PSD2Code is a novel multi-modal approach that leverages PSD file parsing and asset alignment to generate production-ready React+SCSS code.<n>Our method introduces a ParseAlign pipeline that extracts hierarchical structures, layer properties, and metadata from PSD files.<n>The system employs a constraint-based alignment strategy that ensures consistency between generated elements and design resources.
arXiv Detail & Related papers (2025-11-06T03:20:27Z) - Towards Realistic Project-Level Code Generation via Multi-Agent Collaboration and Semantic Architecture Modeling [7.753074942497876]
We introduce CodeProjectEval, a project-level code generation dataset built from 18 real-world repositories with 12.7 files and 2,388.6 lines of code per task.<n>We propose ProjectGen, a multi-agent framework that decomposes projects into architecture design, skeleton generation, and code filling stages.<n>Experiments show that ProjectGen achieves state-of-the-art performance, passing 52/124 test cases on the small-scale project-level code generation dataset DevBench.
arXiv Detail & Related papers (2025-11-05T12:12:35Z) - JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence [48.39202336809688]
We introduce a complete synthesis toolkit to efficiently produce a large-scale, high-quality corpus spanning from standard charts to complex interactive web UIs and code-driven animations.<n>This powers the training of our models, JanusCoder and JanusCoderV, which establish a visual-programmatic interface for generating code from textual instructions, visual inputs, or a combination of both.<n>Our 7B to 14B scale models approaching or even exceeding the performance of commercial models.
arXiv Detail & Related papers (2025-10-27T17:13:49Z) - Code Aesthetics with Agentic Reward Feedback [84.67242022647002]
Large Language Models (LLMs) have become valuable assistants for developers in code-related tasks.<n>LLMs struggle with visually-oriented coding tasks, often producing suboptimal aesthetics.<n>We introduce a new pipeline to enhance the aesthetic quality of LLM-generated code.
arXiv Detail & Related papers (2025-10-27T12:32:33Z) - VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models [82.05514464090172]
Multimodal large language models (MLLMs) have significantly advanced the integration of visual and textual understanding.<n>However, their ability to generate code from multimodal inputs remains limited.<n>We introduce VisCodex, a unified framework that seamlessly merges vision and coding language models.
arXiv Detail & Related papers (2025-08-13T17:00:44Z) - ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents [35.10813247827737]
We introduce a modular multi-agent framework that performs user interface-to-code generation in three interpretable stages.<n>The framework improves robustness, interpretability, and fidelity over end-to-end black-box methods.<n>Our approach achieves state-of-the-art performance in layout accuracy, structural coherence, and code correctness.
arXiv Detail & Related papers (2025-07-30T16:41:21Z) - Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation [72.44384066166147]
Multi-agent systems (MAS) based on large language models (LLMs) have emerged as a powerful solution for dealing with complex problems across diverse domains.<n>Existing approaches are fundamentally constrained by their reliance on a template graph modification paradigm with a predefined set of agents and hard-coded interaction structures.<n>We propose ARG-Designer, a novel autoregressive model that operationalizes this paradigm by constructing the collaboration graph from scratch.
arXiv Detail & Related papers (2025-07-24T09:17:41Z) - ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation [51.297873393639456]
ArtifactsBench is a framework for automated visual code generation evaluation.<n>Our framework renders each generated artifact and captures its dynamic behavior through temporal screenshots.<n>We construct a new benchmark of 1,825 diverse tasks and evaluate over 30 leading Large Language Models.
arXiv Detail & Related papers (2025-07-07T12:53:00Z) - DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code Generation [31.237236649603123]
Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in automated front-end engineering.<n>DesignBench is a benchmark for assessing MLLMs' capabilities in automated front-end engineering.
arXiv Detail & Related papers (2025-06-06T17:21:21Z) - UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding [84.87802580670579]
We introduce UniToken, an auto-regressive generation model that encodes visual inputs through a combination of discrete and continuous representations.<n>Our unified visual encoding framework captures both high-level semantics and low-level details, delivering multidimensional information.
arXiv Detail & Related papers (2025-04-06T09:20:49Z) - Harmonizing Visual Representations for Unified Multimodal Understanding and Generation [53.01486796503091]
We present emphHarmon, a unified autoregressive framework that harmonizes understanding and generation tasks with a shared MAR encoder.<n>Harmon achieves state-of-the-art image generation results on the GenEval, MJHQ30K and WISE benchmarks.
arXiv Detail & Related papers (2025-03-27T20:50:38Z) - Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks.<n>However, improvement is plateauing due to the exhaustion of readily available high-quality data.<n>We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z) - EpiCoder: Encompassing Diversity and Complexity in Code Generation [49.170195362149386]
Existing methods for code generation use code snippets as seed data.<n>We introduce a novel feature tree-based synthesis framework, which revolves around hierarchical code features.<n>Our framework provides precise control over the complexity of the generated code, enabling functionalities that range from function-level operations to multi-file scenarios.
arXiv Detail & Related papers (2025-01-08T18:58:15Z) - See-Saw Generative Mechanism for Scalable Recursive Code Generation with Generative AI [0.0]
This paper introduces the See-Saw generative mechanism, a novel methodology for dynamic and iterative code generation.
The proposed approach alternates between main code updates and dependency generation to ensure alignment and functionality.
The mechanism ensures that all code components are synchronized and functional, enabling scalable and efficient project generation.
arXiv Detail & Related papers (2024-11-16T18:54:56Z) - CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models [106.11371409170818]
Large language models (LLMs) can act as agents with capabilities to self-refine and improve generated code autonomously.
We propose CodeTree, a framework for LLM agents to efficiently explore the search space in different stages of the code generation process.
Specifically, we adopted a unified tree structure to explicitly explore different coding strategies, generate corresponding coding solutions, and subsequently refine the solutions.
arXiv Detail & Related papers (2024-11-07T00:09:54Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - Bridging Design and Development with Automated Declarative UI Code Generation [18.940075474582564]
Declarative UI frameworks have gained widespread adoption in mobile app development, offering benefits such as improved code readability and easier maintenance.
Recent advancements in multimodal large language models (MLLMs) have shown promise in directly generating mobile app code from user interface (UI) designs.
We propose DeclarUI, an automated approach that synergizes computer vision (CV), MLLMs, and iterative compiler-driven optimization to generate and refine declarative UI code from designs.
arXiv Detail & Related papers (2024-09-18T03:04:12Z) - ALaRM: Align Language Models via Hierarchical Rewards Modeling [41.79125107279527]
We introduce ALaRM, the first framework modeling hierarchical rewards in reinforcement learning from human feedback.
The framework addresses the limitations of current alignment approaches, by integrating holistic rewards with aspect-specific rewards.
We validate our approach through applications in long-form question answering and machine translation tasks.
arXiv Detail & Related papers (2024-03-11T14:28:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.