Multimodal graph representation learning for website generation based on visual sketch
- URL: http://arxiv.org/abs/2504.18729v1
- Date: Fri, 25 Apr 2025 22:48:10 GMT
- Title: Multimodal graph representation learning for website generation based on visual sketch
- Authors: Tung D. Vu, Chung Hoang, Truong-Son Hy,
- Abstract summary: Design2Code problem involves converting digital designs into functional source code.<n>Traditional approaches often struggle with accurately interpreting the intricate visual details and structural relationships inherent in webpage designs.<n>We propose a novel method that leverages multimodal graph representation learning to address these challenges.
- Score: 1.515687944002438
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Design2Code problem, which involves converting digital designs into functional source code, is a significant challenge in software development due to its complexity and time-consuming nature. Traditional approaches often struggle with accurately interpreting the intricate visual details and structural relationships inherent in webpage designs, leading to limitations in automation and efficiency. In this paper, we propose a novel method that leverages multimodal graph representation learning to address these challenges. By integrating both visual and structural information from design sketches, our approach enhances the accuracy and efficiency of code generation, particularly in producing semantically correct and structurally sound HTML code. We present a comprehensive evaluation of our method, demonstrating significant improvements in both accuracy and efficiency compared to existing techniques. Extensive evaluation demonstrates significant improvements of multimodal graph learning over existing techniques, highlighting the potential of our method to revolutionize design-to-code automation. Code available at https://github.com/HySonLab/Design2Code
Related papers
- Enhancing Chart-to-Code Generation in Multimodal Large Language Models via Iterative Dual Preference Learning [16.22363384653305]
We introduce Chart2Code, a novel iterative dual preference learning framework for chart-to-code generation.<n>We find that Chart2Code consistently improves out-of-distribution chart-to-code generation quality.<n>Our framework paves the way for future advancements in chart comprehension.
arXiv Detail & Related papers (2025-04-03T07:51:20Z) - Bridging Visualization and Optimization: Multimodal Large Language Models on Graph-Structured Combinatorial Optimization [56.17811386955609]
Graph-structured challenges are inherently difficult due to their nonlinear and intricate nature.
In this study, we propose transforming graphs into images to preserve their higher-order structural features accurately.
By combining the innovative paradigm powered by multimodal large language models with simple search techniques, we aim to develop a novel and effective framework.
arXiv Detail & Related papers (2025-01-21T08:28:10Z) - Design-o-meter: Towards Evaluating and Refining Graphic Designs [11.416650723712968]
We introduce Design-o-meter, a data-driven methodology to quantify the goodness of graphic designs.
To the best of our knowledge, Design-o-meter is the first approach that scores and refines designs in a unified framework.
arXiv Detail & Related papers (2024-11-22T14:17:46Z) - GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a VLM-based framework that generates content-aware text logo layouts.
We introduce two model techniques to reduce the computation for processing multiple glyph images simultaneously.
To support instruction-tuning of out model, we construct two extensive text logo datasets, which are 5x more larger than the existing public dataset.
arXiv Detail & Related papers (2024-11-18T10:04:10Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.<n>Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.<n>We develop an automated text-to-poster system that generates editable posters based on users' design intentions.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning [61.21923643289266]
Chain of Manipulations is a mechanism that enables Vision-Language Models to solve problems step-by-step with evidence.
After training, models can solve various visual problems by eliciting intrinsic manipulations (e.g., grounding, zoom in) actively without involving external tools.
Our trained model, textbfCogCoM, achieves state-of-the-art performance across 9 benchmarks from 4 categories.
arXiv Detail & Related papers (2024-02-06T18:43:48Z) - Compositional Generative Inverse Design [69.22782875567547]
Inverse design, where we seek to design input variables in order to optimize an underlying objective function, is an important problem.
We show that by instead optimizing over the learned energy function captured by the diffusion model, we can avoid such adversarial examples.
In an N-body interaction task and a challenging 2D multi-airfoil design task, we demonstrate that by composing the learned diffusion model at test time, our method allows us to design initial states and boundary shapes.
arXiv Detail & Related papers (2024-01-24T01:33:39Z) - A Comprehensive End-to-End Computer Vision Framework for Restoration and
Recognition of Low-Quality Engineering Drawings [19.375278164300987]
This paper focuses on restoring and recognizing low-quality engineering drawings.
An end-to-end framework is proposed to improve the quality of the drawings and identify the graphical symbols on them.
Experiments on real-world electrical diagrams show that the proposed framework achieves an accuracy of 98.98% and a recall of 99.33%.
arXiv Detail & Related papers (2023-12-21T07:22:25Z) - HAT-GAE: Self-Supervised Graph Auto-encoders with Hierarchical Adaptive
Masking and Trainable Corruption [0.76146285961466]
We propose a novel auto-encoder model for graph representation learning.
Our model incorporates a hierarchical adaptive masking mechanism to incrementally increase the difficulty of training.
We demonstrate the superiority of our proposed method over state-of-the-art graph representation learning models.
arXiv Detail & Related papers (2023-01-28T02:43:54Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.