WebCode2M: A Real-World Dataset for Code Generation from Webpage Designs
- URL: http://arxiv.org/abs/2404.06369v2
- Date: Sat, 22 Feb 2025 05:15:18 GMT
- Title: WebCode2M: A Real-World Dataset for Code Generation from Webpage Designs
- Authors: Yi Gui, Zhen Li, Yao Wan, Yemin Shi, Hongyu Zhang, Yi Su, Bohua Chen, Dongping Chen, Siyuan Wu, Xing Zhou, Wenbin Jiang, Hai Jin, Xiangliang Zhang,
- Abstract summary: This paper introduces WebCode2M, a new dataset comprising 2.56 million instances, each containing a design image along with the corresponding webpage code and layout details.<n>To validate the effectiveness of WebCode2M, we introduce a baseline model based on the Vision Transformer (ViT), named WebCoder, and establish a benchmark for fair comparison.<n>The benchmarking results demonstrate that our dataset significantly improves the ability of MLLMs to generate code from webpage designs.
- Score: 49.91550773480978
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatically generating webpage code from webpage designs can significantly reduce the workload of front-end developers, and recent Multimodal Large Language Models (MLLMs) have shown promising potential in this area. However, our investigation reveals that most existing MLLMs are constrained by the absence of high-quality, large-scale, real-world datasets, resulting in inadequate performance in automated webpage code generation. To fill this gap, this paper introduces WebCode2M, a new dataset comprising 2.56 million instances, each containing a design image along with the corresponding webpage code and layout details. Sourced from real-world web resources, WebCode2M offers a rich and valuable dataset for webpage code generation across a variety of applications. The dataset quality is ensured by a scoring model that filters out instances with aesthetic deficiencies or other incomplete elements. To validate the effectiveness of WebCode2M, we introduce a baseline model based on the Vision Transformer (ViT), named WebCoder, and establish a benchmark for fair comparison. Additionally, we introduce a new metric, TreeBLEU, to measure the structural hierarchy recall. The benchmarking results demonstrate that our dataset significantly improves the ability of MLLMs to generate code from webpage designs, confirming its effectiveness and usability for future applications in front-end design tools. Finally, we highlight several practical challenges introduced by our dataset, calling for further research. The code and dataset are publicly available at our project homepage: https://webcode2m.github.io.
Related papers
- MRWeb: An Exploration of Generating Multi-Page Resource-Aware Web Code from UI Designs [50.274447094978996]
Multi-Page Resource-Aware Webpage (MRWeb) generation task transforms UI designs into multi-page, functional web UIs with internal/external navigation, image loading, and backend routing.
Our study applies existing methods to the MRWeb problem using a newly curated dataset of 500 websites (300 synthetic, 200 real-world). Specifically, we identify the best metric to evaluate the similarity of the web UI, assess the impact of the resource list on MRWeb generation, analyze MLLM limitations, and evaluate the effectiveness of the MRWeb tool in real-world.
arXiv Detail & Related papers (2024-12-19T15:02:33Z) - RedPajama: an Open Dataset for Training Large Language Models [80.74772646989423]
We identify three core data-related challenges that must be addressed to advance open-source language models.
These include (1) transparency in model development, including the data curation process, (2) access to large quantities of high-quality data, and (3) availability of artifacts and metadata for dataset curation and analysis.
We release RedPajama-V1, an open reproduction of the LLaMA training dataset, and RedPajama-V2, a massive web-only dataset consisting of raw, unfiltered text data together with quality signals and metadata.
arXiv Detail & Related papers (2024-11-19T09:35:28Z) - Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data [15.801018643716437]
This paper aims to enhance the GUI understanding and interacting capabilities of large vision-language models (LVLMs) through a data-driven approach.
We propose EDGE, a general data synthesis framework that automatically generates large-scale, multi-granularity training data from webpages across the Web.
Our approach significantly reduces the dependence on manual annotations, empowering researchers to harness the vast public resources available on the Web to advance their work.
arXiv Detail & Related papers (2024-10-25T10:46:17Z) - Data Formulator 2: Iteratively Creating Rich Visualizations with AI [65.48447317310442]
We present Data Formulator 2, an LLM-powered visualization system to address these challenges.
With Data Formulator 2, users describe their visualization intent with blended UI and natural language inputs, and data transformation are delegated to AI.
To support iteration, Data Formulator 2 lets users navigate their iteration history and reuse previous designs towards new ones so that they don't need to start from scratch every time.
arXiv Detail & Related papers (2024-08-28T20:12:17Z) - WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation [24.99791278208309]
We introduce Web Rendering Parameters Generation (WebRPG), a new task that aims at automating the generation for visual presentation of web pages based on their HTML code.
We present baseline models, utilizing VAE to manage numerous elements and rendering parameters, along with custom HTML embedding for capturing essential semantic and hierarchical information from HTML.
arXiv Detail & Related papers (2024-07-22T09:35:43Z) - Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs [112.89665642941814]
Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio.
Current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code.
We propose a benchmark consisting of a new large-scale webpage-to-code dataset for instruction tuning.
arXiv Detail & Related papers (2024-06-28T17:59:46Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.
Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.
We conduct extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - AutoScraper: A Progressive Understanding Web Agent for Web Scraper Generation [54.17246674188208]
Web scraping is a powerful technique that extracts data from websites, enabling automated data collection, enhancing data analysis capabilities, and minimizing manual data entry efforts.
Existing methods, wrappers-based methods suffer from limited adaptability and scalability when faced with a new website.
We introduce the paradigm of generating web scrapers with large language models (LLMs) and propose AutoScraper, a two-stage framework that can handle diverse and changing web environments more efficiently.
arXiv Detail & Related papers (2024-04-19T09:59:44Z) - Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering [74.99736967448423]
We construct Design2Code - the first real-world benchmark for this task.
We manually curate 484 diverse real-world webpages as test cases and develop a set of automatic evaluation metrics.
Our fine-grained break-down metrics indicate that models mostly lag in recalling visual elements from the input webpages and generating correct layout designs.
arXiv Detail & Related papers (2024-03-05T17:56:27Z) - Sketch2FullStack: Generating Skeleton Code of Full Stack Website and
Application from Sketch using Deep Learning and Computer Vision [2.422788410602121]
It requires a team of experienced developers specifically to design a large website and then convert it to code.
It would save valuable resources and fasten the overall development process.
arXiv Detail & Related papers (2022-11-26T16:32:13Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.