WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation
- URL: http://arxiv.org/abs/2407.15502v1
- Date: Mon, 22 Jul 2024 09:35:43 GMT
- Title: WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation
- Authors: Zirui Shao, Feiyu Gao, Hangdi Xing, Zepeng Zhu, Zhi Yu, Jiajun Bu, Qi Zheng, Cong Yao,
- Abstract summary: We introduce Web Rendering Parameters Generation (WebRPG), a new task that aims at automating the generation for visual presentation of web pages based on their HTML code.
We present baseline models, utilizing VAE to manage numerous elements and rendering parameters, along with custom HTML embedding for capturing essential semantic and hierarchical information from HTML.
- Score: 24.99791278208309
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the era of content creation revolution propelled by advancements in generative models, the field of web design remains unexplored despite its critical role in modern digital communication. The web design process is complex and often time-consuming, especially for those with limited expertise. In this paper, we introduce Web Rendering Parameters Generation (WebRPG), a new task that aims at automating the generation for visual presentation of web pages based on their HTML code. WebRPG would contribute to a faster web development workflow. Since there is no existing benchmark available, we develop a new dataset for WebRPG through an automated pipeline. Moreover, we present baseline models, utilizing VAE to manage numerous elements and rendering parameters, along with custom HTML embedding for capturing essential semantic and hierarchical information from HTML. Extensive experiments, including customized quantitative evaluations for this specific task, are conducted to evaluate the quality of the generated results.
Related papers
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs [112.89665642941814]
Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio.
Current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code.
We propose Web2Code, a benchmark consisting of a new large-scale webpage-to-code dataset for instruction tuning.
arXiv Detail & Related papers (2024-06-28T17:59:46Z) - AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation [55.86438100985539]
We introduce a crawler generation task for vertical information web pages.
We propose AutoCrawler, a two-stage framework that leverages the hierarchical structure of HTML for progressive understanding.
arXiv Detail & Related papers (2024-04-19T09:59:44Z) - AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent [33.55199326570078]
AutoWebGLM is an automated web navigation agent built upon ChatGLM3-6B.
Inspired by human browsing patterns, we design an HTML simplification algorithm to represent webpages.
For testing, we establish a bilingual benchmark -- AutoWebBench -- for real-world web browsing tasks.
arXiv Detail & Related papers (2024-04-04T17:58:40Z) - Hierarchical Multimodal Pre-training for Visually Rich Webpage
Understanding [22.00873805952277]
WebLM is a multimodal pre-training network designed to address the limitations of solely modeling text and structure modalities of HTML in webpages.
We propose several pre-training tasks to model the interaction among text, structure, and image modalities effectively.
Empirical results demonstrate that the pre-trained WebLM significantly surpasses previous state-of-the-art pre-trained models across several webpage understanding tasks.
arXiv Detail & Related papers (2024-02-28T11:50:36Z) - VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks [93.85005277463802]
VisualWebArena is a benchmark designed to assess the performance of multimodal web agents on realistic tasks.
To perform on this benchmark, agents need to accurately process image-text inputs, interpret natural language instructions, and execute actions on websites to accomplish user-defined objectives.
arXiv Detail & Related papers (2024-01-24T18:35:21Z) - LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer [81.5482196644596]
Graphic layout designs play an essential role in visual communication.
Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production.
Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
arXiv Detail & Related papers (2022-12-19T21:57:35Z) - WebFormer: The Web-page Transformer for Structure Information Extraction [44.46531405460861]
Structure information extraction refers to the task of extracting structured text fields from web pages.
Recent natural language models with sequence modeling have demonstrated state-of-the-art performance on web information extraction.
We introduce WebFormer, a Web-page transFormer model for structure information extraction from web documents.
arXiv Detail & Related papers (2022-02-01T04:44:02Z) - The Klarna Product Page Dataset: Web Element Nomination with Graph
Neural Networks and Large Language Models [51.39011092347136]
We introduce the Klarna Product Page dataset, a collection of webpages that surpasses existing datasets in richness and variety.
We empirically benchmark a range of Graph Neural Networks (GNNs) on the web element nomination task.
Second, we introduce a training refinement procedure that involves identifying a small number of relevant elements from each page.
Third, we introduce the Challenge Nomination Training Procedure, a novel training approach that further boosts nomination accuracy.
arXiv Detail & Related papers (2021-11-03T12:13:52Z) - CoVA: Context-aware Visual Attention for Webpage Information Extraction [65.11609398029783]
We propose to reformulate WIE as a context-aware Webpage Object Detection task.
We develop a Context-aware Visual Attention-based (CoVA) detection pipeline which combines appearance features with syntactical structure from the DOM tree.
We show that the proposed CoVA approach is a new challenging baseline which improves upon prior state-of-the-art methods.
arXiv Detail & Related papers (2021-10-24T00:21:46Z) - Boilerplate Removal using a Neural Sequence Labeling Model [4.056234173482691]
We propose a neural sequence labeling model that does not rely on any hand-crafted features but takes only the HTML tags and words that appear in a web page as input.
This allows us to present a browser extension which highlights the content of arbitrary web pages directly within the browser using our model.
arXiv Detail & Related papers (2020-04-22T08:06:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.