Fugu-MT 論文翻訳(概要): FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow

論文の概要: FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow

arxiv url: http://arxiv.org/abs/2505.17399v2
Date: Mon, 26 May 2025 11:15:36 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-27 19:27:26.837013
Title: FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow
Title（参考訳）: FullFront: フルフロントエンドエンジニアリングワークフロー全体にわたるMLLMのベンチマーク
Authors: Haoyu Sun, Huichen Will Wang, Jiawei Gu, Linjie Li, Yu Cheng,
Abstract要約: FullFrontは、MLLM(Multimodal Large Language Models)を評価するために設計されたベンチマークである。 FullFrontは、現実世界のWebページをクリーンで標準化されたHTMLに変換する、新しい2段階のプロセスを採用している。
参考スコア（独自算出の注目度）: 27.208918000210797
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Front-end engineering involves a complex workflow where engineers conceptualize designs, translate them into code, and iteratively refine the implementation. While recent benchmarks primarily focus on converting visual designs to code, we present FullFront, a benchmark designed to evaluate Multimodal Large Language Models (MLLMs) \textbf{across the full front-end development pipeline}. FullFront assesses three fundamental tasks that map directly to the front-end engineering pipeline: Webpage Design (conceptualization phase), Webpage Perception QA (comprehension of visual organization and elements), and Webpage Code Generation (implementation phase). Unlike existing benchmarks that use either scraped websites with bloated code or oversimplified LLM-generated HTML, FullFront employs a novel, two-stage process to transform real-world webpages into clean, standardized HTML while maintaining diverse visual designs and avoiding copyright issues. Extensive testing of state-of-the-art MLLMs reveals significant limitations in page perception, code generation (particularly for image handling and layout), and interaction implementation. Our results quantitatively demonstrate performance disparities across models and tasks, and highlight a substantial gap between current MLLM capabilities and human expert performance in front-end engineering. The FullFront benchmark and code are available in https://github.com/Mikivishy/FullFront.
Abstract（参考訳）: フロントエンドエンジニアリングは、設計を概念化し、コードをコードに変換し、反復的に実装を洗練する複雑なワークフローを伴う。最近のベンチマークは主にビジュアルデザインをコードに変換することに焦点を当てているが、FullFrontはMultimodal Large Language Models (MLLM) \textbf{across the full front-end development pipeline}を評価するために設計されたベンチマークである。 FullFrontは、Webページ設計(概念化フェーズ)、Webページ知覚QA(視覚組織と要素の理解)、Webページコード生成(実装フェーズ)という、フロントエンドエンジニアリングパイプラインに直接マップする3つの基本的なタスクを評価します。肥大化したコードでスクラップされたWebサイトを利用する既存のベンチマークと異なり、FullFrontは、現実世界のWebページをクリーンで標準化されたHTMLに変換するための新しい2段階のプロセスを採用し、多様なビジュアルデザインを維持し、著作権問題を回避している。最先端のMLLMの大規模なテストでは、ページ認識、コード生成(特にイメージハンドリングとレイアウト)、インタラクション実装の大幅な制限が示される。本結果は,モデルとタスク間の性能格差を定量的に示すとともに,従来のMLLM機能と,フロントエンドエンジニアリングにおける人的専門家のパフォーマンスとの間に大きなギャップを生じさせるものである。 FullFrontベンチマークとコードはhttps://github.com/Mikivishy/FullFront.comで公開されている。

論文の概要: FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow

関連論文リスト