Fugu-MT 論文翻訳(概要): MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

論文の概要: MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

arxiv url: http://arxiv.org/abs/2604.15309v1
Date: Thu, 16 Apr 2026 17:59:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-17 21:29:32.05081
Title: MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
Title（参考訳）: MM-WebAgent:Webページ生成のための階層型マルチモーダルWebエージェント
Authors: Yan Li, Zezi Zeng, Yifan Yang, Yuqing Yang, Ning Liao, Weiwei Guo, Lili Qiu, Mingxi Cheng, Qi Dai, Zhendong Wang, Zhengyuan Yang, Xue Yang, Ji Li, Lijuan Wang, Chong Luo,
Abstract要約: MM-WebAgentはマルチモーダルWebページ生成のための階層型エージェントフレームワークである。 AIGCベースの要素生成を階層的計画と反復的自己回帰を通じてコーディネートする。
参考スコア（独自算出の注目度）: 99.19991374550729
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often leads to style inconsistency and poor global coherence, as elements are generated in isolation. We propose MM-WebAgent, a hierarchical agentic framework for multimodal webpage generation that coordinates AIGC-based element generation through hierarchical planning and iterative self-reflection. MM-WebAgent jointly optimizes global layout, local multimodal content, and their integration, producing coherent and visually consistent webpages. We further introduce a benchmark for multimodal webpage generation and a multi-level evaluation protocol for systematic assessment. Experiments demonstrate that MM-WebAgent outperforms code-generation and agent-based baselines, especially on multimodal element generation and integration. Code & Data: https://aka.ms/mm-webagent.
Abstract（参考訳）: AIGC(Artificial Intelligence Generated Content)ツールの急速な進歩により、画像、ビデオ、視覚化がWebページデザインの需要に応じて作成できるようになる。しかし、このようなツールを自動Webページ生成に直接統合すると、要素が独立して生成されるため、スタイルの不整合とグローバルな一貫性が低下することが多い。マルチモーダルWebページ生成のための階層型エージェントフレームワークであるMM-WebAgentを提案する。 MM-WebAgentはグローバルなレイアウト、ローカルなマルチモーダルコンテンツ、それらの統合を共同で最適化し、一貫性と視覚的に一貫したWebページを生成する。さらに,マルチモーダルWebページ生成のためのベンチマークと,システム評価のためのマルチレベル評価プロトコルを導入する。 MM-WebAgentは、特にマルチモーダル要素の生成と統合において、コード生成およびエージェントベースのベースラインよりも優れていることを示す実験である。コードとデータ:https://aka.ms/mm-webagent.com

論文の概要: MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

関連論文リスト