Fugu-MT 論文翻訳(概要): Figma2Code: Automating Multimodal Design to Code in the Wild

論文の概要: Figma2Code: Automating Multimodal Design to Code in the Wild

arxiv url: http://arxiv.org/abs/2604.13648v1
Date: Wed, 15 Apr 2026 09:17:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-16 20:38:32.465031
Title: Figma2Code: Automating Multimodal Design to Code in the Wild
Title（参考訳）: Figma2Code: 野生でコードにマルチモーダル設計を自動化する
Authors: Yi Gui, Jiawan Zhang, Yina Wang, Tianran Ma, Yao Wan, Shilin He, Dongping Chen, Zhou Zhao, Wenbin Jiang, Xuanhua Shi, Hai Jin, Philip S Yu,
Abstract要約: 我々はFigma2Codeを紹介した。Figma2Codeは、デザインからコードへ、マルチモーダルな設定に進化させる新しいタスクである。我々はFigmaコミュニティからペアデザインイメージとそのメタデータファイルを収集する。このプロセスは3,055個のサンプルを生成し、そこからデザイナーは213の高品質なケースのバランスのとれたデータセットをキュレートする。
参考スコア（独自算出の注目度）: 85.29510079067464
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Front-end development constitutes a substantial portion of software engineering, yet converting design mockups into production-ready User Interface (UI) code remains tedious and costly. While recent work has explored automating this process with Multimodal Large Language Models (MLLMs), existing approaches typically rely solely on design images. As a result, they must infer complex UI details from images alone, often leading to degraded results. In real-world development workflows, however, design mockups are usually delivered as Figma files, a widely used tool for front-end design, that embed rich multimodal information (e.g., metadata and assets) essential for generating high-quality UI. To bridge this gap, we introduce Figma2Code, a new task that advances design-to-code into a multimodal setting and aims to automate design-to-code in the wild. Specifically, we collect paired design images and their corresponding metadata files from the Figma community. We then apply a series of processing operations, including rule-based filtering, human- and MLLM-based annotation and screening, and metadata refinement. This process yields 3,055 samples, from which designers curate a balanced dataset of 213 high-quality cases. Using this dataset, we benchmark ten state-of-the-art open-source and proprietary MLLMs. Our results show that while proprietary models achieve superior visual fidelity, they remain limited in layout responsiveness and code maintainability. Further experiments across modalities and ablation studies corroborate this limitation, partly due to models' tendency to directly map primitive visual attributes from Figma metadata.
Abstract（参考訳）: フロントエンド開発はソフトウェアエンジニアリングのかなりの部分を占めますが、設計モックアップをプロダクション対応のユーザインターフェース(UI)コードに変換するのは面倒でコストがかかります。最近の研究では、このプロセスをMLLM(Multimodal Large Language Models)で自動化することを検討しているが、既存のアプローチは通常、デザインイメージに頼っている。結果として、画像だけで複雑なUIの詳細を推測しなければなりません。しかし、実際の開発ワークフローでは、設計モックアップは通常、高品質なUIを生成するのに不可欠なリッチなマルチモーダル情報(メタデータやアセットなど)を組み込む、フロントエンド設計のための広く使われているツールであるFigmaファイルとして提供される。このギャップを埋めるために、Figma2Codeを紹介します。これは、デザインからコードへ、マルチモーダルな設定に進化させ、ワイルドな設計からコードへの自動化を目指している新しいタスクです。具体的には、Figmaコミュニティからペアデザインイメージとそのメタデータファイルを収集する。次に、ルールベースのフィルタリング、ヒューマンおよびMLLMベースのアノテーションとスクリーニング、メタデータの精細化など、一連の処理操作を適用する。このプロセスは3,055個のサンプルを生成し、そこからデザイナーは213の高品質なケースのバランスのとれたデータセットをキュレートする。このデータセットを用いて、10の最先端のオープンソースおよびプロプライエタリMLLMをベンチマークする。その結果、プロプライエタリなモデルは優れた視覚的忠実度を達成できるが、レイアウトの応答性とコードの保守性には制限があることがわかった。モダリティとアブレーション研究のさらなる実験は、モデルがフィグマメタデータから直接原始的な視覚属性をマッピングする傾向のために、この制限を裏付ける。

論文の概要: Figma2Code: Automating Multimodal Design to Code in the Wild

関連論文リスト