Fugu-MT 論文翻訳(概要): CANVAS: A Benchmark for Vision-Language Models on Tool-Based User Interface Design

論文の概要: CANVAS: A Benchmark for Vision-Language Models on Tool-Based User Interface Design

arxiv url: http://arxiv.org/abs/2511.20737v2
Date: Thu, 27 Nov 2025 06:30:58 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-01 13:46:31.800846
Title: CANVAS: A Benchmark for Vision-Language Models on Tool-Based User Interface Design
Title（参考訳）: CANVAS: ツールベースのユーザインタフェース設計のためのビジョン言語モデルベンチマーク
Authors: Daeheon Jeong, Seoyeon Byun, Kihoon Son, Dae Hyun Kim, Juho Kim,
Abstract要約: 本稿では,ツールベースのユーザインタフェース設計におけるVLMのベンチマークであるCANVASを紹介する。私たちのベンチマークには,3.3KモバイルUI設計から採取した地平線参照と組み合わせた598のツールベースのデザインタスクが含まれています。結果は、主要なモデルがより戦略的ツール呼び出しを示し、設計品質が向上することを示唆している。
参考スコア（独自算出の注目度）: 20.69770605071827
License: http://creativecommons.org/licenses/by/4.0/
Abstract: User interface (UI) design is an iterative process in which designers progressively refine their work with design software such as Figma or Sketch. Recent advances in vision language models (VLMs) with tool invocation suggest these models can operate design software to edit a UI design through iteration. Understanding and enhancing this capacity is important, as it highlights VLMs' potential to collaborate with designers within conventional software. However, as no existing benchmark evaluates tool-based design performance, the capacity remains unknown. To address this, we introduce CANVAS, a benchmark for VLMs on tool-based user interface design. Our benchmark contains 598 tool-based design tasks paired with ground-truth references sampled from 3.3K mobile UI designs across 30 function-based categories (e.g., onboarding, messaging). In each task, a VLM updates the design step-by-step through context-based tool invocations (e.g., create a rectangle as a button background), linked to design software. Specifically, CANVAS incorporates two task types: (i) design replication evaluates the ability to reproduce a whole UI screen; (ii) design modification evaluates the ability to modify a specific part of an existing screen. Results suggest that leading models exhibit more strategic tool invocations, improving design quality. Furthermore, we identify common error patterns models exhibit, guiding future work in enhancing tool-based design capabilities.
Abstract（参考訳）: ユーザーインターフェース(UI)設計は、デザイナーがFigmaやSketchといったデザインソフトウェアで徐々に洗練していく反復的なプロセスである。ツール呼び出しによる視覚言語モデル(VLM)の最近の進歩は、これらのモデルが反復を通してUIデザインを編集するための設計ソフトウェアを動作させることができることを示唆している。この能力の理解と強化は、VLMが従来のソフトウェアでデザイナと協力する可能性を強調しているため重要である。しかし、既存のベンチマークではツールベースの設計性能を評価していないため、キャパシティは依然として不明である。そこで我々は,ツールベースのユーザインタフェース設計におけるVLMのベンチマークであるCANVASを紹介する。私たちのベンチマークには、598のツールベースのデザインタスクと、30の関数ベースのカテゴリ(例えば、オンボーディング、メッセージング)にわたる3.3KモバイルUIデザインからサンプリングされた地平線参照のペアが含まれています。各タスクにおいて、VLMは、設計ソフトウェアに関連するコンテキストベースのツール呼び出し(例えば、ボタンの背景として矩形を作成する)を通じて、設計をステップバイステップで更新する。具体的には、CANVASには2つのタスクタイプがある。 i)UI画面全体を再現する機能を評価する。二デザイン変更は、既存の画面の特定の部分を変更する能力を評価する。結果は、主要なモデルがより戦略的ツール呼び出しを示し、設計品質が向上することを示唆している。さらに,ツールベース設計能力の向上に向けた今後の取り組みを導くため,一般的なエラーパターンモデルを特定する。

論文の概要: CANVAS: A Benchmark for Vision-Language Models on Tool-Based User Interface Design

関連論文リスト