Related papers: CANVAS: A Benchmark for Vision-Language Models on Tool-Based User Interface Design

CANVAS: A Benchmark for Vision-Language Models on Tool-Based User Interface Design

URL: http://arxiv.org/abs/2511.20737v2
Date: Thu, 27 Nov 2025 06:30:58 GMT
Title: CANVAS: A Benchmark for Vision-Language Models on Tool-Based User Interface Design
Authors: Daeheon Jeong, Seoyeon Byun, Kihoon Son, Dae Hyun Kim, Juho Kim,
Abstract summary: We introduce CANVAS, a benchmark for VLMs on tool-based user interface design.<n>Our benchmark contains 598 tool-based design tasks paired with ground-truth references sampled from 3.3K mobile UI designs.<n>Results suggest that leading models exhibit more strategic tool invocations, improving design quality.
Score: 20.69770605071827
License: http://creativecommons.org/licenses/by/4.0/
Abstract: User interface (UI) design is an iterative process in which designers progressively refine their work with design software such as Figma or Sketch. Recent advances in vision language models (VLMs) with tool invocation suggest these models can operate design software to edit a UI design through iteration. Understanding and enhancing this capacity is important, as it highlights VLMs' potential to collaborate with designers within conventional software. However, as no existing benchmark evaluates tool-based design performance, the capacity remains unknown. To address this, we introduce CANVAS, a benchmark for VLMs on tool-based user interface design. Our benchmark contains 598 tool-based design tasks paired with ground-truth references sampled from 3.3K mobile UI designs across 30 function-based categories (e.g., onboarding, messaging). In each task, a VLM updates the design step-by-step through context-based tool invocations (e.g., create a rectangle as a button background), linked to design software. Specifically, CANVAS incorporates two task types: (i) design replication evaluates the ability to reproduce a whole UI screen; (ii) design modification evaluates the ability to modify a specific part of an existing screen. Results suggest that leading models exhibit more strategic tool invocations, improving design quality. Furthermore, we identify common error patterns models exhibit, guiding future work in enhancing tool-based design capabilities.

Related papers

Computer-Use Agents as Judges for Generative User Interface [142.75272102498806]
ComputerUse Agents (CUA) are becoming increasingly capable of autonomously operating digital environments through Graphical User Interfaces (GUI)<n>Most GUI remain designed primarily for humans to adopt human-oriented behaviors that are unnecessary for efficient task execution.<n>This raises a fundamental question: Can CUA as judges to assist Coder for automatic GUI design?
arXiv Detail & Related papers (2025-11-19T16:00:02Z)
GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a Vision-Language Model (VLM)-based framework that generates content-aware text logo layouts.<n>We introduce two model techniques that reduce the computational cost for processing multiple glyph images simultaneously.<n>To support instruction tuning of our model, we construct two extensive text logo datasets that are five times larger than existing public datasets.
arXiv Detail & Related papers (2024-11-18T10:04:10Z)
DesignRepair: Dual-Stream Design Guideline-Aware Frontend Repair with Large Language Models [24.54628448382394]
DesignRepair is a novel dual-stream design guideline-aware system to examine and repair the design quality issues from both code aspect and rendered page aspect.<n>Our evaluations validated the efficacy and utility of our approach, demonstrating significant enhancements in adherence to design guidelines, accessibility, and user experience metrics.
arXiv Detail & Related papers (2024-11-03T15:25:47Z)
Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping [55.98643055756135]
We introduce Sketch2Code, a benchmark that evaluates state-of-the-art Vision Language Models (VLMs) on automating the conversion of rudimentary sketches into webpage prototypes. We analyze ten commercial and open-source models, showing that Sketch2Code is challenging for existing VLMs. A user study with UI/UX experts reveals a significant preference for proactive question-asking over passive feedback reception.
arXiv Detail & Related papers (2024-10-21T17:39:49Z)
Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models [81.6240188672294]
In graphic design, non-professional users often struggle to create visually appealing layouts due to limited skills and resources. We introduce a novel multimodal instruction-following framework for layout planning, allowing users to easily arrange visual elements into tailored layouts. Our method not only simplifies the design process for non-professionals but also surpasses the performance of few-shot GPT-4V models, with mIoU higher by 12% on Crello.
arXiv Detail & Related papers (2024-04-23T17:58:33Z)
From Concept to Manufacturing: Evaluating Vision-Language Models for Engineering Design [5.268919870502001]
This paper presents a comprehensive evaluation of vision-language models (VLMs) across a spectrum of engineering design tasks.<n>Specifically in this paper, we assess the capabilities of two VLMs, GPT-4V and LLaVA 1.6 34B, in design tasks such as sketch similarity analysis, CAD generation, topology optimization, manufacturability assessment, and engineering textbook problems.
arXiv Detail & Related papers (2023-11-21T15:20:48Z)
How Can Large Language Models Help Humans in Design and Manufacturing? [28.28959612862582]
Large Language Models (LLMs), including GPT-4, provide exciting new opportunities for generative design. We scrutinize the utility of LLMs in tasks such as: converting a text-based prompt into a design specification, transforming a design into manufacturing instructions, producing a design space and design variations, computing the performance of a design, and searching for designs predicated on performance. By exposing these limitations, we aspire to catalyze the continued improvement and progression of these models.
arXiv Detail & Related papers (2023-07-25T17:30:38Z)
PLay: Parametrically Conditioned Layout Generation using Latent Diffusion [18.130461065261354]
We build a conditional latent diffusion model, PLay, that generates parametrically conditioned layouts in vector graphic space from user-specified guidelines. Our method outperforms prior works across three datasets on metrics including FID and FD-VG, and in user study.
arXiv Detail & Related papers (2023-01-27T04:22:27Z)
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer [80.61492265221817]
Graphic layout designs play an essential role in visual communication. Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production. Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
arXiv Detail & Related papers (2022-12-19T21:57:35Z)
Design Space Exploration and Explanation via Conditional Variational Autoencoders in Meta-model-based Conceptual Design of Pedestrian Bridges [52.77024349608834]
This paper provides a performance-driven design exploration framework to augment the human designer through a Conditional Variational Autoencoder (CVAE) The CVAE is trained on 18'000 synthetically generated instances of a pedestrian bridge in Switzerland.
arXiv Detail & Related papers (2022-11-29T17:28:31Z)
Material Prediction for Design Automation Using Graph Representation Learning [5.181429907321226]
We introduce a graph representation learning framework that supports the material prediction of bodies in assemblies. We formulate the material selection task as a node-level prediction task over the assembly graph representation of CAD models and tackle it using Graph Neural Networks (GNNs) The proposed framework can scale to large datasets and incorporate designers' knowledge into the learning process.
arXiv Detail & Related papers (2022-09-26T15:49:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.