UIClip: A Data-driven Model for Assessing User Interface Design
- URL: http://arxiv.org/abs/2404.12500v1
- Date: Thu, 18 Apr 2024 20:43:08 GMT
- Title: UIClip: A Data-driven Model for Assessing User Interface Design
- Authors: Jason Wu, Yi-Hao Peng, Amanda Li, Amanda Swearngin, Jeffrey P. Bigham, Jeffrey Nichols,
- Abstract summary: We develop a machine-learned model, UIClip, for assessing the design quality and visual relevance of a user interface.
We show how UIClip can facilitate downstream applications that rely on instantaneous assessment of UI design quality.
- Score: 20.66914084220734
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: User interface (UI) design is a difficult yet important task for ensuring the usability, accessibility, and aesthetic qualities of applications. In our paper, we develop a machine-learned model, UIClip, for assessing the design quality and visual relevance of a UI given its screenshot and natural language description. To train UIClip, we used a combination of automated crawling, synthetic augmentation, and human ratings to construct a large-scale dataset of UIs, collated by description and ranked by design quality. Through training on the dataset, UIClip implicitly learns properties of good and bad designs by i) assigning a numerical score that represents a UI design's relevance and quality and ii) providing design suggestions. In an evaluation that compared the outputs of UIClip and other baselines to UIs rated by 12 human designers, we found that UIClip achieved the highest agreement with ground-truth rankings. Finally, we present three example applications that demonstrate how UIClip can facilitate downstream applications that rely on instantaneous assessment of UI design quality: i) UI code generation, ii) UI design tips generation, and iii) quality-aware UI example search.
Related papers
- ShowUI: One Vision-Language-Action Model for GUI Visual Agent [80.50062396585004]
Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity.
We develop a vision-language-action model in digital world, namely ShowUI, which features the following innovations.
ShowUI, a lightweight 2B model using 256K data, achieves a strong 75.1% accuracy in zero-shot screenshot grounding.
arXiv Detail & Related papers (2024-11-26T14:29:47Z) - AutoGameUI: Constructing High-Fidelity Game UIs via Multimodal Learning and Interactive Web-Based Tool [21.639682821138663]
We introduce an innovative system, AutoGameUI, for efficiently constructing cohesive user interfaces in game development.
We propose a two-stage multimodal learning pipeline to obtain comprehensive representations of both UI and UX designs.
Through the correspondences, a cohesive user interface is automatically constructed from pairwise designs.
arXiv Detail & Related papers (2024-11-06T07:16:54Z) - UICrit: Enhancing Automated Design Evaluation with a UICritique Dataset [10.427243347670965]
We present a targeted dataset of 3,059 design critiques and quality ratings for 983 mobile UIs.
We apply this dataset to achieve 55% performance gain in LLM-generated UI feedback.
We discuss future applications of this dataset, including training a reward model for generative UI techniques.
arXiv Detail & Related papers (2024-07-11T20:18:19Z) - Tell Me What's Next: Textual Foresight for Generic UI Representations [65.10591722192609]
We propose Textual Foresight, a novel pretraining objective for learning UI screen representations.
Textual Foresight generates global text descriptions of future UI states given a current UI and local action taken.
We train with our newly constructed mobile app dataset, OpenApp, which results in the first public dataset for app UI representation learning.
arXiv Detail & Related papers (2024-06-12T02:43:19Z) - I-Design: Personalized LLM Interior Designer [57.00412237555167]
I-Design is a personalized interior designer that allows users to generate and visualize their design goals through natural language communication.
I-Design starts with a team of large language model agents that engage in dialogues and logical reasoning with one another.
The final design is then constructed in 3D by retrieving and integrating assets from an existing object database.
arXiv Detail & Related papers (2024-04-03T16:17:53Z) - ILuvUI: Instruction-tuned LangUage-Vision modeling of UIs from Machine
Conversations [13.939350184164017]
Multimodal Vision-Language Models (VLMs) enable powerful applications from their fused understanding of images and language.
We adapt a recipe for generating paired text-image training data for VLMs to the UI domain by combining existing pixel-based methods with a Large Language Model (LLM)
We generate a dataset of 335K conversational examples paired with UIs that cover Q&A, UI descriptions, and planning, and use it to fine-tune a conversational VLM for UI tasks.
arXiv Detail & Related papers (2023-10-07T16:32:34Z) - Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models [64.24227572048075]
We propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language models.
Our approach takes inspiration from human intelligence in which external knowledge is usually incorporated into recognizing novel categories of objects.
arXiv Detail & Related papers (2023-08-22T04:24:45Z) - UIBert: Learning Generic Multimodal Representations for UI Understanding [12.931540149350633]
We introduce a transformer-based joint image-text model trained through novel pre-training tasks on large-scale unlabeled UI data.
Our key intuition is that the heterogeneous features in a UI are self-aligned, i.e., the image and text features of UI components, are predictive of each other.
We propose five pretraining tasks utilizing this self-alignment among different features of a UI component and across various components in the same UI.
We evaluate our method on nine real-world downstream UI tasks where UIBert outperforms strong multimodal baselines by up to 9.26% accuracy.
arXiv Detail & Related papers (2021-07-29T03:51:36Z) - VINS: Visual Search for Mobile User Interface Design [66.28088601689069]
This paper introduces VINS, a visual search framework, that takes as input a UI image and retrieves visually similar design examples.
The framework achieves a mean Average Precision of 76.39% for the UI detection and high performance in querying similar UI designs.
arXiv Detail & Related papers (2021-02-10T01:46:33Z) - User-Guided Domain Adaptation for Rapid Annotation from User
Interactions: A Study on Pathological Liver Segmentation [49.96706092808873]
Mask-based annotation of medical images, especially for 3D data, is a bottleneck in developing reliable machine learning models.
We propose the user-guided domain adaptation (UGDA) framework, which uses prediction-based adversarial domain adaptation (PADA) to model the combined distribution of UIs and mask predictions.
We show UGDA can retain this state-of-the-art performance even when only seeing a fraction of available UIs.
arXiv Detail & Related papers (2020-09-05T04:24:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.