Computational Approaches for App-to-App Retrieval and Design Consistency
Check
- URL: http://arxiv.org/abs/2309.10328v1
- Date: Tue, 19 Sep 2023 05:21:22 GMT
- Title: Computational Approaches for App-to-App Retrieval and Design Consistency
Check
- Authors: Seokhyeon Park, Wonjae Kim, Young-Ho Kim, Jinwook Seo
- Abstract summary: Current approaches rely on machine learning models trained on small-sized mobile UI datasets to extract semantic vectors.
We employ visual models trained with large web-scale images to test whether they could extract a UI representation in a zero-shot way.
We also use mathematically founded methods to enable app-to-app retrieval and design consistency analysis.
- Score: 19.689603972238583
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Extracting semantic representations from mobile user interfaces (UI) and
using the representations for designers' decision-making processes have shown
the potential to be effective computational design support tools. Current
approaches rely on machine learning models trained on small-sized mobile UI
datasets to extract semantic vectors and use screenshot-to-screenshot
comparison to retrieve similar-looking UIs given query screenshots. However,
the usability of these methods is limited because they are often not
open-sourced and have complex training pipelines for practitioners to follow,
and are unable to perform screenshot set-to-set (i.e., app-to-app) retrieval.
To this end, we (1) employ visual models trained with large web-scale images
and test whether they could extract a UI representation in a zero-shot way and
outperform existing specialized models, and (2) use mathematically founded
methods to enable app-to-app retrieval and design consistency analysis. Our
experiments show that our methods not only improve upon previous retrieval
models but also enable multiple new applications.
Related papers
- ShowUI: One Vision-Language-Action Model for GUI Visual Agent [80.50062396585004]
Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity.
We develop a vision-language-action model in digital world, namely ShowUI, which features the following innovations.
ShowUI, a lightweight 2B model using 256K data, achieves a strong 75.1% accuracy in zero-shot screenshot grounding.
arXiv Detail & Related papers (2024-11-26T14:29:47Z) - Customize Your Own Paired Data via Few-shot Way [14.193031218059646]
Some supervised methods require huge amounts of paired training data, which greatly limits their usages.
The other unsupervised methods take full advantage of large-scale pre-trained priors, thus being strictly restricted to the domains where the priors are trained on and behaving badly in out-of-distribution cases.
In our proposed framework, a novel few-shot learning mechanism based on the directional transformations among samples is introduced and expands the learnable space exponentially.
arXiv Detail & Related papers (2024-05-21T04:21:35Z) - Generalized User Representations for Transfer Learning [6.953653891411339]
We present a novel framework for user representation in large-scale recommender systems.
Our approach employs a two-stage methodology combining representation learning and transfer learning.
We show how the proposed framework can significantly reduce infrastructure costs compared to alternative approaches.
arXiv Detail & Related papers (2024-03-01T15:05:21Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Meta-training with Demonstration Retrieval for Efficient Few-shot
Learning [11.723856248352007]
Large language models show impressive results on few-shot NLP tasks.
These models are memory and computation-intensive.
We propose meta-training with demonstration retrieval.
arXiv Detail & Related papers (2023-06-30T20:16:22Z) - Spotlight: Mobile UI Understanding using Vision-Language Models with a
Focus [9.401663915424008]
We propose a vision-language model that only takes the screenshot of the UI and a region of interest on the screen as the input.
Our experiments show that our model obtains SoTA results on several representative UI tasks and outperforms previous methods.
arXiv Detail & Related papers (2022-09-29T16:45:43Z) - Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation
with Large Language Models [116.25562358482962]
State-of-the-art neural language models can be used to solve ad-hoc language tasks without the need for supervised training.
PromptIDE allows users to experiment with prompt variations, visualize prompt performance, and iteratively optimize prompts.
arXiv Detail & Related papers (2022-08-16T17:17:53Z) - Multi-Modal Few-Shot Object Detection with Meta-Learning-Based
Cross-Modal Prompting [77.69172089359606]
We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection.
Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning.
We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.
arXiv Detail & Related papers (2022-04-16T16:45:06Z) - How to Design Sample and Computationally Efficient VQA Models [53.65668097847456]
We find that representing the text as probabilistic programs and images as object-level scene graphs best satisfy these desiderata.
We extend existing models to leverage these soft programs and scene graphs to train on question answer pairs in an end-to-end manner.
arXiv Detail & Related papers (2021-03-22T01:48:16Z) - Region Comparison Network for Interpretable Few-shot Image
Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes.
We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works.
We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.