Learning to Denoise Raw Mobile UI Layouts for Improving Datasets at
Scale
- URL: http://arxiv.org/abs/2201.04100v2
- Date: Thu, 13 Jan 2022 17:53:31 GMT
- Title: Learning to Denoise Raw Mobile UI Layouts for Improving Datasets at
Scale
- Authors: Gang Li, Gilles Baechler, Manuel Tragut, Yang Li
- Abstract summary: We propose a deep learning pipeline for denoising user interface ( UI) layouts.
Our pipeline annotates the raw layout by removing incorrect nodes and assigning a semantically meaningful type to each node.
Our deep models achieve high accuracy with F1 scores of 82.7% for detecting layout objects that do not have a valid visual representation.
- Score: 7.6774030932546315
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The layout of a mobile screen is a critical data source for UI design
research and semantic understanding of the screen. However, UI layouts in
existing datasets are often noisy, have mismatches with their visual
representation, or consists of generic or app-specific types that are difficult
to analyze and model. In this paper, we propose the CLAY pipeline that uses a
deep learning approach for denoising UI layouts, allowing us to automatically
improve existing mobile UI layout datasets at scale. Our pipeline takes both
the screenshot and the raw UI layout, and annotates the raw layout by removing
incorrect nodes and assigning a semantically meaningful type to each node. To
experiment with our data-cleaning pipeline, we create the CLAY dataset of
59,555 human-annotated screen layouts, based on screenshots and raw layouts
from Rico, a public mobile UI corpus. Our deep models achieve high accuracy
with F1 scores of 82.7% for detecting layout objects that do not have a valid
visual representation and 85.9% for recognizing object types, which
significantly outperforms a heuristic baseline. Our work lays a foundation for
creating large-scale high quality UI layout datasets for data-driven mobile UI
research and reduces the need of manual labeling efforts that are prohibitively
expensive.
Related papers
- ShowUI: One Vision-Language-Action Model for GUI Visual Agent [80.50062396585004]
Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity.
We develop a vision-language-action model in digital world, namely ShowUI, which features the following innovations.
ShowUI, a lightweight 2B model using 256K data, achieves a strong 75.1% accuracy in zero-shot screenshot grounding.
arXiv Detail & Related papers (2024-11-26T14:29:47Z) - Tell Me What's Next: Textual Foresight for Generic UI Representations [65.10591722192609]
We propose Textual Foresight, a novel pretraining objective for learning UI screen representations.
Textual Foresight generates global text descriptions of future UI states given a current UI and local action taken.
We train with our newly constructed mobile app dataset, OpenApp, which results in the first public dataset for app UI representation learning.
arXiv Detail & Related papers (2024-06-12T02:43:19Z) - LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer [80.61492265221817]
Graphic layout designs play an essential role in visual communication.
Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production.
Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
arXiv Detail & Related papers (2022-12-19T21:57:35Z) - Towards Better Semantic Understanding of Mobile Interfaces [7.756895821262432]
We release a human-annotated dataset with approximately 500k unique annotations aimed at increasing the understanding of the functionality of UI elements.
This dataset augments images and view hierarchies from RICO, a large dataset of mobile UIs.
We also release models using image-only and multimodal inputs; we experiment with various architectures and study the benefits of using multimodal inputs on the new dataset.
arXiv Detail & Related papers (2022-10-06T03:48:54Z) - Multimodal Icon Annotation For Mobile Applications [11.342641993269693]
We propose a novel deep learning based multi-modal approach that combines the benefits of both pixel and view hierarchy features.
In order to demonstrate the utility provided, we create a high quality UI dataset by manually annotating the most commonly used 29 icons in Rico.
arXiv Detail & Related papers (2021-07-09T13:57:37Z) - Vision-Language Navigation with Random Environmental Mixup [112.94609558723518]
Vision-language Navigation (VLN) tasks require an agent to navigate step-by-step while perceiving the visual observations and comprehending a natural language instruction.
Previous works have proposed various data augmentation methods to reduce data bias.
We propose the Random Environmental Mixup (REM) method, which generates cross-connected house scenes as augmented data via mixuping environment.
arXiv Detail & Related papers (2021-06-15T04:34:26Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - VINS: Visual Search for Mobile User Interface Design [66.28088601689069]
This paper introduces VINS, a visual search framework, that takes as input a UI image and retrieves visually similar design examples.
The framework achieves a mean Average Precision of 76.39% for the UI detection and high performance in querying similar UI designs.
arXiv Detail & Related papers (2021-02-10T01:46:33Z) - Understanding Visual Saliency in Mobile User Interfaces [31.278845008743698]
We present findings from a controlled study with 30 participants and 193 mobile UIs.
Results speak to a role of expectations in guiding where users look at.
We release the first annotated dataset for investigating visual saliency in mobile UIs.
arXiv Detail & Related papers (2021-01-22T15:45:13Z) - LAMBERT: Layout-Aware (Language) Modeling for information extraction [2.5907188217412456]
We introduce a new approach to the problem of understanding documents where non-trivial layout influences the local semantics.
We modify the Transformer encoder architecture in a way that allows it to use layout features obtained from an OCR system.
We show that our model achieves superior performance on datasets consisting of visually rich documents.
arXiv Detail & Related papers (2020-02-19T09:48:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.