DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation
- URL: http://arxiv.org/abs/2410.00201v1
- Date: Mon, 30 Sep 2024 19:55:54 GMT
- Title: DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation
- Authors: Yi-Hao Peng, Faria Huq, Yue Jiang, Jason Wu, Amanda Xin Yue Li, Jeffrey Bigham, Amy Pavel,
- Abstract summary: We present a method to generate synthetic, structured visuals with target labels using code generation.
Our method allows people to create datasets with built-in labels and train models with a small number of human-annotated examples.
- Score: 18.05133277269579
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Enabling machines to understand structured visuals like slides and user interfaces is essential for making them accessible to people with disabilities. However, achieving such understanding computationally has required manual data collection and annotation, which is time-consuming and labor-intensive. To overcome this challenge, we present a method to generate synthetic, structured visuals with target labels using code generation. Our method allows people to create datasets with built-in labels and train models with a small number of human-annotated examples. We demonstrate performance improvements in three tasks for understanding slides and UIs: recognizing visual elements, describing visual content, and classifying visual content types.
Related papers
- Generative Timelines for Instructed Visual Assembly [106.80501761556606]
The objective of this work is to manipulate visual timelines (e.g. a video) through natural language instructions.
We propose the Timeline Assembler, a generative model trained to perform instructed visual assembly tasks.
arXiv Detail & Related papers (2024-11-19T07:26:30Z) - An Interactive Human-Machine Learning Interface for Collecting and Learning from Complex Annotations [45.23526921041318]
We aim to alleviate the expectation that human annotators adapt to the constraints imposed by traditional labels by allowing for extra flexibility in the form that supervision information is collected.
For this, we propose a human-machine learning interface for binary classification tasks which enables human annotators to utilise counterfactual examples to complement standard binary labels as annotations for a dataset.
arXiv Detail & Related papers (2024-03-28T11:57:06Z) - Thinking Like an Annotator: Generation of Dataset Labeling Instructions [59.603239753484345]
We introduce a new task, Labeling Instruction Generation, to address missing publicly available labeling instructions.
We take a reasonably annotated dataset and: 1) generate a set of examples that are visually representative of each category in the dataset; 2) provide a text label that corresponds to each of the examples.
This framework acts as a proxy to human annotators that can help to both generate a final labeling instruction set and evaluate its quality.
arXiv Detail & Related papers (2023-06-24T18:32:48Z) - What and How of Machine Learning Transparency: Building Bespoke
Explainability Tools with Interoperable Algorithmic Components [77.87794937143511]
This paper introduces a collection of hands-on training materials for explaining data-driven predictive models.
These resources cover the three core building blocks of this technique: interpretable representation composition, data sampling and explanation generation.
arXiv Detail & Related papers (2022-09-08T13:33:25Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - Rich Semantics Improve Few-shot Learning [49.11659525563236]
We show that by using 'class-level' language descriptions, that can be acquired with minimal annotation cost, we can improve the few-shot learning performance.
We develop a Transformer based forward and backward encoding mechanism to relate visual and semantic tokens.
arXiv Detail & Related papers (2021-04-26T16:48:27Z) - Class Knowledge Overlay to Visual Feature Learning for Zero-Shot Image
Classification [18.299463254965264]
We propose a novel zero-shot learning approach, GAN-CST, based on class knowledge to visual feature learning.
The proposed model delivers superior performance over state-of-the-art approaches.
arXiv Detail & Related papers (2021-02-26T06:34:35Z) - DOC2PPT: Automatic Presentation Slides Generation from Scientific
Documents [76.19748112897177]
We present a novel task and approach for document-to-slide generation.
We propose a hierarchical sequence-to-sequence approach to tackle our task in an end-to-end manner.
Our approach exploits the inherent structures within documents and slides and incorporates paraphrasing and layout prediction modules to generate slides.
arXiv Detail & Related papers (2021-01-28T03:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.