Towards Automatic Parsing of Structured Visual Content through the Use
of Synthetic Data
- URL: http://arxiv.org/abs/2204.14136v1
- Date: Fri, 29 Apr 2022 14:44:52 GMT
- Title: Towards Automatic Parsing of Structured Visual Content through the Use
of Synthetic Data
- Authors: Lukas Scholch, Jonas Steinhauser, Maximilian Beichter, Constantin
Seibold, Kailun Yang, Merlin Kn\"able, Thorsten Schwarz, Alexander M\"adche,
and Rainer Stiefelhagen
- Abstract summary: We propose a synthetic dataset, containing Structured Visual Content (SVCs) in the form of images and ground truths.
We show the usage of this dataset by an application that automatically extracts a graph representation from an SVC image.
Our dataset enables the development of strong models for the interpretation of SVCs while skipping the time-consuming dense data annotation.
- Score: 65.68384124394699
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Structured Visual Content (SVC) such as graphs, flow charts, or the like are
used by authors to illustrate various concepts. While such depictions allow the
average reader to better understand the contents, images containing SVCs are
typically not machine-readable. This, in turn, not only hinders automated
knowledge aggregation, but also the perception of displayed in-formation for
visually impaired people. In this work, we propose a synthetic dataset,
containing SVCs in the form of images as well as ground truths. We show the
usage of this dataset by an application that automatically extracts a graph
representation from an SVC image. This is done by training a model via common
supervised learning methods. As there currently exist no large-scale public
datasets for the detailed analysis of SVC, we propose the Synthetic SVC (SSVC)
dataset comprising 12,000 images with respective bounding box annotations and
detailed graph representations. Our dataset enables the development of strong
models for the interpretation of SVCs while skipping the time-consuming dense
data annotation. We evaluate our model on both synthetic and manually annotated
data and show the transferability of synthetic to real via various metrics,
given the presented application. Here, we evaluate that this proof of concept
is possible to some extend and lay down a solid baseline for this task. We
discuss the limitations of our approach for further improvements. Our utilized
metrics can be used as a tool for future comparisons in this domain. To enable
further research on this task, the dataset is publicly available at
https://bit.ly/3jN1pJJ
Related papers
- Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Diffusion Models as Data Mining Tools [87.77999285241219]
This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining.
We show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure.
This measure assesses how typical visual elements are for different data labels, such as geographic location, time stamps, semantic labels, or even the presence of a disease.
arXiv Detail & Related papers (2024-07-20T17:14:31Z) - Visual Data-Type Understanding does not emerge from Scaling
Vision-Language Models [31.69213233651326]
We introduce the novel task of Visual Data-Type Identification.
An extensive zero-shot evaluation of 39 vision-language models (VLMs) shows a nuanced performance landscape.
arXiv Detail & Related papers (2023-10-12T17:59:30Z) - DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion
Models [61.906934570771256]
We present a generic dataset generation model that can produce diverse synthetic images and perception annotations.
Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation.
We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module.
arXiv Detail & Related papers (2023-08-11T14:38:11Z) - Learnable Graph Matching: A Practical Paradigm for Data Association [74.28753343714858]
We propose a general learnable graph matching method to address these issues.
Our method achieves state-of-the-art performance on several MOT datasets.
For image matching, our method outperforms state-of-the-art methods on a popular indoor dataset, ScanNet.
arXiv Detail & Related papers (2023-03-27T17:39:00Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic
Filter Attention [7.237370981736913]
We propose a framework to teach any existing convolutional neural network to generate text descriptions about its own latent representations at the filter level.
We show that our method can generate novel descriptions for learned filters beyond the set of categories defined in the training dataset.
We also demonstrate a novel application of our method for unsupervised dataset bias analysis.
arXiv Detail & Related papers (2022-04-10T04:57:56Z) - Beyond Accuracy: A Consolidated Tool for Visual Question Answering
Benchmarking [30.155625852894797]
We propose a browser-based benchmarking tool for researchers and challenge organizers.
Our tool helps test generalization capabilities of models across multiple datasets.
Interactive filtering facilitates discovery of problematic behavior.
arXiv Detail & Related papers (2021-10-11T11:08:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.