Related papers: Synthesis in Style: Semantic Segmentation of Historical Documents using Synthetic Data

Synthesis in Style: Semantic Segmentation of Historical Documents using Synthetic Data

URL: http://arxiv.org/abs/2107.06777v1
Date: Wed, 14 Jul 2021 15:36:47 GMT
Title: Synthesis in Style: Semantic Segmentation of Historical Documents using Synthetic Data
Authors: Christian Bartz, Hendrik R\"atz, Haojin Yang, Joseph Bethge, Christoph Meinel
Abstract summary: We propose a novel method for the synthesis of training data for semantic segmentation of document images. We utilize clusters found in intermediate features of a StyleGAN generator for the synthesis of RGB and label images. Our model can be applied to any dataset of scanned documents without the need for manual annotation of individual images.
Score: 12.704529528199062
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One of the most pressing problems in the automated analysis of historical documents is the availability of annotated training data. In this paper, we propose a novel method for the synthesis of training data for semantic segmentation of document images. We utilize clusters found in intermediate features of a StyleGAN generator for the synthesis of RGB and label images at the same time. Our model can be applied to any dataset of scanned documents without the need for manual annotation of individual images, as each model is custom-fit to the dataset. In our experiments, we show that models trained on our synthetic data can reach competitive performance on open benchmark datasets for line segmentation.

Related papers

SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding [23.910783272007407]
This paper introduces SynthDoc, a novel synthetic document generation pipeline designed to enhance Visual Document Understanding (VDU) Addressing the challenges of data acquisition and the limitations of existing datasets, SynthDoc leverages publicly available corpora and advanced rendering tools to create a comprehensive and versatile dataset. Our experiments, conducted using the Donut model, demonstrate that models trained with SynthDoc's data achieve superior performance in pre-training read tasks and maintain robustness in downstream tasks, despite language inconsistencies.
arXiv Detail & Related papers (2024-08-27T03:31:24Z)
Diffusion Models as Data Mining Tools [87.77999285241219]
This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining. We show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure. This measure assesses how typical visual elements are for different data labels, such as geographic location, time stamps, semantic labels, or even the presence of a disease.
arXiv Detail & Related papers (2024-07-20T17:14:31Z)
Learning from Synthetic Data for Visual Grounding [55.21937116752679]
We show that SynGround can improve the localization capabilities of off-the-shelf vision-and-language models. Data generated with SynGround improves the pointing game accuracy of a pretrained ALBEF and BLIP models by 4.81% and 17.11% absolute percentage points, respectively.
arXiv Detail & Related papers (2024-03-20T17:59:43Z)
SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training? [57.42016037768947]
We present SynthCLIP, a CLIP model trained on entirely synthetic text-image pairs. We generate synthetic datasets of images and corresponding captions at scale, with no human intervention.
arXiv Detail & Related papers (2024-02-02T18:59:58Z)
Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models. ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task. This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z)
Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks. We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z)
Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets [83.749895930242]
We propose two techniques for producing high-quality naturalistic synthetic occluded faces. We empirically show the effectiveness and robustness of both methods, even for unseen occlusions. We present two high-resolution real-world occluded face datasets with fine-grained annotations, RealOcc and RealOcc-Wild.
arXiv Detail & Related papers (2022-05-12T17:03:57Z)
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets [56.018551958004814]
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources. Large-scale datasets with noisy image-text pairs provide a sub-optimal source of supervision. We propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component.
arXiv Detail & Related papers (2021-11-24T19:00:05Z)
DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis [16.284895792639137]
This paper presents a novel approach, called Doc Synth, to automatically synthesize document images based on a given layout. In this work, given a spatial layout (bounding boxes with object categories) as a reference by the user, our proposed Doc Synth model learns to generate a set of realistic document images. The results highlight that our model can successfully generate realistic and diverse document images with multiple objects.
arXiv Detail & Related papers (2021-07-06T14:24:30Z)
On the use of automatically generated synthetic image datasets for benchmarking face recognition [2.0196229393131726]
Recent advances in Generative Adversarial Networks (GANs) provide a pathway to replace real datasets by synthetic datasets. Recent advances in Generative Adversarial Networks (GANs) to synthesize realistic face images provide a pathway to replace real datasets by synthetic datasets. benchmarking results on the synthetic dataset are a good substitution, often providing error rates and system ranking similar to the benchmarking on the real dataset.
arXiv Detail & Related papers (2021-06-08T09:54:02Z)
Generating Synthetic Handwritten Historical Documents With OCR Constrained GANs [2.3808546906079178]
We present a framework to generate synthetic historical documents with precise ground truth using nothing more than a collection of unlabeled historical images. We demonstrate a high-quality synthesis that makes it possible to generate large labeled historical document datasets with precise ground truth.
arXiv Detail & Related papers (2021-03-15T09:39:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.