DocSynth: A Layout Guided Approach for Controllable Document Image
Synthesis
- URL: http://arxiv.org/abs/2107.02638v1
- Date: Tue, 6 Jul 2021 14:24:30 GMT
- Title: DocSynth: A Layout Guided Approach for Controllable Document Image
Synthesis
- Authors: Sanket Biswas, Pau Riba, Josep Llad\'os and Umapada Pal
- Abstract summary: This paper presents a novel approach, called Doc Synth, to automatically synthesize document images based on a given layout.
In this work, given a spatial layout (bounding boxes with object categories) as a reference by the user, our proposed Doc Synth model learns to generate a set of realistic document images.
The results highlight that our model can successfully generate realistic and diverse document images with multiple objects.
- Score: 16.284895792639137
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Despite significant progress on current state-of-the-art image generation
models, synthesis of document images containing multiple and complex object
layouts is a challenging task. This paper presents a novel approach, called
DocSynth, to automatically synthesize document images based on a given layout.
In this work, given a spatial layout (bounding boxes with object categories) as
a reference by the user, our proposed DocSynth model learns to generate a set
of realistic document images consistent with the defined layout. Also, this
framework has been adapted to this work as a superior baseline model for
creating synthetic document image datasets for augmenting real data during
training for document layout analysis tasks. Different sets of learning
objectives have been also used to improve the model performance.
Quantitatively, we also compare the generated results of our model with real
data using standard evaluation metrics. The results highlight that our model
can successfully generate realistic and diverse document images with multiple
objects. We also present a comprehensive qualitative analysis summary of the
different scopes of synthetic image generation tasks. Lastly, to our knowledge
this is the first work of its kind.
Related papers
- AnySynth: Harnessing the Power of Image Synthetic Data Generation for Generalized Vision-Language Tasks [23.041812897803034]
We propose Any Synth, a unified framework capable of generating arbitrary type of synthetic data.
We have validated our framework's performance across various tasks, including Few-shot Object Detection, Cross-domain Object Detection, Zero-shot Image Retrieval, and Multi-modal Image Perception and Grounding.
arXiv Detail & Related papers (2024-11-24T04:49:07Z) - SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding [23.910783272007407]
This paper introduces SynthDoc, a novel synthetic document generation pipeline designed to enhance Visual Document Understanding (VDU)
Addressing the challenges of data acquisition and the limitations of existing datasets, SynthDoc leverages publicly available corpora and advanced rendering tools to create a comprehensive and versatile dataset.
Our experiments, conducted using the Donut model, demonstrate that models trained with SynthDoc's data achieve superior performance in pre-training read tasks and maintain robustness in downstream tasks, despite language inconsistencies.
arXiv Detail & Related papers (2024-08-27T03:31:24Z) - DocSynthv2: A Practical Autoregressive Modeling for Document Generation [43.84027661517748]
This paper proposes a novel approach called Doc Synthv2 through the development of a simple yet effective autoregressive structured model.
Our model, distinct in its integration of both layout and textual cues, marks a step beyond existing layout-generation approaches.
arXiv Detail & Related papers (2024-06-12T16:00:16Z) - Evaluating Data Attribution for Text-to-Image Models [62.844382063780365]
We evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style.
Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction.
By taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.
arXiv Detail & Related papers (2023-06-15T17:59:51Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z) - Composer: Creative and Controllable Image Synthesis with Composable
Conditions [57.78533372393828]
Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability.
This work offers a new generation paradigm that allows flexible control of the output image, such as spatial layout and palette, while maintaining the synthesis quality and model creativity.
arXiv Detail & Related papers (2023-02-20T05:48:41Z) - Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks.
We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z) - Synthetic Document Generator for Annotation-free Layout Recognition [15.657295650492948]
We describe a synthetic document generator that automatically produces realistic documents with labels for spatial positions, extents and categories of layout elements.
We empirically illustrate that a deep layout detection model trained purely on the synthetic documents can match the performance of a model that uses real documents.
arXiv Detail & Related papers (2021-11-11T01:58:44Z) - Synthesis in Style: Semantic Segmentation of Historical Documents using
Synthetic Data [12.704529528199062]
We propose a novel method for the synthesis of training data for semantic segmentation of document images.
We utilize clusters found in intermediate features of a StyleGAN generator for the synthesis of RGB and label images.
Our model can be applied to any dataset of scanned documents without the need for manual annotation of individual images.
arXiv Detail & Related papers (2021-07-14T15:36:47Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - DOC2PPT: Automatic Presentation Slides Generation from Scientific
Documents [76.19748112897177]
We present a novel task and approach for document-to-slide generation.
We propose a hierarchical sequence-to-sequence approach to tackle our task in an end-to-end manner.
Our approach exploits the inherent structures within documents and slides and incorporates paraphrasing and layout prediction modules to generate slides.
arXiv Detail & Related papers (2021-01-28T03:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.