Related papers: DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis

DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis

URL: http://arxiv.org/abs/2107.02638v1
Date: Tue, 6 Jul 2021 14:24:30 GMT
Title: DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis
Authors: Sanket Biswas, Pau Riba, Josep Llad\'os and Umapada Pal
Abstract summary: This paper presents a novel approach, called Doc Synth, to automatically synthesize document images based on a given layout. In this work, given a spatial layout (bounding boxes with object categories) as a reference by the user, our proposed Doc Synth model learns to generate a set of realistic document images. The results highlight that our model can successfully generate realistic and diverse document images with multiple objects.
Score: 16.284895792639137
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Despite significant progress on current state-of-the-art image generation models, synthesis of document images containing multiple and complex object layouts is a challenging task. This paper presents a novel approach, called DocSynth, to automatically synthesize document images based on a given layout. In this work, given a spatial layout (bounding boxes with object categories) as a reference by the user, our proposed DocSynth model learns to generate a set of realistic document images consistent with the defined layout. Also, this framework has been adapted to this work as a superior baseline model for creating synthetic document image datasets for augmenting real data during training for document layout analysis tasks. Different sets of learning objectives have been also used to improve the model performance. Quantitatively, we also compare the generated results of our model with real data using standard evaluation metrics. The results highlight that our model can successfully generate realistic and diverse document images with multiple objects. We also present a comprehensive qualitative analysis summary of the different scopes of synthetic image generation tasks. Lastly, to our knowledge this is the first work of its kind.

Related papers

EditAR: Unified Conditional Generation with Autoregressive Models [58.093860528672735]
We propose EditAR, a single unified autoregressive framework for a variety of conditional image generation tasks. The model takes both images and instructions as inputs, and predicts the edited images tokens in a vanilla next-token paradigm. We evaluate its effectiveness across diverse tasks on established benchmarks, showing competitive performance to various state-of-the-art task-specific methods.
arXiv Detail & Related papers (2025-01-08T18:59:35Z)
AnySynth: Harnessing the Power of Image Synthetic Data Generation for Generalized Vision-Language Tasks [23.041812897803034]
We propose Any Synth, a unified framework capable of generating arbitrary type of synthetic data. We have validated our framework's performance across various tasks, including Few-shot Object Detection, Cross-domain Object Detection, Zero-shot Image Retrieval, and Multi-modal Image Perception and Grounding.
arXiv Detail & Related papers (2024-11-24T04:49:07Z)
SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding [23.910783272007407]
This paper introduces SynthDoc, a novel synthetic document generation pipeline designed to enhance Visual Document Understanding (VDU) Addressing the challenges of data acquisition and the limitations of existing datasets, SynthDoc leverages publicly available corpora and advanced rendering tools to create a comprehensive and versatile dataset. Our experiments, conducted using the Donut model, demonstrate that models trained with SynthDoc's data achieve superior performance in pre-training read tasks and maintain robustness in downstream tasks, despite language inconsistencies.
arXiv Detail & Related papers (2024-08-27T03:31:24Z)
DocSynthv2: A Practical Autoregressive Modeling for Document Generation [43.84027661517748]
This paper proposes a novel approach called Doc Synthv2 through the development of a simple yet effective autoregressive structured model. Our model, distinct in its integration of both layout and textual cues, marks a step beyond existing layout-generation approaches.
arXiv Detail & Related papers (2024-06-12T16:00:16Z)
Evaluating Data Attribution for Text-to-Image Models [62.844382063780365]
We evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style. Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction. By taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.
arXiv Detail & Related papers (2023-06-15T17:59:51Z)
Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users. The method is based on a general framework that bypasses the lengthy optimization required by previous approaches. We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z)
Composer: Creative and Controllable Image Synthesis with Composable Conditions [57.78533372393828]
Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability. This work offers a new generation paradigm that allows flexible control of the output image, such as spatial layout and palette, while maintaining the synthesis quality and model creativity.
arXiv Detail & Related papers (2023-02-20T05:48:41Z)
Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks. We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z)
Synthetic Document Generator for Annotation-free Layout Recognition [15.657295650492948]
We describe a synthetic document generator that automatically produces realistic documents with labels for spatial positions, extents and categories of layout elements. We empirically illustrate that a deep layout detection model trained purely on the synthetic documents can match the performance of a model that uses real documents.
arXiv Detail & Related papers (2021-11-11T01:58:44Z)
Synthesis in Style: Semantic Segmentation of Historical Documents using Synthetic Data [12.704529528199062]
We propose a novel method for the synthesis of training data for semantic segmentation of document images. We utilize clusters found in intermediate features of a StyleGAN generator for the synthesis of RGB and label images. Our model can be applied to any dataset of scanned documents without the need for manual annotation of individual images.
arXiv Detail & Related papers (2021-07-14T15:36:47Z)
Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents [76.19748112897177]
We present a novel task and approach for document-to-slide generation. We propose a hierarchical sequence-to-sequence approach to tackle our task in an end-to-end manner. Our approach exploits the inherent structures within documents and slides and incorporates paraphrasing and layout prediction modules to generate slides.
arXiv Detail & Related papers (2021-01-28T03:21:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.