Latent Diffusion for Guided Document Table Generation
- URL: http://arxiv.org/abs/2408.09800v1
- Date: Mon, 19 Aug 2024 08:46:16 GMT
- Title: Latent Diffusion for Guided Document Table Generation
- Authors: Syed Jawwad Haider Hamdani, Saifullah Saifullah, Stefan Agne, Andreas Dengel, Sheraz Ahmed,
- Abstract summary: This research paper introduces a novel approach for generating annotated images for table structure.
The proposed method aims to enhance the quality of synthetic data used for training object detection models.
Experimental results demonstrate that the introduced approach significantly improves the quality of synthetic data for training.
- Score: 4.891597567642704
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Obtaining annotated table structure data for complex tables is a challenging task due to the inherent diversity and complexity of real-world document layouts. The scarcity of publicly available datasets with comprehensive annotations for intricate table structures hinders the development and evaluation of models designed for such scenarios. This research paper introduces a novel approach for generating annotated images for table structure by leveraging conditioned mask images of rows and columns through the application of latent diffusion models. The proposed method aims to enhance the quality of synthetic data used for training object detection models. Specifically, the study employs a conditioning mechanism to guide the generation of complex document table images, ensuring a realistic representation of table layouts. To evaluate the effectiveness of the generated data, we employ the popular YOLOv5 object detection model for training. The generated table images serve as valuable training samples, enriching the dataset with diverse table structures. The model is subsequently tested on the challenging pubtables-1m testset, a benchmark for table structure recognition in complex document layouts. Experimental results demonstrate that the introduced approach significantly improves the quality of synthetic data for training, leading to YOLOv5 models with enhanced performance. The mean Average Precision (mAP) values obtained on the pubtables-1m testset showcase results closely aligned with state-of-the-art methods. Furthermore, low FID results obtained on the synthetic data further validate the efficacy of the proposed methodology in generating annotated images for table structure.
Related papers
- Enhancing Table Representations with LLM-powered Synthetic Data Generation [0.565395466029518]
We formulate a clear definition of table similarity in the context of data transformation activities within data-driven enterprises.
We propose a novel synthetic data generation pipeline that harnesses the code generation and data manipulation capabilities of Large Language Models.
We demonstrate that the synthetic data generated by our pipeline aligns with our proposed definition of table similarity and significantly enhances table representations.
arXiv Detail & Related papers (2024-11-04T19:54:07Z) - Knowledge-Aware Reasoning over Multimodal Semi-structured Tables [85.24395216111462]
This study investigates whether current AI models can perform knowledge-aware reasoning on multimodal structured data.
We introduce MMTabQA, a new dataset designed for this purpose.
Our experiments highlight substantial challenges for current AI models in effectively integrating and interpreting multiple text and image inputs.
arXiv Detail & Related papers (2024-08-25T15:17:43Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - 4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs [67.47600679176963]
RDBs store vast amounts of rich, informative data spread across interconnected tables.
The progress of predictive machine learning models falls behind advances in other domains such as computer vision or natural language processing.
We explore a class of baseline models predicated on converting multi-table datasets into graphs.
We assemble a diverse collection of large-scale RDB datasets and (ii) coincident predictive tasks.
arXiv Detail & Related papers (2024-04-28T15:04:54Z) - Synthesizing Realistic Data for Table Recognition [4.500373384879752]
We propose a novel method for synthesizing annotation data specifically designed for table recognition.
By leveraging the structure and content of tables from Chinese financial announcements, we have developed the first extensive table annotation dataset.
We have established the inaugural benchmark for real-world complex tables in the Chinese financial announcement domain, using it to assess the performance of models trained on our synthetic data.
arXiv Detail & Related papers (2024-04-17T06:36:17Z) - Images in Discrete Choice Modeling: Addressing Data Isomorphism in
Multi-Modality Inputs [77.54052164713394]
This paper explores the intersection of Discrete Choice Modeling (DCM) and machine learning.
We investigate the consequences of embedding high-dimensional image data that shares isomorphic information with traditional tabular inputs within a DCM framework.
arXiv Detail & Related papers (2023-12-22T14:33:54Z) - Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR)
It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z) - Table Detection in the Wild: A Novel Diverse Table Detection Dataset and
Method [1.3814823347690746]
We introduce a diverse large-scale dataset for table detection with more than seven thousand samples.
We also present baseline results using a convolutional neural network-based method to detect table structure in documents.
arXiv Detail & Related papers (2022-08-31T14:20:30Z) - Data augmentation on graphs for table type classification [1.1859913430860336]
We address the classification of tables using a Graph Neural Network, exploiting the table structure for the message passing algorithm in use.
We achieve promising preliminary results, proposing a data augmentation method suitable for graph-based table representation.
arXiv Detail & Related papers (2022-08-23T21:54:46Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z) - TableNet: Deep Learning model for end-to-end Table detection and Tabular
data extraction from Scanned Document Images [18.016832803961165]
We propose a novel end-to-end deep learning model for both table detection and structure recognition.
TableNet exploits the interdependence between the twin tasks of table detection and table structure recognition.
The proposed model and extraction approach was evaluated on the publicly available ICDAR 2013 and Marmot Table datasets.
arXiv Detail & Related papers (2020-01-06T10:25:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.