Related papers: Latent Diffusion for Guided Document Table Generation

Latent Diffusion for Guided Document Table Generation

URL: http://arxiv.org/abs/2408.09800v1
Date: Mon, 19 Aug 2024 08:46:16 GMT
Title: Latent Diffusion for Guided Document Table Generation
Authors: Syed Jawwad Haider Hamdani, Saifullah Saifullah, Stefan Agne, Andreas Dengel, Sheraz Ahmed,
Abstract summary: This research paper introduces a novel approach for generating annotated images for table structure. The proposed method aims to enhance the quality of synthetic data used for training object detection models. Experimental results demonstrate that the introduced approach significantly improves the quality of synthetic data for training.
Score: 4.891597567642704
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Obtaining annotated table structure data for complex tables is a challenging task due to the inherent diversity and complexity of real-world document layouts. The scarcity of publicly available datasets with comprehensive annotations for intricate table structures hinders the development and evaluation of models designed for such scenarios. This research paper introduces a novel approach for generating annotated images for table structure by leveraging conditioned mask images of rows and columns through the application of latent diffusion models. The proposed method aims to enhance the quality of synthetic data used for training object detection models. Specifically, the study employs a conditioning mechanism to guide the generation of complex document table images, ensuring a realistic representation of table layouts. To evaluate the effectiveness of the generated data, we employ the popular YOLOv5 object detection model for training. The generated table images serve as valuable training samples, enriching the dataset with diverse table structures. The model is subsequently tested on the challenging pubtables-1m testset, a benchmark for table structure recognition in complex document layouts. Experimental results demonstrate that the introduced approach significantly improves the quality of synthetic data for training, leading to YOLOv5 models with enhanced performance. The mean Average Precision (mAP) values obtained on the pubtables-1m testset showcase results closely aligned with state-of-the-art methods. Furthermore, low FID results obtained on the synthetic data further validate the efficacy of the proposed methodology in generating annotated images for table structure.

Related papers

Same Content, Different Representations: A Controlled Study for Table QA [15.896655757672441]
Table Question Answering (Table QA) in real-world settings must operate over both structured databases and semi-structured tables containing textual fields.<n>Existing benchmarks are tied to fixed data formats and have not systematically examined how representation itself affects model performance.<n>We present the first controlled study that isolates the role of table representation by holding content constant while varying structure.
arXiv Detail & Related papers (2025-09-26T22:33:19Z)
TabGLM: Tabular Graph Language Model for Learning Transferable Representations Through Multi-Modal Consistency Minimization [2.1067477213933503]
TabGLM (Tabular Graph Language Model) is a novel multi-modal architecture designed to model both structural and semantic information from a table. It transforms each row of a table into a fully connected graph and serialized text, which are encoded using a graph neural network (GNN) and a text encoder, respectively. Evaluations across 25 benchmark datasets demonstrate substantial performance gains.
arXiv Detail & Related papers (2025-02-26T05:32:45Z)
Enhancing Table Representations with LLM-powered Synthetic Data Generation [0.565395466029518]
We formulate a clear definition of table similarity in the context of data transformation activities within data-driven enterprises. We propose a novel synthetic data generation pipeline that harnesses the code generation and data manipulation capabilities of Large Language Models. We demonstrate that the synthetic data generated by our pipeline aligns with our proposed definition of table similarity and significantly enhances table representations.
arXiv Detail & Related papers (2024-11-04T19:54:07Z)
TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all mixed-type distributions of tabular data in one model. Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data. TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z)
Knowledge-Aware Reasoning over Multimodal Semi-structured Tables [85.24395216111462]
This study investigates whether current AI models can perform knowledge-aware reasoning on multimodal structured data. We introduce MMTabQA, a new dataset designed for this purpose. Our experiments highlight substantial challenges for current AI models in effectively integrating and interpreting multiple text and image inputs.
arXiv Detail & Related papers (2024-08-25T15:17:43Z)
Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images. We identify model weaknesses by testing the model using the counterfactual image dataset. We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z)
4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs [67.47600679176963]
RDBs store vast amounts of rich, informative data spread across interconnected tables. The progress of predictive machine learning models falls behind advances in other domains such as computer vision or natural language processing. We explore a class of baseline models predicated on converting multi-table datasets into graphs. We assemble a diverse collection of large-scale RDB datasets and (ii) coincident predictive tasks.
arXiv Detail & Related papers (2024-04-28T15:04:54Z)
Synthesizing Realistic Data for Table Recognition [4.500373384879752]
We propose a novel method for synthesizing annotation data specifically designed for table recognition. By leveraging the structure and content of tables from Chinese financial announcements, we have developed the first extensive table annotation dataset. We have established the inaugural benchmark for real-world complex tables in the Chinese financial announcement domain, using it to assess the performance of models trained on our synthetic data.
arXiv Detail & Related papers (2024-04-17T06:36:17Z)
Images in Discrete Choice Modeling: Addressing Data Isomorphism in Multi-Modality Inputs [77.54052164713394]
This paper explores the intersection of Discrete Choice Modeling (DCM) and machine learning. We investigate the consequences of embedding high-dimensional image data that shares isomorphic information with traditional tabular inputs within a DCM framework.
arXiv Detail & Related papers (2023-12-22T14:33:54Z)
Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR) It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z)
Table Detection in the Wild: A Novel Diverse Table Detection Dataset and Method [1.3814823347690746]
We introduce a diverse large-scale dataset for table detection with more than seven thousand samples. We also present baseline results using a convolutional neural network-based method to detect table structure in documents.
arXiv Detail & Related papers (2022-08-31T14:20:30Z)
Data augmentation on graphs for table type classification [1.1859913430860336]
We address the classification of tables using a Graph Neural Network, exploiting the table structure for the message passing algorithm in use. We achieve promising preliminary results, proposing a data augmentation method suitable for graph-based table representation.
arXiv Detail & Related papers (2022-08-23T21:54:46Z)
GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing. We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar. To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z)
TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images [18.016832803961165]
We propose a novel end-to-end deep learning model for both table detection and structure recognition. TableNet exploits the interdependence between the twin tasks of table detection and table structure recognition. The proposed model and extraction approach was evaluated on the publicly available ICDAR 2013 and Marmot Table datasets.
arXiv Detail & Related papers (2020-01-06T10:25:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.