Table Detection in the Wild: A Novel Diverse Table Detection Dataset and
Method
- URL: http://arxiv.org/abs/2209.09207v2
- Date: Thu, 30 Nov 2023 07:55:36 GMT
- Title: Table Detection in the Wild: A Novel Diverse Table Detection Dataset and
Method
- Authors: Mrinal Haloi, Shashank Shekhar, Nikhil Fande, Siddhant Swaroop Dash,
Sanjay G
- Abstract summary: We introduce a diverse large-scale dataset for table detection with more than seven thousand samples.
We also present baseline results using a convolutional neural network-based method to detect table structure in documents.
- Score: 1.3814823347690746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent deep learning approaches in table detection achieved outstanding
performance and proved to be effective in identifying document layouts.
Currently, available table detection benchmarks have many limitations,
including the lack of samples diversity, simple table structure, the lack of
training cases, and samples quality. In this paper, we introduce a diverse
large-scale dataset for table detection with more than seven thousand samples
containing a wide variety of table structures collected from many diverse
sources. In addition to that, we also present baseline results using a
convolutional neural network-based method to detect table structure in
documents. Experimental results show the superiority of applying convolutional
deep learning methods over classical computer vision-based methods. The
introduction of this diverse table detection dataset will enable the community
to develop high throughput deep learning methods for understanding document
layout and tabular data processing. Dataset is available at: 1.
https://www.kaggle.com/datasets/mrinalim/stdw-dataset 2.
https://huggingface.co/datasets/n3011/STDW
Related papers
- Latent Diffusion for Guided Document Table Generation [4.891597567642704]
This research paper introduces a novel approach for generating annotated images for table structure.
The proposed method aims to enhance the quality of synthetic data used for training object detection models.
Experimental results demonstrate that the introduced approach significantly improves the quality of synthetic data for training.
arXiv Detail & Related papers (2024-08-19T08:46:16Z) - A Closer Look at Deep Learning on Tabular Data [52.50778536274327]
Tabular data is prevalent across various domains in machine learning.
Deep Neural Network (DNN)-based methods have shown promising performance comparable to tree-based ones.
arXiv Detail & Related papers (2024-07-01T04:24:07Z) - An Automatic Prompt Generation System for Tabular Data Tasks [3.117741687220381]
Large language models (LLMs) have demonstrated their ability on several tasks through carefully crafted prompts.
This paper presents an innovative auto-prompt generation system suitable for multiple LLMs, with minimal training.
arXiv Detail & Related papers (2024-05-09T08:32:55Z) - Towards End-to-End Semi-Supervised Table Detection with Deformable
Transformer [11.648151981111436]
Table detection is the task of classifying and localizing table objects within document images.
Many semi-supervised approaches are introduced to mitigate the need for a substantial amount of label data.
This paper presents a novel end-to-end semi-supervised table detection method that employs the deformable transformer for detecting table objects.
arXiv Detail & Related papers (2023-05-04T12:15:15Z) - SEMv2: Table Separation Line Detection Based on Instance Segmentation [96.36188168694781]
We propose an accurate table structure recognizer, termed SEMv2 (SEM: Split, Embed and Merge)
We address the table separation line instance-level discrimination problem and introduce a table separation line detection strategy based on conditional convolution.
To comprehensively evaluate the SEMv2, we also present a more challenging dataset for table structure recognition, dubbed iFLYTAB.
arXiv Detail & Related papers (2023-03-08T05:15:01Z) - STUNT: Few-shot Tabular Learning with Self-generated Tasks from
Unlabeled Tables [64.0903766169603]
We propose a framework for few-shot semi-supervised learning, coined Self-generated Tasks from UNlabeled Tables (STUNT)
Our key idea is to self-generate diverse few-shot tasks by treating randomly chosen columns as a target label.
We then employ a meta-learning scheme to learn generalizable knowledge with the constructed tasks.
arXiv Detail & Related papers (2023-03-02T02:37:54Z) - Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection.
We provide an analysis of both classic and new applications in the field.
The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z) - Graph Neural Networks and Representation Embedding for Table Extraction
in PDF Documents [1.1859913430860336]
The main contribution of this work is to tackle the problem of table extraction, exploiting Graph Neural Networks.
We experimentally evaluated the proposed approach on a new dataset obtained by merging the information provided in the PubLayNet and PubTables-1M datasets.
arXiv Detail & Related papers (2022-08-23T21:36:01Z) - Detection Hub: Unifying Object Detection Datasets via Query Adaptation
on Language Embedding [137.3719377780593]
A new design (named Detection Hub) is dataset-aware and category-aligned.
It mitigates the dataset inconsistency and provides coherent guidance for the detector to learn across multiple datasets.
The categories across datasets are semantically aligned into a unified space by replacing one-hot category representations with word embedding.
arXiv Detail & Related papers (2022-06-07T17:59:44Z) - Multi-Type-TD-TSR -- Extracting Tables from Document Images using a
Multi-stage Pipeline for Table Detection and Table Structure Recognition:
from OCR to Structured Table Representations [63.98463053292982]
The recognition of tables consists of two main tasks, namely table detection and table structure recognition.
Recent work shows a clear trend towards deep learning approaches coupled with the use of transfer learning for the task of table structure recognition.
We present a multistage pipeline named Multi-Type-TD-TSR, which offers an end-to-end solution for the problem of table recognition.
arXiv Detail & Related papers (2021-05-23T21:17:18Z) - TableNet: Deep Learning model for end-to-end Table detection and Tabular
data extraction from Scanned Document Images [18.016832803961165]
We propose a novel end-to-end deep learning model for both table detection and structure recognition.
TableNet exploits the interdependence between the twin tasks of table detection and table structure recognition.
The proposed model and extraction approach was evaluated on the publicly available ICDAR 2013 and Marmot Table datasets.
arXiv Detail & Related papers (2020-01-06T10:25:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.