TableZa -- A classical Computer Vision approach to Tabular Extraction
- URL: http://arxiv.org/abs/2105.09137v1
- Date: Wed, 19 May 2021 13:55:33 GMT
- Title: TableZa -- A classical Computer Vision approach to Tabular Extraction
- Authors: Saumya Banthia, Anantha Sharma, Ravi Mangipudi
- Abstract summary: We discuss an approach for Tabular Data Extraction in the realm of document comprehension.
Given the different kinds of the Tabular formats that are often found across various documents, we discuss a novel approach using Computer Vision.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computer aided Tabular Data Extraction has always been a very challenging and
error prone task because it demands both Spectral and Spatial Sanity of data.
In this paper we discuss an approach for Tabular Data Extraction in the realm
of document comprehension. Given the different kinds of the Tabular formats
that are often found across various documents, we discuss a novel approach
using Computer Vision for extraction of tabular data from images or vector
pdf(s) converted to image(s).
Related papers
- Unifying Multimodal Retrieval via Document Screenshot Embedding [92.03571344075607]
Document Screenshot Embedding (DSE) is a novel retrieval paradigm that regards document screenshots as a unified input format.
We first craft the dataset of Wiki-SS, a 1.3M Wikipedia web page screenshots as the corpus to answer the questions from the Natural Questions dataset.
In such a text-intensive document retrieval setting, DSE shows competitive effectiveness compared to other text retrieval methods relying on parsing.
arXiv Detail & Related papers (2024-06-17T06:27:35Z) - An Interactive Interface for Novel Class Discovery in Tabular Data [54.11148718494725]
Novel Class Discovery (NCD) is the problem of trying to discover novel classes in an unlabeled set, given a labeled set of different but related classes.
The majority of NCD methods proposed so far only deal with image data.
This interface allows a domain expert to easily run state-of-the-art algorithms for NCD in tabular data.
arXiv Detail & Related papers (2023-06-22T14:32:53Z) - A Method for Discovering Novel Classes in Tabular Data [54.11148718494725]
In Novel Class Discovery (NCD), the goal is to find new classes in an unlabeled set given a labeled set of known but different classes.
We show a way to extract knowledge from already known classes to guide the discovery process of novel classes in heterogeneous data.
arXiv Detail & Related papers (2022-09-02T11:45:24Z) - Graph Neural Networks and Representation Embedding for Table Extraction
in PDF Documents [1.1859913430860336]
The main contribution of this work is to tackle the problem of table extraction, exploiting Graph Neural Networks.
We experimentally evaluated the proposed approach on a new dataset obtained by merging the information provided in the PubLayNet and PubTables-1M datasets.
arXiv Detail & Related papers (2022-08-23T21:36:01Z) - Neural Content Extraction for Poster Generation of Scientific Papers [84.30128728027375]
The problem of poster generation for scientific papers is under-investigated.
Previous studies focus mainly on poster layout and panel composition, while neglecting the importance of content extraction.
To get both textual and visual elements of a poster panel, a neural extractive model is proposed to extract text, figures and tables of a paper section simultaneously.
arXiv Detail & Related papers (2021-12-16T01:19:37Z) - Multi-Type-TD-TSR -- Extracting Tables from Document Images using a
Multi-stage Pipeline for Table Detection and Table Structure Recognition:
from OCR to Structured Table Representations [63.98463053292982]
The recognition of tables consists of two main tasks, namely table detection and table structure recognition.
Recent work shows a clear trend towards deep learning approaches coupled with the use of transfer learning for the task of table structure recognition.
We present a multistage pipeline named Multi-Type-TD-TSR, which offers an end-to-end solution for the problem of table recognition.
arXiv Detail & Related papers (2021-05-23T21:17:18Z) - TabLeX: A Benchmark Dataset for Structure and Content Information
Extraction from Scientific Tables [1.4115224153549193]
This paper presents TabLeX, a large-scale benchmark dataset comprising table images generated from scientific articles.
To facilitate the development of robust table IE tools, TabLeX contains images in different aspect ratios and in a variety of fonts.
Our analysis sheds light on the shortcomings of current state-of-the-art table extraction models and shows that they fail on even simple table images.
arXiv Detail & Related papers (2021-05-12T05:13:38Z) - Deep Structured Feature Networks for Table Detection and Tabular Data
Extraction from Scanned Financial Document Images [0.6299766708197884]
This research is proposing an automated table detection and tabular data extraction from financial PDF documents.
We proposed a method that consists of three main processes, which are detecting table areas with a Faster R-CNN (Region-based Convolutional Neural Network) model.
The excellent table detection performance of the detection model is obtained from our customized dataset.
arXiv Detail & Related papers (2021-02-20T08:21:17Z) - GFTE: Graph-based Financial Table Extraction [66.26206038522339]
In financial industry and many other fields, tables are often disclosed in unstructured digital files, e.g. Portable Document Format (PDF) and images.
We publish a standard Chinese dataset named FinTab, which contains more than 1,600 financial tables of diverse kinds.
We propose a novel graph-based convolutional network model named GFTE as a baseline for future comparison.
arXiv Detail & Related papers (2020-03-17T07:10:05Z) - TableNet: Deep Learning model for end-to-end Table detection and Tabular
data extraction from Scanned Document Images [18.016832803961165]
We propose a novel end-to-end deep learning model for both table detection and structure recognition.
TableNet exploits the interdependence between the twin tasks of table detection and table structure recognition.
The proposed model and extraction approach was evaluated on the publicly available ICDAR 2013 and Marmot Table datasets.
arXiv Detail & Related papers (2020-01-06T10:25:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.