Tab.IAIS: Flexible Table Recognition and Semantic Interpretation System
- URL: http://arxiv.org/abs/2105.11879v1
- Date: Tue, 25 May 2021 12:31:02 GMT
- Title: Tab.IAIS: Flexible Table Recognition and Semantic Interpretation System
- Authors: Marcin Namysl, Alexander M. Esser, Sven Behnke, Joachim K\"ohler
- Abstract summary: We develop two rule-based algorithms that perform the complete table recognition process and support the most frequent table formats.
To incorporate the extraction of semantic information into the table recognition process, we develop a graph-based table interpretation method.
Our table recognition approach achieves results competitive with state-of-the-art approaches.
- Score: 84.39812458417246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Table extraction is an important but still unsolved problem. In this paper,
we introduce a flexible end-to-end table extraction system. We develop two
rule-based algorithms that perform the complete table recognition process and
support the most frequent table formats found in the scientific literature.
Moreover, to incorporate the extraction of semantic information into the table
recognition process, we develop a graph-based table interpretation method. We
conduct extensive experiments on the challenging table recognition benchmarks
ICDAR 2013 and ICDAR 2019. Our table recognition approach achieves results
competitive with state-of-the-art approaches. Moreover, our complete
information extraction system exhibited a high F1 score of 0.7380 proving the
utility of our approach.
Related papers
- TableRAG: Million-Token Table Understanding with Language Models [53.039560091592215]
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding.
TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.
Our results demonstrate that TableRAG achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
arXiv Detail & Related papers (2024-10-07T04:15:02Z) - Synthesizing Realistic Data for Table Recognition [4.500373384879752]
We propose a novel method for synthesizing annotation data specifically designed for table recognition.
By leveraging the structure and content of tables from Chinese financial announcements, we have developed the first extensive table annotation dataset.
We have established the inaugural benchmark for real-world complex tables in the Chinese financial announcement domain, using it to assess the performance of models trained on our synthetic data.
arXiv Detail & Related papers (2024-04-17T06:36:17Z) - TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content [39.34067105360439]
We propose an end-to-end pipeline that integrates deep learning models, including DETR, CascadeTabNet, and PP OCR v2, to achieve comprehensive image-based table recognition.
Our system achieves simultaneous table detection (TD), table structure recognition (TSR), and table content recognition (TCR)
Our proposed approach achieves an IOU of 0.96 and an OCR Accuracy of 78%, showcasing a remarkable improvement of approximately 25% in the OCR Accuracy compared to the previous Table Transformer approach.
arXiv Detail & Related papers (2024-04-16T06:24:53Z) - A Review On Table Recognition Based On Deep Learning [0.0]
Table recognition is using the computer to automatically understand the table.
The development of deep learning techniques has brought a new paradigm to this field.
This review mainly discusses the table recognition problem from five aspects.
arXiv Detail & Related papers (2023-12-08T02:58:00Z) - SEMv2: Table Separation Line Detection Based on Instance Segmentation [96.36188168694781]
We propose an accurate table structure recognizer, termed SEMv2 (SEM: Split, Embed and Merge)
We address the table separation line instance-level discrimination problem and introduce a table separation line detection strategy based on conditional convolution.
To comprehensively evaluate the SEMv2, we also present a more challenging dataset for table structure recognition, dubbed iFLYTAB.
arXiv Detail & Related papers (2023-03-08T05:15:01Z) - Mixed-modality Representation Learning and Pre-training for Joint
Table-and-Text Retrieval in OpenQA [85.17249272519626]
An optimized OpenQA Table-Text Retriever (OTTeR) is proposed.
We conduct retrieval-centric mixed-modality synthetic pre-training.
OTTeR substantially improves the performance of table-and-text retrieval on the OTT-QA dataset.
arXiv Detail & Related papers (2022-10-11T07:04:39Z) - Generating Table Vector Representations [11.092714216647245]
This paper is an evaluation of methods for table-to-class annotation.
We provide a formal definition for table classification as a machine learning task.
arXiv Detail & Related papers (2021-10-28T14:05:21Z) - TGRNet: A Table Graph Reconstruction Network for Table Structure
Recognition [76.06530816349763]
We propose an end-to-end trainable table graph reconstruction network (TGRNet) for table structure recognition.
Specifically, the proposed method has two main branches, a cell detection branch and a cell logical location branch, to jointly predict the spatial location and the logical location of different cells.
arXiv Detail & Related papers (2021-06-20T01:57:05Z) - Multi-Type-TD-TSR -- Extracting Tables from Document Images using a
Multi-stage Pipeline for Table Detection and Table Structure Recognition:
from OCR to Structured Table Representations [63.98463053292982]
The recognition of tables consists of two main tasks, namely table detection and table structure recognition.
Recent work shows a clear trend towards deep learning approaches coupled with the use of transfer learning for the task of table structure recognition.
We present a multistage pipeline named Multi-Type-TD-TSR, which offers an end-to-end solution for the problem of table recognition.
arXiv Detail & Related papers (2021-05-23T21:17:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.