UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition
- URL: http://arxiv.org/abs/2409.13148v1
- Date: Fri, 20 Sep 2024 01:26:32 GMT
- Title: UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition
- Authors: Zhenrong Zhang, Shuhang Liu, Pengfei Hu, Jiefeng Ma, Jun Du, Jianshu Zhang, Yu Hu,
- Abstract summary: We introduce UniTabNet, a novel framework for table structure parsing based on the image-to-text model.
UniTabNet employs a divide-and-conquer'' strategy, utilizing an image-to-text model to decouple table cells and integrating both physical and logical decoders to reconstruct the complete table structure.
- Score: 55.153629718464565
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the digital era, table structure recognition technology is a critical tool for processing and analyzing large volumes of tabular data. Previous methods primarily focus on visual aspects of table structure recovery but often fail to effectively comprehend the textual semantics within tables, particularly for descriptive textual cells. In this paper, we introduce UniTabNet, a novel framework for table structure parsing based on the image-to-text model. UniTabNet employs a ``divide-and-conquer'' strategy, utilizing an image-to-text model to decouple table cells and integrating both physical and logical decoders to reconstruct the complete table structure. We further enhance our framework with the Vision Guider, which directs the model's focus towards pertinent areas, thereby boosting prediction accuracy. Additionally, we introduce the Language Guider to refine the model's capability to understand textual semantics in table images. Evaluated on prominent table structure datasets such as PubTabNet, PubTables1M, WTW, and iFLYTAB, UniTabNet achieves a new state-of-the-art performance, demonstrating the efficacy of our approach. The code will also be made publicly available.
Related papers
- Rethinking Image-based Table Recognition Using Weakly Supervised Methods [3.9993134366218857]
We propose a weakly supervised model named WSTabNet for table recognition that relies only on HTML (or) code-level annotations of table images.
To facilitate table recognition with deep learning, we create and release WikiTableSet, the largest publicly available table image-based dataset built from Wikipedia.
arXiv Detail & Related papers (2023-03-14T06:03:57Z) - Towards Table-to-Text Generation with Pretrained Language Model: A Table
Structure Understanding and Text Deliberating Approach [60.03002572791552]
We propose a table structure understanding and text deliberating approach, namely TASD.
Specifically, we devise a three-layered multi-head attention network to realize the table-structure-aware text generation model.
Our approach can generate faithful and fluent descriptive texts for different types of tables.
arXiv Detail & Related papers (2023-01-05T14:03:26Z) - A Graph Representation of Semi-structured Data for Web Question
Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations.
Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z) - Towards Faithful Neural Table-to-Text Generation with Content-Matching
Constraints [63.84063384518667]
We propose a novel Transformer-based generation framework to achieve the goal.
Core techniques in our method to enforce faithfulness include a new table-text optimal-transport matching loss.
To evaluate faithfulness, we propose a new automatic metric specialized to the table-to-text generation problem.
arXiv Detail & Related papers (2020-05-03T02:54:26Z) - Global Table Extractor (GTE): A Framework for Joint Table Identification
and Cell Structure Recognition Using Visual Context [11.99452212008243]
We present a vision-guided systematic framework for joint table detection and cell structured recognition.
With GTE-Table, we invent a new penalty based on the natural cell containment constraint of tables to train our table network.
We use this to enhance PubTabNet with cell labels and create FinTabNet, real-world and complex scientific and financial datasets.
arXiv Detail & Related papers (2020-05-01T20:14:49Z) - GFTE: Graph-based Financial Table Extraction [66.26206038522339]
In financial industry and many other fields, tables are often disclosed in unstructured digital files, e.g. Portable Document Format (PDF) and images.
We publish a standard Chinese dataset named FinTab, which contains more than 1,600 financial tables of diverse kinds.
We propose a novel graph-based convolutional network model named GFTE as a baseline for future comparison.
arXiv Detail & Related papers (2020-03-17T07:10:05Z) - Identifying Table Structure in Documents using Conditional Generative
Adversarial Networks [0.0]
In many industries and in academic research, information is primarily transmitted in the form of unstructured documents.
We propose a top-down approach, first using a conditional generative adversarial network to map a table image into a standardised skeleton' table form.
We then deriving latent table structure using xy-cut projection and Genetic Algorithm optimisation.
arXiv Detail & Related papers (2020-01-13T20:42:40Z) - TableNet: Deep Learning model for end-to-end Table detection and Tabular
data extraction from Scanned Document Images [18.016832803961165]
We propose a novel end-to-end deep learning model for both table detection and structure recognition.
TableNet exploits the interdependence between the twin tasks of table detection and table structure recognition.
The proposed model and extraction approach was evaluated on the publicly available ICDAR 2013 and Marmot Table datasets.
arXiv Detail & Related papers (2020-01-06T10:25:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.