TDeLTA: A Light-weight and Robust Table Detection Method based on
Learning Text Arrangement
- URL: http://arxiv.org/abs/2312.11043v1
- Date: Mon, 18 Dec 2023 09:18:43 GMT
- Title: TDeLTA: A Light-weight and Robust Table Detection Method based on
Learning Text Arrangement
- Authors: Yang Fan, Xiangping Wu, Qingcai Chen, Heng Li, Yan Huang, Zhixiang
Cai, Qitian Wu
- Abstract summary: We propose a novel, light-weighted and robust Table Detection method based on Learning Text Arrangement, namely TDeLTA.
To locate the tables precisely, we design a text-classification task, classifying the text blocks into 4 categories according to their semantic roles in the tables.
Compared to several state-of-the-art methods, TDeLTA achieves competitive results with only 3.1M model parameters on the large-scale public datasets.
- Score: 34.73880086005418
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The diversity of tables makes table detection a great challenge, leading to
existing models becoming more tedious and complex. Despite achieving high
performance, they often overfit to the table style in training set, and suffer
from significant performance degradation when encountering out-of-distribution
tables in other domains. To tackle this problem, we start from the essence of
the table, which is a set of text arranged in rows and columns. Based on this,
we propose a novel, light-weighted and robust Table Detection method based on
Learning Text Arrangement, namely TDeLTA. TDeLTA takes the text blocks as
input, and then models the arrangement of them with a sequential encoder and an
attention module. To locate the tables precisely, we design a
text-classification task, classifying the text blocks into 4 categories
according to their semantic roles in the tables. Experiments are conducted on
both the text blocks parsed from PDF and extracted by open-source OCR tools,
respectively. Compared to several state-of-the-art methods, TDeLTA achieves
competitive results with only 3.1M model parameters on the large-scale public
datasets. Moreover, when faced with the cross-domain data under the 0-shot
setting, TDeLTA outperforms baselines by a large margin of nearly 7%, which
shows the strong robustness and transferability of the proposed model.
Related papers
- TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively.
It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z) - PixT3: Pixel-based Table-To-Text Generation [66.96636025277536]
We present PixT3, a multimodal table-to-text model that overcomes the challenges of linearization and input size limitations.
Experiments on the ToTTo and Logic2Text benchmarks show that PixT3 is competitive and superior to generators that operate solely on text.
arXiv Detail & Related papers (2023-11-16T11:32:47Z) - HeLM: Highlighted Evidence augmented Language Model for Enhanced Table-to-Text Generation [7.69801337810352]
We conduct parameter-efficient fine-tuning on the LLaMA2 model.
Our approach involves injecting reasoning information into the input by emphasizing table-specific row data.
On both the FetaQA and QTSumm datasets, our approach achieved state-of-the-art results.
arXiv Detail & Related papers (2023-11-15T12:02:52Z) - ReTAG: Reasoning Aware Table to Analytic Text Generation [12.603569641254417]
ReTAG is a table and reasoning aware model that uses vector-quantization to infuse different types of analytical reasoning into the output.
We extend (and open source 35.6K analytical, 55.9k descriptive instances) the ToTTo, InfoTabs datasets with the reasoning categories used in each reference sentences.
arXiv Detail & Related papers (2023-05-19T17:03:09Z) - SEMv2: Table Separation Line Detection Based on Instance Segmentation [96.36188168694781]
We propose an accurate table structure recognizer, termed SEMv2 (SEM: Split, Embed and Merge)
We address the table separation line instance-level discrimination problem and introduce a table separation line detection strategy based on conditional convolution.
To comprehensively evaluate the SEMv2, we also present a more challenging dataset for table structure recognition, dubbed iFLYTAB.
arXiv Detail & Related papers (2023-03-08T05:15:01Z) - Bridge the Gap between Language models and Tabular Understanding [99.88470271644894]
Table pretrain-then-finetune paradigm has been proposed and employed at a rapid pace after the success of pre-training in the natural language domain.
Despite the promising findings, there is an input gap between pre-training and fine-tuning phases.
We propose UTP, an approach that dynamically supports three types of multi-modal inputs: table-text, table, and text.
arXiv Detail & Related papers (2023-02-16T15:16:55Z) - Towards Table-to-Text Generation with Pretrained Language Model: A Table
Structure Understanding and Text Deliberating Approach [60.03002572791552]
We propose a table structure understanding and text deliberating approach, namely TASD.
Specifically, we devise a three-layered multi-head attention network to realize the table-structure-aware text generation model.
Our approach can generate faithful and fluent descriptive texts for different types of tables.
arXiv Detail & Related papers (2023-01-05T14:03:26Z) - Learning Better Representation for Tables by Self-Supervised Tasks [23.69766883380125]
We propose two self-supervised tasks, Number Ordering and Significance Ordering, to help to learn better table representation.
We test our methods on the widely used dataset ROTOWIRE which consists of NBA game statistic and related news.
arXiv Detail & Related papers (2020-10-15T09:03:38Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.