TRUST: An Accurate and End-to-End Table structure Recognizer Using
Splitting-based Transformers
- URL: http://arxiv.org/abs/2208.14687v1
- Date: Wed, 31 Aug 2022 08:33:36 GMT
- Title: TRUST: An Accurate and End-to-End Table structure Recognizer Using
Splitting-based Transformers
- Authors: Zengyuan Guo, Yuechen Yu, Pengyuan Lv, Chengquan Zhang, Haojie Li,
Zhihui Wang, Kun Yao, Jingtuo Liu, Jingdong Wang
- Abstract summary: We propose an accurate and end-to-end transformer-based table structure recognition method, referred to as TRUST.
Transformers are suitable for table structure recognition because of their global computations, perfect memory, and parallel computation.
We conduct experiments on several popular benchmarks including PubTabNet and SynthTable, our method achieves new state-of-the-art results.
- Score: 56.56591337457137
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Table structure recognition is a crucial part of document image analysis
domain. Its difficulty lies in the need to parse the physical coordinates and
logical indices of each cell at the same time. However, the existing methods
are difficult to achieve both these goals, especially when the table splitting
lines are blurred or tilted. In this paper, we propose an accurate and
end-to-end transformer-based table structure recognition method, referred to as
TRUST. Transformers are suitable for table structure recognition because of
their global computations, perfect memory, and parallel computation. By
introducing novel Transformer-based Query-based Splitting Module and
Vertex-based Merging Module, the table structure recognition problem is
decoupled into two joint optimization sub-tasks: multi-oriented table
row/column splitting and table grid merging. The Query-based Splitting Module
learns strong context information from long dependencies via Transformer
networks, accurately predicts the multi-oriented table row/column separators,
and obtains the basic grids of the table accordingly. The Vertex-based Merging
Module is capable of aggregating local contextual information between adjacent
basic grids, providing the ability to merge basic girds that belong to the same
spanning cell accurately. We conduct experiments on several popular benchmarks
including PubTabNet and SynthTable, our method achieves new state-of-the-art
results. In particular, TRUST runs at 10 FPS on PubTabNet, surpassing the
previous methods by a large margin.
Related papers
- SEMv3: A Fast and Robust Approach to Table Separation Line Detection [48.75713662571455]
Table structure recognition (TSR) aims to parse the inherent structure of a table from its input image.
"Split-and-merge" paradigm is a pivotal approach to parse table structure, where the table separation line detection is crucial.
We propose SEMv3 (SEM: Split, Embed and Merge), a method that is both fast and robust for detecting table separation lines.
arXiv Detail & Related papers (2024-05-20T08:13:46Z) - ClusterTabNet: Supervised clustering method for table detection and table structure recognition [0.0]
We present a novel deep-learning-based method to cluster words in documents which we apply to detect and recognize tables given the OCR output.
We interpret table structure bottom-up as a graph of relations between pairs of words and use a transformer encoder model to predict its adjacency matrix.
Compared to the current state-of-the-art detection methods such as DETR and Faster R-CNN, our method achieves similar or better accuracy, while requiring a significantly smaller model.
arXiv Detail & Related papers (2024-02-12T09:10:24Z) - SEMv2: Table Separation Line Detection Based on Instance Segmentation [96.36188168694781]
We propose an accurate table structure recognizer, termed SEMv2 (SEM: Split, Embed and Merge)
We address the table separation line instance-level discrimination problem and introduce a table separation line detection strategy based on conditional convolution.
To comprehensively evaluate the SEMv2, we also present a more challenging dataset for table structure recognition, dubbed iFLYTAB.
arXiv Detail & Related papers (2023-03-08T05:15:01Z) - TSRFormer: Table Structure Recognition with Transformers [15.708108572696064]
We present a new table structure recognition (TSR) approach, called TSRFormer, to robustly recognize the structures of complex tables with geometrical distortions from various table images.
We propose a new two-stage DETR based separator prediction approach, dubbed textbfSeparator textbfREgression textbfTRansformer (SepRETR)
We achieve state-of-the-art performance on several benchmark datasets, including SciTSR, PubTabNet and WTW.
arXiv Detail & Related papers (2022-08-09T17:36:13Z) - Split, embed and merge: An accurate table structure recognizer [42.579215135672094]
We introduce Split, Embed and Merge (SEM) as an accurate table structure recognizer.
SEM can achieve an average F-Measure of $96.9%$ on the SciTSR dataset.
arXiv Detail & Related papers (2021-07-12T06:26:19Z) - TGRNet: A Table Graph Reconstruction Network for Table Structure
Recognition [76.06530816349763]
We propose an end-to-end trainable table graph reconstruction network (TGRNet) for table structure recognition.
Specifically, the proposed method has two main branches, a cell detection branch and a cell logical location branch, to jointly predict the spatial location and the logical location of different cells.
arXiv Detail & Related papers (2021-06-20T01:57:05Z) - LGPMA: Complicated Table Structure Recognition with Local and Global
Pyramid Mask Alignment [54.768354427967296]
Table structure recognition is a challenging task due to the various structures and complicated cell spanning relations.
We propose the framework of Local and Global Pyramid Mask Alignment, which adopts the soft pyramid mask learning mechanism in both the local and global feature maps.
A pyramid mask re-scoring module is then integrated to compromise the local and global information and refine the predicted boundaries.
arXiv Detail & Related papers (2021-05-13T12:24:12Z) - TCN: Table Convolutional Network for Web Table Interpretation [52.32515851633981]
We propose a novel table representation learning approach considering both the intra- and inter-table contextual information.
Our method can outperform competitive baselines by +4.8% of F1 for column type prediction and by +4.1% of F1 for column pairwise relation prediction.
arXiv Detail & Related papers (2021-02-17T02:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.