Robust Table Structure Recognition with Dynamic Queries Enhanced
Detection Transformer
- URL: http://arxiv.org/abs/2303.11615v2
- Date: Wed, 12 Jul 2023 09:05:17 GMT
- Title: Robust Table Structure Recognition with Dynamic Queries Enhanced
Detection Transformer
- Authors: Jiawei Wang, Weihong Lin, Chixiang Ma, Mingze Li, Zheng Sun, Lei Sun,
Qiang Huo
- Abstract summary: We present a new table structure recognition approach, called TSRFormer, to robustly recognize the structures of complex tables with geometrical distortions from various table images.
With these new techniques, our TSRFormer achieves state-of-the-art performance on several benchmark datasets, including SciTSR, PubTabNet, WTW and FinTabNet.
- Score: 15.708108572696062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a new table structure recognition (TSR) approach, called
TSRFormer, to robustly recognizing the structures of complex tables with
geometrical distortions from various table images. Unlike previous methods, we
formulate table separation line prediction as a line regression problem instead
of an image segmentation problem and propose a new two-stage dynamic queries
enhanced DETR based separation line regression approach, named DQ-DETR, to
predict separation lines from table images directly. Compared to Vallina DETR,
we propose three improvements in DQ-DETR to make the two-stage DETR framework
work efficiently and effectively for the separation line prediction task: 1) A
new query design, named Dynamic Query, to decouple single line query into
separable point queries which could intuitively improve the localization
accuracy for regression tasks; 2) A dynamic queries based progressive line
regression approach to progressively regressing points on the line which
further enhances localization accuracy for distorted tables; 3) A
prior-enhanced matching strategy to solve the slow convergence issue of DETR.
After separation line prediction, a simple relation network based cell merging
module is used to recover spanning cells. With these new techniques, our
TSRFormer achieves state-of-the-art performance on several benchmark datasets,
including SciTSR, PubTabNet, WTW and FinTabNet. Furthermore, we have validated
the robustness and high localization accuracy of our approach to tables with
complex structures, borderless cells, large blank spaces, empty or spanning
cells as well as distorted or even curved shapes on a more challenging
real-world in-house dataset.
Related papers
- SEMv3: A Fast and Robust Approach to Table Separation Line Detection [48.75713662571455]
Table structure recognition (TSR) aims to parse the inherent structure of a table from its input image.
"Split-and-merge" paradigm is a pivotal approach to parse table structure, where the table separation line detection is crucial.
We propose SEMv3 (SEM: Split, Embed and Merge), a method that is both fast and robust for detecting table separation lines.
arXiv Detail & Related papers (2024-05-20T08:13:46Z) - LORE++: Logical Location Regression Network for Table Structure
Recognition with Pre-training [45.80561537971478]
Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats.
We model TSR as a logical location regression problem and propose a new TSR framework called LORE.
Our proposed LORE is conceptually simpler, easier to train, and more accurate than other paradigms of TSR.
arXiv Detail & Related papers (2024-01-03T03:14:55Z) - SEMv2: Table Separation Line Detection Based on Instance Segmentation [96.36188168694781]
We propose an accurate table structure recognizer, termed SEMv2 (SEM: Split, Embed and Merge)
We address the table separation line instance-level discrimination problem and introduce a table separation line detection strategy based on conditional convolution.
To comprehensively evaluate the SEMv2, we also present a more challenging dataset for table structure recognition, dubbed iFLYTAB.
arXiv Detail & Related papers (2023-03-08T05:15:01Z) - LORE: Logical Location Regression Network for Table Structure
Recognition [24.45544796305824]
Table structure recognition aims at extracting tables in images into machine-understandable formats.
Recent methods solve this problem by predicting the adjacency relations of detected cell boxes.
We propose a new TSR framework called LORE, standing for LOgical location REgression network.
arXiv Detail & Related papers (2023-03-07T08:42:46Z) - Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation [103.90033029330527]
Few-Shot Instance (FSIS) requires detecting and segmenting novel classes with limited support examples.
We introduce a unified framework, Reference Twice (RefT), to exploit the relationship between support and query features for FSIS.
arXiv Detail & Related papers (2023-01-03T15:33:48Z) - Learning Cross-view Geo-localization Embeddings via Dynamic Weighted
Decorrelation Regularization [52.493240055559916]
Cross-view geo-localization aims to spot images of the same location shot from two platforms, e.g., the drone platform and the satellite platform.
Existing methods usually focus on optimizing the distance between one embedding with others in the feature space.
In this paper, we argue that the low redundancy is also of importance, which motivates the model to mine more diverse patterns.
arXiv Detail & Related papers (2022-11-10T02:13:10Z) - TRUST: An Accurate and End-to-End Table structure Recognizer Using
Splitting-based Transformers [56.56591337457137]
We propose an accurate and end-to-end transformer-based table structure recognition method, referred to as TRUST.
Transformers are suitable for table structure recognition because of their global computations, perfect memory, and parallel computation.
We conduct experiments on several popular benchmarks including PubTabNet and SynthTable, our method achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-08-31T08:33:36Z) - TSRFormer: Table Structure Recognition with Transformers [15.708108572696064]
We present a new table structure recognition (TSR) approach, called TSRFormer, to robustly recognize the structures of complex tables with geometrical distortions from various table images.
We propose a new two-stage DETR based separator prediction approach, dubbed textbfSeparator textbfREgression textbfTRansformer (SepRETR)
We achieve state-of-the-art performance on several benchmark datasets, including SciTSR, PubTabNet and WTW.
arXiv Detail & Related papers (2022-08-09T17:36:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.