LORE++: Logical Location Regression Network for Table Structure
Recognition with Pre-training
- URL: http://arxiv.org/abs/2401.01522v1
- Date: Wed, 3 Jan 2024 03:14:55 GMT
- Title: LORE++: Logical Location Regression Network for Table Structure
Recognition with Pre-training
- Authors: Rujiao Long and Hangdi Xing and Zhibo Yang and Qi Zheng and Zhi Yu and
Cong Yao and Fei Huang
- Abstract summary: Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats.
We model TSR as a logical location regression problem and propose a new TSR framework called LORE.
Our proposed LORE is conceptually simpler, easier to train, and more accurate than other paradigms of TSR.
- Score: 45.80561537971478
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Table structure recognition (TSR) aims at extracting tables in images into
machine-understandable formats. Recent methods solve this problem by predicting
the adjacency relations of detected cell boxes or learning to directly generate
the corresponding markup sequences from the table images. However, existing
approaches either count on additional heuristic rules to recover the table
structures, or face challenges in capturing long-range dependencies within
tables, resulting in increased complexity. In this paper, we propose an
alternative paradigm. We model TSR as a logical location regression problem and
propose a new TSR framework called LORE, standing for LOgical location
REgression network, which for the first time regresses logical location as well
as spatial location of table cells in a unified network. Our proposed LORE is
conceptually simpler, easier to train, and more accurate than other paradigms
of TSR. Moreover, inspired by the persuasive success of pre-trained models on a
number of computer vision and natural language processing tasks, we propose two
pre-training tasks to enrich the spatial and logical representations at the
feature level of LORE, resulting in an upgraded version called LORE++. The
incorporation of pre-training in LORE++ has proven to enjoy significant
advantages, leading to a substantial enhancement in terms of accuracy,
generalization, and few-shot capability compared to its predecessor.
Experiments on standard benchmarks against methods of previous paradigms
demonstrate the superiority of LORE++, which highlights the potential and
promising prospect of the logical location regression paradigm for TSR.
Related papers
- ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision.
This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline.
Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z) - Towards Unified Token Learning for Vision-Language Tracking [65.96561538356315]
We present a vision-language (VL) tracking pipeline, termed textbfMMTrack, which casts VL tracking as a token generation task.
Our proposed framework serializes language description and bounding box into a sequence of discrete tokens.
In this new design paradigm, all token queries are required to perceive the desired target and directly predict spatial coordinates of the target.
arXiv Detail & Related papers (2023-08-27T13:17:34Z) - Robust Table Structure Recognition with Dynamic Queries Enhanced
Detection Transformer [15.708108572696062]
We present a new table structure recognition approach, called TSRFormer, to robustly recognize the structures of complex tables with geometrical distortions from various table images.
With these new techniques, our TSRFormer achieves state-of-the-art performance on several benchmark datasets, including SciTSR, PubTabNet, WTW and FinTabNet.
arXiv Detail & Related papers (2023-03-21T06:20:49Z) - LORE: Logical Location Regression Network for Table Structure
Recognition [24.45544796305824]
Table structure recognition aims at extracting tables in images into machine-understandable formats.
Recent methods solve this problem by predicting the adjacency relations of detected cell boxes.
We propose a new TSR framework called LORE, standing for LOgical location REgression network.
arXiv Detail & Related papers (2023-03-07T08:42:46Z) - RRSR:Reciprocal Reference-based Image Super-Resolution with Progressive
Feature Alignment and Selection [66.08293086254851]
We propose a reciprocal learning framework to reinforce the learning of a RefSR network.
The newly proposed module aligns reference-input images at multi-scale feature spaces and performs reference-aware feature selection.
We empirically show that multiple recent state-of-the-art RefSR models can be consistently improved with our reciprocal learning paradigm.
arXiv Detail & Related papers (2022-11-08T12:39:35Z) - TSRFormer: Table Structure Recognition with Transformers [15.708108572696064]
We present a new table structure recognition (TSR) approach, called TSRFormer, to robustly recognize the structures of complex tables with geometrical distortions from various table images.
We propose a new two-stage DETR based separator prediction approach, dubbed textbfSeparator textbfREgression textbfTRansformer (SepRETR)
We achieve state-of-the-art performance on several benchmark datasets, including SciTSR, PubTabNet and WTW.
arXiv Detail & Related papers (2022-08-09T17:36:13Z) - VQ-T: RNN Transducers using Vector-Quantized Prediction Network States [52.48566999668521]
We propose to use vector-quantized long short-term memory units in the prediction network of RNN transducers.
By training the discrete representation jointly with the ASR network, hypotheses can be actively merged for lattice generation.
Our experiments on the Switchboard corpus show that the proposed VQ RNN transducers improve ASR performance over transducers with regular prediction networks.
arXiv Detail & Related papers (2022-08-03T02:45:52Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.