Related papers: TABLET: Table Structure Recognition using Encoder-only Transformers

TABLET: Table Structure Recognition using Encoder-only Transformers

URL: http://arxiv.org/abs/2506.07015v1
Date: Sun, 08 Jun 2025 06:34:15 GMT
Title: TABLET: Table Structure Recognition using Encoder-only Transformers
Authors: Qiyu Hou, Jun Wang,
Abstract summary: We propose a novel Split-Merge-based top-down model optimized for large, densely populated tables.<n>Our approach formulates row and column splitting as sequence labeling tasks, utilizing dual Transformer encoders to capture feature interactions.<n>Our method reduces resolution loss and computational complexity, achieving high accuracy while maintaining fast processing speed.
Score: 5.525467421201709
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To address the challenges of table structure recognition, we propose a novel Split-Merge-based top-down model optimized for large, densely populated tables. Our approach formulates row and column splitting as sequence labeling tasks, utilizing dual Transformer encoders to capture feature interactions. The merging process is framed as a grid cell classification task, leveraging an additional Transformer encoder to ensure accurate and coherent merging. By eliminating unstable bounding box predictions, our method reduces resolution loss and computational complexity, achieving high accuracy while maintaining fast processing speed. Extensive experiments on FinTabNet and PubTabNet demonstrate the superiority of our model over existing approaches, particularly in real-world applications. Our method offers a robust, scalable, and efficient solution for large-scale table recognition, making it well-suited for industrial deployment.

Related papers

DETQUS: Decomposition-Enhanced Transformers for QUery-focused Summarization [0.6825805890534123]
We introduce DETQUS (Decomposition-Enhanced Transformers for QUery-focused Summarization), a system designed to improve summarization accuracy.<n>We employ a large language model to reduce table size, retaining only query-relevant columns while preserving essential information.<n>Our approach, equipped with table-based QA model Omnitab, achieves a ROUGE-L score of 0.4437, outperforming the previous state-of-the-art REFACTOR model (ROUGE-L: 0.422)
arXiv Detail & Related papers (2025-03-07T21:11:35Z)
Mixture of Attention Yields Accurate Results for Tabular Data [21.410818837489973]
We propose MAYA, an encoder-decoder transformer-based framework.<n>In the encoder, we design a Mixture of Attention (MOA) that constructs multiple parallel attention branches.<n>We employ collaborative learning with a dynamic consistency weight constraint to produce more robust representations.
arXiv Detail & Related papers (2025-02-18T03:43:42Z)
RegFormer: An Efficient Projection-Aware Transformer Network for Large-Scale Point Cloud Registration [73.69415797389195]
We propose an end-to-end transformer network (RegFormer) for large-scale point cloud alignment. Specifically, a projection-aware hierarchical transformer is proposed to capture long-range dependencies and filter outliers. Our transformer has linear complexity, which guarantees high efficiency even for large-scale scenes.
arXiv Detail & Related papers (2023-03-22T08:47:37Z)
TRUST: An Accurate and End-to-End Table structure Recognizer Using Splitting-based Transformers [56.56591337457137]
We propose an accurate and end-to-end transformer-based table structure recognition method, referred to as TRUST. Transformers are suitable for table structure recognition because of their global computations, perfect memory, and parallel computation. We conduct experiments on several popular benchmarks including PubTabNet and SynthTable, our method achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-08-31T08:33:36Z)
TSRFormer: Table Structure Recognition with Transformers [15.708108572696064]
We present a new table structure recognition (TSR) approach, called TSRFormer, to robustly recognize the structures of complex tables with geometrical distortions from various table images. We propose a new two-stage DETR based separator prediction approach, dubbed textbfSeparator textbfREgression textbfTRansformer (SepRETR) We achieve state-of-the-art performance on several benchmark datasets, including SciTSR, PubTabNet and WTW.
arXiv Detail & Related papers (2022-08-09T17:36:13Z)
Dual-Flattening Transformers through Decomposed Row and Column Queries for Semantic Segmentation [50.321277476317974]
We propose a Dual-Flattening Transformer (DFlatFormer) to enable high-resolution output. Experiments on ADE20K and Cityscapes datasets demonstrate the superiority of the proposed dual-flattening transformer architecture.
arXiv Detail & Related papers (2022-01-22T22:38:15Z)
Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences [52.6022911513076]
Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. We propose Linformer and Informer to reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention.
arXiv Detail & Related papers (2021-12-10T06:58:05Z)
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers [115.90778814368703]
Our objective is language-based search of large-scale image and video datasets. For this task, the approach that consists of independently mapping text and vision to a joint embedding space, a.k.a. dual encoders, is attractive as retrieval scales. An alternative approach of using vision-text transformers with cross-attention gives considerable improvements in accuracy over the joint embeddings.
arXiv Detail & Related papers (2021-03-30T17:57:08Z)
The Cascade Transformer: an Application for Efficient Answer Sentence Selection [116.09532365093659]
We introduce the Cascade Transformer, a technique to adapt transformer-based models into a cascade of rankers. When compared to a state-of-the-art transformer model, our approach reduces computation by 37% with almost no impact on accuracy.
arXiv Detail & Related papers (2020-05-05T23:32:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.