Towards End-to-End Semi-Supervised Table Detection with Semantic Aligned Matching Transformer
- URL: http://arxiv.org/abs/2405.00187v1
- Date: Tue, 30 Apr 2024 20:25:57 GMT
- Title: Towards End-to-End Semi-Supervised Table Detection with Semantic Aligned Matching Transformer
- Authors: Tahira Shehzadi, Shalini Sarode, Didier Stricker, Muhammad Zeshan Afzal,
- Abstract summary: Table detection within document images is a crucial task in document processing, involving the identification and localization of tables.
Recent strides in deep learning have substantially improved the accuracy of this task, but it still relies on large labeled datasets for effective training.
We introduce a semi-supervised approach employing SAM-DETR, a novel approach for precise alignment between object queries and target features.
- Score: 12.042768320132694
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Table detection within document images is a crucial task in document processing, involving the identification and localization of tables. Recent strides in deep learning have substantially improved the accuracy of this task, but it still heavily relies on large labeled datasets for effective training. Several semi-supervised approaches have emerged to overcome this challenge, often employing CNN-based detectors with anchor proposals and post-processing techniques like non-maximal suppression (NMS). However, recent advancements in the field have shifted the focus towards transformer-based techniques, eliminating the need for NMS and emphasizing object queries and attention mechanisms. Previous research has focused on two key areas to improve transformer-based detectors: refining the quality of object queries and optimizing attention mechanisms. However, increasing object queries can introduce redundancy, while adjustments to the attention mechanism can increase complexity. To address these challenges, we introduce a semi-supervised approach employing SAM-DETR, a novel approach for precise alignment between object queries and target features. Our approach demonstrates remarkable reductions in false positives and substantial enhancements in table detection performance, particularly in complex documents characterized by diverse table structures. This work provides more efficient and accurate table detection in semi-supervised settings.
Related papers
- Better Sampling, towards Better End-to-end Small Object Detection [7.7473020808686694]
Small object detection remains unsatisfactory due to limited characteristics and high density and mutual overlap.
We propose methods enhancing sampling within an end-to-end framework.
Our model demonstrates a significant enhancement, achieving a 2.9% increase in average precision (AP) over the state-of-the-art (SOTA) on the VisDrone dataset.
arXiv Detail & Related papers (2024-05-17T04:37:44Z) - End-to-End Semi-Supervised approach with Modulated Object Queries for Table Detection in Documents [12.042768320132694]
This study presents an innovative transformer-based semi-supervised table detector.
It improves the quality of pseudo-labels through a novel matching strategy.
It achieves new state-of-the-art results, with a mAP of 95.7% and 97.9% on TableBank (word) and PubLaynet with 30% label data.
arXiv Detail & Related papers (2024-05-08T11:24:57Z) - Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection [12.417754433715903]
We introduce Sparse Semi-DETR, a novel transformer-based, end-to-end semi-supervised object detection solution.
Sparse Semi-DETR incorporates a Query Refinement Module to enhance the quality of object queries, significantly improving detection capabilities for small and partially obscured objects.
On the MS-COCO and Pascal VOC object detection benchmarks, Sparse Semi-DETR achieves a significant improvement over current state-of-the-art methods.
arXiv Detail & Related papers (2024-04-02T10:22:23Z) - Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions.
Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z) - Decoupled DETR: Spatially Disentangling Localization and Classification
for Improved End-to-End Object Detection [48.429555904690595]
We introduce spatially decoupled DETR, which includes a task-aware query generation module and a disentangled feature learning process.
We demonstrate that our approach achieves a significant improvement in MSCOCO datasets compared to previous work.
arXiv Detail & Related papers (2023-10-24T15:54:11Z) - Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - Towards End-to-End Semi-Supervised Table Detection with Deformable
Transformer [11.648151981111436]
Table detection is the task of classifying and localizing table objects within document images.
Many semi-supervised approaches are introduced to mitigate the need for a substantial amount of label data.
This paper presents a novel end-to-end semi-supervised table detection method that employs the deformable transformer for detecting table objects.
arXiv Detail & Related papers (2023-05-04T12:15:15Z) - AntPivot: Livestream Highlight Detection via Hierarchical Attention
Mechanism [64.70568612993416]
We formulate a new task Livestream Highlight Detection, discuss and analyze the difficulties listed above and propose a novel architecture AntPivot to solve this problem.
We construct a fully-annotated dataset AntHighlight to instantiate this task and evaluate the performance of our model.
arXiv Detail & Related papers (2022-06-10T05:58:11Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z) - FairMOT: On the Fairness of Detection and Re-Identification in Multiple
Object Tracking [92.48078680697311]
Multi-object tracking (MOT) is an important problem in computer vision.
We present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet.
The approach achieves high accuracy for both detection and tracking.
arXiv Detail & Related papers (2020-04-04T08:18:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.