Related papers: Towards End-to-End Semi-Supervised Table Detection with Semantic Aligned Matching Transformer

Towards End-to-End Semi-Supervised Table Detection with Semantic Aligned Matching Transformer

URL: http://arxiv.org/abs/2405.00187v1
Date: Tue, 30 Apr 2024 20:25:57 GMT
Title: Towards End-to-End Semi-Supervised Table Detection with Semantic Aligned Matching Transformer
Authors: Tahira Shehzadi, Shalini Sarode, Didier Stricker, Muhammad Zeshan Afzal,
Abstract summary: Table detection within document images is a crucial task in document processing, involving the identification and localization of tables. Recent strides in deep learning have substantially improved the accuracy of this task, but it still relies on large labeled datasets for effective training. We introduce a semi-supervised approach employing SAM-DETR, a novel approach for precise alignment between object queries and target features.
Score: 12.042768320132694
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Table detection within document images is a crucial task in document processing, involving the identification and localization of tables. Recent strides in deep learning have substantially improved the accuracy of this task, but it still heavily relies on large labeled datasets for effective training. Several semi-supervised approaches have emerged to overcome this challenge, often employing CNN-based detectors with anchor proposals and post-processing techniques like non-maximal suppression (NMS). However, recent advancements in the field have shifted the focus towards transformer-based techniques, eliminating the need for NMS and emphasizing object queries and attention mechanisms. Previous research has focused on two key areas to improve transformer-based detectors: refining the quality of object queries and optimizing attention mechanisms. However, increasing object queries can introduce redundancy, while adjustments to the attention mechanism can increase complexity. To address these challenges, we introduce a semi-supervised approach employing SAM-DETR, a novel approach for precise alignment between object queries and target features. Our approach demonstrates remarkable reductions in false positives and substantial enhancements in table detection performance, particularly in complex documents characterized by diverse table structures. This work provides more efficient and accurate table detection in semi-supervised settings.

Related papers

Better Sampling, towards Better End-to-end Small Object Detection [7.7473020808686694]
Small object detection remains unsatisfactory due to limited characteristics and high density and mutual overlap. We propose methods enhancing sampling within an end-to-end framework. Our model demonstrates a significant enhancement, achieving a 2.9% increase in average precision (AP) over the state-of-the-art (SOTA) on the VisDrone dataset.
arXiv Detail & Related papers (2024-05-17T04:37:44Z)
End-to-End Semi-Supervised approach with Modulated Object Queries for Table Detection in Documents [12.042768320132694]
This study presents an innovative transformer-based semi-supervised table detector. It improves the quality of pseudo-labels through a novel matching strategy. It achieves new state-of-the-art results, with a mAP of 95.7% and 97.9% on TableBank (word) and PubLaynet with 30% label data.
arXiv Detail & Related papers (2024-05-08T11:24:57Z)
Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection [12.417754433715903]
We introduce Sparse Semi-DETR, a novel transformer-based, end-to-end semi-supervised object detection solution. Sparse Semi-DETR incorporates a Query Refinement Module to enhance the quality of object queries, significantly improving detection capabilities for small and partially obscured objects. On the MS-COCO and Pascal VOC object detection benchmarks, Sparse Semi-DETR achieves a significant improvement over current state-of-the-art methods.
arXiv Detail & Related papers (2024-04-02T10:22:23Z)
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions. Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z)
RQFormer: Rotated Query Transformer for End-to-End Oriented Object Detection [26.37802649901314]
Oriented object detection presents a challenging task due to the presence of object instances with multiple orientations, varying scales, and dense distributions. We propose an end-to-end oriented detector called the Rotated Query Transformer, which integrates two key technologies. Experiments conducted on four remote sensing datasets and one scene text dataset demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-11-29T13:43:17Z)
Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection [48.429555904690595]
We introduce spatially decoupled DETR, which includes a task-aware query generation module and a disentangled feature learning process. We demonstrate that our approach achieves a significant improvement in MSCOCO datasets compared to previous work.
arXiv Detail & Related papers (2023-10-24T15:54:11Z)
Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection. First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network. Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z)
Calibrating Undisciplined Over-Smoothing in Transformer for Weakly Supervised Semantic Segmentation [51.14107156747967]
Weakly supervised semantic segmentation (WSSS) has attracted considerable attention because it requires fewer annotations than fully supervised approaches.<n>We propose an Adaptive Re-Activation Mechanism (AReAM) to control deep-level attention to undisciplined over-smoothing.<n>AReAM substantially improves segmentation performance compared with existing WSSS methods, reducing noise while sharpening focus on relevant semantic regions.
arXiv Detail & Related papers (2023-05-04T19:11:33Z)
Towards End-to-End Semi-Supervised Table Detection with Deformable Transformer [11.648151981111436]
Table detection is the task of classifying and localizing table objects within document images. Many semi-supervised approaches are introduced to mitigate the need for a substantial amount of label data. This paper presents a novel end-to-end semi-supervised table detection method that employs the deformable transformer for detecting table objects.
arXiv Detail & Related papers (2023-05-04T12:15:15Z)
AntPivot: Livestream Highlight Detection via Hierarchical Attention Mechanism [64.70568612993416]
We formulate a new task Livestream Highlight Detection, discuss and analyze the difficulties listed above and propose a novel architecture AntPivot to solve this problem. We construct a fully-annotated dataset AntHighlight to instantiate this task and evaluate the performance of our model.
arXiv Detail & Related papers (2022-06-10T05:58:11Z)
End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)
FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking [92.48078680697311]
Multi-object tracking (MOT) is an important problem in computer vision. We present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet. The approach achieves high accuracy for both detection and tracking.
arXiv Detail & Related papers (2020-04-04T08:18:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.