Bridging the Performance Gap between DETR and R-CNN for Graphical Object
Detection in Document Images
- URL: http://arxiv.org/abs/2306.13526v1
- Date: Fri, 23 Jun 2023 14:46:03 GMT
- Title: Bridging the Performance Gap between DETR and R-CNN for Graphical Object
Detection in Document Images
- Authors: Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker, Marcus Liwicki
and Muhammad Zeshan Afzal
- Abstract summary: This paper takes an important step in bridging the performance gap between DETR and R-CNN for graphical object detection.
We modify object queries in different ways, using points, anchor boxes and adding positive and negative noise to the anchors to boost performance.
We evaluate our approach on the four graphical datasets: PubTables, TableBank, NTable and PubLaynet.
- Score: 11.648151981111436
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper takes an important step in bridging the performance gap between
DETR and R-CNN for graphical object detection. Existing graphical object
detection approaches have enjoyed recent enhancements in CNN-based object
detection methods, achieving remarkable progress. Recently, Transformer-based
detectors have considerably boosted the generic object detection performance,
eliminating the need for hand-crafted features or post-processing steps such as
Non-Maximum Suppression (NMS) using object queries. However, the effectiveness
of such enhanced transformer-based detection algorithms has yet to be verified
for the problem of graphical object detection. Essentially, inspired by the
latest advancements in the DETR, we employ the existing detection transformer
with few modifications for graphical object detection. We modify object queries
in different ways, using points, anchor boxes and adding positive and negative
noise to the anchors to boost performance. These modifications allow for better
handling of objects with varying sizes and aspect ratios, more robustness to
small variations in object positions and sizes, and improved image
discrimination between objects and non-objects. We evaluate our approach on the
four graphical datasets: PubTables, TableBank, NTable and PubLaynet. Upon
integrating query modifications in the DETR, we outperform prior works and
achieve new state-of-the-art results with the mAP of 96.9\%, 95.7\% and 99.3\%
on TableBank, PubLaynet, PubTables, respectively. The results from extensive
ablations show that transformer-based methods are more effective for document
analysis analogous to other applications. We hope this study draws more
attention to the research of using detection transformers in document image
analysis.
Related papers
- Small Object Detection by DETR via Information Augmentation and Adaptive
Feature Fusion [4.9860018132769985]
The RT-DETR model performs well in real-time object detection, but performs poorly in small object detection accuracy.
We propose an adaptive feature fusion algorithm that assigns learnable parameters to each feature map from different levels.
This enhances the model's ability to capture object features at different scales, thereby improving the accuracy of detecting small objects.
arXiv Detail & Related papers (2024-01-16T00:01:23Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - Contrastive Learning for Multi-Object Tracking with Transformers [79.61791059432558]
We show how DETR can be turned into a MOT model by employing an instance-level contrastive loss.
Our training scheme learns object appearances while preserving detection capabilities and with little overhead.
Its performance surpasses the previous state-of-the-art by +2.6 mMOTA on the challenging BDD100K dataset.
arXiv Detail & Related papers (2023-11-14T10:07:52Z) - Transformers in Small Object Detection: A Benchmark and Survey of
State-of-the-Art [34.077422623505804]
Transformers consistently outperformed well-established CNN-based detectors in almost every video or image dataset.
Small objects have been identified as one of the most challenging object types in detection frameworks.
This survey presents a taxonomy of over 60 research studies on developed transformers for the task of small object detection.
arXiv Detail & Related papers (2023-09-10T00:08:29Z) - Object Detection with Transformers: A Review [11.255962936937744]
This paper provides a comprehensive review of 21 recently proposed advancements in the original DETR model.
We conduct a comparative analysis across various detection transformers, evaluating their performance and network architectures.
We hope that this study will ignite further interest among researchers in addressing the existing challenges and exploring the application of transformers in the object detection domain.
arXiv Detail & Related papers (2023-06-07T16:13:38Z) - Transformation-Invariant Network for Few-Shot Object Detection in Remote
Sensing Images [15.251042369061024]
Few-shot object detection (FSOD) relies on a large amount of labeled data for training.
Scale and orientation variations of objects in remote sensing images pose significant challenges to existing FSOD methods.
We propose integrating a feature pyramid network and utilizing prototype features to enhance query features.
arXiv Detail & Related papers (2023-03-13T02:21:38Z) - ObjectFormer for Image Manipulation Detection and Localization [118.89882740099137]
We propose ObjectFormer to detect and localize image manipulations.
We extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings.
We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-03-28T12:27:34Z) - Decoupled Adaptation for Cross-Domain Object Detection [69.5852335091519]
Cross-domain object detection is more challenging than object classification.
D-adapt achieves a state-of-the-art results on four cross-domain object detection tasks.
arXiv Detail & Related papers (2021-10-06T08:43:59Z) - You Better Look Twice: a new perspective for designing accurate
detectors with reduced computations [56.34005280792013]
BLT-net is a new low-computation two-stage object detection architecture.
It reduces computations by separating objects from background using a very lite first-stage.
Resulting image proposals are then processed in the second-stage by a highly accurate model.
arXiv Detail & Related papers (2021-07-21T12:39:51Z) - DA-DETR: Domain Adaptive Detection Transformer with Information Fusion [53.25930448542148]
DA-DETR is a domain adaptive object detection transformer that introduces information fusion for effective transfer from a labeled source domain to an unlabeled target domain.
We introduce a novel CNN-Transformer Blender (CTBlender) that fuses the CNN features and Transformer features ingeniously for effective feature alignment and knowledge transfer across domains.
CTBlender employs the Transformer features to modulate the CNN features across multiple scales where the high-level semantic information and the low-level spatial information are fused for accurate object identification and localization.
arXiv Detail & Related papers (2021-03-31T13:55:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.