Object Detection with Transformers: A Review
- URL: http://arxiv.org/abs/2306.04670v3
- Date: Mon, 10 Jul 2023 16:41:15 GMT
- Title: Object Detection with Transformers: A Review
- Authors: Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker and Muhammad
Zeshan Afzal
- Abstract summary: This paper provides a comprehensive review of 21 recently proposed advancements in the original DETR model.
We conduct a comparative analysis across various detection transformers, evaluating their performance and network architectures.
We hope that this study will ignite further interest among researchers in addressing the existing challenges and exploring the application of transformers in the object detection domain.
- Score: 11.255962936937744
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The astounding performance of transformers in natural language processing
(NLP) has motivated researchers to explore their applications in computer
vision tasks. DEtection TRansformer (DETR) introduces transformers to object
detection tasks by reframing detection as a set prediction problem.
Consequently, eliminating the need for proposal generation and post-processing
steps. Initially, despite competitive performance, DETR suffered from slow
training convergence and ineffective detection of smaller objects. However,
numerous improvements are proposed to address these issues, leading to
substantial improvements in DETR and enabling it to exhibit state-of-the-art
performance. To our knowledge, this is the first paper to provide a
comprehensive review of 21 recently proposed advancements in the original DETR
model. We dive into both the foundational modules of DETR and its recent
enhancements, such as modifications to the backbone structure, query design
strategies, and refinements to attention mechanisms. Moreover, we conduct a
comparative analysis across various detection transformers, evaluating their
performance and network architectures. We hope that this study will ignite
further interest among researchers in addressing the existing challenges and
exploring the application of transformers in the object detection domain.
Readers interested in the ongoing developments in detection transformers can
refer to our website at:
https://github.com/mindgarage-shan/trans_object_detection_survey
Related papers
- Transformers in Small Object Detection: A Benchmark and Survey of
State-of-the-Art [34.077422623505804]
Transformers consistently outperformed well-established CNN-based detectors in almost every video or image dataset.
Small objects have been identified as one of the most challenging object types in detection frameworks.
This survey presents a taxonomy of over 60 research studies on developed transformers for the task of small object detection.
arXiv Detail & Related papers (2023-09-10T00:08:29Z) - Bridging the Performance Gap between DETR and R-CNN for Graphical Object
Detection in Document Images [11.648151981111436]
This paper takes an important step in bridging the performance gap between DETR and R-CNN for graphical object detection.
We modify object queries in different ways, using points, anchor boxes and adding positive and negative noise to the anchors to boost performance.
We evaluate our approach on the four graphical datasets: PubTables, TableBank, NTable and PubLaynet.
arXiv Detail & Related papers (2023-06-23T14:46:03Z) - A Comprehensive Survey on Applications of Transformers for Deep Learning
Tasks [60.38369406877899]
Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data.
transformer models excel in handling long dependencies between input sequence elements and enable parallel processing.
Our survey encompasses the identification of the top five application domains for transformer-based models.
arXiv Detail & Related papers (2023-06-11T23:13:51Z) - Hierarchical Point Attention for Indoor 3D Object Detection [111.04397308495618]
This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors.
First, we propose Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning.
Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals.
arXiv Detail & Related papers (2023-01-06T18:52:12Z) - Vision Transformers for Action Recognition: A Survey [41.69370782177517]
Vision transformers are emerging as a powerful tool to solve computer vision problems.
Recent techniques have proven the efficacy of transformers beyond the image domain to solve numerous video-related tasks.
Human action recognition is receiving special attention from the research community due to its widespread applications.
arXiv Detail & Related papers (2022-09-13T02:57:05Z) - Exploring Structure-aware Transformer over Interaction Proposals for
Human-Object Interaction Detection [119.93025368028083]
We design a novel Transformer-style Human-Object Interaction (HOI) detector, i.e., Structure-aware Transformer over Interaction Proposals (STIP)
STIP decomposes the process of HOI set prediction into two subsequent phases, i.e., an interaction proposal generation is first performed, and then followed by transforming the non-parametric interaction proposals into HOI predictions via a structure-aware Transformer.
The structure-aware Transformer upgrades vanilla Transformer by encoding additionally the holistically semantic structure among interaction proposals as well as the locally spatial structure of human/object within each interaction proposal, so as to strengthen HOI
arXiv Detail & Related papers (2022-06-13T16:21:08Z) - Miti-DETR: Object Detection based on Transformers with Mitigatory
Self-Attention Convergence [17.854940064699985]
We propose a transformer architecture with a mitigatory self-attention mechanism.
Miti-DETR reserves the inputs of each single attention layer to the outputs of that layer so that the "non-attention" information has participated in attention propagation.
Miti-DETR significantly enhances the average detection precision and convergence speed towards existing DETR-based models.
arXiv Detail & Related papers (2021-12-26T03:23:59Z) - ViDT: An Efficient and Effective Fully Transformer-based Object Detector [97.71746903042968]
Detection transformers are the first fully end-to-end learning systems for object detection.
vision transformers are the first fully transformer-based architecture for image classification.
In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector.
arXiv Detail & Related papers (2021-10-08T06:32:05Z) - DA-DETR: Domain Adaptive Detection Transformer with Information Fusion [53.25930448542148]
DA-DETR is a domain adaptive object detection transformer that introduces information fusion for effective transfer from a labeled source domain to an unlabeled target domain.
We introduce a novel CNN-Transformer Blender (CTBlender) that fuses the CNN features and Transformer features ingeniously for effective feature alignment and knowledge transfer across domains.
CTBlender employs the Transformer features to modulate the CNN features across multiple scales where the high-level semantic information and the low-level spatial information are fused for accurate object identification and localization.
arXiv Detail & Related papers (2021-03-31T13:55:56Z) - Rethinking Transformer-based Set Prediction for Object Detection [57.7208561353529]
Experimental results show that the proposed methods not only converge much faster than the original DETR, but also significantly outperform DETR and other baselines in terms of detection accuracy.
arXiv Detail & Related papers (2020-11-21T21:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.