Rethinking Transformer-based Set Prediction for Object Detection
- URL: http://arxiv.org/abs/2011.10881v2
- Date: Tue, 12 Oct 2021 06:09:03 GMT
- Title: Rethinking Transformer-based Set Prediction for Object Detection
- Authors: Zhiqing Sun, Shengcao Cao, Yiming Yang, Kris Kitani
- Abstract summary: Experimental results show that the proposed methods not only converge much faster than the original DETR, but also significantly outperform DETR and other baselines in terms of detection accuracy.
- Score: 57.7208561353529
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: DETR is a recently proposed Transformer-based method which views object
detection as a set prediction problem and achieves state-of-the-art performance
but demands extra-long training time to converge. In this paper, we investigate
the causes of the optimization difficulty in the training of DETR. Our
examinations reveal several factors contributing to the slow convergence of
DETR, primarily the issues with the Hungarian loss and the Transformer
cross-attention mechanism. To overcome these issues we propose two solutions,
namely, TSP-FCOS (Transformer-based Set Prediction with FCOS) and TSP-RCNN
(Transformer-based Set Prediction with RCNN). Experimental results show that
the proposed methods not only converge much faster than the original DETR, but
also significantly outperform DETR and other baselines in terms of detection
accuracy.
Related papers
- MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection [54.545054873239295]
Deepfakes have recently raised significant trust issues and security concerns among the public.
ViT-based methods take advantage of the expressivity of transformers, achieving superior detection performance.
This work introduces Mixture-of-Experts modules for Face Forgery Detection (MoE-FFD), a generalized yet parameter-efficient ViT-based approach.
arXiv Detail & Related papers (2024-04-12T13:02:08Z) - Cal-DETR: Calibrated Detection Transformer [67.75361289429013]
We propose a mechanism for calibrated detection transformers (Cal-DETR), particularly for Deformable-DETR, UP-DETR and DINO.
We develop an uncertainty-guided logit modulation mechanism that leverages the uncertainty to modulate the class logits.
Results corroborate the effectiveness of Cal-DETR against the competing train-time methods in calibrating both in-domain and out-domain detections.
arXiv Detail & Related papers (2023-11-06T22:13:10Z) - Rank-DETR for High Quality Object Detection [52.82810762221516]
A highly performant object detector requires accurate ranking for the bounding box predictions.
In this work, we introduce a simple and highly performant DETR-based object detector by proposing a series of rank-oriented designs.
arXiv Detail & Related papers (2023-10-13T04:48:32Z) - Object Detection with Transformers: A Review [11.255962936937744]
This paper provides a comprehensive review of 21 recently proposed advancements in the original DETR model.
We conduct a comparative analysis across various detection transformers, evaluating their performance and network architectures.
We hope that this study will ignite further interest among researchers in addressing the existing challenges and exploring the application of transformers in the object detection domain.
arXiv Detail & Related papers (2023-06-07T16:13:38Z) - Align-DETR: Improving DETR with Simple IoU-aware BCE loss [32.13866392998818]
We propose a metric, recall of best-regressed samples, to quantitively evaluate the misalignment problem.
The proposed loss, IA-BCE, guides the training of DETR to build a strong correlation between classification score and localization precision.
To overcome the dramatic decrease in sample quality induced by the sparsity of queries, we introduce a prime sample weighting mechanism.
arXiv Detail & Related papers (2023-04-15T10:24:51Z) - Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches.
We propose a decoder-free fully transformer-based (DFFT) object detector.
DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z) - UP-DETR: Unsupervised Pre-training for Object Detection with
Transformers [11.251593386108189]
We propose a novel pretext task named random query patch detection in Unsupervised Pre-training DETR (UP-DETR)
Specifically, we randomly crop patches from the given image and then feed them as queries to the decoder.
UP-DETR significantly boosts the performance of DETR with faster convergence and higher average precision on object detection, one-shot detection and panoptic segmentation.
arXiv Detail & Related papers (2020-11-18T05:16:11Z) - Deformable DETR: Deformable Transformers for End-to-End Object Detection [41.050320861408046]
DETR suffers from slow convergence and limited feature spatial resolution.
We propose Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference.
Deformable DETR can achieve better performance than DETR with 10 times less training epochs.
arXiv Detail & Related papers (2020-10-08T17:59:21Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.