Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale
Feature Fusion
- URL: http://arxiv.org/abs/2207.14172v1
- Date: Thu, 28 Jul 2022 15:34:29 GMT
- Title: Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale
Feature Fusion
- Authors: Gongjie Zhang, Zhipeng Luo, Yingchen Yu, Jiaxing Huang, Kaiwen Cui,
Shijian Lu, Eric P. Xing
- Abstract summary: The proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection.
DETR suffers from slow training convergence, which hinders its applicability to various detection tasks.
We design Semantic-Aligned-Matching DETR++ to accelerate DETR's convergence and improve detection performance.
- Score: 95.7732308775325
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The recently proposed DEtection TRansformer (DETR) has established a fully
end-to-end paradigm for object detection. However, DETR suffers from slow
training convergence, which hinders its applicability to various detection
tasks. We observe that DETR's slow convergence is largely attributed to the
difficulty in matching object queries to relevant regions due to the unaligned
semantics between object queries and encoded image features. With this
observation, we design Semantic-Aligned-Matching DETR++ (SAM-DETR++) to
accelerate DETR's convergence and improve detection performance. The core of
SAM-DETR++ is a plug-and-play module that projects object queries and encoded
image features into the same feature embedding space, where each object query
can be easily matched to relevant regions with similar semantics. Besides,
SAM-DETR++ searches for multiple representative keypoints and exploits their
features for semantic-aligned matching with enhanced representation capacity.
Furthermore, SAM-DETR++ can effectively fuse multi-scale features in a
coarse-to-fine manner on the basis of the designed semantic-aligned matching.
Extensive experiments show that the proposed SAM-DETR++ achieves superior
convergence speed and competitive detection accuracy. Additionally, as a
plug-and-play method, SAM-DETR++ can complement existing DETR convergence
solutions with even better performance, achieving 44.8% AP with merely 12
training epochs and 49.1% AP with 50 training epochs on COCO val2017 with
ResNet-50. Codes are available at https://github.com/ZhangGongjie/SAM-DETR .
Related papers
- Relation DETR: Exploring Explicit Position Relation Prior for Object Detection [26.03892270020559]
We present a scheme for enhancing the convergence and performance of DETR (DEtection TRansformer)
Our approach, termed Relation-DETR, introduces an encoder to construct position relation embeddings for progressive attention refinement.
Experiments on both generic and task-specific datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-07-16T13:17:07Z) - Decoupled DETR: Spatially Disentangling Localization and Classification
for Improved End-to-End Object Detection [48.429555904690595]
We introduce spatially decoupled DETR, which includes a task-aware query generation module and a disentangled feature learning process.
We demonstrate that our approach achieves a significant improvement in MSCOCO datasets compared to previous work.
arXiv Detail & Related papers (2023-10-24T15:54:11Z) - Semi-DETR: Semi-Supervised Object Detection with Detection Transformers [105.45018934087076]
We analyze the DETR-based framework on semi-supervised object detection (SSOD)
We present Semi-DETR, the first transformer-based end-to-end semi-supervised object detector.
Our method outperforms all state-of-the-art methods by clear margins.
arXiv Detail & Related papers (2023-07-16T16:32:14Z) - Pair DETR: Contrastive Learning Speeds Up DETR Training [0.6491645162078056]
We present a simple approach to address the main problem of DETR, the slow convergence.
We detect an object bounding box as a pair of keypoints, the top-left corner and the center, using two decoders.
Experiments show that Pair DETR can converge at least 10x faster than original DETR and 1.5x faster than Conditional DETR during training.
arXiv Detail & Related papers (2022-10-29T03:02:49Z) - DETRs with Hybrid Matching [21.63116788914251]
One-to-one set matching is a key design for DETR to establish its end-to-end capability.
We propose a hybrid matching scheme that combines the original one-to-one matching branch with an auxiliary one-to-many matching branch during training.
arXiv Detail & Related papers (2022-07-26T17:52:14Z) - Accelerating DETR Convergence via Semantic-Aligned Matching [50.3633635846255]
This paper presents SAM-DETR, a Semantic-Aligned-Matching DETR that greatly accelerates DETR's convergence without sacrificing its accuracy.
It explicitly searches salient points with the most discriminative features for semantic-aligned matching, which further speeds up the convergence and boosts detection accuracy as well.
arXiv Detail & Related papers (2022-03-14T06:50:51Z) - Recurrent Glimpse-based Decoder for Detection with Transformer [85.64521612986456]
We introduce a novel REcurrent Glimpse-based decOder (REGO) in this paper.
In particular, the REGO employs a multi-stage recurrent processing structure to help the attention of DETR gradually focus on foreground objects.
REGO consistently boosts the performance of different DETR detectors by up to 7% relative gain at the same setting of 50 training epochs.
arXiv Detail & Related papers (2021-12-09T00:29:19Z) - Conditional DETR for Fast Training Convergence [76.95358216461524]
We present a conditional cross-attention mechanism for fast DETR training.
Our approach is motivated by that the cross-attention in DETR relies highly on the content embeddings for localizing the four extremities.
We show that conditional DETR converges 6.7x faster for the backbones R50 and R101 and 10x faster for stronger backbones DC5-R50 and DC5-R101.
arXiv Detail & Related papers (2021-08-13T10:07:46Z) - Fast Convergence of DETR with Spatially Modulated Co-Attention [83.19863907905666]
We propose a simple yet effective scheme for improving the Detection Transformer framework, namely Spatially Modulated Co-Attention (SMCA) mechanism.
Our proposed SMCA increases DETR's convergence speed by replacing the original co-attention mechanism in the decoder.
Our fully-fledged SMCA can achieve better performance compared to DETR with a dilated convolution-based backbone.
arXiv Detail & Related papers (2021-08-05T06:53:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.