Related papers: Detection Transformer with Stable Matching

Detection Transformer with Stable Matching

URL: http://arxiv.org/abs/2304.04742v1
Date: Mon, 10 Apr 2023 17:55:37 GMT
Title: Detection Transformer with Stable Matching
Authors: Shilong Liu, Tianhe Ren, Jiayu Chen, Zhaoyang Zeng, Hao Zhang, Feng Li, Hongyang Li, Jun Huang, Hang Su, Jun Zhu, Lei Zhang
Abstract summary: We show that the most important design is to use and only use positional metrics to supervise classification scores of positive examples. Under the principle, we propose two simple yet effective modifications by integrating positional metrics to DETR's classification loss and matching cost. We achieve 50.4 and 51.5 AP on the COCO detection benchmark using ResNet-50 backbones under 12 epochs and 24 epochs training settings.
Score: 48.963171068785435
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper is concerned with the matching stability problem across different decoder layers in DEtection TRansformers (DETR). We point out that the unstable matching in DETR is caused by a multi-optimization path problem, which is highlighted by the one-to-one matching design in DETR. To address this problem, we show that the most important design is to use and only use positional metrics (like IOU) to supervise classification scores of positive examples. Under the principle, we propose two simple yet effective modifications by integrating positional metrics to DETR's classification loss and matching cost, named position-supervised loss and position-modulated cost. We verify our methods on several DETR variants. Our methods show consistent improvements over baselines. By integrating our methods with DINO, we achieve 50.4 and 51.5 AP on the COCO detection benchmark using ResNet-50 backbones under 12 epochs and 24 epochs training settings, achieving a new record under the same setting. We achieve 63.8 AP on COCO detection test-dev with a Swin-Large backbone. Our code will be made available at https://github.com/IDEA-Research/Stable-DINO.

Related papers

Rank-DETR for High Quality Object Detection [52.82810762221516]
A highly performant object detector requires accurate ranking for the bounding box predictions. In this work, we introduce a simple and highly performant DETR-based object detector by proposing a series of rank-oriented designs.
arXiv Detail & Related papers (2023-10-13T04:48:32Z)
Semi-DETR: Semi-Supervised Object Detection with Detection Transformers [105.45018934087076]
We analyze the DETR-based framework on semi-supervised object detection (SSOD) We present Semi-DETR, the first transformer-based end-to-end semi-supervised object detector. Our method outperforms all state-of-the-art methods by clear margins.
arXiv Detail & Related papers (2023-07-16T16:32:14Z)
End-to-End Lane detection with One-to-Several Transformer [6.79236957488334]
O2SFormer converges 12.5x faster than DETR for the ResNet18 backbone. O2SFormer with ResNet50 backbone achieves 77.83% F1 score on CULane dataset, outperforming existing Transformer-based and CNN-based detectors.
arXiv Detail & Related papers (2023-05-01T06:07:11Z)
Align-DETR: Improving DETR with Simple IoU-aware BCE loss [32.13866392998818]
We propose a metric, recall of best-regressed samples, to quantitively evaluate the misalignment problem. The proposed loss, IA-BCE, guides the training of DETR to build a strong correlation between classification score and localization precision. To overcome the dramatic decrease in sample quality induced by the sparsity of queries, we introduce a prime sample weighting mechanism.
arXiv Detail & Related papers (2023-04-15T10:24:51Z)
Consistent-Teacher: Towards Reducing Inconsistent Pseudo-targets in Semi-supervised Object Detection [28.40887130075552]
Pseudo-targets undermine the training of an accurate detector. It injects noise into the student's training, leading to severe overfitting problems. We propose a systematic solution, termed ConsistentTeacher, to reduce the inconsistency.
arXiv Detail & Related papers (2022-09-04T10:21:02Z)
Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion [95.7732308775325]
The proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection. DETR suffers from slow training convergence, which hinders its applicability to various detection tasks. We design Semantic-Aligned-Matching DETR++ to accelerate DETR's convergence and improve detection performance.
arXiv Detail & Related papers (2022-07-28T15:34:29Z)
DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR [37.61768722607528]
We present a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) This new formulation directly uses box coordinates as queries in Transformer decoders and dynamically updates them layer-by-layer. It leads to the best performance on MS-COCO benchmark among the DETR-like detection models under the same setting.
arXiv Detail & Related papers (2022-01-28T18:51:09Z)
Fast Convergence of DETR with Spatially Modulated Co-Attention [83.19863907905666]
We propose a simple yet effective scheme for improving the Detection Transformer framework, namely Spatially Modulated Co-Attention (SMCA) mechanism. Our proposed SMCA increases DETR's convergence speed by replacing the original co-attention mechanism in the decoder. Our fully-fledged SMCA can achieve better performance compared to DETR with a dilated convolution-based backbone.
arXiv Detail & Related papers (2021-08-05T06:53:19Z)
End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.