Align-DETR: Improving DETR with Simple IoU-aware BCE loss
- URL: http://arxiv.org/abs/2304.07527v1
- Date: Sat, 15 Apr 2023 10:24:51 GMT
- Title: Align-DETR: Improving DETR with Simple IoU-aware BCE loss
- Authors: Zhi Cai, Songtao Liu, Guodong Wang, Zheng Ge, Xiangyu Zhang and Di
Huang
- Abstract summary: We propose a metric, recall of best-regressed samples, to quantitively evaluate the misalignment problem.
The proposed loss, IA-BCE, guides the training of DETR to build a strong correlation between classification score and localization precision.
To overcome the dramatic decrease in sample quality induced by the sparsity of queries, we introduce a prime sample weighting mechanism.
- Score: 32.13866392998818
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: DETR has set up a simple end-to-end pipeline for object detection by
formulating this task as a set prediction problem, showing promising potential.
However, despite the significant progress in improving DETR, this paper
identifies a problem of misalignment in the output distribution, which prevents
the best-regressed samples from being assigned with high confidence, hindering
the model's accuracy. We propose a metric, recall of best-regressed samples, to
quantitively evaluate the misalignment problem. Observing its importance, we
propose a novel Align-DETR that incorporates a localization precision-aware
classification loss in optimization. The proposed loss, IA-BCE, guides the
training of DETR to build a strong correlation between classification score and
localization precision. We also adopt the mixed-matching strategy, to
facilitate DETR-based detectors with faster training convergence while keeping
an end-to-end scheme. Moreover, to overcome the dramatic decrease in sample
quality induced by the sparsity of queries, we introduce a prime sample
weighting mechanism to suppress the interference of unimportant samples.
Extensive experiments are conducted with very competitive results reported. In
particular, it delivers a 46 (+3.8)% AP on the DAB-DETR baseline with the
ResNet-50 backbone and reaches a new SOTA performance of 50.2% AP in the 1x
setting on the COCO validation set when employing the strong baseline DINO. Our
code is available at https://github.com/FelixCaae/AlignDETR.
Related papers
- Relation DETR: Exploring Explicit Position Relation Prior for Object Detection [26.03892270020559]
We present a scheme for enhancing the convergence and performance of DETR (DEtection TRansformer)
Our approach, termed Relation-DETR, introduces an encoder to construct position relation embeddings for progressive attention refinement.
Experiments on both generic and task-specific datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-07-16T13:17:07Z) - Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement [19.277560848076984]
Two-stage selection strategies result in scale bias and redundancy due to mismatch between selected queries and objects.
We propose hierarchical salience filtering refinement, which performs transformer encoding only on filtered discriminative queries.
The proposed Salience DETR achieves significant improvements of +4.0% AP, +0.2% AP, +4.4% AP on three challenging task-specific detection datasets.
arXiv Detail & Related papers (2024-03-24T13:01:57Z) - Theoretically Achieving Continuous Representation of Oriented Bounding Boxes [64.15627958879053]
This paper endeavors to completely solve the issue of discontinuity in Oriented Bounding Box representation.
We propose a novel representation method called Continuous OBB (COBB) which can be readily integrated into existing detectors.
For fairness and transparency of experiments, we have developed a modularized benchmark based on the open-source deep learning framework Jittor's detection toolbox JDet for OOD evaluation.
arXiv Detail & Related papers (2024-02-29T09:27:40Z) - End-to-End Lane detection with One-to-Several Transformer [6.79236957488334]
O2SFormer converges 12.5x faster than DETR for the ResNet18 backbone.
O2SFormer with ResNet50 backbone achieves 77.83% F1 score on CULane dataset, outperforming existing Transformer-based and CNN-based detectors.
arXiv Detail & Related papers (2023-05-01T06:07:11Z) - Detection Transformer with Stable Matching [48.963171068785435]
We show that the most important design is to use and only use positional metrics to supervise classification scores of positive examples.
Under the principle, we propose two simple yet effective modifications by integrating positional metrics to DETR's classification loss and matching cost.
We achieve 50.4 and 51.5 AP on the COCO detection benchmark using ResNet-50 backbones under 12 epochs and 24 epochs training settings.
arXiv Detail & Related papers (2023-04-10T17:55:37Z) - DETRs with Hybrid Matching [21.63116788914251]
One-to-one set matching is a key design for DETR to establish its end-to-end capability.
We propose a hybrid matching scheme that combines the original one-to-one matching branch with an auxiliary one-to-many matching branch during training.
arXiv Detail & Related papers (2022-07-26T17:52:14Z) - Accelerating DETR Convergence via Semantic-Aligned Matching [50.3633635846255]
This paper presents SAM-DETR, a Semantic-Aligned-Matching DETR that greatly accelerates DETR's convergence without sacrificing its accuracy.
It explicitly searches salient points with the most discriminative features for semantic-aligned matching, which further speeds up the convergence and boosts detection accuracy as well.
arXiv Detail & Related papers (2022-03-14T06:50:51Z) - Disentangle Your Dense Object Detector [82.22771433419727]
Deep learning-based dense object detectors have achieved great success in the past few years and have been applied to numerous multimedia applications such as video understanding.
However, the current training pipeline for dense detectors is compromised to lots of conjunctions that may not hold.
We propose Disentangled Dense Object Detector (DDOD), in which simple and effective disentanglement mechanisms are designed and integrated into the current state-of-the-art detectors.
arXiv Detail & Related papers (2021-07-07T00:52:16Z) - Higher Performance Visual Tracking with Dual-Modal Localization [106.91097443275035]
Visual Object Tracking (VOT) has synchronous needs for both robustness and accuracy.
We propose a dual-modal framework for target localization, consisting of robust localization suppressingors via ONR and the accurate localization attending to the target center precisely via OFC.
arXiv Detail & Related papers (2021-03-18T08:47:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.