Hausdorff Distance Matching with Adaptive Query Denoising for Rotated
Detection Transformer
- URL: http://arxiv.org/abs/2305.07598v4
- Date: Wed, 29 Nov 2023 08:56:29 GMT
- Title: Hausdorff Distance Matching with Adaptive Query Denoising for Rotated
Detection Transformer
- Authors: Hakjin Lee, Minki Song, Jamyoung Koo, Junghoon Seo
- Abstract summary: The application of DETR in detecting rotated objects has demonstrated suboptimal performance relative to established oriented object detectors.
We introduce a Hausdorff distance-based cost for Hungarian matching, which more accurately quantifies the discrepancy between predictions and ground truths.
We propose an adaptive query denoising technique, employing Hungarian matching to selectively filter out superfluous noised queries that no longer contribute to model improvement.
- Score: 4.519754139322585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Detection Transformer (DETR) has emerged as a pivotal role in object
detection tasks, setting new performance benchmarks due to its end-to-end
design and scalability. Despite its advancements, the application of DETR in
detecting rotated objects has demonstrated suboptimal performance relative to
established oriented object detectors. Our analysis identifies a key
limitation: the L1 cost used in Hungarian Matching leads to duplicate
predictions due to the square-like problem in oriented object detection,
thereby obstructing the training process of the detector. We introduce a
Hausdorff distance-based cost for Hungarian matching, which more accurately
quantifies the discrepancy between predictions and ground truths. Moreover, we
note that a static denoising approach hampers the training of rotated DETR,
particularly when the detector's predictions surpass the quality of noised
ground truths. We propose an adaptive query denoising technique, employing
Hungarian matching to selectively filter out superfluous noised queries that no
longer contribute to model improvement. Our proposed modifications to DETR have
resulted in superior performance, surpassing previous rotated DETR models and
other alternatives. This is evidenced by our model's state-of-the-art
achievements in benchmarks such as DOTA-v1.0/v1.5/v2.0, and DIOR-R.
Related papers
- Relation DETR: Exploring Explicit Position Relation Prior for Object Detection [26.03892270020559]
We present a scheme for enhancing the convergence and performance of DETR (DEtection TRansformer)
Our approach, termed Relation-DETR, introduces an encoder to construct position relation embeddings for progressive attention refinement.
Experiments on both generic and task-specific datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-07-16T13:17:07Z) - TD^2-Net: Toward Denoising and Debiasing for Dynamic Scene Graph
Generation [76.24766055944554]
We introduce a network named TD$2$-Net that aims at denoising and debiasing for dynamic SGG.
TD$2$-Net outperforms the second-best competitors by 12.7 % on mean-Recall@10 for predicate classification.
arXiv Detail & Related papers (2024-01-23T04:17:42Z) - Rank-DETR for High Quality Object Detection [52.82810762221516]
A highly performant object detector requires accurate ranking for the bounding box predictions.
In this work, we introduce a simple and highly performant DETR-based object detector by proposing a series of rank-oriented designs.
arXiv Detail & Related papers (2023-10-13T04:48:32Z) - Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches.
We propose a decoder-free fully transformer-based (DFFT) object detector.
DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z) - Analyzing and Mitigating Interference in Neural Architecture Search [96.60805562853153]
We investigate the interference issue by sampling different child models and calculating the gradient similarity of shared operators.
Inspired by these two observations, we propose two approaches to mitigate the interference.
Our searched architecture outperforms RoBERTa$_rm base$ by 1.1 and 0.6 scores and ELECTRA$_rm base$ by 1.6 and 1.1 scores on the dev and test set of GLUE benchmark.
arXiv Detail & Related papers (2021-08-29T11:07:46Z) - Which to Match? Selecting Consistent GT-Proposal Assignment for
Pedestrian Detection [23.92066492219922]
The fixed Intersection over Union (IoU) based assignment-regression manner still limits their performance.
We introduce one geometric sensitive search algorithm as a new assignment and regression metric.
Specifically, we boost the MR-FPPI under R$_75$ by 8.8% on Citypersons dataset.
arXiv Detail & Related papers (2021-03-18T08:54:51Z) - SADet: Learning An Efficient and Accurate Pedestrian Detector [68.66857832440897]
This paper proposes a series of systematic optimization strategies for the detection pipeline of one-stage detector.
It forms a single shot anchor-based detector (SADet) for efficient and accurate pedestrian detection.
Though structurally simple, it presents state-of-the-art result and real-time speed of $20$ FPS for VGA-resolution images.
arXiv Detail & Related papers (2020-07-26T12:32:38Z) - A Systematic Evaluation of Object Detection Networks for Scientific
Plots [17.882932963813985]
We train and compare the accuracy of various SOTA object detection networks on the PlotQA dataset.
At the standard IOU setting of 0.5, most networks perform well with mAP scores greater than 80% in detecting the relatively simple objects in plots.
However, the performance drops drastically when evaluated at a stricter IOU of 0.9 with the best model giving a mAP of 35.70%.
arXiv Detail & Related papers (2020-07-05T05:30:53Z) - Detection in Crowded Scenes: One Proposal, Multiple Predictions [79.28850977968833]
We propose a proposal-based object detector, aiming at detecting highly-overlapped instances in crowded scenes.
The key of our approach is to let each proposal predict a set of correlated instances rather than a single one in previous proposal-based frameworks.
Our detector can obtain 4.9% AP gains on challenging CrowdHuman dataset and 1.0% $textMR-2$ improvements on CityPersons dataset.
arXiv Detail & Related papers (2020-03-20T09:48:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.