SAP-DETR: Bridging the Gap Between Salient Points and Queries-Based
Transformer Detector for Fast Model Convergency
- URL: http://arxiv.org/abs/2211.02006v1
- Date: Thu, 3 Nov 2022 17:20:55 GMT
- Title: SAP-DETR: Bridging the Gap Between Salient Points and Queries-Based
Transformer Detector for Fast Model Convergency
- Authors: Yang Liu, Yao Zhang, Yixin Wang, Yang Zhang, Jiang Tian, Zhongchao
Shi, Jianping Fan, Zhiqiang He
- Abstract summary: DETR-based approaches apply central-concept spatial prior to accelerate Transformer detector convergency.
We propose SAlient Point-based DETR (SAP-DETR) by treating object detection as a transformation from salient points to instance objects.
Our experiments have demonstrated that SAP-DETR 1.4 times achieves convergency speed with competitive performance.
- Score: 40.04140037952051
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the dominant DETR-based approaches apply central-concept spatial
prior to accelerate Transformer detector convergency. These methods gradually
refine the reference points to the center of target objects and imbue object
queries with the updated central reference information for spatially
conditional attention. However, centralizing reference points may severely
deteriorate queries' saliency and confuse detectors due to the indiscriminative
spatial prior. To bridge the gap between the reference points of salient
queries and Transformer detectors, we propose SAlient Point-based DETR
(SAP-DETR) by treating object detection as a transformation from salient points
to instance objects. In SAP-DETR, we explicitly initialize a query-specific
reference point for each object query, gradually aggregate them into an
instance object, and then predict the distance from each side of the bounding
box to these points. By rapidly attending to query-specific reference region
and other conditional extreme regions from the image features, SAP-DETR can
effectively bridge the gap between the salient point and the query-based
Transformer detector with a significant convergency speed. Our extensive
experiments have demonstrated that SAP-DETR achieves 1.4 times convergency
speed with competitive performance. Under the standard training scheme,
SAP-DETR stably promotes the SOTA approaches by 1.0 AP. Based on ResNet-DC-101,
SAP-DETR achieves 46.9 AP.
Related papers
- OrientedFormer: An End-to-End Transformer-Based Oriented Object Detector in Remote Sensing Images [26.37802649901314]
Oriented object detection in remote sensing images is a challenging task due to objects being distributed in multi-orientation.
We propose an end-to-end transformer-based oriented object detector consisting of three dedicated modules to address these issues.
Compared with previous end-to-end detectors, the OrientedFormer gains 1.16 and 1.21 AP$_50$ on DIOR-R and DOTA-v1.0 respectively, while reducing training epochs from 3$times$ to 1$times$.
arXiv Detail & Related papers (2024-09-29T10:36:33Z) - SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network [32.7318504162588]
Hyperspectral target detection (HTD) aims to identify materials based on spectral information in hyperspectral imagery and can detect point targets.
Existing HTD methods are developed based on per-pixel binary classification, which limits the feature representation capability for point targets.
We propose the first specialized network for hyperspectral multi-class point object detection, SpecDETR.
We develop a simulated hyperSpectral Point Object Detection benchmark termed SPOD, and for the first time, evaluate and compare the performance of current object detection networks and HTD methods on hyperspectral multi-class point object detection.
arXiv Detail & Related papers (2024-05-16T14:45:06Z) - Cascade-DETR: Delving into High-Quality Universal Object Detection [99.62131881419143]
We introduce Cascade-DETR for high-quality universal object detection.
We propose the Cascade Attention layer, which explicitly integrates object-centric information into the detection decoder.
Lastly, we introduce a universal object detection benchmark, UDB10, that contains 10 datasets from diverse domains.
arXiv Detail & Related papers (2023-07-20T17:11:20Z) - Time-rEversed diffusioN tEnsor Transformer: A new TENET of Few-Shot
Object Detection [35.54153749138406]
We propose a Time-rEversed diffusioN tEnsor Transformer (TENET) that captures multi-way feature occurrences that are highly discriminative.
We also propose a Transformer Relation Head (TRH) equipped with higher-order representations, which encodes correlations between query regions and the entire support set.
Our model achieves state-of-the-art results on PASCAL VOC, FSOD, and COCO.
arXiv Detail & Related papers (2022-10-30T17:40:12Z) - Pair DETR: Contrastive Learning Speeds Up DETR Training [0.6491645162078056]
We present a simple approach to address the main problem of DETR, the slow convergence.
We detect an object bounding box as a pair of keypoints, the top-left corner and the center, using two decoders.
Experiments show that Pair DETR can converge at least 10x faster than original DETR and 1.5x faster than Conditional DETR during training.
arXiv Detail & Related papers (2022-10-29T03:02:49Z) - Robust Change Detection Based on Neural Descriptor Fields [53.111397800478294]
We develop an object-level online change detection approach that is robust to partially overlapping observations and noisy localization results.
By associating objects via shape code similarity and comparing local object-neighbor spatial layout, our proposed approach demonstrates robustness to low observation overlap and localization noises.
arXiv Detail & Related papers (2022-08-01T17:45:36Z) - Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale
Feature Fusion [95.7732308775325]
The proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection.
DETR suffers from slow training convergence, which hinders its applicability to various detection tasks.
We design Semantic-Aligned-Matching DETR++ to accelerate DETR's convergence and improve detection performance.
arXiv Detail & Related papers (2022-07-28T15:34:29Z) - Oriented Object Detection with Transformer [51.634913687632604]
We implement Oriented Object DEtection with TRansformer ($bf O2DETR$) based on an end-to-end network.
We design a simple but highly efficient encoder for Transformer by replacing the attention mechanism with depthwise separable convolution.
Our $rm O2DETR$ can be another new benchmark in the field of oriented object detection, which achieves up to 3.85 mAP improvement over Faster R-CNN and RetinaNet.
arXiv Detail & Related papers (2021-06-06T14:57:17Z) - DA-DETR: Domain Adaptive Detection Transformer with Information Fusion [53.25930448542148]
DA-DETR is a domain adaptive object detection transformer that introduces information fusion for effective transfer from a labeled source domain to an unlabeled target domain.
We introduce a novel CNN-Transformer Blender (CTBlender) that fuses the CNN features and Transformer features ingeniously for effective feature alignment and knowledge transfer across domains.
CTBlender employs the Transformer features to modulate the CNN features across multiple scales where the high-level semantic information and the low-level spatial information are fused for accurate object identification and localization.
arXiv Detail & Related papers (2021-03-31T13:55:56Z) - Spatio-temporal Tubelet Feature Aggregation and Object Linking in Videos [2.4923006485141284]
paper addresses the problem of how to exploittemporal information in available videos to improve the object classification.
We propose a two stage object detector called FANet based on short-term detection aggregation feature.
arXiv Detail & Related papers (2020-04-01T13:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.