Efficient Decoder for End-to-End Oriented Object Detection in Remote Sensing Images
- URL: http://arxiv.org/abs/2311.17629v3
- Date: Mon, 19 Aug 2024 04:18:23 GMT
- Title: Efficient Decoder for End-to-End Oriented Object Detection in Remote Sensing Images
- Authors: Jiaqi Zhao, Zeyu Ding, Yong Zhou, Hancheng Zhu, Wenliang Du, Rui Yao, Abdulmotaleb El Saddik,
- Abstract summary: We propose an end-to-end oriented detector equipped with an efficient decoder.
Rotated RoI attention and Selective Distinct Queries (SDQ) are proposed.
Our method achieves state-of-the-art performance on DIOR-R (67.31% mAP), DOTA-v1.5 (67.43% mAP), and DOTA-v2.0 (53.28% mAP) with the ResNet50 backbone.
- Score: 26.37802649901314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object instances in remote sensing images often distribute with multi-orientations, varying scales, and dense distribution. These issues bring challenges to end-to-end oriented object detectors including multi-scale features alignment and a large number of queries. To address these limitations, we propose an end-to-end oriented detector equipped with an efficient decoder, which incorporates two technologies, Rotated RoI attention (RRoI attention) and Selective Distinct Queries (SDQ). Specifically, RRoI attention effectively focuses on oriented regions of interest through a cross-attention mechanism and aligns multi-scale features. SDQ collects queries from intermediate decoder layers and then filters similar queries to obtain distinct queries. The proposed SDQ can facilitate the optimization of one-to-one label assignment, without introducing redundant initial queries or extra auxiliary branches. Extensive experiments on five datasets demonstrate the effectiveness of our method. Notably, our method achieves state-of-the-art performance on DIOR-R (67.31% mAP), DOTA-v1.5 (67.43% mAP), and DOTA-v2.0 (53.28% mAP) with the ResNet50 backbone.
Related papers
- Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery [51.83786195178233]
We design a Knowledge Discovery Network (KDN) to implement the renormalization group theory in terms of efficient feature extraction.
Renormalized connection (RC) on the KDN enables synergistic focusing'' of multi-scale features.
RCs extend the multi-level feature's divide-and-conquer'' mechanism of the FPN-based detectors to a wide range of scale-preferred tasks.
arXiv Detail & Related papers (2024-09-09T13:56:22Z) - Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement [19.277560848076984]
Two-stage selection strategies result in scale bias and redundancy due to mismatch between selected queries and objects.
We propose hierarchical salience filtering refinement, which performs transformer encoding only on filtered discriminative queries.
The proposed Salience DETR achieves significant improvements of +4.0% AP, +0.2% AP, +4.4% AP on three challenging task-specific detection datasets.
arXiv Detail & Related papers (2024-03-24T13:01:57Z) - PETDet: Proposal Enhancement for Two-Stage Fine-Grained Object Detection [26.843891792018447]
We present PETDet (Proposal Enhancement for Two-stage fine-grained object detection) to better handle the sub-tasks in two-stage FGOD methods.
An anchor-free Quality Oriented Proposal Network (QOPN) is proposed with dynamic label assignment and attention-based decomposition.
A novel Adaptive Recognition Loss (ARL) offers guidance for the R-CNN head to focus on high-quality proposals.
arXiv Detail & Related papers (2023-12-16T18:04:56Z) - Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - Adaptive Rotated Convolution for Rotated Object Detection [96.94590550217718]
We present Adaptive Rotated Convolution (ARC) module to handle rotated object detection problem.
In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images.
The proposed approach achieves state-of-the-art performance on the DOTA dataset with 81.77% mAP.
arXiv Detail & Related papers (2023-03-14T11:53:12Z) - CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented
Object Detection in Remote Sensing Images [0.9462808515258465]
In this paper, we discuss the role of discriminative features in object detection.
We then propose a Critical Feature Capturing Network (CFC-Net) to improve detection accuracy.
We show that our method achieves superior detection performance compared with many state-of-the-art approaches.
arXiv Detail & Related papers (2021-01-18T02:31:09Z) - MRDet: A Multi-Head Network for Accurate Oriented Object Detection in
Aerial Images [51.227489316673484]
We propose an arbitrary-oriented region proposal network (AO-RPN) to generate oriented proposals transformed from horizontal anchors.
To obtain accurate bounding boxes, we decouple the detection task into multiple subtasks and propose a multi-head network.
Each head is specially designed to learn the features optimal for the corresponding task, which allows our network to detect objects accurately.
arXiv Detail & Related papers (2020-12-24T06:36:48Z) - AFD-Net: Adaptive Fully-Dual Network for Few-Shot Object Detection [8.39479809973967]
Few-shot object detection (FSOD) aims at learning a detector that can fast adapt to previously unseen objects with scarce examples.
Existing methods solve this problem by performing subtasks of classification and localization utilizing a shared component.
We present that a general few-shot detector should consider the explicit decomposition of two subtasks, as well as leveraging information from both of them to enhance feature representations.
arXiv Detail & Related papers (2020-11-30T10:21:32Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - FairMOT: On the Fairness of Detection and Re-Identification in Multiple
Object Tracking [92.48078680697311]
Multi-object tracking (MOT) is an important problem in computer vision.
We present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet.
The approach achieves high accuracy for both detection and tracking.
arXiv Detail & Related papers (2020-04-04T08:18:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.