Improving Detection in Aerial Images by Capturing Inter-Object Relationships
- URL: http://arxiv.org/abs/2404.04140v1
- Date: Fri, 5 Apr 2024 14:39:13 GMT
- Title: Improving Detection in Aerial Images by Capturing Inter-Object Relationships
- Authors: Botao Ren, Botian Xu, Yifan Pu, Jingyi Wang, Zhidong Deng,
- Abstract summary: spatial distribution of objects in a scene exhibits meaningful patterns governed by their semantic relationships.
In this work, we introduce a transformer-based approach to capture these inter-object relationships.
Specific spatial and geometric relations are incorporated into the attention weights and adaptively modulated and regularized.
- Score: 7.729994373861261
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many image domains, the spatial distribution of objects in a scene exhibits meaningful patterns governed by their semantic relationships. In most modern detection pipelines, however, the detection proposals are processed independently, overlooking the underlying relationships between objects. In this work, we introduce a transformer-based approach to capture these inter-object relationships to refine classification and regression outcomes for detected objects. Building on two-stage detectors, we tokenize the region of interest (RoI) proposals to be processed by a transformer encoder. Specific spatial and geometric relations are incorporated into the attention weights and adaptively modulated and regularized. Experimental results demonstrate that the proposed method achieves consistent performance improvement on three benchmarks including DOTA-v1.0, DOTA-v1.5, and HRSC 2016, especially ranking first on both DOTA-v1.5 and HRSC 2016. Specifically, our new method has an increase of 1.59 mAP on DOTA-v1.0, 4.88 mAP on DOTA-v1.5, and 2.1 mAP on HRSC 2016, respectively, compared to the baselines.
Related papers
- OrientedFormer: An End-to-End Transformer-Based Oriented Object Detector in Remote Sensing Images [26.37802649901314]
Oriented object detection in remote sensing images is a challenging task due to objects being distributed in multi-orientation.
We propose an end-to-end transformer-based oriented object detector consisting of three dedicated modules to address these issues.
Compared with previous end-to-end detectors, the OrientedFormer gains 1.16 and 1.21 AP$_50$ on DIOR-R and DOTA-v1.0 respectively, while reducing training epochs from 3$times$ to 1$times$.
arXiv Detail & Related papers (2024-09-29T10:36:33Z) - Relation DETR: Exploring Explicit Position Relation Prior for Object Detection [26.03892270020559]
We present a scheme for enhancing the convergence and performance of DETR (DEtection TRansformer)
Our approach, termed Relation-DETR, introduces an encoder to construct position relation embeddings for progressive attention refinement.
Experiments on both generic and task-specific datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-07-16T13:17:07Z) - SOOD++: Leveraging Unlabeled Data to Boost Oriented Object Detection [59.868772767818975]
We propose a simple yet effective Semi-supervised Oriented Object Detection method termed SOOD++.
Specifically, we observe that objects from aerial images are usually arbitrary orientations, small scales, and aggregation.
Extensive experiments conducted on various multi-oriented object datasets under various labeled settings demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-07-01T07:03:51Z) - Hausdorff Distance Matching with Adaptive Query Denoising for Rotated Detection Transformer [4.137346786534721]
We introduce a Hausdorff distance-based cost for bipartite matching, which more accurately quantifies the discrepancy between predictions and ground truths.
We propose an adaptive query denoising method that employs bipartite matching to selectively eliminate noised queries that detract from model improvement.
arXiv Detail & Related papers (2023-05-12T16:42:54Z) - SOOD: Towards Semi-Supervised Oriented Object Detection [57.05141794402972]
This paper proposes a novel Semi-supervised Oriented Object Detection model, termed SOOD, built upon the mainstream pseudo-labeling framework.
Our experiments show that when trained with the two proposed losses, SOOD surpasses the state-of-the-art SSOD methods under various settings on the DOTA-v1.5 benchmark.
arXiv Detail & Related papers (2023-04-10T11:10:42Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - Exploring Consistency in Cross-Domain Transformer for Domain Adaptive
Semantic Segmentation [51.10389829070684]
Domain gap can cause discrepancies in self-attention.
Due to this gap, the transformer attends to spurious regions or pixels, which deteriorates accuracy on the target domain.
We propose adaptation on attention maps with cross-domain attention layers.
arXiv Detail & Related papers (2022-11-27T02:40:33Z) - Anchor Retouching via Model Interaction for Robust Object Detection in
Aerial Images [15.404024559652534]
We present an effective Dynamic Enhancement Anchor (DEA) network to construct a novel training sample generator.
Our method achieves state-of-the-art performance in accuracy with moderate inference speed and computational overhead for training.
arXiv Detail & Related papers (2021-12-13T14:37:20Z) - MRDet: A Multi-Head Network for Accurate Oriented Object Detection in
Aerial Images [51.227489316673484]
We propose an arbitrary-oriented region proposal network (AO-RPN) to generate oriented proposals transformed from horizontal anchors.
To obtain accurate bounding boxes, we decouple the detection task into multiple subtasks and propose a multi-head network.
Each head is specially designed to learn the features optimal for the corresponding task, which allows our network to detect objects accurately.
arXiv Detail & Related papers (2020-12-24T06:36:48Z) - DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets.
We propose an efficient and effective data augmentation method called DecAug for HOI detection.
Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.