RelationRS: Relationship Representation Network for Object Detection in
Aerial Images
- URL: http://arxiv.org/abs/2110.06730v1
- Date: Wed, 13 Oct 2021 14:02:33 GMT
- Title: RelationRS: Relationship Representation Network for Object Detection in
Aerial Images
- Authors: Zhiming Liu, Xuefei Zhang, Chongyang Liu, Hao Wang, Chao Sun, Bin Li,
Weifeng Sun, Pu Huang, Qingjun Li, Yu Liu, Haipeng Kuang, Jihong Xiu
- Abstract summary: We propose a relationship representation network for object detection in aerial images (RelationRS)
The dual relationship module learns the potential relationship between features of different scales and learns the relationship between different scenes from different patches in a same iteration.
The bridging visual representations module (BVR) is introduced into the field of aerial images to improve the object detection effect in images with complex backgrounds.
- Score: 15.269897893563417
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object detection is a basic and important task in the field of aerial image
processing and has gained much attention in computer vision. However, previous
aerial image object detection approaches have insufficient use of scene
semantic information between different regions of large-scale aerial images. In
addition, complex background and scale changes make it difficult to improve
detection accuracy. To address these issues, we propose a relationship
representation network for object detection in aerial images (RelationRS): 1)
Firstly, multi-scale features are fused and enhanced by a dual relationship
module (DRM) with conditional convolution. The dual relationship module learns
the potential relationship between features of different scales and learns the
relationship between different scenes from different patches in a same
iteration. In addition, the dual relationship module dynamically generates
parameters to guide the fusion of multi-scale features. 2) Secondly, The
bridging visual representations module (BVR) is introduced into the field of
aerial images to improve the object detection effect in images with complex
backgrounds. Experiments with a publicly available object detection dataset for
aerial images demonstrate that the proposed RelationRS achieves a
state-of-the-art detection performance.
Related papers
- Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing [11.626527403157922]
We present the Pattern Integration and Enhancement Vision Transformer (PIEViT), a novel self-supervised learning framework for remote sensing imagery.
PIEViT enhances the representation of internal patch features, providing significant improvements over existing self-supervised baselines.
It achieves excellent results in object detection, land cover classification, and change detection, underscoring its robustness, generalization, and transferability for remote sensing image interpretation tasks.
arXiv Detail & Related papers (2024-11-09T07:06:31Z) - Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection [14.22646492640906]
We propose a simple and highly efficient decoder-free architecture for open-vocabulary visual relationship detection.
Our model consists of a Transformer-based image encoder that represents objects as tokens and models their relationships implicitly.
Our approach achieves state-of-the-art relationship detection performance on Visual Genome and on the large-vocabulary GQA benchmark at real-time inference speeds.
arXiv Detail & Related papers (2024-03-21T10:15:57Z) - DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with
Competitive Query Selection and Adaptive Feature Fusion [82.2425759608975]
Infrared-visible object detection aims to achieve robust even full-day object detection by fusing the complementary information of infrared and visible images.
We propose a Dynamic Adaptive Multispectral Detection Transformer (DAMSDet) to address these two challenges.
Experiments on four public datasets demonstrate significant improvements compared to other state-of-the-art methods.
arXiv Detail & Related papers (2024-03-01T07:03:27Z) - Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images [1.662438436885552]
Multi-modal fusion has been determined to enhance the accuracy by fusing data from multiple modalities.
We propose a novel multi-modal fusion strategy for mapping relationships between different channels at the early stage.
By addressing fusion in the early stage, as opposed to mid or late-stage methods, our method achieves competitive and even superior performance compared to existing techniques.
arXiv Detail & Related papers (2023-10-21T00:56:11Z) - A Dual Attentive Generative Adversarial Network for Remote Sensing Image
Change Detection [6.906936669510404]
We propose a dual attentive generative adversarial network for achieving very high-resolution remote sensing image change detection tasks.
The DAGAN framework has better performance with 85.01% mean IoU and 91.48% mean F1 score than advanced methods on the LEVIR dataset.
arXiv Detail & Related papers (2023-10-03T08:26:27Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - ObjectFormer for Image Manipulation Detection and Localization [118.89882740099137]
We propose ObjectFormer to detect and localize image manipulations.
We extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings.
We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-03-28T12:27:34Z) - Perceiving Traffic from Aerial Images [86.994032967469]
We propose an object detection method called Butterfly Detector that is tailored to detect objects in aerial images.
We evaluate our Butterfly Detector on two publicly available UAV datasets (UAVDT and VisDrone 2019) and show that it outperforms previous state-of-the-art methods while remaining real-time.
arXiv Detail & Related papers (2020-09-16T11:37:43Z) - Visual Relationship Detection with Visual-Linguistic Knowledge from
Multimodal Representations [103.00383924074585]
Visual relationship detection aims to reason over relationships among salient objects in images.
We propose a novel approach named Visual-Linguistic Representations from Transformers (RVL-BERT)
RVL-BERT performs spatial reasoning with both visual and language commonsense knowledge learned via self-supervised pre-training.
arXiv Detail & Related papers (2020-09-10T16:15:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.