Related papers: OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images

OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images

URL: http://arxiv.org/abs/2503.06146v2
Date: Fri, 21 Mar 2025 06:47:18 GMT
Title: OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
Authors: Ziyue Huang, Yongchao Feng, Shuai Yang, Ziqi Liu, Qingjie Liu, Yunhong Wang,
Abstract summary: We propose OpenRSD, a universal open-prompt RS object detection framework.<n>OpenRSD supports multimodal prompts and integrates multi-task detection heads to balance accuracy and real-time requirements.<n>Compared to YOLO-World, OpenRSD exhibits an 8.7% higher average precision and achieves an inference speed of 20.8 FPS.
Score: 45.40710102095654
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Remote sensing object detection has made significant progress, but most studies still focus on closed-set detection, limiting generalization across diverse datasets. Open-vocabulary object detection (OVD) provides a solution by leveraging multimodal associations between text prompts and visual features. However, existing OVD methods for remote sensing (RS) images are constrained by small-scale datasets and fail to address the unique challenges of remote sensing interpretation, include oriented object detection and the need for both high precision and real-time performance in diverse scenarios. To tackle these challenges, we propose OpenRSD, a universal open-prompt RS object detection framework. OpenRSD supports multimodal prompts and integrates multi-task detection heads to balance accuracy and real-time requirements. Additionally, we design a multi-stage training pipeline to enhance the generalization of model. Evaluated on seven public datasets, OpenRSD demonstrates superior performance in oriented and horizontal bounding box detection, with real-time inference capabilities suitable for large-scale RS image analysis. Compared to YOLO-World, OpenRSD exhibits an 8.7\% higher average precision and achieves an inference speed of 20.8 FPS. Codes and models will be released.

Related papers

RingMo-Agent: A Unified Remote Sensing Foundation Model for Multi-Platform and Multi-Modal Reasoning [15.670921552151775]
RingMo-Agent is designed to handle multi-modal and multi-platform data.<n>It is supported by a large-scale vision-language dataset named RS-VL3M.<n>It proves effective in both visual understanding and sophisticated analytical tasks.
arXiv Detail & Related papers (2025-07-28T12:39:33Z)
RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images [3.305346506291318]
We introduce RS-TinyNet, a multi-stage feature fusion and enhancement model specifically tailored for tiny object detection in various RS scenarios.<n> RS-TinyNet comes with two novel designs: tiny object saliency modeling and feature integrity reconstruction.<n>Our experiments show that RS-TinyNet surpasses existing state-of-the-art (SOTA) detectors by 4.0% AP and 6.5% AP75.
arXiv Detail & Related papers (2025-07-17T13:34:21Z)
AuxDet: Auxiliary Metadata Matters for Omni-Domain Infrared Small Target Detection [58.67129770371016]
We propose a novel IRSTD framework that reimagines the IRSTD paradigm by incorporating textual metadata for scene-aware optimization.<n>AuxDet consistently outperforms state-of-the-art methods, validating the critical role of auxiliary information in improving robustness and accuracy.
arXiv Detail & Related papers (2025-05-21T07:02:05Z)
Real-IAD D3: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection [53.2590751089607]
Real-IAD D3 is a high-precision multimodal dataset that incorporates an additional pseudo3D modality generated through photometric stereo. We introduce an effective approach that integrates RGB, point cloud, and pseudo-3D depth information to leverage the complementary strengths of each modality. Our experiments highlight the importance of these modalities in boosting detection robustness and overall IAD performance.
arXiv Detail & Related papers (2025-04-19T08:05:47Z)
OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing [57.050679160659705]
We introduce textbfOpenEarthSensing (OES), a large-scale fine-grained benchmark for open-world remote sensing.<n>OES includes 189 scene and object categories, covering the vast majority of potential semantic shifts that may occur in the real world.
arXiv Detail & Related papers (2025-02-28T02:49:52Z)
LWGANet: A Lightweight Group Attention Backbone for Remote Sensing Visual Tasks [20.924609707499915]
This article introduces LWGANet, a specialized lightweight backbone network tailored for RS visual tasks.<n>LWGA module, tailored for RS imagery, adeptly harnesses redundant features to extract a wide range of spatial information.<n>The results confirm LWGANet's widespread applicability and its ability to maintain an optimal balance between high performance and low complexity.
arXiv Detail & Related papers (2025-01-17T08:56:17Z)
Generalization-Enhanced Few-Shot Object Detection in Remote Sensing [22.411751110592842]
Few-shot object detection (FSOD) targets object detection challenges in data-limited conditions.<n>We propose the Generalization-Enhanced Few-Shot Object Detection (GE-FSOD) model to improve the generalization capability in remote sensing tasks.<n>Our model introduces three key innovations: the Cross-Level Fusion Pyramid Attention Network (CFPAN), the Multi-Stage Refinement Region Proposal Network (MRRPN), and the Generalized Classification Loss (GCL)
arXiv Detail & Related papers (2025-01-05T08:12:25Z)
PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network [24.54269823691119]
We present an advanced study on more challenging high-resolution salient object detection (HRSOD) from both dataset and network framework perspectives. To compensate for the lack of HRSOD dataset, we thoughtfully collect a large-scale high resolution salient object detection dataset, called UHRSD. All the images are finely annotated in pixel-level, far exceeding previous low-resolution SOD datasets.
arXiv Detail & Related papers (2024-08-02T09:31:21Z)
SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection [79.23689506129733]
We establish a new benchmark dataset and an open-source method for large-scale SAR object detection. Our dataset, SARDet-100K, is a result of intense surveying, collecting, and standardizing 10 existing SAR detection datasets. To the best of our knowledge, SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created.
arXiv Detail & Related papers (2024-03-11T09:20:40Z)
DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion [82.2425759608975]
Infrared-visible object detection aims to achieve robust even full-day object detection by fusing the complementary information of infrared and visible images. We propose a Dynamic Adaptive Multispectral Detection Transformer (DAMSDet) to address these two challenges. Experiments on four public datasets demonstrate significant improvements compared to other state-of-the-art methods.
arXiv Detail & Related papers (2024-03-01T07:03:27Z)
Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images [1.662438436885552]
Multi-modal fusion has been determined to enhance the accuracy by fusing data from multiple modalities. We propose a novel multi-modal fusion strategy for mapping relationships between different channels at the early stage. By addressing fusion in the early stage, as opposed to mid or late-stage methods, our method achieves competitive and even superior performance compared to existing techniques.
arXiv Detail & Related papers (2023-10-21T00:56:11Z)
Adaptive Rotated Convolution for Rotated Object Detection [96.94590550217718]
We present Adaptive Rotated Convolution (ARC) module to handle rotated object detection problem. In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images. The proposed approach achieves state-of-the-art performance on the DOTA dataset with 81.77% mAP.
arXiv Detail & Related papers (2023-03-14T11:53:12Z)
Multi-Scale Iterative Refinement Network for RGB-D Salient Object Detection [7.062058947498447]
salient visual cues appear in various scales and resolutions of RGB images due to semantic gaps at different feature levels. Similar salient patterns are available in cross-modal depth images as well as multi-scale versions. We devise attention based fusion module (ABF) to address on cross-modal correlation.
arXiv Detail & Related papers (2022-01-24T10:33:00Z)
MRDet: A Multi-Head Network for Accurate Oriented Object Detection in Aerial Images [51.227489316673484]
We propose an arbitrary-oriented region proposal network (AO-RPN) to generate oriented proposals transformed from horizontal anchors. To obtain accurate bounding boxes, we decouple the detection task into multiple subtasks and propose a multi-head network. Each head is specially designed to learn the features optimal for the corresponding task, which allows our network to detect objects accurately.
arXiv Detail & Related papers (2020-12-24T06:36:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.