Related papers: SDCoNet: Saliency-Driven Multi-Task Collaborative Network for Remote Sensing Object Detection

SDCoNet: Saliency-Driven Multi-Task Collaborative Network for Remote Sensing Object Detection

URL: http://arxiv.org/abs/2601.12507v1
Date: Sun, 18 Jan 2026 17:36:48 GMT
Title: SDCoNet: Saliency-Driven Multi-Task Collaborative Network for Remote Sensing Object Detection
Authors: Ruo Qi, Linhui Dai, Yusong Qin, Chaolei Yang, Yanshan Li,
Abstract summary: In remote sensing images, complex backgrounds, weak object signals, and small object scales make accurate detection particularly challenging.<n>A common strategy is to integrate single-image super-resolution (SR) before detection.<n>We propose a Saliency-Driven multi-task Collaborative Network (SDCoNet) that couples SR and detection through implicit feature sharing.
Score: 7.016133328153285
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In remote sensing images, complex backgrounds, weak object signals, and small object scales make accurate detection particularly challenging, especially under low-quality imaging conditions. A common strategy is to integrate single-image super-resolution (SR) before detection; however, such serial pipelines often suffer from misaligned optimization objectives, feature redundancy, and a lack of effective interaction between SR and detection. To address these issues, we propose a Saliency-Driven multi-task Collaborative Network (SDCoNet) that couples SR and detection through implicit feature sharing while preserving task specificity. SDCoNet employs the swin transformer-based shared encoder, where hierarchical window-shifted self-attention supports cross-task feature collaboration and adaptively balances the trade-off between texture refinement and semantic representation. In addition, a multi-scale saliency prediction module produces importance scores to select key tokens, enabling focused attention on weak object regions, suppression of background clutter, and suppression of adverse features introduced by multi-task coupling. Furthermore, a gradient routing strategy is introduced to mitigate optimization conflicts. It first stabilizes detection semantics and subsequently routes SR gradients along a detection-oriented direction, enabling the framework to guide the SR branch to generate high-frequency details that are explicitly beneficial for detection. Experiments on public datasets, including NWPU VHR-10-Split, DOTAv1.5-Split, and HRSSD-Split, demonstrate that the proposed method, while maintaining competitive computational efficiency, significantly outperforms existing mainstream algorithms in small object detection on low-quality remote sensing images. Our code is available at https://github.com/qiruo-ya/SDCoNet.

Related papers

SMR-Net:Robot Snap Detection Based on Multi-Scale Features and Self-Attention Network [0.0]
Traditional visual methods suffer from poor robustness and large localization errors when handling complex scenarios.<n>This paper proposes SMR-Net, a self-attention-based multi-scale object detection algorithm.<n> Experimental results on Type A and Type B snap datasets show SMR-Net outperforms traditional Faster R-CNN significantly.
arXiv Detail & Related papers (2026-03-01T10:28:01Z)
DCCS-Det: Directional Context and Cross-Scale-Aware Detector for Infrared Small Target [4.318503966844226]
Infrared small target detection (IRSTD) is critical for applications like remote sensing and surveillance.<n>We propose DCCS-Det, a novel detector that incorporates a Dual-stream Saliency Enhancement (DSE) block and a Latent-aware Semantic Extraction and Aggregation (LaSEA) module.<n>Experiments show that DCCS-Det achieves state-of-the-art detection accuracy with competitive efficiency across multiple datasets.
arXiv Detail & Related papers (2026-01-23T03:53:59Z)
LSFDNet: A Single-Stage Fusion and Detection Network for Ships Using SWIR and LWIR [16.16208006025223]
Short-wave infrared (SWIR) and long-wave infrared (LWIR) are used in ship detection.<n>We propose a novel single-stage image fusion detection algorithm called LSFDNet.<n>This algorithm leverages feature interaction between the image fusion and object detection subtask networks.<n>We validated the superiority of our proposed single-stage fusion detection algorithm on two datasets.
arXiv Detail & Related papers (2025-07-28T07:13:55Z)
SDS-Net: Shallow-Deep Synergism-detection Network for infrared small target detection [0.18641315013048293]
Current CNN-based infrared small target detection methods overlook the heterogeneity between shallow and deep features.<n>The dependency relationships and fusion mechanisms fail to fully exploit the complementarity of multilevel features.<n>This paper proposes a shallow-deep synergistic detection network (SDS-Net) that efficiently models multilevel feature representations.
arXiv Detail & Related papers (2025-06-06T12:44:41Z)
RRCANet: Recurrent Reusable-Convolution Attention Network for Infrared Small Target Detection [20.301470710894005]
Infrared small target detection is a challenging task due to its unique characteristics.<n>Recent CNN-based methods have achieved promising performance with heavy feature extraction and fusion modules.<n>We propose a recurrent reusable-convolution attention network (RRCA-Net) for infrared small target detection.
arXiv Detail & Related papers (2025-06-03T03:18:17Z)
Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery [51.83786195178233]
We design a Knowledge Discovery Network (KDN) to implement the renormalization group theory in terms of efficient feature extraction. Renormalized connection (RC) on the KDN enables synergistic focusing'' of multi-scale features. RCs extend the multi-level feature's divide-and-conquer'' mechanism of the FPN-based detectors to a wide range of scale-preferred tasks.
arXiv Detail & Related papers (2024-09-09T13:56:22Z)
Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning. CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z)
Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural Network [52.29330138835208]
Accurately matching local features between a pair of images is a challenging computer vision task. Previous studies typically use attention based graph neural networks (GNNs) with fully-connected graphs over keypoints within/across images. We propose MaKeGNN, a sparse attention-based GNN architecture which bypasses non-repeatable keypoints and leverages matchable ones to guide message passing.
arXiv Detail & Related papers (2023-07-04T02:50:44Z)
RRNet: Relational Reasoning Network with Parallel Multi-scale Attention for Salient Object Detection in Optical Remote Sensing Images [82.1679766706423]
Salient object detection (SOD) for optical remote sensing images (RSIs) aims at locating and extracting visually distinctive objects/regions from the optical RSIs. We propose a relational reasoning network with parallel multi-scale attention for SOD in optical RSIs. Our proposed RRNet outperforms the existing state-of-the-art SOD competitors both qualitatively and quantitatively.
arXiv Detail & Related papers (2021-10-27T07:18:32Z)
Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture. We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions. Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z)
Resolution Adaptive Networks for Efficient Inference [53.04907454606711]
We propose a novel Resolution Adaptive Network (RANet), which is inspired by the intuition that low-resolution representations are sufficient for classifying "easy" inputs. In RANet, the input images are first routed to a lightweight sub-network that efficiently extracts low-resolution representations. High-resolution paths in the network maintain the capability to recognize the "hard" samples.
arXiv Detail & Related papers (2020-03-16T16:54:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.