Related papers: GCRPNet: Graph-Enhanced Contextual and Regional Perception Network for Salient Object Detection in Optical Remote Sensing Images

GCRPNet: Graph-Enhanced Contextual and Regional Perception Network for Salient Object Detection in Optical Remote Sensing Images

URL: http://arxiv.org/abs/2508.10542v3
Date: Tue, 09 Sep 2025 09:43:46 GMT
Title: GCRPNet: Graph-Enhanced Contextual and Regional Perception Network for Salient Object Detection in Optical Remote Sensing Images
Authors: Mengyu Ren, Yutong Li, Hua Li, Runmin Cong, Sam Kwong,
Abstract summary: We propose a graph-enhanced contextual and regional perception network (GCRPNet)<n>It builds upon the Mamba architecture to simultaneously capture long-range dependencies and enhance regional feature representation.<n>It performs adaptive patch scanning on feature maps processed via multi-scale convolutions, thereby capturing rich local region information.
Score: 68.33481681452675
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Salient object detection (SOD) in optical remote sensing images (ORSIs) faces numerous challenges, including significant variations in target scales and low contrast between targets and the background. Existing methods based on vision transformers (ViTs) and convolutional neural networks (CNNs) architectures aim to leverage both global and local features, but the difficulty in effectively integrating these heterogeneous features limits their overall performance. To overcome these limitations, we propose a graph-enhanced contextual and regional perception network (GCRPNet), which builds upon the Mamba architecture to simultaneously capture long-range dependencies and enhance regional feature representation. Specifically, we employ the visual state space (VSS) encoder to extract multi-scale features. To further achieve deep guidance and enhancement of these features, we first design a difference-similarity guided hierarchical graph attention module (DS-HGAM). This module strengthens cross-layer interaction capabilities between features of different scales while enhancing the model's structural perception,allowing it to distinguish between foreground and background more effectively. Then, we design the LEVSS block as the decoder of GCRPNet. This module integrates our proposed adaptive scanning strategy and multi-granularity collaborative attention enhancement module (MCAEM). It performs adaptive patch scanning on feature maps processed via multi-scale convolutions, thereby capturing rich local region information and enhancing Mamba's local modeling capability. Extensive experimental results demonstrate that the proposed model achieves state-of-the-art performance, validating its effectiveness and superiority.

Related papers

RS-ISRefiner: Towards Better Adapting Vision Foundation Models for Interactive Segmentation of Remote Sensing Images [17.648922817109224]
RS-ISRefiner is a novel click-based IIS framework tailored for remote sensing images.<n>It consistently outperforms state-of-the-art IIS methods in terms of segmentation accuracy, efficiency and interaction cost.
arXiv Detail & Related papers (2025-11-30T04:12:43Z)
VRS-UIE: Value-Driven Reordering Scanning for Underwater Image Enhancement [104.78586859995333]
State Space Models (SSMs) have emerged as a promising backbone for vision tasks due to their linear complexity and global receptive field.<n>The predominance of large-portion, homogeneous but useless oceanic backgrounds can dilute the feature representation responses of sparse yet valuable targets.<n>We propose a novel Value-Driven Reordering Scanning framework for Underwater Image Enhancement (UIE)<n>Our framework sets a new state-of-the-art, delivering superior enhancement performance (surpassing WMamba by 0.89 dB on average) by effectively suppressing water bias and preserving structural and color fidelity.
arXiv Detail & Related papers (2025-05-02T12:21:44Z)
Spatial-Geometry Enhanced 3D Dynamic Snake Convolutional Neural Network for Hyperspectral Image Classification [12.168520751389622]
Deep neural networks face several challenges in hyperspectral image classification.<n>These include complex and sparse ground object distributions, small clustered structures, and elongated multi-branch features.<n>This paper proposes a Spatial-Geometry Enhanced 3D Dynamic Snake Network (SG-DSCNet) based on an improved 3D-DenseNet model.
arXiv Detail & Related papers (2025-04-06T12:21:39Z)
Optimized Unet with Attention Mechanism for Multi-Scale Semantic Segmentation [8.443350618722564]
This paper proposes an improved Unet model combined with an attention mechanism.<n>It introduces channel attention and spatial attention modules, enhances the model's ability to focus on important features.<n>The improved model performs well in terms of mIoU and pixel accuracy (PA), reaching 76.5% and 95.3% respectively.
arXiv Detail & Related papers (2025-02-06T06:51:23Z)
Threshold Attention Network for Semantic Segmentation of Remote Sensing Images [3.5449012582104795]
Self-attention mechanism (SA) is an effective approach for designing segmentation networks.<n>We propose a novel threshold attention mechanism (TAM) for semantic segmentation.<n>Based on TAM, we present a threshold attention network (TANet) for semantic segmentation.
arXiv Detail & Related papers (2025-01-14T10:09:55Z)
Brain-Inspired Stepwise Patch Merging for Vision Transformers [6.108377966393714]
We propose Stepwise Patch Merging (SPM), which enhances the subsequent attention mechanism's ability to'see' better.<n>The code has been released at https://github.com/Yonghao-Yu/StepwisePatchMerging.
arXiv Detail & Related papers (2024-09-11T03:04:46Z)
Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising [54.110544509099526]
Hyperspectral image (HSI) denoising is critical for the effective analysis and interpretation of hyperspectral data. We propose a hybrid convolution and attention network (HCANet) to enhance HSI denoising. Experimental results on mainstream HSI datasets demonstrate the rationality and effectiveness of the proposed HCANet.
arXiv Detail & Related papers (2024-03-15T07:18:43Z)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery. We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z)
SENetV2: Aggregated dense layer for channelwise and global representations [0.0]
We introduce a novel aggregated multilayer perceptron, a multi-branch dense layer, within the Squeeze residual module. This fusion enhances the network's ability to capture channel-wise patterns and have global knowledge. We conduct extensive experiments on benchmark datasets to validate the model and compare them with established architectures.
arXiv Detail & Related papers (2023-11-17T14:10:57Z)
Salient Object Detection in Optical Remote Sensing Images Driven by Transformer [69.22039680783124]
We propose a novel Global Extraction Local Exploration Network (GeleNet) for Optical Remote Sensing Images (ORSI-SOD) Specifically, GeleNet first adopts a transformer backbone to generate four-level feature embeddings with global long-range dependencies. Extensive experiments on three public datasets demonstrate that the proposed GeleNet outperforms relevant state-of-the-art methods.
arXiv Detail & Related papers (2023-09-15T07:14:43Z)
Semantic-aware Texture-Structure Feature Collaboration for Underwater Image Enhancement [58.075720488942125]
Underwater image enhancement has become an attractive topic as a significant technology in marine engineering and aquatic robotics. We develop an efficient and compact enhancement network in collaboration with a high-level semantic-aware pretrained model. We also apply the proposed algorithm to the underwater salient object detection task to reveal the favorable semantic-aware ability for high-level vision tasks.
arXiv Detail & Related papers (2022-11-19T07:50:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.