A Global-Local Cross-Attention Network for Ultra-high Resolution Remote Sensing Image Semantic Segmentation
- URL: http://arxiv.org/abs/2506.19406v1
- Date: Tue, 24 Jun 2025 08:20:08 GMT
- Title: A Global-Local Cross-Attention Network for Ultra-high Resolution Remote Sensing Image Semantic Segmentation
- Authors: Chen Yi, Shan LianLei,
- Abstract summary: GLCANet is a lightweight segmentation framework designed for UHR remote sensing imagery.<n>A self-attention mechanism enhances long-range dependencies, refines global features, and preserves local details for better semantic consistency.<n>A masked cross-attention mechanism also adaptively fuses global-local features, selectively enhancing fine-grained details while exploiting global context to improve segmentation accuracy.
- Score: 1.833928124984226
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rapid development of ultra-high resolution (UHR) remote sensing technology, the demand for accurate and efficient semantic segmentation has increased significantly. However, existing methods face challenges in computational efficiency and multi-scale feature fusion. To address these issues, we propose GLCANet (Global-Local Cross-Attention Network), a lightweight segmentation framework designed for UHR remote sensing imagery.GLCANet employs a dual-stream architecture to efficiently fuse global semantics and local details while minimizing GPU usage. A self-attention mechanism enhances long-range dependencies, refines global features, and preserves local details for better semantic consistency. A masked cross-attention mechanism also adaptively fuses global-local features, selectively enhancing fine-grained details while exploiting global context to improve segmentation accuracy. Experimental results show that GLCANet outperforms state-of-the-art methods regarding accuracy and computational efficiency. The model effectively processes large, high-resolution images with a small memory footprint, providing a promising solution for real-world remote sensing applications.
Related papers
- Referring Remote Sensing Image Segmentation with Cross-view Semantics Interaction Network [65.01521002836611]
We propose a paralleled yet unified segmentation framework Cross-view Semantics Interaction Network (CSINet) to solve the limitations.<n>Motivated by human behavior in observing targets of interest, the network orchestrates visual cues from remote and close distances to conduct synergistic prediction.<n>In its every encoding stage, a Cross-View Window-attention module (CVWin) is utilized to supplement global and local semantics into close-view and remote-view branch features.
arXiv Detail & Related papers (2025-08-02T11:57:56Z) - HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation [74.1872891313184]
HRSeg is an efficient model with high-resolution fine-grained perception.<n>It features two key innovations: High-Resolution Perception (HRP) and High-Resolution Enhancement (HRE)
arXiv Detail & Related papers (2025-07-17T08:09:31Z) - MGDFIS: Multi-scale Global-detail Feature Integration Strategy for Small Object Detection [10.135137525886098]
Small object detection in UAV imagery is crucial for applications such as search-and-rescue, traffic monitoring, and environmental surveillance.<n>Existing multi-scale fusion methods help, but add computational burden and blur fine details.<n>We propose a unified fusion framework that tightly couples global context with local detail to boost detection performance.
arXiv Detail & Related papers (2025-06-15T02:54:25Z) - Breaking Complexity Barriers: High-Resolution Image Restoration with Rank Enhanced Linear Attention [54.42902794496325]
Linear attention, a variant of softmax attention, demonstrates promise in global context modeling.<n>We propose Rank Enhanced Linear Attention (RELA), a simple yet effective method that enriches feature representations by integrating a lightweight depthwise convolution.<n>Building upon RELA, we propose an efficient and effective image restoration Transformer, named LAformer.
arXiv Detail & Related papers (2025-05-22T02:57:23Z) - Semantic-Guided Global-Local Collaborative Networks for Lightweight Image Super-Resolution [9.666827340439669]
Single-Image Super-Resolution (SISR) plays a pivotal role in enhancing the accuracy and reliability of measurement systems.<n>We propose a Semantic-Guided Global-Local Collaborative Network (SGGLC-Net) for lightweight SISR.
arXiv Detail & Related papers (2025-03-20T11:43:55Z) - HRDecoder: High-Resolution Decoder Network for Fundus Image Lesion Segmentation [12.606794661369959]
We propose HRDecoder, a simple High-Resolution Decoder network for fundus lesion segmentation.<n>It integrates a high-resolution representation learning module to capture fine-grained local features and a high-resolution fusion module to fuse multi-scale predictions.<n>Our method effectively improves the overall segmentation accuracy of fundus lesions while consuming reasonable memory and computational overhead, and maintaining satisfying inference speed.
arXiv Detail & Related papers (2024-11-06T15:13:31Z) - ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions.
Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks.
We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z) - Low-Resolution Self-Attention for Semantic Segmentation [93.30597515880079]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.<n>Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.<n>We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - CSRNet: Cascaded Selective Resolution Network for Real-time Semantic
Segmentation [18.63596070055678]
We propose a light Cascaded Selective Resolution Network (CSRNet) to improve the performance of real-time segmentation.
The proposed network builds a three-stage segmentation system, which integrates feature information from low resolution to high resolution.
Experiments on two well-known datasets demonstrate that the proposed CSRNet effectively improves the performance for real-time segmentation.
arXiv Detail & Related papers (2021-06-08T14:22:09Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z) - Hybrid Multiple Attention Network for Semantic Segmentation in Aerial
Images [24.35779077001839]
We propose a novel attention-based framework named Hybrid Multiple Attention Network (HMANet) to adaptively capture global correlations.
We introduce a simple yet effective region shuffle attention (RSA) module to reduce feature redundant and improve the efficiency of self-attention mechanism.
arXiv Detail & Related papers (2020-01-09T07:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.