Related papers: EMRA-proxy: Enhancing Multi-Class Region Semantic Segmentation in Remote Sensing Images with Attention Proxy

EMRA-proxy: Enhancing Multi-Class Region Semantic Segmentation in Remote Sensing Images with Attention Proxy

URL: http://arxiv.org/abs/2505.17665v1
Date: Fri, 23 May 2025 09:30:45 GMT
Title: EMRA-proxy: Enhancing Multi-Class Region Semantic Segmentation in Remote Sensing Images with Attention Proxy
Authors: Yichun Yu, Yuqing Lan, Zhihuan Xing, Xiaoyi Yang, Tingyue Tang, Dan Yu,
Abstract summary: We propose a novel approach, Region-Aware Proxy Network (RAPNet), which consists of two components: Contextual Region Attention (CRA) and Global Class Refinement (GCR)<n>RAPNet operates at the region level for more flexible segmentation.<n>Experiments on three public datasets show that RAPNet outperforms state-of-the-art methods, achieving superior multi-class segmentation accuracy.
Score: 2.3727914512000714
License: http://creativecommons.org/licenses/by/4.0/
Abstract: High-resolution remote sensing (HRRS) image segmentation is challenging due to complex spatial layouts and diverse object appearances. While CNNs excel at capturing local features, they struggle with long-range dependencies, whereas Transformers can model global context but often neglect local details and are computationally expensive.We propose a novel approach, Region-Aware Proxy Network (RAPNet), which consists of two components: Contextual Region Attention (CRA) and Global Class Refinement (GCR). Unlike traditional methods that rely on grid-based layouts, RAPNet operates at the region level for more flexible segmentation. The CRA module uses a Transformer to capture region-level contextual dependencies, generating a Semantic Region Mask (SRM). The GCR module learns a global class attention map to refine multi-class information, combining the SRM and attention map for accurate segmentation.Experiments on three public datasets show that RAPNet outperforms state-of-the-art methods, achieving superior multi-class segmentation accuracy.

Related papers

Region-based Cluster Discrimination for Visual Representation Learning [30.79223671093668]
Region-Aware Cluster Discrimination (RICE) is a novel method that enhances region-level visual and OCR capabilities.<n>RICE consistently outperforms previous methods on tasks, including segmentation, dense detection, and visual perception.
arXiv Detail & Related papers (2025-07-26T17:47:09Z)
A Novel Shape Guided Transformer Network for Instance Segmentation in Remote Sensing Images [4.14360329494344]
We propose a novel Shape Guided Transformer Network (SGTN) to accurately extract objects at the instance level.<n>Inspired by the global contextual modeling capacity of the self-attention mechanism, we propose an effective transformer encoder termed LSwin.<n>Our SGTN achieves the highest average precision (AP) scores on two single-class public datasets.
arXiv Detail & Related papers (2024-12-31T09:25:41Z)
LOGCAN++: Adaptive Local-global class-aware network for semantic segmentation of remote sensing imagery [6.715911889086415]
LOGCAN++ is a semantic segmentation model customized for remote sensing images.<n>It is made up of a Global Class Awareness (GCA) module and several Local Class Awareness (LCA) modules.<n>LCA module generates local class representations as intermediate perceptual elements to indirectly associate pixels with the global class representations.
arXiv Detail & Related papers (2024-06-24T10:12:03Z)
ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions. Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks. We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z)
R-MAE: Regions Meet Masked Autoencoders [113.73147144125385]
We explore regions as a potential visual analogue of words for self-supervised image representation learning. Inspired by Masked Autoencoding (MAE), a generative pre-training baseline, we propose masked region autoencoding to learn from groups of pixels or regions.
arXiv Detail & Related papers (2023-06-08T17:56:46Z)
LoG-CAN: local-global Class-aware Network for semantic segmentation of remote sensing images [4.124381172041927]
We present LoG-CAN, a multi-scale semantic segmentation network with a global class-aware (GCA) module and local class-aware (LCA) modules to remote sensing images. Specifically, the GCA module captures the global representations of class-wise context modeling to circumvent background interference; the LCA modules generate local class representations as intermediate aware elements, indirectly associating pixels with global class representations to reduce variance within a class.
arXiv Detail & Related papers (2023-03-14T09:44:29Z)
Semantic Segmentation by Early Region Proxy [53.594035639400616]
We present a novel and efficient modeling that starts from interpreting the image as a tessellation of learnable regions. To model region-wise context, we exploit Transformer to encode regions in a sequence-to-sequence manner. Semantic segmentation is now carried out as per-region prediction on top of the encoded region embeddings.
arXiv Detail & Related papers (2022-03-26T10:48:32Z)
PRA-Net: Point Relation-Aware Network for 3D Point Cloud Analysis [56.91758845045371]
We propose a novel framework named Point Relation-Aware Network (PRA-Net) It is composed of an Intra-region Structure Learning (ISL) module and an Inter-region Relation Learning (IRL) module. Experiments on several 3D benchmarks covering shape classification, keypoint estimation, and part segmentation have verified the effectiveness and the ability of PRA-Net.
arXiv Detail & Related papers (2021-12-09T13:24:43Z)
Global Aggregation then Local Distribution for Scene Parsing [99.1095068574454]
We show that our approach can be modularized as an end-to-end trainable block and easily plugged into existing semantic segmentation networks. Our approach allows us to build new state of the art on major semantic segmentation benchmarks including Cityscapes, ADE20K, Pascal Context, Camvid and COCO-stuff.
arXiv Detail & Related papers (2021-07-28T03:46:57Z)
CSRNet: Cascaded Selective Resolution Network for Real-time Semantic Segmentation [18.63596070055678]
We propose a light Cascaded Selective Resolution Network (CSRNet) to improve the performance of real-time segmentation. The proposed network builds a three-stage segmentation system, which integrates feature information from low resolution to high resolution. Experiments on two well-known datasets demonstrate that the proposed CSRNet effectively improves the performance for real-time segmentation.
arXiv Detail & Related papers (2021-06-08T14:22:09Z)
LRC-Net: Learning Discriminative Features on Point Clouds by Encoding Local Region Contexts [65.79931333193016]
We present a novel Local-Region-Context Network (LRC-Net) to learn discriminative features on point clouds. LRC-Net encodes fine-grained contexts inside and among local regions simultaneously. Results show LRC-Net is competitive with state-of-the-art methods in shape classification and shape segmentation applications.
arXiv Detail & Related papers (2020-03-18T14:34:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.