Related papers: Prototype-Based Information Compensation Network for Multi-Source Remote Sensing Data Classification

Prototype-Based Information Compensation Network for Multi-Source Remote Sensing Data Classification

URL: http://arxiv.org/abs/2505.04003v1
Date: Tue, 06 May 2025 22:30:23 GMT
Title: Prototype-Based Information Compensation Network for Multi-Source Remote Sensing Data Classification
Authors: Feng Gao, Sheng Liu, Chuanzheng Gong, Xiaowei Zhou, Jiayi Wang, Junyu Dong, Qian Du,
Abstract summary: Multi-source remote sensing data joint classification aims to provide accuracy and reliability of land cover classification.<n>Existing methods confront two challenges: inter-frequency multi-source feature coupling and inconsistency of complementary information exploration.<n>We present a Prototype-based Information Compensation Network (PICNet) for land cover classification based on HSI and SAR/LiDAR data.
Score: 56.065032039986725
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-source remote sensing data joint classification aims to provide accuracy and reliability of land cover classification by leveraging the complementary information from multiple data sources. Existing methods confront two challenges: inter-frequency multi-source feature coupling and inconsistency of complementary information exploration. To solve these issues, we present a Prototype-based Information Compensation Network (PICNet) for land cover classification based on HSI and SAR/LiDAR data. Specifically, we first design a frequency interaction module to enhance the inter-frequency coupling in multi-source feature extraction. The multi-source features are first decoupled into high- and low-frequency components. Then, these features are recoupled to achieve efficient inter-frequency communication. Afterward, we design a prototype-based information compensation module to model the global multi-source complementary information. Two sets of learnable modality prototypes are introduced to represent the global modality information of multi-source data. Subsequently, cross-modal feature integration and alignment are achieved through cross-attention computation between the modality-specific prototype vectors and the raw feature representations. Extensive experiments on three public datasets demonstrate the significant superiority of our PICNet over state-of-the-art methods. The codes are available at https://github.com/oucailab/PICNet.

Related papers

Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation [61.91492500828508]
Few-shot 3D point cloud segmentation (FS-PCS) aims at generalizing models to segment novel categories with minimal support samples.<n>We introduce a multimodal FS-PCS setup, utilizing textual labels and the potentially available 2D image modality.<n>We propose a simple yet effective Test-time Adaptive Cross-modal (TACC) technique to mitigate training bias.
arXiv Detail & Related papers (2024-10-29T19:28:41Z)
PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN) PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z)
Hierarchical Attention and Parallel Filter Fusion Network for Multi-Source Data Classification [33.26466989592473]
We propose a hierarchical attention and parallel filter fusion network for multi-source data classification. Our proposed method achieves 91.44% and 80.51% of overall accuracy (OA) on the respective datasets.
arXiv Detail & Related papers (2024-08-22T23:14:22Z)
Multimodal Informative ViT: Information Aggregation and Distribution for Hyperspectral and LiDAR Classification [25.254816993934746]
Multimodal Informative Vit (MIVit) is a system with an innovative information aggregate-distributing mechanism. MIVit reduces redundancy in the empirical distribution of each modality's separate and fused features. Our results show that MIVit's bidirectional aggregate-distributing mechanism is highly effective.
arXiv Detail & Related papers (2024-01-06T09:53:33Z)
Feature Decoupling-Recycling Network for Fast Interactive Segmentation [79.22497777645806]
Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input. We propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies.
arXiv Detail & Related papers (2023-08-07T12:26:34Z)
A3CLNN: Spatial, Spectral and Multiscale Attention ConvLSTM Neural Network for Multisource Remote Sensing Data Classification [24.006660419933727]
We propose a new approach to exploit the complement of two data sources: hyperspectral images (HSIs) and light detection and ranging (LiDAR) data. We develop a new dual-channel spatial, spectral and multiscale attention convolutional long short-term memory neural network (called dual-channel A3CLNN) for feature extraction and classification.
arXiv Detail & Related papers (2022-04-09T12:43:32Z)
Multi-modal land cover mapping of remote sensing images using pyramid attention and gated fusion networks [20.66034058363032]
We propose a new multi-modality network for land cover mapping of multi-modal remote sensing data based on a novel pyramid attention fusion (PAF) module and a gated fusion unit (GFU) PAF module is designed to efficiently obtain rich fine-grained contextual representations from each modality with a built-in cross-level and cross-view attention fusion mechanism. GFU module utilizes a novel gating mechanism for early merging of features, thereby diminishing hidden redundancies and noise.
arXiv Detail & Related papers (2021-11-06T10:01:01Z)
MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object Detection [17.295359521427073]
We propose a Multi-Branch Deep Fusion Network (MBDF-Net) for 3D object detection. In the first stage, our multi-branch feature extraction network utilizes Adaptive Attention Fusion modules to produce cross-modal fusion features from single-modal semantic features. In the second stage, we use a region of interest (RoI) -pooled fusion module to generate enhanced local features for refinement.
arXiv Detail & Related papers (2021-08-29T15:40:15Z)
Specificity-preserving RGB-D Saliency Detection [103.3722116992476]
We propose a specificity-preserving network (SP-Net) for RGB-D saliency detection. Two modality-specific networks and a shared learning network are adopted to generate individual and shared saliency maps. Experiments on six benchmark datasets demonstrate that our SP-Net outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-08-18T14:14:22Z)
X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for Classification of Remote Sensing Data [69.37597254841052]
We propose a novel cross-modal deep-learning framework called X-ModalNet. X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network. We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.
arXiv Detail & Related papers (2020-06-24T15:29:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.