Related papers: X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for Classification of Remote Sensing Data

X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for Classification of Remote Sensing Data

URL: http://arxiv.org/abs/2006.13806v2
Date: Sat, 11 Jul 2020 18:26:47 GMT
Title: X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for Classification of Remote Sensing Data
Authors: Danfeng Hong, Naoto Yokoya, Gui-Song Xia, Jocelyn Chanussot, Xiao Xiang Zhu
Abstract summary: We propose a novel cross-modal deep-learning framework called X-ModalNet. X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network. We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.
Score: 69.37597254841052
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper addresses the problem of semi-supervised transfer learning with limited cross-modality data in remote sensing. A large amount of multi-modal earth observation images, such as multispectral imagery (MSI) or synthetic aperture radar (SAR) data, are openly available on a global scale, enabling parsing global urban scenes through remote sensing imagery. However, their ability in identifying materials (pixel-wise classification) remains limited, due to the noisy collection environment and poor discriminative information as well as limited number of well-annotated training images. To this end, we propose a novel cross-modal deep-learning framework, called X-ModalNet, with three well-designed modules: self-adversarial module, interactive learning module, and label propagation module, by learning to transfer more discriminative information from a small-scale hyperspectral image (HSI) into the classification task using a large-scale MSI or SAR data. Significantly, X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network, yielding semi-supervised cross-modality learning. We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.

Related papers

Semi-supervised Semantic Segmentation for Remote Sensing Images via Multi-scale Uncertainty Consistency and Cross-Teacher-Student Attention [59.19580789952102]
This paper proposes a novel semi-supervised Multi-Scale Uncertainty and Cross-Teacher-Student Attention (MUCA) model for RS image semantic segmentation tasks. MUCA constrains the consistency among feature maps at different layers of the network by introducing a multi-scale uncertainty consistency regularization. MUCA utilizes a Cross-Teacher-Student attention mechanism to guide the student network, guiding the student network to construct more discriminative feature representations.
arXiv Detail & Related papers (2025-01-18T11:57:20Z)
SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection [73.49799596304418]
This paper introduces a new task called Multi-Modal datasets and Multi-Task Object Detection (M2Det) for remote sensing. It is designed to accurately detect horizontal or oriented objects from any sensor modality. This task poses challenges due to 1) the trade-offs involved in managing multi-modal modelling and 2) the complexities of multi-task optimization.
arXiv Detail & Related papers (2024-12-30T02:47:51Z)
Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion [9.098711843118629]
This paper introduces state space model (SSM) and proposes a novel hybrid semantic segmentation network based on vision Mamba (CVMH-UNet) This method designs a cross-scanning visual state space block (CVSSBlock) that uses cross 2D scanning (CS2D) to fully capture global information from multiple directions. By incorporating convolutional neural network branches to overcome the constraints of Vision Mamba (VMamba) in acquiring local information, this approach facilitates a comprehensive analysis of both global and local features.
arXiv Detail & Related papers (2024-10-08T02:17:38Z)
SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image [21.596874679058327]
SwiMDiff is a novel self-supervised pre-training framework for remote sensing images. It recalibrates labels to recognize data from the same scene as false negatives. It seamlessly integrates contrastive learning (CL) with a diffusion model.
arXiv Detail & Related papers (2024-01-10T11:55:58Z)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery. We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z)
De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects. We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding. We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z)
Multi-Spectral Image Classification with Ultra-Lean Complex-Valued Models [28.798100220715686]
Multi-spectral imagery is invaluable for remote sensing due to different spectral signatures exhibited by materials. We apply complex-valued co-domain symmetric models to classify real-valued MSI images. Our work is the first to demonstrate the value of complex-valued deep learning on real-valued MSI data.
arXiv Detail & Related papers (2022-11-21T19:01:53Z)
Multi-Content Complementation Network for Salient Object Detection in Optical Remote Sensing Images [108.79667788962425]
salient object detection in optical remote sensing images (RSI-SOD) remains to be a challenging emerging topic. We propose a novel Multi-Content Complementation Network (MCCNet) to explore the complementarity of multiple content for RSI-SOD. In MCCM, we consider multiple types of features that are critical to RSI-SOD, including foreground features, edge features, background features, and global image-level features.
arXiv Detail & Related papers (2021-12-02T04:46:40Z)
Remote Sensing Image Scene Classification with Self-Supervised Paradigm under Limited Labeled Samples [11.025191332244919]
We introduce new self-supervised learning (SSL) mechanism to obtain the high-performance pre-training model for RSIs scene classification from large unlabeled data. Experiments on three commonly used RSIs scene classification datasets demonstrated that this new learning paradigm outperforms the traditional dominant ImageNet pre-trained model. The insights distilled from our studies can help to foster the development of SSL in the remote sensing community.
arXiv Detail & Related papers (2020-10-02T09:27:19Z)
Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral Super-Resolution [79.97180849505294]
We propose a novel coupled unmixing network with a cross-attention mechanism, CUCaNet, to enhance the spatial resolution of HSI. Experiments are conducted on three widely-used HS-MS datasets in comparison with state-of-the-art HSI-SR models.
arXiv Detail & Related papers (2020-07-10T08:08:20Z)
Crowd Counting via Hierarchical Scale Recalibration Network [61.09833400167511]
We propose a novel Hierarchical Scale Recalibration Network (HSRNet) to tackle the task of crowd counting. HSRNet models rich contextual dependencies and recalibrating multiple scale-associated information. Our approach can ignore various noises selectively and focus on appropriate crowd scales automatically.
arXiv Detail & Related papers (2020-03-07T10:06:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.