More Diverse Means Better: Multimodal Deep Learning Meets Remote Sensing
Imagery Classification
- URL: http://arxiv.org/abs/2008.05457v1
- Date: Wed, 12 Aug 2020 17:45:25 GMT
- Title: More Diverse Means Better: Multimodal Deep Learning Meets Remote Sensing
Imagery Classification
- Authors: Danfeng Hong and Lianru Gao and Naoto Yokoya and Jing Yao and Jocelyn
Chanussot and Qian Du and Bing Zhang
- Abstract summary: We show how to train deep networks and build the network architecture.
In particular, we show different fusion strategies as well as how to train deep networks and build the network architecture.
Our framework is not only limited to pixel-wise classification tasks but also applicable to spatial information modeling with convolutional neural networks (CNNs)
- Score: 43.35966675372692
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classification and identification of the materials lying over or beneath the
Earth's surface have long been a fundamental but challenging research topic in
geoscience and remote sensing (RS) and have garnered a growing concern owing to
the recent advancements of deep learning techniques. Although deep networks
have been successfully applied in single-modality-dominated classification
tasks, yet their performance inevitably meets the bottleneck in complex scenes
that need to be finely classified, due to the limitation of information
diversity. In this work, we provide a baseline solution to the aforementioned
difficulty by developing a general multimodal deep learning (MDL) framework. In
particular, we also investigate a special case of multi-modality learning (MML)
-- cross-modality learning (CML) that exists widely in RS image classification
applications. By focusing on "what", "where", and "how" to fuse, we show
different fusion strategies as well as how to train deep networks and build the
network architecture. Specifically, five fusion architectures are introduced
and developed, further being unified in our MDL framework. More significantly,
our framework is not only limited to pixel-wise classification tasks but also
applicable to spatial information modeling with convolutional neural networks
(CNNs). To validate the effectiveness and superiority of the MDL framework,
extensive experiments related to the settings of MML and CML are conducted on
two different multimodal RS datasets. Furthermore, the codes and datasets will
be available at https://github.com/danfenghong/IEEE_TGRS_MDL-RS, contributing
to the RS community.
Related papers
- When Graph meets Multimodal: Benchmarking on Multimodal Attributed Graphs Learning [36.6581535146878]
Multimodal attributed graphs (MAGs) are prevalent in various real-world scenarios and generally contain two kinds of knowledge.
Recent advancements in Pre-trained Language/Vision models (PLMs/PVMs) and Graph neural networks (GNNs) have facilitated effective learning on MAGs.
We propose Multimodal Attribute Graph Benchmark (MAGB), a comprehensive and diverse collection of challenging benchmark datasets for MAGs.
arXiv Detail & Related papers (2024-10-11T13:24:57Z) - AMANet: Advancing SAR Ship Detection with Adaptive Multi-Hierarchical
Attention Network [0.5437298646956507]
A novel adaptive multi-hierarchical attention module (AMAM) is proposed to learn multi-scale features and adaptively aggregate salient features from various feature layers.
We first fuse information from adjacent feature layers to enhance the detection of smaller targets, thereby achieving multi-scale feature enhancement.
Thirdly, we present a novel adaptive multi-hierarchical attention network (AMANet) by embedding the AMAM between the backbone network and the feature pyramid network.
arXiv Detail & Related papers (2024-01-24T03:56:33Z) - General-Purpose Multimodal Transformer meets Remote Sensing Semantic
Segmentation [35.100738362291416]
Multimodal AI seeks to exploit complementary data sources, particularly for complex tasks like semantic segmentation.
Recent trends in general-purpose multimodal networks have shown great potential to achieve state-of-the-art performance.
We propose a UNet-inspired module that employs 3D convolution to encode vital local information and learn cross-modal features simultaneously.
arXiv Detail & Related papers (2023-07-07T04:58:34Z) - HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness [2.341385717236931]
We propose a novel Hierarchical Depth Awareness network (HiDAnet) for RGB-D saliency detection.
Our motivation comes from the observation that the multi-granularity properties of geometric priors correlate well with the neural network hierarchies.
Our HiDAnet performs favorably over the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2023-01-18T10:00:59Z) - Deep Image Clustering with Contrastive Learning and Multi-scale Graph
Convolutional Networks [58.868899595936476]
This paper presents a new deep clustering approach termed image clustering with contrastive learning and multi-scale graph convolutional networks (IcicleGCN)
Experiments on multiple image datasets demonstrate the superior clustering performance of IcicleGCN over the state-of-the-art.
arXiv Detail & Related papers (2022-07-14T19:16:56Z) - Routing with Self-Attention for Multimodal Capsule Networks [108.85007719132618]
We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework.
To adapt the capsules to large-scale input data, we propose a novel routing by self-attention mechanism that selects relevant capsules.
This allows not only for robust training with noisy video data, but also to scale up the size of the capsule network compared to traditional routing methods.
arXiv Detail & Related papers (2021-12-01T19:01:26Z) - Bifurcated backbone strategy for RGB-D salient object detection [168.19708737906618]
We leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to devise a novel cascaded refinement network.
Our architecture, named Bifurcated Backbone Strategy Network (BBS-Net), is simple, efficient, and backbone-independent.
arXiv Detail & Related papers (2020-07-06T13:01:30Z) - X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for
Classification of Remote Sensing Data [69.37597254841052]
We propose a novel cross-modal deep-learning framework called X-ModalNet.
X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network.
We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.
arXiv Detail & Related papers (2020-06-24T15:29:41Z) - Multi-Subspace Neural Network for Image Recognition [33.61205842747625]
In image classification task, feature extraction is always a big issue. Intra-class variability increases the difficulty in designing the extractors.
Recently, deep learning has drawn lots of attention on automatically learning features from data.
In this study, we proposed multi-subspace neural network (MSNN) which integrates key components of the convolutional neural network (CNN), receptive field, with subspace concept.
arXiv Detail & Related papers (2020-06-17T02:55:34Z) - Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation.
In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI.
We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.