Multi-modal land cover mapping of remote sensing images using pyramid
attention and gated fusion networks
- URL: http://arxiv.org/abs/2111.03845v1
- Date: Sat, 6 Nov 2021 10:01:01 GMT
- Title: Multi-modal land cover mapping of remote sensing images using pyramid
attention and gated fusion networks
- Authors: Qinghui Liu, Michael Kampffmeyer, Robert Jenssen and Arnt-B{\o}rre
Salberg
- Abstract summary: We propose a new multi-modality network for land cover mapping of multi-modal remote sensing data based on a novel pyramid attention fusion (PAF) module and a gated fusion unit (GFU)
PAF module is designed to efficiently obtain rich fine-grained contextual representations from each modality with a built-in cross-level and cross-view attention fusion mechanism.
GFU module utilizes a novel gating mechanism for early merging of features, thereby diminishing hidden redundancies and noise.
- Score: 20.66034058363032
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-modality data is becoming readily available in remote sensing (RS) and
can provide complementary information about the Earth's surface. Effective
fusion of multi-modal information is thus important for various applications in
RS, but also very challenging due to large domain differences, noise, and
redundancies. There is a lack of effective and scalable fusion techniques for
bridging multiple modality encoders and fully exploiting complementary
information. To this end, we propose a new multi-modality network (MultiModNet)
for land cover mapping of multi-modal remote sensing data based on a novel
pyramid attention fusion (PAF) module and a gated fusion unit (GFU). The PAF
module is designed to efficiently obtain rich fine-grained contextual
representations from each modality with a built-in cross-level and cross-view
attention fusion mechanism, and the GFU module utilizes a novel gating
mechanism for early merging of features, thereby diminishing hidden
redundancies and noise. This enables supplementary modalities to effectively
extract the most valuable and complementary information for late feature
fusion. Extensive experiments on two representative RS benchmark datasets
demonstrate the effectiveness, robustness, and superiority of the MultiModNet
for multi-modal land cover classification.
Related papers
- Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation [61.91492500828508]
Few-shot 3D point cloud segmentation (FS-PCS) aims at generalizing models to segment novel categories with minimal support samples.
We introduce a cost-free multimodal FS-PCS setup, utilizing textual labels and the potentially available 2D image modality.
We propose a simple yet effective Test-time Adaptive Cross-modal Seg (TACC) technique to mitigate training bias.
arXiv Detail & Related papers (2024-10-29T19:28:41Z) - Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding [51.96911650437978]
Multi-modal fusion has played a vital role in multi-modal scene understanding.
Most existing methods focus on cross-modal fusion involving two modalities, often overlooking more complex multi-modal fusion.
We propose a relational Part-Whole Fusion (PWRF) framework for multi-modal scene understanding.
arXiv Detail & Related papers (2024-10-19T02:27:30Z) - MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation [8.443065903814821]
This study introduces a novel Multimodal Adapter-based Network (MANet) for multimodal remote sensing semantic segmentation.
At the core of this approach is the development of a Multimodal Adapter (MMAdapter), which fine-tunes SAM's image encoder to effectively leverage the model's general knowledge for multimodal data.
This work not only introduces a novel network for multimodal fusion, but also demonstrates, for the first time, SAM's powerful generalization capabilities with Digital Surface Model (DSM) data.
arXiv Detail & Related papers (2024-10-15T00:52:16Z) - PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion [41.34335755315773]
Multi-modality image fusion aims at fusing specific-modality and shared-modality information from two source images.
We propose a three-branch encoder-decoder architecture along with corresponding fusion layers as the fusion strategy.
Our method has obtained competitive results compared with state-of-the-art methods in visible/infrared image fusion and medical image fusion tasks.
arXiv Detail & Related papers (2024-06-11T09:32:40Z) - LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing [25.016421338677816]
Current methods often process only two types of data, missing out on the rich information that additional modalities can provide.
We propose a novel textbfLightweight textbfMultimodal data textbfFusion textbfNetwork (LMFNet)
LMFNet accommodates various data types simultaneously, including RGB, NirRG, and DSM, through a weight-sharing, multi-branch vision transformer.
arXiv Detail & Related papers (2024-04-21T13:29:42Z) - Multimodal Informative ViT: Information Aggregation and Distribution for
Hyperspectral and LiDAR Classification [25.254816993934746]
Multimodal Informative Vit (MIVit) is a system with an innovative information aggregate-distributing mechanism.
MIVit reduces redundancy in the empirical distribution of each modality's separate and fused features.
Our results show that MIVit's bidirectional aggregate-distributing mechanism is highly effective.
arXiv Detail & Related papers (2024-01-06T09:53:33Z) - HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness [2.341385717236931]
We propose a novel Hierarchical Depth Awareness network (HiDAnet) for RGB-D saliency detection.
Our motivation comes from the observation that the multi-granularity properties of geometric priors correlate well with the neural network hierarchies.
Our HiDAnet performs favorably over the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2023-01-18T10:00:59Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - Specificity-preserving RGB-D Saliency Detection [103.3722116992476]
We propose a specificity-preserving network (SP-Net) for RGB-D saliency detection.
Two modality-specific networks and a shared learning network are adopted to generate individual and shared saliency maps.
Experiments on six benchmark datasets demonstrate that our SP-Net outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-08-18T14:14:22Z) - Efficient Human Pose Estimation by Learning Deeply Aggregated
Representations [67.24496300046255]
We propose an efficient human pose estimation network (DANet) by learning deeply aggregated representations.
Our networks could achieve comparable or even better accuracy with much smaller model complexity.
arXiv Detail & Related papers (2020-12-13T10:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.