A Diff-Attention Aware State Space Fusion Model for Remote Sensing Classification
- URL: http://arxiv.org/abs/2504.16665v1
- Date: Wed, 23 Apr 2025 12:34:32 GMT
- Title: A Diff-Attention Aware State Space Fusion Model for Remote Sensing Classification
- Authors: Wenping Ma, Boyou Xue, Mengru Ma, Chuang Chen, Hekai Zhang, Hao Zhu,
- Abstract summary: Multispectral (MS) and panchromatic (PAN) images describe the same land surface.<n>In order to separate these similar information and their respective advantages, reduce the feature redundancy in the fusion stage.<n>This paper introduces a diff-attention aware state space fusion model (DAS2F-Model) for multimodal remote sensing image classification.
- Score: 5.381099682416992
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multispectral (MS) and panchromatic (PAN) images describe the same land surface, so these images not only have their own advantages, but also have a lot of similar information. In order to separate these similar information and their respective advantages, reduce the feature redundancy in the fusion stage. This paper introduces a diff-attention aware state space fusion model (DAS2F-Model) for multimodal remote sensing image classification. Based on the selective state space model, a cross-modal diff-attention module (CMDA-Module) is designed to extract and separate the common features and their respective dominant features of MS and PAN images. Among this, space preserving visual mamba (SPVM) retains image spatial features and captures local features by optimizing visual mamba's input reasonably. Considering that features in the fusion stage will have large semantic differences after feature separation and simple fusion operations struggle to effectively integrate these significantly different features, an attention-aware linear fusion module (AALF-Module) is proposed. It performs pixel-wise linear fusion by calculating influence coefficients. This mechanism can fuse features with large semantic differences while keeping the feature size unchanged. Empirical evaluations indicate that the presented method achieves better results than alternative approaches. The relevant code can be found at:https://github.com/AVKSKVL/DAS-F-Model
Related papers
- BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion [11.306367018981678]
This paper proposes an unaligned medical image fusion method called Bidirectional Stepwise Feature Alignment and Fusion.<n>In terms of feature alignment, BSFA-F employs a bidirectional stepwise alignment deformation field prediction strategy.<n>The experimental results across multiple datasets demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-12-11T02:56:23Z) - Fusion-Mamba for Cross-modality Object Detection [63.56296480951342]
Cross-modality fusing information from different modalities effectively improves object detection performance.
We design a Fusion-Mamba block (FMB) to map cross-modal features into a hidden state space for interaction.
Our proposed approach outperforms the state-of-the-art methods on $m$AP with 5.9% on $M3FD$ and 4.9% on FLIR-Aligned datasets.
arXiv Detail & Related papers (2024-04-14T05:28:46Z) - Mutual-Guided Dynamic Network for Image Fusion [51.615598671899335]
We propose a novel mutual-guided dynamic network (MGDN) for image fusion, which allows for effective information utilization across different locations and inputs.
Experimental results on five benchmark datasets demonstrate that our proposed method outperforms existing methods on four image fusion tasks.
arXiv Detail & Related papers (2023-08-24T03:50:37Z) - Video Semantic Segmentation with Inter-Frame Feature Fusion and
Inner-Frame Feature Refinement [39.06589186472675]
We propose a spatial-temporal fusion (STF) module to model dense pairwise relationships among multi-frame features.
Besides, we propose a novel memory-augmented refinement (MAR) module to tackle difficult predictions among semantic boundaries.
arXiv Detail & Related papers (2023-01-10T07:57:05Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - Image Fusion Transformer [75.71025138448287]
In image fusion, images obtained from different sensors are fused to generate a single image with enhanced information.
In recent years, state-of-the-art methods have adopted Convolution Neural Networks (CNNs) to encode meaningful features for image fusion.
We propose a novel Image Fusion Transformer (IFT) where we develop a transformer-based multi-scale fusion strategy.
arXiv Detail & Related papers (2021-07-19T16:42:49Z) - Similarity-Aware Fusion Network for 3D Semantic Segmentation [87.51314162700315]
We propose a similarity-aware fusion network (SAFNet) to adaptively fuse 2D images and 3D point clouds for 3D semantic segmentation.
We employ a late fusion strategy where we first learn the geometric and contextual similarities between the input and back-projected (from 2D pixels) point clouds.
We show that SAFNet significantly outperforms existing state-of-the-art fusion-based approaches across various data integrity.
arXiv Detail & Related papers (2021-07-04T09:28:18Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - VMLoc: Variational Fusion For Learning-Based Multimodal Camera Localization [36.11674066519118]
We propose an end-to-end framework, termed VMLoc, to fuse different sensor inputs into a common latent space.
Unlike previous multimodal variational works directly adapting the objective function of vanilla variational auto-encoder, we show how camera localization can be accurately estimated.
arXiv Detail & Related papers (2020-03-12T14:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.