Exploring State Space Model in Wavelet Domain: An Infrared and Visible Image Fusion Network via Wavelet Transform and State Space Model
- URL: http://arxiv.org/abs/2503.18378v1
- Date: Mon, 24 Mar 2025 06:25:44 GMT
- Title: Exploring State Space Model in Wavelet Domain: An Infrared and Visible Image Fusion Network via Wavelet Transform and State Space Model
- Authors: Tianpei Zhang, Yiming Zhu, Jufeng Zhao, Guangmang Cui, Yuchen Zheng,
- Abstract summary: We propose Wavelet-Mamba, which integrates wavelet transform with the state-space model (SSM)<n>Wavelet-SSM module incorporates wavelet-based frequency domain feature extraction and global information extraction through SSM.<n>Our method achieves both visually compelling results and superior performance compared to current state-of-the-art methods.
- Score: 8.392891463947661
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning techniques have revolutionized the infrared and visible image fusion (IVIF), showing remarkable efficacy on complex scenarios. However, current methods do not fully combine frequency domain features with global semantic information, which will result in suboptimal extraction of global features across modalities and insufficient preservation of local texture details. To address these issues, we propose Wavelet-Mamba (W-Mamba), which integrates wavelet transform with the state-space model (SSM). Specifically, we introduce Wavelet-SSM module, which incorporates wavelet-based frequency domain feature extraction and global information extraction through SSM, thereby effectively capturing both global and local features. Additionally, we propose a cross-modal feature attention modulation, which facilitates efficient interaction and fusion between different modalities. The experimental results indicate that our method achieves both visually compelling results and superior performance compared to current state-of-the-art methods. Our code is available at https://github.com/Lmmh058/W-Mamba.
Related papers
- FMNet: Frequency-Assisted Mamba-Like Linear Attention Network for Camouflaged Object Detection [7.246630480680039]
Camouflaged Object Detection (COD) is challenging due to the strong similarity between camouflaged objects and their surroundings.<n>Existing methods mainly rely on spatial local features, failing to capture global information.<n>To address this, the Frequency-Assisted Mamba-Like Linear Attention Network (FMNet) is proposed.
arXiv Detail & Related papers (2025-03-14T02:55:19Z) - DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation [4.391439322050918]
We introduce a novel state-space architecture for diffusion models.<n>We harness spatial and frequency information to enhance the inductive bias towards local features in input images.
arXiv Detail & Related papers (2024-11-06T18:59:17Z) - SFDFusion: An Efficient Spatial-Frequency Domain Fusion Network for Infrared and Visible Image Fusion [11.46957526079837]
Infrared and visible image fusion aims to generate fused images with prominent targets and rich texture details.
This paper proposes an efficient Spatial-Frequency Domain Fusion network for infrared and visible image fusion.
Our method produces fused images with significant advantages in various fusion metrics and visual effects.
arXiv Detail & Related papers (2024-10-30T09:17:23Z) - Accelerated Multi-Contrast MRI Reconstruction via Frequency and Spatial Mutual Learning [50.74383395813782]
We propose a novel Frequency and Spatial Mutual Learning Network (FSMNet) to explore global dependencies across different modalities.
The proposed FSMNet achieves state-of-the-art performance for the Multi-Contrast MR Reconstruction task with different acceleration factors.
arXiv Detail & Related papers (2024-09-21T12:02:47Z) - Wavelet-based Bi-dimensional Aggregation Network for SAR Image Change Detection [53.842568573251214]
Experimental results on three SAR datasets demonstrate that our WBANet significantly outperforms contemporary state-of-the-art methods.
Our WBANet achieves 98.33%, 96.65%, and 96.62% of percentage of correct classification (PCC) on the respective datasets.
arXiv Detail & Related papers (2024-07-18T04:36:10Z) - Fusion-Mamba for Cross-modality Object Detection [63.56296480951342]
Cross-modality fusing information from different modalities effectively improves object detection performance.
We design a Fusion-Mamba block (FMB) to map cross-modal features into a hidden state space for interaction.
Our proposed approach outperforms the state-of-the-art methods on $m$AP with 5.9% on $M3FD$ and 4.9% on FLIR-Aligned datasets.
arXiv Detail & Related papers (2024-04-14T05:28:46Z) - FusionMamba: Efficient Remote Sensing Image Fusion with State Space Model [35.57157248152558]
Current deep learning (DL) methods typically employ convolutional neural networks (CNNs) or Transformers for feature extraction and information integration.
We propose FusionMamba, an innovative method for efficient remote sensing image fusion.
arXiv Detail & Related papers (2024-04-11T17:29:56Z) - A Dual Domain Multi-exposure Image Fusion Network based on the
Spatial-Frequency Integration [57.14745782076976]
Multi-exposure image fusion aims to generate a single high-dynamic image by integrating images with different exposures.
We propose a novelty perspective on multi-exposure image fusion via the Spatial-Frequency Integration Framework, named MEF-SFI.
Our method achieves visual-appealing fusion results against state-of-the-art multi-exposure image fusion approaches.
arXiv Detail & Related papers (2023-12-17T04:45:15Z) - RF-ULM: Ultrasound Localization Microscopy Learned from Radio-Frequency Wavefronts [7.652037892439504]
Delay-and-sum beamforming leads to irreversible reduction of Radio-Frequency (RF) channel data.
rich contextual information embedded within RF wavefronts offers great promise for guiding Deep Neural Networks (DNNs) in challenging localization scenarios.
We propose to directly localize scatterers in RF channel data using learned feature channel shuffling, non-maximum suppression, and a semi-global convolutional block.
arXiv Detail & Related papers (2023-10-02T18:41:23Z) - Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image
Fusion [59.19469551774703]
Infrared and visible image fusion aims to integrate comprehensive information from multiple sources to achieve superior performances on various practical tasks.
We propose a dynamic image fusion framework with a multi-modal gated mixture of local-to-global experts.
Our model consists of a Mixture of Local Experts (MoLE) and a Mixture of Global Experts (MoGE) guided by a multi-modal gate.
arXiv Detail & Related papers (2023-02-02T20:06:58Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.