CAWM-Mamba: A unified model for infrared-visible image fusion and compound adverse weather restoration
- URL: http://arxiv.org/abs/2603.02560v1
- Date: Tue, 03 Mar 2026 03:27:05 GMT
- Title: CAWM-Mamba: A unified model for infrared-visible image fusion and compound adverse weather restoration
- Authors: Huichun Liu, Xiaosong Li, Zhuangfan Huang, Tao Ye, Yang Liu, Haishu Tan,
- Abstract summary: Multimodal Image Fusion (MMIF) integrates complementary information from various modalities to produce clearer and more informative fused images.<n>Existing adverse weather fusion methods only tackle single types of degradation such as haze, rain, or snow, and fail when multiple degradations coexist.<n>We propose Compound Adverse Weather Mamba, the first end-to-end framework that jointly performs image fusion and compound weather restoration with unified shared weights.
- Score: 8.400835004298624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal Image Fusion (MMIF) integrates complementary information from various modalities to produce clearer and more informative fused images. MMIF under adverse weather is particularly crucial in autonomous driving and UAV monitoring applications. However, existing adverse weather fusion methods generally only tackle single types of degradation such as haze, rain, or snow, and fail when multiple degradations coexist (e.g., haze+rain, rain+snow). To address this challenge, we propose Compound Adverse Weather Mamba (CAWM-Mamba), the first end-to-end framework that jointly performs image fusion and compound weather restoration with unified shared weights. Our network contains three key components: (1) a Weather-Aware Preprocess Module (WAPM) to enhance degraded visible features and extracts global weather embeddings; (2) a Cross-modal Feature Interaction Module (CFIM) to facilitate the alignment of heterogeneous modalities and exchange of complementary features across modalities; and (3) a Wavelet Space State Block (WSSB) that leverages wavelet-domain decomposition to decouple multi-frequency degradations. WSSB includes Freq-SSM, a module that models anisotropic high-frequency degradation without redundancy, and a unified degradation representation mechanism to further improve generalization across complex compound weather conditions. Extensive experiments on the AWMM-100K benchmark and three standard fusion datasets demonstrate that CAWM-Mamba consistently outperforms state-of-the-art methods in both compound and single-weather scenarios. In addition, our fusion results excel in downstream tasks covering semantic segmentation and object detection, confirming the practical value in real-world adverse weather perception. The source code will be available at https://github.com/Feecuin/CAWM-Mamba.
Related papers
- Semantics and Content Matter: Towards Multi-Prior Hierarchical Mamba for Image Deraining [95.00432497331583]
Multi-Prior Hierarchical Mamba (MPHM) network for image deraining.<n>MPHM integrates macro-semantic textual priors (CLIP) for task-level semantic guidance and micro-structural visual priors (DINOv2) for scene-aware structural information.<n>Experiments demonstrate MPHM's state-of-the-art performance, achieving a 0.57 dB PSNR gain on the Rain200H dataset.
arXiv Detail & Related papers (2025-11-17T08:08:59Z) - MdaIF: Robust One-Stop Multi-Degradation-Aware Image Fusion with Language-Driven Semantics [8.783211177601045]
Infrared and visible image fusion aims to integrate complementary multi-modal information into a single fused result.<n>We propose a one-stop degradation-aware image fusion framework for multi-degradation scenarios driven by a large language model (MdaIF)<n>To adaptively extract diverse weather-aware degradation knowledge and scene feature representations, we employ a pre-trained vision-language model (VLM) in our framework.
arXiv Detail & Related papers (2025-11-16T09:43:12Z) - Spatial-Frequency Enhanced Mamba for Multi-Modal Image Fusion [64.5037956060757]
Multi-Modal Image Fusion (MMIF) aims to integrate complementary image information from different modalities to produce informative images.<n>We propose a novel framework named Spatial-Frequency Enhanced Mamba Fusion (SFMFusion) for MMIF.<n>Our method achieves better results than most state-of-the-art methods on six MMIF datasets.
arXiv Detail & Related papers (2025-11-10T00:44:49Z) - CMAWRNet: Multiple Adverse Weather Removal via a Unified Quaternion Neural Architecture [2.2124180701409233]
Images used in real-world applications such as image or video retrieval, outdoor surveillance, and autonomous driving suffer from poor weather conditions.<n>This work focuses on developing an efficient solution for multiple adverse weather removal using a unified quaternion neural architecture called CMAWRNet.<n>It is based on a novel texture-structure decomposition block, a novel lightweight encoder-decoder quaternion transformer architecture, and an attentive fusion block with low-light correction.
arXiv Detail & Related papers (2025-05-03T18:02:19Z) - FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [92.4205087439928]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose the Self-supervised Transfer (PST) and the FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models, effectively mitigating data scarcity.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.<n>This combined approach enables FUSE to construct a universal image-event that only requires lightweight decoder adaptation for target datasets.
arXiv Detail & Related papers (2025-03-25T15:04:53Z) - Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection [70.84835546732738]
RGB-Thermal Salient Object Detection aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images.<n>Traditional encoder-decoder architectures may not have adequately considered the robustness against noise originating from defective modalities.<n>We propose the ConTriNet, a robust Confluent Triple-Flow Network employing a Divide-and-Conquer strategy.
arXiv Detail & Related papers (2024-12-02T14:44:39Z) - CFMW: Cross-modality Fusion Mamba for Robust Object Detection under Adverse Weather [15.472015859766069]
We propose the Cross-modality Fusion Mamba with Weather-removal (CFMW) to augment stability and cost-effectiveness under adverse weather conditions.<n>CFMW is able to reconstruct visual features affected by adverse weather, enriching the representation of image details.<n>To bridge the gap in relevant datasets, we construct a new Severe Weather Visible-Infrared (SWVI) dataset.
arXiv Detail & Related papers (2024-04-25T02:54:11Z) - All-weather Multi-Modality Image Fusion: Unified Framework and 100k Benchmark [42.49073228252726]
Multi-modality image fusion (MMIF) combines complementary information from different image modalities to provide a more comprehensive and objective interpretation of scenes.
Existing MMIF methods lack the ability to resist different weather interferences in real-world scenes, preventing them from being useful in practical applications such as autonomous driving.
We propose an all-weather MMIF model to achieve effective multi-tasking in this context.
Experimental results in both real-world and synthetic scenes show that the proposed algorithm excels in detail recovery and multi-modality feature extraction.
arXiv Detail & Related papers (2024-02-03T09:02:46Z) - Mutual Information-driven Triple Interaction Network for Efficient Image
Dehazing [54.168567276280505]
We propose a novel Mutual Information-driven Triple interaction Network (MITNet) for image dehazing.
The first stage, named amplitude-guided haze removal, aims to recover the amplitude spectrum of the hazy images for haze removal.
The second stage, named phase-guided structure refined, devotes to learning the transformation and refinement of the phase spectrum.
arXiv Detail & Related papers (2023-08-14T08:23:58Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - Pay "Attention" to Adverse Weather: Weather-aware Attention-based Object
Detection [5.816506391882502]
This paper proposes a Global-Local Attention (GLA) framework to adaptively fuse the multi-modality sensing streams.
Specifically, GLA integrates an early-stage fusion via a local attention network and a late-stage fusion via a global attention network to deal with both local and global information.
Experimental results demonstrate the superior performance of the proposed GLA compared with state-of-the-art fusion approaches.
arXiv Detail & Related papers (2022-04-22T16:32:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.