Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image
Fusion
- URL: http://arxiv.org/abs/2302.01392v2
- Date: Thu, 23 Mar 2023 07:15:53 GMT
- Title: Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image
Fusion
- Authors: Yiming Sun, Bing Cao, Pengfei Zhu, Qinghua Hu
- Abstract summary: Infrared and visible image fusion aims to integrate comprehensive information from multiple sources to achieve superior performances on various practical tasks.
We propose a dynamic image fusion framework with a multi-modal gated mixture of local-to-global experts.
Our model consists of a Mixture of Local Experts (MoLE) and a Mixture of Global Experts (MoGE) guided by a multi-modal gate.
- Score: 59.19469551774703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Infrared and visible image fusion aims to integrate comprehensive information
from multiple sources to achieve superior performances on various practical
tasks, such as detection, over that of a single modality. However, most
existing methods directly combined the texture details and object contrast of
different modalities, ignoring the dynamic changes in reality, which diminishes
the visible texture in good lighting conditions and the infrared contrast in
low lighting conditions. To fill this gap, we propose a dynamic image fusion
framework with a multi-modal gated mixture of local-to-global experts, termed
MoE-Fusion, to dynamically extract effective and comprehensive information from
the respective modalities. Our model consists of a Mixture of Local Experts
(MoLE) and a Mixture of Global Experts (MoGE) guided by a multi-modal gate. The
MoLE performs specialized learning of multi-modal local features, prompting the
fused images to retain the local information in a sample-adaptive manner, while
the MoGE focuses on the global information that complements the fused image
with overall texture detail and contrast. Extensive experiments show that our
MoE-Fusion outperforms state-of-the-art methods in preserving multi-modal image
texture and contrast through the local-to-global dynamic learning paradigm, and
also achieves superior performance on detection tasks. Our code will be
available: https://github.com/SunYM2020/MoE-Fusion.
Related papers
- MATCNN: Infrared and Visible Image Fusion Method Based on Multi-scale CNN with Attention Transformer [21.603763071331667]
This paper proposes a novel cross-modal image fusion approach based on a multi-scale convolutional neural network with attention Transformer (MATCNN)
MATCNN utilizes the multi-scale fusion module (MSFM) to extract local features at different scales and employs the global feature extraction module (GFEM) to extract global features.
An information mask is used to label pertinent details within the images, aiming to enhance the proportion of preserving significant information in infrared images and background textures in visible images in fused images.
arXiv Detail & Related papers (2025-02-04T03:09:54Z) - From Cross-Modal to Mixed-Modal Visible-Infrared Re-Identification [11.324518300593983]
Current VI-ReID methods focus on cross-modality matching, but real-world applications often involve mixed galleries containing both V and I images.
This is because gallery images from the same modality may have lower domain gaps but correspond to different identities.
This paper introduces a novel mixed-modal ReID setting, where galleries contain data from both modalities.
arXiv Detail & Related papers (2025-01-23T01:28:05Z) - Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond [74.96466744512992]
The essence of image fusion is to integrate complementary information from source images.
DeFusion++ produces versatile fused representations that can enhance the quality of image fusion and the effectiveness of downstream high-level vision tasks.
arXiv Detail & Related papers (2024-10-16T06:28:49Z) - Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment [20.902935570581207]
We introduce a Multimodal Alignment and Reconstruction Network (MARNet) to enhance the model's resistance to visual noise.
MARNet includes a cross-modal diffusion reconstruction module for smoothly and stably blending information across different domains.
Experiments conducted on two benchmark datasets, Vireo-Food172 and Ingredient-101, demonstrate that MARNet effectively improves the quality of image information extracted by the model.
arXiv Detail & Related papers (2024-07-26T16:30:18Z) - Task-Customized Mixture of Adapters for General Image Fusion [51.8742437521891]
General image fusion aims at integrating important information from multi-source images.
We propose a novel task-customized mixture of adapters (TC-MoA) for general image fusion, adaptively prompting various fusion tasks in a unified model.
arXiv Detail & Related papers (2024-03-19T07:02:08Z) - From Text to Pixels: A Context-Aware Semantic Synergy Solution for
Infrared and Visible Image Fusion [66.33467192279514]
We introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images.
Our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.
arXiv Detail & Related papers (2023-12-31T08:13:47Z) - A Dual Domain Multi-exposure Image Fusion Network based on the
Spatial-Frequency Integration [57.14745782076976]
Multi-exposure image fusion aims to generate a single high-dynamic image by integrating images with different exposures.
We propose a novelty perspective on multi-exposure image fusion via the Spatial-Frequency Integration Framework, named MEF-SFI.
Our method achieves visual-appealing fusion results against state-of-the-art multi-exposure image fusion approaches.
arXiv Detail & Related papers (2023-12-17T04:45:15Z) - Bridging the Gap between Multi-focus and Multi-modal: A Focused
Integration Framework for Multi-modal Image Fusion [5.417493475406649]
Multi-modal image fusion (MMIF) integrates valuable information from different modality images into a fused one.
This paper proposes a MMIF framework for joint focused integration and modalities information extraction.
The proposed algorithm can surpass the state-of-the-art methods in visual perception and quantitative evaluation.
arXiv Detail & Related papers (2023-11-03T12:58:39Z) - Equivariant Multi-Modality Image Fusion [124.11300001864579]
We propose the Equivariant Multi-Modality imAge fusion paradigm for end-to-end self-supervised learning.
Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations.
Experiments confirm that EMMA yields high-quality fusion results for infrared-visible and medical images.
arXiv Detail & Related papers (2023-05-19T05:50:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.