DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion
- URL: http://arxiv.org/abs/2409.10080v1
- Date: Mon, 16 Sep 2024 08:37:09 GMT
- Title: DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion
- Authors: Yuchen Guo, Ruoxiang Xu, Rongcheng Li, Zhenghao Wu, Weifeng Su,
- Abstract summary: Two-phase discriminative autoencoder framework, DAE-Fuse, generates sharp and natural fused images.
Experiments on public infrared-visible, medical image fusion, and downstream object detection datasets demonstrate our method's superiority and generalizability.
- Score: 10.713089596405053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modality image fusion aims to integrate complementary data information from different imaging modalities into a single image. Existing methods often generate either blurry fused images that lose fine-grained semantic information or unnatural fused images that appear perceptually cropped from the inputs. In this work, we propose a novel two-phase discriminative autoencoder framework, termed DAE-Fuse, that generates sharp and natural fused images. In the adversarial feature extraction phase, we introduce two discriminative blocks into the encoder-decoder architecture, providing an additional adversarial loss to better guide feature extraction by reconstructing the source images. While the two discriminative blocks are adapted in the attention-guided cross-modality fusion phase to distinguish the structural differences between the fused output and the source inputs, injecting more naturalness into the results. Extensive experiments on public infrared-visible, medical image fusion, and downstream object detection datasets demonstrate our method's superiority and generalizability in both quantitative and qualitative evaluations.
Related papers
- Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond [74.96466744512992]
The essence of image fusion is to integrate complementary information from source images.
DeFusion++ produces versatile fused representations that can enhance the quality of image fusion and the effectiveness of downstream high-level vision tasks.
arXiv Detail & Related papers (2024-10-16T06:28:49Z) - A Multi-scale Information Integration Framework for Infrared and Visible
Image Fusion [50.84746752058516]
Infrared and visible image fusion aims at generating a fused image containing intensity and detail information of source images.
Existing methods mostly adopt a simple weight in the loss function to decide the information retention of each modality.
We propose a multi-scale dual attention (MDA) framework for infrared and visible image fusion.
arXiv Detail & Related papers (2023-12-07T14:40:05Z) - Multi-modal Medical Neurological Image Fusion using Wavelet Pooled Edge
Preserving Autoencoder [3.3828292731430545]
This paper presents an end-to-end unsupervised fusion model for multimodal medical images based on an edge-preserving dense autoencoder network.
In the proposed model, feature extraction is improved by using wavelet decomposition-based attention pooling of feature maps.
The proposed model is trained on a variety of medical image pairs which helps in capturing the intensity distributions of the source images.
arXiv Detail & Related papers (2023-10-18T11:59:35Z) - AdaFuse: Adaptive Medical Image Fusion Based on Spatial-Frequential
Cross Attention [6.910879180358217]
We propose AdaFuse, in which multimodal image information is fused adaptively through frequency-guided attention mechanism.
The proposed method outperforms state-of-the-art methods in terms of both visual quality and quantitative metrics.
arXiv Detail & Related papers (2023-10-09T07:10:30Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature
Ensemble for Multi-modality Image Fusion [72.8898811120795]
We propose a coupled contrastive learning network, dubbed CoCoNet, to realize infrared and visible image fusion.
Our method achieves state-of-the-art (SOTA) performance under both subjective and objective evaluation.
arXiv Detail & Related papers (2022-11-20T12:02:07Z) - Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image
Retrieval [55.21569389894215]
We propose a cross-attention framework for Vision Transformers (XModalViT) that fuses modality-specific information instead of discarding them.
Our framework first maps paired datapoints from the individual photo and sketch modalities to fused representations that unify information from both modalities.
We then decouple the input space of the aforementioned modality fusion network into independent encoders of the individual modalities via contrastive and relational cross-modal knowledge distillation.
arXiv Detail & Related papers (2022-10-19T11:50:14Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z) - Unsupervised Image Fusion Method based on Feature Mutual Mapping [16.64607158983448]
We propose an unsupervised adaptive image fusion method to address the above issues.
We construct a global map to measure the connections of pixels between the input source images.
Our method achieves superior performance in both visual perception and objective evaluation.
arXiv Detail & Related papers (2022-01-25T07:50:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.