Multimodal Image Fusion based on Hybrid CNN-Transformer and Non-local
Cross-modal Attention
- URL: http://arxiv.org/abs/2210.09847v1
- Date: Tue, 18 Oct 2022 13:30:52 GMT
- Title: Multimodal Image Fusion based on Hybrid CNN-Transformer and Non-local
Cross-modal Attention
- Authors: Yu Yuan and Jiaqi Wu and Zhongliang Jing and Henry Leung and Han Pan
- Abstract summary: We present a hybrid model consisting of a convolutional encoder and a Transformer-based decoder to fuse multimodal images.
A branch fusion module is designed to adaptively fuse the features of the two branches.
- Score: 12.167049432063132
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The fusion of images taken by heterogeneous sensors helps to enrich the
information and improve the quality of imaging. In this article, we present a
hybrid model consisting of a convolutional encoder and a Transformer-based
decoder to fuse multimodal images. In the encoder, a non-local cross-modal
attention block is proposed to capture both local and global dependencies of
multiple source images. A branch fusion module is designed to adaptively fuse
the features of the two branches. We embed a Transformer module with linear
complexity in the decoder to enhance the reconstruction capability of the
proposed network. Qualitative and quantitative experiments demonstrate the
effectiveness of the proposed method by comparing it with existing
state-of-the-art fusion models. The source code of our work is available at
https://github.com/pandayuanyu/HCFusion.
Related papers
- Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond [74.96466744512992]
The essence of image fusion is to integrate complementary information from source images.
DeFusion++ produces versatile fused representations that can enhance the quality of image fusion and the effectiveness of downstream high-level vision tasks.
arXiv Detail & Related papers (2024-10-16T06:28:49Z) - Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion [15.79138560700532]
We propose a dual-branch image fusion network called Tmamba.
It consists of linear Transformer and Mamba, which has global modeling capabilities while maintaining linear complexity.
Experiments show that our Tmamba achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2024-09-05T03:42:11Z) - A Hybrid Transformer-Mamba Network for Single Image Deraining [70.64069487982916]
Existing deraining Transformers employ self-attention mechanisms with fixed-range windows or along channel dimensions.
We introduce a novel dual-branch hybrid Transformer-Mamba network, denoted as TransMamba, aimed at effectively capturing long-range rain-related dependencies.
arXiv Detail & Related papers (2024-08-31T10:03:19Z) - Effective Image Tampering Localization via Enhanced Transformer and
Co-attention Fusion [5.691973573807887]
We propose an effective image tampering localization network (EITLNet) based on a two-branch enhanced transformer encoder.
The features extracted from RGB and noise streams are fused effectively by the coordinate attention-based fusion module.
arXiv Detail & Related papers (2023-09-17T15:43:06Z) - Equivariant Multi-Modality Image Fusion [124.11300001864579]
We propose the Equivariant Multi-Modality imAge fusion paradigm for end-to-end self-supervised learning.
Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations.
Experiments confirm that EMMA yields high-quality fusion results for infrared-visible and medical images.
arXiv Detail & Related papers (2023-05-19T05:50:24Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - TransFuse: A Unified Transformer-based Image Fusion Framework using
Self-supervised Learning [5.849513679510834]
Image fusion is a technique to integrate information from multiple source images with complementary information to improve the richness of a single image.
Two-stage methods avoid the need of large amount of task-specific training data by training encoder-decoder network on large natural image datasets.
We propose a destruction-reconstruction based self-supervised training scheme to encourage the network to learn task-specific features.
arXiv Detail & Related papers (2022-01-19T07:30:44Z) - TransMEF: A Transformer-Based Multi-Exposure Image Fusion Framework
using Self-Supervised Multi-Task Learning [5.926203312586108]
We propose TransMEF, a transformer-based multi-exposure image fusion framework.
The framework is based on an encoder-decoder network, which can be trained on large natural image datasets.
arXiv Detail & Related papers (2021-12-02T07:43:42Z) - Image Fusion Transformer [75.71025138448287]
In image fusion, images obtained from different sensors are fused to generate a single image with enhanced information.
In recent years, state-of-the-art methods have adopted Convolution Neural Networks (CNNs) to encode meaningful features for image fusion.
We propose a novel Image Fusion Transformer (IFT) where we develop a transformer-based multi-scale fusion strategy.
arXiv Detail & Related papers (2021-07-19T16:42:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.