Image Fusion Transformer
- URL: http://arxiv.org/abs/2107.09011v2
- Date: Tue, 20 Jul 2021 15:34:03 GMT
- Title: Image Fusion Transformer
- Authors: Vibashan VS, Jeya Maria Jose Valanarasu, Poojan Oza and Vishal M.
Patel
- Abstract summary: In image fusion, images obtained from different sensors are fused to generate a single image with enhanced information.
In recent years, state-of-the-art methods have adopted Convolution Neural Networks (CNNs) to encode meaningful features for image fusion.
We propose a novel Image Fusion Transformer (IFT) where we develop a transformer-based multi-scale fusion strategy.
- Score: 75.71025138448287
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In image fusion, images obtained from different sensors are fused to generate
a single image with enhanced information. In recent years, state-of-the-art
methods have adopted Convolution Neural Networks (CNNs) to encode meaningful
features for image fusion. Specifically, CNN-based methods perform image fusion
by fusing local features. However, they do not consider long-range dependencies
that are present in the image. Transformer-based models are designed to
overcome this by modeling the long-range dependencies with the help of
self-attention mechanism. This motivates us to propose a novel Image Fusion
Transformer (IFT) where we develop a transformer-based multi-scale fusion
strategy that attends to both local and long-range information (or global
context). The proposed method follows a two-stage training approach. In the
first stage, we train an auto-encoder to extract deep features at multiple
scales. In the second stage, multi-scale features are fused using a
Spatio-Transformer (ST) fusion strategy. The ST fusion blocks are comprised of
a CNN and a transformer branch which capture local and long-range features,
respectively. Extensive experiments on multiple benchmark datasets show that
the proposed method performs better than many competitive fusion algorithms.
Furthermore, we show the effectiveness of the proposed ST fusion strategy with
an ablation analysis. The source code is available at:
https://github.com/Vibashan/Image-Fusion-Transformer.
Related papers
- Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion [15.79138560700532]
We propose a dual-branch image fusion network called Tmamba.
It consists of linear Transformer and Mamba, which has global modeling capabilities while maintaining linear complexity.
Experiments show that our Tmamba achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2024-09-05T03:42:11Z) - FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving [63.96049803915402]
The integration of data from diverse sensor modalities constitutes a prevalent methodology within the ambit of autonomous driving scenarios.
Recent advancements in efficient point cloud transformers have underscored the efficacy of integrating information in sparse formats.
In this paper, we conduct a comprehensive exploration of design choices for Transformer-based sparse cameraLiDAR fusion.
arXiv Detail & Related papers (2024-08-13T11:46:32Z) - Fusion Transformer with Object Mask Guidance for Image Forgery Analysis [9.468075384561947]
We introduce OMG-Fuser, a fusion transformer-based network designed to extract information from various forensic signals.
Our approach can operate with an arbitrary number of forensic signals and leverages object information for their analysis.
Our model is robust against traditional and novel forgery attacks and can be expanded with new signals without training from scratch.
arXiv Detail & Related papers (2024-03-18T20:20:13Z) - FuseFormer: A Transformer for Visual and Thermal Image Fusion [3.6064695344878093]
We propose a novel methodology for the image fusion problem that mitigates the limitations associated with using classical evaluation metrics as loss functions.
Our approach integrates a transformer-based multi-scale fusion strategy that adeptly addresses local and global context information.
Our proposed method, along with the novel loss function definition, demonstrates superior performance compared to other competitive fusion algorithms.
arXiv Detail & Related papers (2024-02-01T19:40:39Z) - Transformer Fusion with Optimal Transport [25.022849817421964]
Fusion is a technique for merging multiple independently-trained neural networks in order to combine their capabilities.
This paper presents a systematic approach for fusing two or more transformer-based networks exploiting Optimal Transport to (soft-)align the various architectural components.
arXiv Detail & Related papers (2023-10-09T13:40:31Z) - Mutual-Guided Dynamic Network for Image Fusion [51.615598671899335]
We propose a novel mutual-guided dynamic network (MGDN) for image fusion, which allows for effective information utilization across different locations and inputs.
Experimental results on five benchmark datasets demonstrate that our proposed method outperforms existing methods on four image fusion tasks.
arXiv Detail & Related papers (2023-08-24T03:50:37Z) - Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation [59.91357714415056]
We propose two Transformer variants: Context-Sharing Transformer (CST) and Semantic Gathering-Scattering Transformer (S GST)
CST learns the global-shared contextual information within image frames with a lightweight computation; S GST models the semantic correlation separately for the foreground and background.
Compared with the baseline that uses vanilla Transformers for multi-stage fusion, ours significantly increase the speed by 13 times and achieves new state-of-the-art ZVOS performance.
arXiv Detail & Related papers (2023-08-13T06:12:00Z) - A Task-guided, Implicitly-searched and Meta-initialized Deep Model for
Image Fusion [69.10255211811007]
We present a Task-guided, Implicit-searched and Meta- generalizationd (TIM) deep model to address the image fusion problem in a challenging real-world scenario.
Specifically, we propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion.
Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency.
arXiv Detail & Related papers (2023-05-25T08:54:08Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - Multimodal Image Fusion based on Hybrid CNN-Transformer and Non-local
Cross-modal Attention [12.167049432063132]
We present a hybrid model consisting of a convolutional encoder and a Transformer-based decoder to fuse multimodal images.
A branch fusion module is designed to adaptively fuse the features of the two branches.
arXiv Detail & Related papers (2022-10-18T13:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.