Related papers: TransFuse: A Unified Transformer-based Image Fusion Framework using Self-supervised Learning

TransFuse: A Unified Transformer-based Image Fusion Framework using Self-supervised Learning

URL: http://arxiv.org/abs/2201.07451v1
Date: Wed, 19 Jan 2022 07:30:44 GMT
Title: TransFuse: A Unified Transformer-based Image Fusion Framework using Self-supervised Learning
Authors: Linhao Qu, Shaolei Liu, Manning Wang, Shiman Li, Siqi Yin, Qin Qiao, Zhijian Song
Abstract summary: Image fusion is a technique to integrate information from multiple source images with complementary information to improve the richness of a single image. Two-stage methods avoid the need of large amount of task-specific training data by training encoder-decoder network on large natural image datasets. We propose a destruction-reconstruction based self-supervised training scheme to encourage the network to learn task-specific features.
Score: 5.849513679510834
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image fusion is a technique to integrate information from multiple source images with complementary information to improve the richness of a single image. Due to insufficient task-specific training data and corresponding ground truth, most existing end-to-end image fusion methods easily fall into overfitting or tedious parameter optimization processes. Two-stage methods avoid the need of large amount of task-specific training data by training encoder-decoder network on large natural image datasets and utilizing the extracted features for fusion, but the domain gap between natural images and different fusion tasks results in limited performance. In this study, we design a novel encoder-decoder based image fusion framework and propose a destruction-reconstruction based self-supervised training scheme to encourage the network to learn task-specific features. Specifically, we propose three destruction-reconstruction self-supervised auxiliary tasks for multi-modal image fusion, multi-exposure image fusion and multi-focus image fusion based on pixel intensity non-linear transformation, brightness transformation and noise transformation, respectively. In order to encourage different fusion tasks to promote each other and increase the generalizability of the trained network, we integrate the three self-supervised auxiliary tasks by randomly choosing one of them to destroy a natural image in model training. In addition, we design a new encoder that combines CNN and Transformer for feature extraction, so that the trained model can exploit both local and global information. Extensive experiments on multi-modal image fusion, multi-exposure image fusion and multi-focus image fusion tasks demonstrate that our proposed method achieves the state-of-the-art performance in both subjective and objective evaluations. The code will be publicly available soon.

Related papers

Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond [74.96466744512992]
The essence of image fusion is to integrate complementary information from source images. DeFusion++ produces versatile fused representations that can enhance the quality of image fusion and the effectiveness of downstream high-level vision tasks.
arXiv Detail & Related papers (2024-10-16T06:28:49Z)
MMDRFuse: Distilled Mini-Model with Dynamic Refresh for Multi-Modality Image Fusion [32.38584862347954]
A lightweight Distilled Mini-Model with a Dynamic Refresh strategy (MMDRFuse) is proposed to achieve this objective. To pursue model parsimony, an extremely small convolutional network with a total of 113 trainable parameters (0.44 KB) is obtained. Experiments on several public datasets demonstrate that our method exhibits promising advantages in terms of model efficiency and complexity.
arXiv Detail & Related papers (2024-08-28T08:52:33Z)
MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training [95.24751989263117]
MaeFuse is a novel autoencoder model designed for Infrared and Visible Image Fusion (IVIF) Our model utilizes a pretrained encoder from Masked Autoencoders (MAE), which facilities the omni features extraction for low-level reconstruction and high-level vision tasks. MaeFuse not only introduces a novel perspective in the realm of fusion techniques but also stands out with impressive performance across various public datasets.
arXiv Detail & Related papers (2024-04-17T02:47:39Z)
A Task-guided, Implicitly-searched and Meta-initialized Deep Model for Image Fusion [69.10255211811007]
We present a Task-guided, Implicit-searched and Meta- generalizationd (TIM) deep model to address the image fusion problem in a challenging real-world scenario. Specifically, we propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion. Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency.
arXiv Detail & Related papers (2023-05-25T08:54:08Z)
Equivariant Multi-Modality Image Fusion [124.11300001864579]
We propose the Equivariant Multi-Modality imAge fusion paradigm for end-to-end self-supervised learning. Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations. Experiments confirm that EMMA yields high-quality fusion results for infrared-visible and medical images.
arXiv Detail & Related papers (2023-05-19T05:50:24Z)
LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Visible Images [98.36300655482196]
We formulate the fusion task mathematically, and establish a connection between its optimal solution and the network architecture that can implement it. In particular we adopt a learnable representation approach to the fusion task, in which the construction of the fusion network architecture is guided by the optimisation algorithm producing the learnable model. Based on this novel network architecture, an end-to-end lightweight fusion network is constructed to fuse infrared and visible light images.
arXiv Detail & Related papers (2023-04-11T12:11:23Z)
CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion [72.8898811120795]
We propose a coupled contrastive learning network, dubbed CoCoNet, to realize infrared and visible image fusion. Our method achieves state-of-the-art (SOTA) performance under both subjective and objective evaluation.
arXiv Detail & Related papers (2022-11-20T12:02:07Z)
Unsupervised Image Fusion Method based on Feature Mutual Mapping [16.64607158983448]
We propose an unsupervised adaptive image fusion method to address the above issues. We construct a global map to measure the connections of pixels between the input source images. Our method achieves superior performance in both visual perception and objective evaluation.
arXiv Detail & Related papers (2022-01-25T07:50:14Z)
TransMEF: A Transformer-Based Multi-Exposure Image Fusion Framework using Self-Supervised Multi-Task Learning [5.926203312586108]
We propose TransMEF, a transformer-based multi-exposure image fusion framework. The framework is based on an encoder-decoder network, which can be trained on large natural image datasets.
arXiv Detail & Related papers (2021-12-02T07:43:42Z)
Deep Image Compositing [93.75358242750752]
We propose a new method which can automatically generate high-quality image composites without any user input. Inspired by Laplacian pyramid blending, a dense-connected multi-stream fusion network is proposed to effectively fuse the information from the foreground and background images. Experiments show that the proposed method can automatically generate high-quality composites and outperforms existing methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-11-04T06:12:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.