FusionVAE: A Deep Hierarchical Variational Autoencoder for RGB Image
Fusion
- URL: http://arxiv.org/abs/2209.11277v1
- Date: Thu, 22 Sep 2022 19:06:55 GMT
- Title: FusionVAE: A Deep Hierarchical Variational Autoencoder for RGB Image
Fusion
- Authors: Fabian Duffhauss, Ngo Anh Vien, Hanna Ziesche, Gerhard Neumann
- Abstract summary: We present a novel deep hierarchical variational autoencoder called FusionVAE that can serve as a basis for many fusion tasks.
Our approach is able to generate diverse image samples that are conditioned on multiple noisy, occluded, or only partially visible input images.
- Score: 16.64908104831795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sensor fusion can significantly improve the performance of many computer
vision tasks. However, traditional fusion approaches are either not data-driven
and cannot exploit prior knowledge nor find regularities in a given dataset or
they are restricted to a single application. We overcome this shortcoming by
presenting a novel deep hierarchical variational autoencoder called FusionVAE
that can serve as a basis for many fusion tasks. Our approach is able to
generate diverse image samples that are conditioned on multiple noisy,
occluded, or only partially visible input images. We derive and optimize a
variational lower bound for the conditional log-likelihood of FusionVAE. In
order to assess the fusion capabilities of our model thoroughly, we created
three novel datasets for image fusion based on popular computer vision
datasets. In our experiments, we show that FusionVAE learns a representation of
aggregated information that is relevant to fusion tasks. The results
demonstrate that our approach outperforms traditional methods significantly.
Furthermore, we present the advantages and disadvantages of different design
choices.
Related papers
- Test-Time Dynamic Image Fusion [45.551196908423606]
In this paper, we give our solution from a generalization perspective.
We decompose the fused image into multiple components corresponding to its source data.
We prove that the key to reducing generalization error hinges on the negative correlation between the RD-based fusion weight and the uni-source reconstruction loss.
arXiv Detail & Related papers (2024-11-05T06:23:44Z) - Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond [74.96466744512992]
The essence of image fusion is to integrate complementary information from source images.
DeFusion++ produces versatile fused representations that can enhance the quality of image fusion and the effectiveness of downstream high-level vision tasks.
arXiv Detail & Related papers (2024-10-16T06:28:49Z) - A Task-guided, Implicitly-searched and Meta-initialized Deep Model for
Image Fusion [69.10255211811007]
We present a Task-guided, Implicit-searched and Meta- generalizationd (TIM) deep model to address the image fusion problem in a challenging real-world scenario.
Specifically, we propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion.
Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency.
arXiv Detail & Related papers (2023-05-25T08:54:08Z) - Equivariant Multi-Modality Image Fusion [124.11300001864579]
We propose the Equivariant Multi-Modality imAge fusion paradigm for end-to-end self-supervised learning.
Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations.
Experiments confirm that EMMA yields high-quality fusion results for infrared-visible and medical images.
arXiv Detail & Related papers (2023-05-19T05:50:24Z) - TransFuse: A Unified Transformer-based Image Fusion Framework using
Self-supervised Learning [5.849513679510834]
Image fusion is a technique to integrate information from multiple source images with complementary information to improve the richness of a single image.
Two-stage methods avoid the need of large amount of task-specific training data by training encoder-decoder network on large natural image datasets.
We propose a destruction-reconstruction based self-supervised training scheme to encourage the network to learn task-specific features.
arXiv Detail & Related papers (2022-01-19T07:30:44Z) - Image Fusion Transformer [75.71025138448287]
In image fusion, images obtained from different sensors are fused to generate a single image with enhanced information.
In recent years, state-of-the-art methods have adopted Convolution Neural Networks (CNNs) to encode meaningful features for image fusion.
We propose a novel Image Fusion Transformer (IFT) where we develop a transformer-based multi-scale fusion strategy.
arXiv Detail & Related papers (2021-07-19T16:42:49Z) - Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination.
Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities.
We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z) - VMLoc: Variational Fusion For Learning-Based Multimodal Camera
Localization [46.607930208613574]
We propose an end-to-end framework, termed VMLoc, to fuse different sensor inputs into a common latent space.
Unlike previous multimodal variational works directly adapting the objective function of vanilla variational auto-encoder, we show how camera localization can be accurately estimated.
arXiv Detail & Related papers (2020-03-12T14:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.