Explicit and implicit models in infrared and visible image fusion
- URL: http://arxiv.org/abs/2206.09581v1
- Date: Mon, 20 Jun 2022 06:05:09 GMT
- Title: Explicit and implicit models in infrared and visible image fusion
- Authors: Zixuan Wang, Bin Sun
- Abstract summary: This paper discusses the limitations of deep learning models in image fusion and the corresponding optimization strategies.
Ten models for comparison experiments on 21 test sets were screened.
The qualitative and quantitative results show that the implicit models have more comprehensive ability to learn image features.
- Score: 5.842112272932475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Infrared and visible images, as multi-modal image pairs, show significant
differences in the expression of the same scene. The image fusion task is faced
with two problems: one is to maintain the unique features between different
modalities, and the other is to maintain features at various levels like local
and global features. This paper discusses the limitations of deep learning
models in image fusion and the corresponding optimization strategies. Based on
artificially designed structures and constraints, we divide models into
explicit models, and implicit models that adaptively learn high-level features
or can establish global pixel associations. Ten models for comparison
experiments on 21 test sets were screened. The qualitative and quantitative
results show that the implicit models have more comprehensive ability to learn
image features. At the same time, the stability of them needs to be improved.
Aiming at the advantages and limitations to be solved by existing algorithms,
we discuss the main problems of multi-modal image fusion and future research
directions.
Related papers
- Progressive Compositionality In Text-to-Image Generative Models [33.18510121342558]
We propose EvoGen, a new curriculum for contrastive learning of diffusion models.
In this work, we leverage large-language models (LLMs) to compose realistic, complex scenarios.
We also harness Visual-Question Answering (VQA) systems alongside diffusion models to automatically curate a contrastive dataset, ConPair.
arXiv Detail & Related papers (2024-10-22T05:59:29Z) - Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment [20.902935570581207]
We introduce a Multimodal Alignment and Reconstruction Network (MARNet) to enhance the model's resistance to visual noise.
MARNet includes a cross-modal diffusion reconstruction module for smoothly and stably blending information across different domains.
Experiments conducted on two benchmark datasets, Vireo-Food172 and Ingredient-101, demonstrate that MARNet effectively improves the quality of image information extracted by the model.
arXiv Detail & Related papers (2024-07-26T16:30:18Z) - Advanced Multimodal Deep Learning Architecture for Image-Text Matching [33.8315200009152]
Image-text matching is a key multimodal task that aims to model the semantic association between images and text as a matching relationship.
We introduce an advanced multimodal deep learning architecture, which combines the high-level abstract representation ability of deep neural networks for visual information with the advantages of natural language processing models for text semantic understanding.
Experiments show that compared with existing image-text matching models, the optimized new model has significantly improved performance on a series of benchmark data sets.
arXiv Detail & Related papers (2024-06-13T08:32:24Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - From Text to Pixels: A Context-Aware Semantic Synergy Solution for
Infrared and Visible Image Fusion [66.33467192279514]
We introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images.
Our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.
arXiv Detail & Related papers (2023-12-31T08:13:47Z) - CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image
Diffusion Models [48.10798436003449]
Images produced by text-to-image diffusion models might not always faithfully represent the semantic intent of the provided text prompt.
Our work introduces a novel perspective by tackling this challenge in a contrastive context.
We conduct extensive experiments across a wide variety of scenarios, each involving unique combinations of objects, attributes, and scenes.
arXiv Detail & Related papers (2023-12-11T01:42:15Z) - Hybrid-Supervised Dual-Search: Leveraging Automatic Learning for
Loss-free Multi-Exposure Image Fusion [60.221404321514086]
Multi-exposure image fusion (MEF) has emerged as a prominent solution to address the limitations of digital imaging in representing varied exposure levels.
This paper presents a Hybrid-Supervised Dual-Search approach for MEF, dubbed HSDS-MEF, which introduces a bi-level optimization search scheme for automatic design of both network structures and loss functions.
arXiv Detail & Related papers (2023-09-03T08:07:26Z) - UniDiff: Advancing Vision-Language Models with Generative and
Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC)
UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z) - A Task-guided, Implicitly-searched and Meta-initialized Deep Model for
Image Fusion [69.10255211811007]
We present a Task-guided, Implicit-searched and Meta- generalizationd (TIM) deep model to address the image fusion problem in a challenging real-world scenario.
Specifically, we propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion.
Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency.
arXiv Detail & Related papers (2023-05-25T08:54:08Z) - A Better Loss for Visual-Textual Grounding [74.81353762517979]
Given a textual phrase and an image, the visual grounding problem is defined as the task of locating the content of the image referenced by the sentence.
It is a challenging task that has several real-world applications in human-computer interaction, image-text reference resolution, and video-text reference resolution.
We propose a model that is able to achieve a higher accuracy than state-of-the-art models thanks to the adoption of a more effective loss function.
arXiv Detail & Related papers (2021-08-11T16:26:54Z) - Dependent Multi-Task Learning with Causal Intervention for Image
Captioning [10.6405791176668]
In this paper, we propose a dependent multi-task learning framework with the causal intervention (DMTCI)
Firstly, we involve an intermediate task, bag-of-categories generation, before the final task, image captioning.
Secondly, we apply Pearl's do-calculus on the model, cutting off the link between the visual features and possible confounders.
Finally, we use a multi-agent reinforcement learning strategy to enable end-to-end training and reduce the inter-task error accumulations.
arXiv Detail & Related papers (2021-05-18T14:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.