From Text to Pixels: A Context-Aware Semantic Synergy Solution for
Infrared and Visible Image Fusion
- URL: http://arxiv.org/abs/2401.00421v1
- Date: Sun, 31 Dec 2023 08:13:47 GMT
- Title: From Text to Pixels: A Context-Aware Semantic Synergy Solution for
Infrared and Visible Image Fusion
- Authors: Xingyuan Li, Yang Zou, Jinyuan Liu, Zhiying Jiang, Long Ma, Xin Fan,
Risheng Liu
- Abstract summary: We introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images.
Our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.
- Score: 66.33467192279514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid progression of deep learning technologies, multi-modality
image fusion has become increasingly prevalent in object detection tasks.
Despite its popularity, the inherent disparities in how different sources
depict scene content make fusion a challenging problem. Current fusion
methodologies identify shared characteristics between the two modalities and
integrate them within this shared domain using either iterative optimization or
deep learning architectures, which often neglect the intricate semantic
relationships between modalities, resulting in a superficial understanding of
inter-modal connections and, consequently, suboptimal fusion outcomes. To
address this, we introduce a text-guided multi-modality image fusion method
that leverages the high-level semantics from textual descriptions to integrate
semantics from infrared and visible images. This method capitalizes on the
complementary characteristics of diverse modalities, bolstering both the
accuracy and robustness of object detection. The codebook is utilized to
enhance a streamlined and concise depiction of the fused intra- and
inter-domain dynamics, fine-tuned for optimal performance in detection tasks.
We present a bilevel optimization strategy that establishes a nexus between the
joint problem of fusion and detection, optimizing both processes concurrently.
Furthermore, we introduce the first dataset of paired infrared and visible
images accompanied by text prompts, paving the way for future research.
Extensive experiments on several datasets demonstrate that our method not only
produces visually superior fusion results but also achieves a higher detection
mAP over existing methods, achieving state-of-the-art results.
Related papers
- Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model [30.739879255847946]
Existing multi-modal image fusion methods fail to address the compound degradations presented in source images.
This study proposes a novel interactive multi-modal image fusion framework based on the text-modulated diffusion model, called Text-DiFuse.
arXiv Detail & Related papers (2024-10-31T13:10:50Z) - Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion [26.809259323430368]
We introduce a novel approach that leverages semantic text guidance image fusion model for degradation-aware and interactive image fusion task, termed as Text-IF.
Text-IF is accessible to the all-in-one infrared and visible image degradation-aware processing and the interactive flexible fusion outcomes.
In this way, Text-IF achieves not only multi-modal image fusion, but also multi-modal information fusion.
arXiv Detail & Related papers (2024-03-25T03:06:45Z) - A Task-guided, Implicitly-searched and Meta-initialized Deep Model for
Image Fusion [69.10255211811007]
We present a Task-guided, Implicit-searched and Meta- generalizationd (TIM) deep model to address the image fusion problem in a challenging real-world scenario.
Specifically, we propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion.
Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency.
arXiv Detail & Related papers (2023-05-25T08:54:08Z) - An Interactively Reinforced Paradigm for Joint Infrared-Visible Image
Fusion and Saliency Object Detection [59.02821429555375]
This research focuses on the discovery and localization of hidden objects in the wild and serves unmanned systems.
Through empirical analysis, infrared and visible image fusion (IVIF) enables hard-to-find objects apparent.
multimodal salient object detection (SOD) accurately delineates the precise spatial location of objects within the picture.
arXiv Detail & Related papers (2023-05-17T06:48:35Z) - Breaking Free from Fusion Rule: A Fully Semantic-driven Infrared and
Visible Image Fusion [51.22863068854784]
Infrared and visible image fusion plays a vital role in the field of computer vision.
Previous approaches make efforts to design various fusion rules in the loss functions.
We develop a semantic-level fusion network to sufficiently utilize the semantic guidance.
arXiv Detail & Related papers (2022-11-22T13:59:59Z) - CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature
Ensemble for Multi-modality Image Fusion [72.8898811120795]
We propose a coupled contrastive learning network, dubbed CoCoNet, to realize infrared and visible image fusion.
Our method achieves state-of-the-art (SOTA) performance under both subjective and objective evaluation.
arXiv Detail & Related papers (2022-11-20T12:02:07Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z) - Unsupervised Image Fusion Method based on Feature Mutual Mapping [16.64607158983448]
We propose an unsupervised adaptive image fusion method to address the above issues.
We construct a global map to measure the connections of pixels between the input source images.
Our method achieves superior performance in both visual perception and objective evaluation.
arXiv Detail & Related papers (2022-01-25T07:50:14Z) - Cross-modal Image Retrieval with Deep Mutual Information Maximization [14.778158582349137]
We study the cross-modal image retrieval, where the inputs contain a source image plus some text that describes certain modifications to this image and the desired image.
Our method narrows the modality gap between the text modality and the image modality by maximizing mutual information between their not exactly semantically identical representation.
arXiv Detail & Related papers (2021-03-10T13:08:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.