Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption
- URL: http://arxiv.org/abs/2501.10761v1
- Date: Sat, 18 Jan 2025 13:17:34 GMT
- Title: Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption
- Authors: Jinyuan Liu, Guanyao Wu, Zhu Liu, Di Wang, Zhiying Jiang, Long Ma, Wei Zhong, Xin Fan, Risheng Liu,
- Abstract summary: Infrared-visible image fusion is a critical task in computer vision.
There is a lack of recent comprehensive surveys that address this rapidly expanding domain.
We introduce a multi-dimensional framework to elucidate common learning-based IVIF methods.
- Score: 65.06388526722186
- License:
- Abstract: Infrared-visible image fusion (IVIF) is a critical task in computer vision, aimed at integrating the unique features of both infrared and visible spectra into a unified representation. Since 2018, the field has entered the deep learning era, with an increasing variety of approaches introducing a range of networks and loss functions to enhance visual performance. However, challenges such as data compatibility, perception accuracy, and efficiency remain. Unfortunately, there is a lack of recent comprehensive surveys that address this rapidly expanding domain. This paper fills that gap by providing a thorough survey covering a broad range of topics. We introduce a multi-dimensional framework to elucidate common learning-based IVIF methods, from visual enhancement strategies to data compatibility and task adaptability. We also present a detailed analysis of these approaches, accompanied by a lookup table clarifying their core ideas. Furthermore, we summarize performance comparisons, both quantitatively and qualitatively, focusing on registration, fusion, and subsequent high-level tasks. Beyond technical analysis, we discuss potential future directions and open issues in this area. For further details, visit our GitHub repository: https://github.com/RollingPlain/IVIF_ZOO.
Related papers
- Range and Bird's Eye View Fused Cross-Modal Visual Place Recognition [10.086473917830112]
Image-to-point cloud cross-modal Visual Place Recognition (VPR) is a challenging task where the query is an RGB image, and the database samples are LiDAR point clouds.
We propose an innovative initial retrieval + re-rank method that effectively combines information from range (or RGB) images and Bird's Eye View (BEV) images.
arXiv Detail & Related papers (2025-02-17T12:29:26Z) - Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness [47.68358935792437]
Chart question answering (CQA) is a crucial area of Visual Language Understanding.
Current Visual Language Models (VLMs) in this field remain under-explored.
This paper evaluates state-of-the-art VLMs on comprehensive datasets.
arXiv Detail & Related papers (2024-07-15T20:29:24Z) - DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with
Competitive Query Selection and Adaptive Feature Fusion [82.2425759608975]
Infrared-visible object detection aims to achieve robust even full-day object detection by fusing the complementary information of infrared and visible images.
We propose a Dynamic Adaptive Multispectral Detection Transformer (DAMSDet) to address these two challenges.
Experiments on four public datasets demonstrate significant improvements compared to other state-of-the-art methods.
arXiv Detail & Related papers (2024-03-01T07:03:27Z) - From Text to Pixels: A Context-Aware Semantic Synergy Solution for
Infrared and Visible Image Fusion [66.33467192279514]
We introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images.
Our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.
arXiv Detail & Related papers (2023-12-31T08:13:47Z) - Learning a Graph Neural Network with Cross Modality Interaction for
Image Fusion [23.296468921842948]
Infrared and visible image fusion has gradually proved to be a vital fork in the field of multi-modality imaging technologies.
We propose an interactive graph neural network (GNN)-based architecture between cross modality for fusion, called IGNet.
Our IGNet can generate visually appealing fused images while scoring averagely 2.59% mAP@.5 and 7.77% mIoU higher in detection and segmentation.
arXiv Detail & Related papers (2023-08-07T02:25:06Z) - An Interactively Reinforced Paradigm for Joint Infrared-Visible Image
Fusion and Saliency Object Detection [59.02821429555375]
This research focuses on the discovery and localization of hidden objects in the wild and serves unmanned systems.
Through empirical analysis, infrared and visible image fusion (IVIF) enables hard-to-find objects apparent.
multimodal salient object detection (SOD) accurately delineates the precise spatial location of objects within the picture.
arXiv Detail & Related papers (2023-05-17T06:48:35Z) - CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion [68.78897015832113]
We propose a coupled contrastive learning network, dubbed CoCoNet, to realize infrared and visible image fusion.
Our method achieves state-of-the-art (SOTA) performance under both subjective and objective evaluation.
arXiv Detail & Related papers (2022-11-20T12:02:07Z) - Learnable Graph Convolutional Network and Feature Fusion for Multi-view
Learning [30.74535386745822]
This paper proposes a joint deep learning framework called Learnable Graph Convolutional Network and Feature Fusion (LGCN-FF)
It consists of two stages: feature fusion network and learnable graph convolutional network.
The proposed LGCN-FF is validated to be superior to various state-of-the-art methods in multi-view semi-supervised classification.
arXiv Detail & Related papers (2022-11-16T19:07:12Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.