Related papers: CrossFuse: Learning Infrared and Visible Image Fusion by Cross-Sensor Top-K Vision Alignment and Beyond

CrossFuse: Learning Infrared and Visible Image Fusion by Cross-Sensor Top-K Vision Alignment and Beyond

URL: http://arxiv.org/abs/2502.14493v1
Date: Thu, 20 Feb 2025 12:19:30 GMT
Title: CrossFuse: Learning Infrared and Visible Image Fusion by Cross-Sensor Top-K Vision Alignment and Beyond
Authors: Yukai Shi, Cidan Shi, Zhipeng Weng, Yin Tian, Xiaoyu Xian, Liang Lin,
Abstract summary: Infrared and visible image fusion (IVIF) is increasingly applied in critical fields such as video surveillance and autonomous driving systems.<n>We propose an infrared-visible fusion framework based on Multi-View Augmentation.<n>Our approach significantly enhances the reliability and stability of IVIF tasks in practical applications.
Score: 45.996901339560566
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Infrared and visible image fusion (IVIF) is increasingly applied in critical fields such as video surveillance and autonomous driving systems. Significant progress has been made in deep learning-based fusion methods. However, these models frequently encounter out-of-distribution (OOD) scenes in real-world applications, which severely impact their performance and reliability. Therefore, addressing the challenge of OOD data is crucial for the safe deployment of these models in open-world environments. Unlike existing research, our focus is on the challenges posed by OOD data in real-world applications and on enhancing the robustness and generalization of models. In this paper, we propose an infrared-visible fusion framework based on Multi-View Augmentation. For external data augmentation, Top-k Selective Vision Alignment is employed to mitigate distribution shifts between datasets by performing RGB-wise transformations on visible images. This strategy effectively introduces augmented samples, enhancing the adaptability of the model to complex real-world scenarios. Additionally, for internal data augmentation, self-supervised learning is established using Weak-Aggressive Augmentation. This enables the model to learn more robust and general feature representations during the fusion process, thereby improving robustness and generalization. Extensive experiments demonstrate that the proposed method exhibits superior performance and robustness across various conditions and environments. Our approach significantly enhances the reliability and stability of IVIF tasks in practical applications.

Related papers

DFVO: Learning Darkness-free Visible and Infrared Image Disentanglement and Fusion All at Once [57.15043822199561]
A Darkness-Free network is proposed to handle Visible and infrared image disentanglement and fusion all at Once (DFVO)<n>DFVO employs a cascaded multi-task approach to replace the traditional two-stage cascaded training (enhancement and fusion)<n>Our proposed approach outperforms state-of-the-art alternatives in terms of qualitative and quantitative evaluations.
arXiv Detail & Related papers (2025-05-07T15:59:45Z)
Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption [65.06388526722186]
Infrared-visible image fusion is a critical task in computer vision. There is a lack of recent comprehensive surveys that address this rapidly expanding domain. We introduce a multi-dimensional framework to elucidate common learning-based IVIF methods.
arXiv Detail & Related papers (2025-01-18T13:17:34Z)
Hierarchical Information Flow for Generalized Efficient Image Restoration [108.83750852785582]
We propose a hierarchical information flow mechanism for image restoration, dubbed Hi-IR. Hi-IR constructs a hierarchical information tree representing the degraded image across three levels. In seven common image restoration tasks, Hi-IR achieves its effectiveness and generalizability.
arXiv Detail & Related papers (2024-11-27T18:30:08Z)
WTCL-Dehaze: Rethinking Real-world Image Dehazing via Wavelet Transform and Contrastive Learning [17.129068060454255]
Single image dehazing is essential for applications such as autonomous driving and surveillance. We propose an enhanced semi-supervised dehazing network that integrates Contrastive Loss and Discrete Wavelet Transform. Our proposed algorithm achieves superior performance and improved robustness compared to state-of-the-art single image dehazing methods.
arXiv Detail & Related papers (2024-10-07T05:36:11Z)
DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion [10.713089596405053]
We propose DAE-Fuse, a novel two-phase discriminative autoencoder framework that generates sharp and natural fused images.<n>We pioneer the extension of image fusion techniques from static images to the video domain.<n>DaE-Fuse achieves state-of-the-art performance on multiple benchmarks, with superior generalizability to tasks like medical image fusion.
arXiv Detail & Related papers (2024-09-16T08:37:09Z)
A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance. We propose a simple yet effective data augmentation approach by leveraging advancements in generative models. Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z)
Online,Target-Free LiDAR-Camera Extrinsic Calibration via Cross-Modal Mask Matching [16.13886663417327]
We introduce a novel framework known as MIAS-LCEC, provide an open-source versatile calibration toolbox, and publish three real-world datasets. The cornerstone of our framework and toolbox is the cross-modal mask matching (C3M) algorithm, developed based on a state-of-the-art (SoTA) LVM.
arXiv Detail & Related papers (2024-04-28T06:25:56Z)
OCR is All you need: Importing Multi-Modality into Image-based Defect Detection System [7.1083241462091165]
We introduce an external modality-guided data mining framework, primarily rooted in optical character recognition (OCR), to extract statistical features from images. A key aspect of our approach is the alignment of external modality features, extracted using a single modality-aware model, with image features encoded by a convolutional neural network. Our methodology considerably boosts the recall rate of the defect detection model and maintains high robustness even in challenging scenarios.
arXiv Detail & Related papers (2024-03-18T07:41:39Z)
Cross-Modality Perturbation Synergy Attack for Person Re-identification [66.48494594909123]
The main challenge in cross-modality ReID lies in effectively dealing with visual differences between different modalities. Existing attack methods have primarily focused on the characteristics of the visible image modality. This study proposes a universal perturbation attack specifically designed for cross-modality ReID.
arXiv Detail & Related papers (2024-01-18T15:56:23Z)
RBSR: Efficient and Flexible Recurrent Network for Burst Super-Resolution [57.98314517861539]
Burst super-resolution (BurstSR) aims at reconstructing a high-resolution (HR) image from a sequence of low-resolution (LR) and noisy images. In this paper, we suggest fusing cues frame-by-frame with an efficient and flexible recurrent network.
arXiv Detail & Related papers (2023-06-30T12:14:13Z)
Does Thermal data make the detection systems more reliable? [1.2891210250935146]
We propose a comprehensive detection system based on a multimodal-collaborative framework. This framework learns from both RGB (from visual cameras) and thermal (from Infrared cameras) data. Our empirical results show that while the improvement in accuracy is nominal, the value lies in challenging and extremely difficult edge cases.
arXiv Detail & Related papers (2021-11-09T15:04:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.