Related papers: Registration-Free Hybrid Learning Empowers Simple Multimodal Imaging System for High-quality Fusion Detection

Registration-Free Hybrid Learning Empowers Simple Multimodal Imaging System for High-quality Fusion Detection

URL: http://arxiv.org/abs/2307.03425v1
Date: Fri, 7 Jul 2023 07:11:37 GMT
Title: Registration-Free Hybrid Learning Empowers Simple Multimodal Imaging System for High-quality Fusion Detection
Authors: Yinghan Guan, Haoran Dai, Zekuan Yu, Shouyu Wang and Yuanjie Gu
Abstract summary: We propose IA-VFDnet, a CNN-Transformer hybrid learning framework with a unified high-quality multimodal feature matching module. AKM and DWDAF work in synergy to perform high-quality infrared-aware visible fusion detection. Experiments on the M3FD dataset validate the superiority of the proposed method.
Score: 1.9249287163937976
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal fusion detection always places high demands on the imaging system and image pre-processing, while either a high-quality pre-registration system or image registration processing is costly. Unfortunately, the existing fusion methods are designed for registered source images, and the fusion of inhomogeneous features, which denotes a pair of features at the same spatial location that expresses different semantic information, cannot achieve satisfactory performance via these methods. As a result, we propose IA-VFDnet, a CNN-Transformer hybrid learning framework with a unified high-quality multimodal feature matching module (AKM) and a fusion module (WDAF), in which AKM and DWDAF work in synergy to perform high-quality infrared-aware visible fusion detection, which can be applied to smoke and wildfire detection. Furthermore, experiments on the M3FD dataset validate the superiority of the proposed method, with IA-VFDnet achieving the best detection performance than other state-of-the-art methods under conventional registered conditions. In addition, the first unregistered multimodal smoke and wildfire detection benchmark is openly available in this letter.

Related papers

SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion [65.80051636480836]
This paper proposes a conditional diffusion model guided by the Segment Anything Model (SAM) to achieve high-fidelity and semantically-aware image fusion.<n>The framework operates in a two-stage process: it first performs a preliminary fusion of multi-modal features, and then utilizes the semantic masks as a condition to drive the diffusion model's coarse-to-fine denoising generation.<n>Extensive experiments demonstrate that SGDFuse achieves state-of-the-art performance in both subjective and objective evaluations.
arXiv Detail & Related papers (2025-08-07T10:58:52Z)
Attention Fusion Reverse Distillation for Multi-Lighting Image Anomaly Detection [4.677326790094539]
This study proposes Attention Fusion Reverse Distillation to handle multiple inputs in MLIAD. Experiments on Eyecandies demonstrates that AFRD achieves superior MLIAD performance than other MLIAD alternatives.
arXiv Detail & Related papers (2024-06-07T01:26:37Z)
DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection. It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor. Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z)
Bridging the Gap between Multi-focus and Multi-modal: A Focused Integration Framework for Multi-modal Image Fusion [5.417493475406649]
Multi-modal image fusion (MMIF) integrates valuable information from different modality images into a fused one. This paper proposes a MMIF framework for joint focused integration and modalities information extraction. The proposed algorithm can surpass the state-of-the-art methods in visual perception and quantitative evaluation.
arXiv Detail & Related papers (2023-11-03T12:58:39Z)
Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images [1.662438436885552]
Multi-modal fusion has been determined to enhance the accuracy by fusing data from multiple modalities. We propose a novel multi-modal fusion strategy for mapping relationships between different channels at the early stage. By addressing fusion in the early stage, as opposed to mid or late-stage methods, our method achieves competitive and even superior performance compared to existing techniques.
arXiv Detail & Related papers (2023-10-21T00:56:11Z)
Equivariant Multi-Modality Image Fusion [124.11300001864579]
We propose the Equivariant Multi-Modality imAge fusion paradigm for end-to-end self-supervised learning. Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations. Experiments confirm that EMMA yields high-quality fusion results for infrared-visible and medical images.
arXiv Detail & Related papers (2023-05-19T05:50:24Z)
Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme. Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z)
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network. We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z)
Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection. Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks. This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z)
A Robust Multimodal Remote Sensing Image Registration Method and System Using Steerable Filters with First- and Second-order Gradients [7.813406811407584]
Co-registration of multimodal remote sensing images is still an ongoing challenge because of nonlinear radiometric differences (NRD) and significant geometric distortions. In this paper, a robust matching method based on the Steerable filters is proposed consisting of two critical steps. The performance of the proposed matching method has been evaluated with many different kinds of multimodal RS images.
arXiv Detail & Related papers (2022-02-27T12:22:42Z)
Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination. Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities. We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.