Related papers: CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions

CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions

URL: http://arxiv.org/abs/2404.16302v1
Date: Thu, 25 Apr 2024 02:54:11 GMT
Title: CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions
Authors: Haoyuan Li, Qi Hu, You Yao, Kailun Yang, Peng Chen,
Abstract summary: Cross-modality images that integrate visible-infrared spectra cues can provide richer information for object detection. Existing visible-infrared object detection methods severely degrade in severe weather conditions. We introduce visible-infrared object detection under adverse weather conditions.
Score: 17.216501433862014
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cross-modality images that integrate visible-infrared spectra cues can provide richer complementary information for object detection. Despite this, existing visible-infrared object detection methods severely degrade in severe weather conditions. This failure stems from the pronounced sensitivity of visible images to environmental perturbations, such as rain, haze, and snow, which frequently cause false negatives and false positives in detection. To address this issue, we introduce a novel and challenging task, termed visible-infrared object detection under adverse weather conditions. To foster this task, we have constructed a new Severe Weather Visible-Infrared Dataset (SWVID) with diverse severe weather scenes. Furthermore, we introduce the Cross-modality Fusion Mamba with Weather-removal (CFMW) to augment detection accuracy in adverse weather conditions. Thanks to the proposed Weather Removal Diffusion Model (WRDM) and Cross-modality Fusion Mamba (CFM) modules, CFMW is able to mine more essential information of pedestrian features in cross-modality fusion, thus could transfer to other rarer scenarios with high efficiency and has adequate availability on those platforms with low computing power. To the best of our knowledge, this is the first study that targeted improvement and integrated both Diffusion and Mamba modules in cross-modality object detection, successfully expanding the practical application of this type of model with its higher accuracy and more advanced architecture. Extensive experiments on both well-recognized and self-created datasets conclusively demonstrate that our CFMW achieves state-of-the-art detection performance, surpassing existing benchmarks. The dataset and source code will be made publicly available at https://github.com/lhy-zjut/CFMW.

Related papers

PIF-Net: Ill-Posed Prior Guided Multispectral and Hyperspectral Image Fusion via Invertible Mamba and Fusion-Aware LoRA [0.16385815610837165]
The goal of multispectral and hyperspectral image fusion (MHIF) is to generate high-quality images that simultaneously possess rich spectral information and fine spatial details.<n>Previous studies have not effectively addressed the ill-posed nature caused by data misalignment.<n>We propose a fusion framework named PIF-Net, which explicitly incorporates ill-posed priors to effectively fuse multispectral images and hyperspectral images.
arXiv Detail & Related papers (2025-08-01T09:17:17Z)
DFVO: Learning Darkness-free Visible and Infrared Image Disentanglement and Fusion All at Once [57.15043822199561]
A Darkness-Free network is proposed to handle Visible and infrared image disentanglement and fusion all at Once (DFVO)<n>DFVO employs a cascaded multi-task approach to replace the traditional two-stage cascaded training (enhancement and fusion)<n>Our proposed approach outperforms state-of-the-art alternatives in terms of qualitative and quantitative evaluations.
arXiv Detail & Related papers (2025-05-07T15:59:45Z)
FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z)
PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model [76.95536611263356]
PolSAR data presents unique challenges due to its rich and complex characteristics.<n>Existing data representations, such as complex-valued data, polarimetric features, and amplitude images, are widely used.<n>Most feature extraction networks for PolSAR are small, limiting their ability to capture features effectively.<n>We propose the Polarimetric Scattering Mechanism-Informed SAM (PolSAM), an enhanced Segment Anything Model (SAM) that integrates domain-specific scattering characteristics and a novel prompt generation strategy.
arXiv Detail & Related papers (2024-12-17T09:59:53Z)
CRT-Fusion: Camera, Radar, Temporal Fusion Using Motion Information for 3D Object Detection [9.509625131289429]
We introduce CRT-Fusion, a novel framework that integrates temporal information into radar-camera fusion.<n>CRT-Fusion achieves state-of-the-art performance for radar-camera-based 3D object detection.
arXiv Detail & Related papers (2024-11-05T11:25:19Z)
ContextualFusion: Context-Based Multi-Sensor Fusion for 3D Object Detection in Adverse Operating Conditions [1.7537812081430004]
We propose a technique called ContextualFusion to incorporate the domain knowledge about cameras and lidars behaving differently across lighting and weather variations into 3D object detection models. Our approach yields an mAP improvement of 6.2% over state-of-the-art methods on our context-balanced synthetic dataset. Our method enhances state-of-the-art 3D objection performance at night on the real-world NuScenes dataset with a significant mAP improvement of 11.7%.
arXiv Detail & Related papers (2024-04-23T06:37:54Z)
Fusion-Mamba for Cross-modality Object Detection [63.56296480951342]
Cross-modality fusing information from different modalities effectively improves object detection performance. We design a Fusion-Mamba block (FMB) to map cross-modal features into a hidden state space for interaction. Our proposed approach outperforms the state-of-the-art methods on $m$AP with 5.9% on $M3FD$ and 4.9% on FLIR-Aligned datasets.
arXiv Detail & Related papers (2024-04-14T05:28:46Z)
VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing [13.777195433138179]
This study aims to design a visible-infrared fusion network for image dehazing. In particular, we propose a multi-scale Deep Structure Feature Extraction (DSFE) module to restore more spatial and marginal information. To validate this, we construct a visible-infrared multimodal dataset called AirSim-VID based on the AirSim simulation platform.
arXiv Detail & Related papers (2024-04-11T14:31:11Z)
Beyond Night Visibility: Adaptive Multi-Scale Fusion of Infrared and Visible Images [49.75771095302775]
We propose an Adaptive Multi-scale Fusion network (AMFusion) with infrared and visible images. First, we separately fuse spatial and semantic features from infrared and visible images, where the former are used for the adjustment of light distribution. Second, we utilize detection features extracted by a pre-trained backbone that guide the fusion of semantic features. Third, we propose a new illumination loss to constrain fusion image with normal light intensity.
arXiv Detail & Related papers (2024-03-02T03:52:07Z)
MISFIT-V: Misaligned Image Synthesis and Fusion using Information from Thermal and Visual [2.812395851874055]
This work presents Misaligned Image Synthesis and Fusion using Information from Thermal and Visual (MISFIT-V) It is a novel two-pronged unsupervised deep learning approach that utilizes a Generative Adversarial Network (GAN) and a cross-attention mechanism to capture the most relevant features from each modality. Experimental results show MISFIT-V offers enhanced robustness against misalignment and poor lighting/thermal environmental conditions.
arXiv Detail & Related papers (2023-09-22T23:41:24Z)
Multi-Task Cross-Modality Attention-Fusion for 2D Object Detection [6.388430091498446]
We propose two new radar preprocessing techniques to better align radar and camera data. We also introduce a Multi-Task Cross-Modality Attention-Fusion Network (MCAF-Net) for object detection. Our approach outperforms current state-of-the-art radar-camera fusion-based object detectors in the nuScenes dataset.
arXiv Detail & Related papers (2023-07-17T09:26:13Z)
Searching a Compact Architecture for Robust Multi-Exposure Image Fusion [55.37210629454589]
Two major stumbling blocks hinder the development, including pixel misalignment and inefficient inference. This study introduces an architecture search-based paradigm incorporating self-alignment and detail repletion modules for robust multi-exposure image fusion. The proposed method outperforms various competitive schemes, achieving a noteworthy 3.19% improvement in PSNR for general scenarios and an impressive 23.5% enhancement in misaligned scenarios.
arXiv Detail & Related papers (2023-05-20T17:01:52Z)
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network. We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z)
Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration [59.02821429555375]
We present a robust cross-modality generation-registration paradigm for unsupervised misaligned infrared and visible image fusion. To better fuse the registered infrared images and visible images, we present a feature Interaction Fusion Module (IFM)
arXiv Detail & Related papers (2022-05-24T07:51:57Z)
ReDFeat: Recoupling Detection and Description for Multimodal Feature Learning [51.07496081296863]
We recouple independent constraints of detection and description of multimodal feature learning with a mutual weighting strategy. We propose a detector that possesses a large receptive field and is equipped with learnable non-maximum suppression layers. We build a benchmark that contains cross visible, infrared, near-infrared and synthetic aperture radar image pairs for evaluating the performance of features in feature matching and image registration tasks.
arXiv Detail & Related papers (2022-05-16T04:24:22Z)
Pay "Attention" to Adverse Weather: Weather-aware Attention-based Object Detection [5.816506391882502]
This paper proposes a Global-Local Attention (GLA) framework to adaptively fuse the multi-modality sensing streams. Specifically, GLA integrates an early-stage fusion via a local attention network and a late-stage fusion via a global attention network to deal with both local and global information. Experimental results demonstrate the superior performance of the proposed GLA compared with state-of-the-art fusion approaches.
arXiv Detail & Related papers (2022-04-22T16:32:34Z)
Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection. Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks. This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z)
Fusion Detection via Distance-Decay IoU and weighted Dempster-Shafer Evidence Theory [0.0]
A fast multi-source fusion detection framework is proposed in current paper. A novel distance-decay intersection over union is employed to encode the shape properties of the targets. The weighted Dempster-Shafer evidence theory is utilized to combine the optical and synthetic aperture radar detection.
arXiv Detail & Related papers (2021-12-06T13:46:39Z)
Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of Adverse Weather Conditions for 3D Object Detection [60.89616629421904]
Lidar-based object detectors are critical parts of the 3D perception pipeline in autonomous navigation systems such as self-driving cars. They are sensitive to adverse weather conditions such as rain, snow and fog due to reduced signal-to-noise ratio (SNR) and signal-to-background ratio (SBR)
arXiv Detail & Related papers (2021-07-14T21:10:47Z)
Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning [59.19469551774703]
Drone-based vehicle detection aims at finding the vehicle locations and categories in an aerial image. We construct a large-scale drone-based RGB-Infrared vehicle detection dataset, termed DroneVehicle. Our DroneVehicle collects 28, 439 RGB-Infrared image pairs, covering urban roads, residential areas, parking lots, and other scenarios from day to night.
arXiv Detail & Related papers (2020-03-05T05:29:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.