Related papers: Beyond conventional vision: RGB-event fusion for robust object detection in dynamic traffic scenarios

Beyond conventional vision: RGB-event fusion for robust object detection in dynamic traffic scenarios

URL: http://arxiv.org/abs/2508.10704v1
Date: Thu, 14 Aug 2025 14:48:21 GMT
Title: Beyond conventional vision: RGB-event fusion for robust object detection in dynamic traffic scenarios
Authors: Zhanwen Liu, Yujing Sun, Yang Wang, Nan Yang, Shengbo Eben Li, Xiangmo Zhao,
Abstract summary: Dynamic range of conventional RGB cameras reduces global contrast and causes loss of high-frequency details.<n>We propose a motion cue fusion network (MCFNet) which achieves optimal cross-modal feature fusion under challenging lighting.<n>MCFNet significantly outperforms existing methods in various poor lighting and fast moving traffic scenarios.
Score: 23.41380544271609
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The dynamic range limitation of conventional RGB cameras reduces global contrast and causes loss of high-frequency details such as textures and edges in complex traffic environments (e.g., nighttime driving, tunnels), hindering discriminative feature extraction and degrading frame-based object detection. To address this, we integrate a bio-inspired event camera with an RGB camera to provide high dynamic range information and propose a motion cue fusion network (MCFNet), which achieves optimal spatiotemporal alignment and adaptive cross-modal feature fusion under challenging lighting. Specifically, an event correction module (ECM) temporally aligns asynchronous event streams with image frames via optical-flow-based warping, jointly optimized with the detection network to learn task-aware event representations. The event dynamic upsampling module (EDUM) enhances spatial resolution of event frames to match image structures, ensuring precise spatiotemporal alignment. The cross-modal mamba fusion module (CMM) uses adaptive feature fusion with a novel interlaced scanning mechanism, effectively integrating complementary information for robust detection. Experiments conducted on the DSEC-Det and PKU-DAVIS-SOD datasets demonstrate that MCFNet significantly outperforms existing methods in various poor lighting and fast moving traffic scenarios. Notably, on the DSEC-Det dataset, MCFNet achieves a remarkable improvement, surpassing the best existing methods by 7.4% in mAP50 and 1.7% in mAP metrics, respectively. The code is available at https://github.com/Charm11492/MCFNet.

Related papers

Decoupling Amplitude and Phase Attention in Frequency Domain for RGB-Event based Visual Object Tracking [51.31378940976401]
Existing RGB-Event tracking approaches fail to fully exploit the unique advantages of event cameras.<n>We propose a novel tracking framework that performs early fusion in the frequency domain, enabling effective aggregation of high-frequency information from the event modality.<n>Experiments on three widely used RGB-Event tracking benchmark datasets, including FE108, FELT, and COESOT, demonstrate the high performance and efficiency of our method.
arXiv Detail & Related papers (2026-01-03T01:10:17Z)
MambaSeg: Harnessing Mamba for Accurate and Efficient Image-Event Semantic Segmentation [17.515348703686232]
We propose MambaSeg, a novel dual-branch semantic segmentation framework that employs parallel Mamba encoders to efficiently model RGB images and event streams.<n>MambaSeg achieves state-of-the-art segmentation performance while significantly reducing computational cost, showcasing its promise for efficient, scalable, and robust multimodal perception.
arXiv Detail & Related papers (2025-12-30T14:09:17Z)
Inter-event Interval Microscopy for Event Cameras [52.05337480169517]
Event cameras, an innovative bio-inspired sensor, differ from traditional cameras by sensing changes in intensity rather than directly perceiving intensity.<n>We achieve event-to-intensity conversion using a static event camera for both static and dynamic scenes in fluorescence microscopy.<n>We have collected IEIMat dataset under various scenes including high dynamic range and high-speed scenarios.
arXiv Detail & Related papers (2025-04-07T11:05:13Z)
FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [92.4205087439928]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose the Self-supervised Transfer (PST) and the FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models, effectively mitigating data scarcity.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.<n>This combined approach enables FUSE to construct a universal image-event that only requires lightweight decoder adaptation for target datasets.
arXiv Detail & Related papers (2025-03-25T15:04:53Z)
Spatially-guided Temporal Aggregation for Robust Event-RGB Optical Flow Estimation [47.75348821902489]
Current optical flow methods exploit the stable appearance of frame (or RGB) data to establish robust correspondences across time.<n>Event cameras, on the other hand, provide high-temporal-resolution motion cues and excel in challenging scenarios.<n>This study introduces a novel approach that uses a spatially dense modality to guide the aggregation of the temporally dense event modality.
arXiv Detail & Related papers (2025-01-01T13:40:09Z)
Cross-Modal Object Tracking via Modality-Aware Fusion Network and A Large-Scale Dataset [20.729414075628814]
We propose an adaptive cross-modal object tracking algorithm called Modality-Aware Fusion Network (MAFNet) MAFNet efficiently integrates information from both RGB and NIR modalities using an adaptive weighting mechanism.
arXiv Detail & Related papers (2023-12-22T05:22:33Z)
Enhancing Traffic Object Detection in Variable Illumination with RGB-Event Fusion [29.117211261620934]
Traffic object detection under variable illumination is challenging due to the information loss caused by the limited dynamic range of conventional frame-based cameras. We propose a novel Structure-aware Fusion Network (SFNet) that extracts sharp and complete object structures from the event stream. Our proposed SFNet can overcome the perceptual boundaries of conventional cameras and outperform the frame-based method by 8.0% in mAP50 and 5.9% in mAP50:95.
arXiv Detail & Related papers (2023-11-01T10:59:57Z)
Alignment-free HDR Deghosting with Semantics Consistent Transformer [76.91669741684173]
High dynamic range imaging aims to retrieve information from multiple low-dynamic range inputs to generate realistic output. Existing methods often focus on the spatial misalignment across input frames caused by the foreground and/or camera motion. We propose a novel alignment-free network with a Semantics Consistent Transformer (SCTNet) with both spatial and channel attention modules.
arXiv Detail & Related papers (2023-05-29T15:03:23Z)
Dual Memory Aggregation Network for Event-Based Object Detection with Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner. Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation. Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z)
RGB-Event Fusion for Moving Object Detection in Autonomous Driving [3.5397758597664306]
Moving Object Detection (MOD) is a critical vision task for successfully achieving safe autonomous driving. Recent advances in sensor technologies, especially the Event camera, can naturally complement the conventional camera approach to better model moving objects. We propose RENet, a novel RGB-Event fusion Network, that jointly exploits the two complementary modalities to achieve more robust MOD.
arXiv Detail & Related papers (2022-09-17T12:59:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.