Related papers: Beyond RGB and Events: Enhancing Object Detection under Adverse Lighting with Monocular Normal Maps

Beyond RGB and Events: Enhancing Object Detection under Adverse Lighting with Monocular Normal Maps

URL: http://arxiv.org/abs/2508.02127v1
Date: Mon, 04 Aug 2025 07:19:20 GMT
Title: Beyond RGB and Events: Enhancing Object Detection under Adverse Lighting with Monocular Normal Maps
Authors: Mingjie Liu, Hanqing Liu, Chuang Zhu,
Abstract summary: We introduce NRE-Net, a novel multi-modal detection framework.<n>It fuses three complementary modalities: monocularly predicted surface normal maps, RGB images, and event streams.<n>NRE-Net significantly outperforms state-of-the-art methods.
Score: 6.240947520777607
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate object detection under adverse lighting conditions is critical for real-world applications such as autonomous driving. Although neuromorphic event cameras have been introduced to handle these scenarios, adverse lighting often induces distracting reflections from tunnel walls or road surfaces, which frequently lead to false obstacle detections. However, neither RGB nor event data alone is robust enough to address these complexities, and mitigating these issues without additional sensors remains underexplored. To overcome these challenges, we propose leveraging normal maps, directly predicted from monocular RGB images, as robust geometric cues to suppress false positives and enhance detection accuracy. We introduce NRE-Net, a novel multi-modal detection framework that effectively fuses three complementary modalities: monocularly predicted surface normal maps, RGB images, and event streams. To optimize the fusion process, our framework incorporates two key modules: the Adaptive Dual-stream Fusion Module (ADFM), which integrates RGB and normal map features, and the Event-modality Aware Fusion Module (EAFM), which adapts to the high dynamic range characteristics of event data. Extensive evaluations on the DSEC-Det-sub and PKU-DAVIS-SOD datasets demonstrate that NRE-Net significantly outperforms state-of-the-art methods. Our approach achieves mAP50 improvements of 7.9% and 6.1% over frame-based approaches (e.g., YOLOX), while surpassing the fusion-based SFNet by 2.7% on the DSEC-Det-sub dataset and SODFormer by 7.1% on the PKU-DAVIS-SOD dataset.

Related papers

Lightweight RGB-D Salient Object Detection from a Speed-Accuracy Tradeoff Perspective [54.91271106816616]
Current RGB-D methods usually leverage large-scale backbones to improve accuracy but sacrifice efficiency.<n>We propose a Speed-Accuracy Tradeoff Network (SATNet) for Lightweight RGB-D SOD from three fundamental perspectives.<n> Concerning depth quality, we introduce the Depth Anything Model to generate high-quality depth maps.<n>For modality fusion, we propose a Decoupled Attention Module (DAM) to explore the consistency within and between modalities.<n>For feature representation, we develop a Dual Information Representation Module (DIRM) with a bi-directional inverted framework.
arXiv Detail & Related papers (2025-05-07T19:37:20Z)
FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z)
Cross-Modal Object Tracking via Modality-Aware Fusion Network and A Large-Scale Dataset [20.729414075628814]
We propose an adaptive cross-modal object tracking algorithm called Modality-Aware Fusion Network (MAFNet) MAFNet efficiently integrates information from both RGB and NIR modalities using an adaptive weighting mechanism.
arXiv Detail & Related papers (2023-12-22T05:22:33Z)
Chasing Day and Night: Towards Robust and Efficient All-Day Object Detection Guided by an Event Camera [8.673063170884591]
EOLO is a novel object detection framework that achieves robust and efficient all-day detection by fusing both RGB and event modalities. Our EOLO framework is built based on a lightweight spiking neural network (SNN) to efficiently leverage the asynchronous property of events.
arXiv Detail & Related papers (2023-09-17T15:14:01Z)
RGB-Event Fusion for Moving Object Detection in Autonomous Driving [3.5397758597664306]
Moving Object Detection (MOD) is a critical vision task for successfully achieving safe autonomous driving. Recent advances in sensor technologies, especially the Event camera, can naturally complement the conventional camera approach to better model moving objects. We propose RENet, a novel RGB-Event fusion Network, that jointly exploits the two complementary modalities to achieve more robust MOD.
arXiv Detail & Related papers (2022-09-17T12:59:08Z)
MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems. Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion. We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z)
Mirror Complementary Transformer Network for RGB-thermal Salient Object Detection [16.64781797503128]
RGB-thermal object detection (RGB-T SOD) aims to locate the common prominent objects of an aligned visible and thermal infrared image pair. In this paper, we propose a novel mirror complementary Transformer network (MCNet) for RGB-T SOD. Experiments on benchmark and VT723 datasets show that the proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-07-07T20:26:09Z)
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR. fusing these two modalities can significantly boost the performance of 3D perception models. We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z)
RGB-D Salient Object Detection with Ubiquitous Target Awareness [37.6726410843724]
We make the first attempt to solve the RGB-D salient object detection problem with a novel depth-awareness framework. We propose a Ubiquitous Target Awareness (UTA) network to solve three important challenges in RGB-D SOD task. Our proposed UTA network is depth-free for inference and runs in real-time with 43 FPS.
arXiv Detail & Related papers (2021-09-08T04:27:29Z)
EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics. In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z)
DUT-LFSaliency: Versatile Dataset and Light Field-to-RGB Saliency Detection [104.50425501764806]
We introduce a large-scale dataset to enable versatile applications for light field saliency detection. We present an asymmetrical two-stream model consisting of the Focal stream and RGB stream. Experiments demonstrate that our Focal stream achieves state-of-the-arts performance.
arXiv Detail & Related papers (2020-12-30T11:53:27Z)
RGB-D Salient Object Detection with Cross-Modality Modulation and Selection [126.4462739820643]
We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD) The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features.
arXiv Detail & Related papers (2020-07-14T14:22:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.