Related papers: SAMFusion: Sensor-Adaptive Multimodal Fusion for 3D Object Detection in Adverse Weather

SAMFusion: Sensor-Adaptive Multimodal Fusion for 3D Object Detection in Adverse Weather

URL: http://arxiv.org/abs/2508.16408v1
Date: Fri, 22 Aug 2025 14:20:46 GMT
Title: SAMFusion: Sensor-Adaptive Multimodal Fusion for 3D Object Detection in Adverse Weather
Authors: Edoardo Palladin, Roland Dietze, Praveen Narayanan, Mario Bijelic, Felix Heide,
Abstract summary: We introduce a novel multi-sensor fusion approach tailored to adverse weather conditions.<n>In addition to fusing RGB and LiDAR sensors, our sensor fusion stack is also capable of learning from NIR camera and radar modalities.<n>Our method improves the reliability of multimodal sensor fusion in autonomous vehicles under challenging weather conditions.
Score: 33.428611256223256
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal sensor fusion is an essential capability for autonomous robots, enabling object detection and decision-making in the presence of failing or uncertain inputs. While recent fusion methods excel in normal environmental conditions, these approaches fail in adverse weather, e.g., heavy fog, snow, or obstructions due to soiling. We introduce a novel multi-sensor fusion approach tailored to adverse weather conditions. In addition to fusing RGB and LiDAR sensors, which are employed in recent autonomous driving literature, our sensor fusion stack is also capable of learning from NIR gated camera and radar modalities to tackle low light and inclement weather. We fuse multimodal sensor data through attentive, depth-based blending schemes, with learned refinement on the Bird's Eye View (BEV) plane to combine image and range features effectively. Our detections are predicted by a transformer decoder that weighs modalities based on distance and visibility. We demonstrate that our method improves the reliability of multimodal sensor fusion in autonomous vehicles under challenging weather conditions, bridging the gap between ideal conditions and real-world edge cases. Our approach improves average precision by 17.2 AP compared to the next best method for vulnerable pedestrians in long distances and challenging foggy scenes. Our project page is available at https://light.princeton.edu/samfusion/

Related papers

MSSF: A 4D Radar and Camera Fusion Framework With Multi-Stage Sampling for 3D Object Detection in Autonomous Driving [9.184945917823047]
We present a simple but effective multi-stage sampling fusion (MSSF) network based on 4D radar and camera. MSSF achieves a 7.0% and 4.0% improvement in 3D mean average precision on the View-of-Delft (VoD) and TJ4DRadset datasets. It even surpasses classical LiDAR-based methods on the VoD dataset.
arXiv Detail & Related papers (2024-11-22T15:45:23Z)
CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes [56.52618054240197]
We propose a novel, condition-aware multimodal fusion approach for robust semantic perception of driving scenes.<n>Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a Condition Token.<n>Our model significantly improves robustness and accuracy, especially in adverse-condition scenarios.
arXiv Detail & Related papers (2024-10-14T17:56:20Z)
ContextualFusion: Context-Based Multi-Sensor Fusion for 3D Object Detection in Adverse Operating Conditions [1.7537812081430004]
We propose a technique called ContextualFusion to incorporate the domain knowledge about cameras and lidars behaving differently across lighting and weather variations into 3D object detection models. Our approach yields an mAP improvement of 6.2% over state-of-the-art methods on our context-balanced synthetic dataset. Our method enhances state-of-the-art 3D objection performance at night on the real-world NuScenes dataset with a significant mAP improvement of 11.7%.
arXiv Detail & Related papers (2024-04-23T06:37:54Z)
Echoes Beyond Points: Unleashing the Power of Raw Radar Data in Multi-modality Fusion [74.84019379368807]
We propose a novel method named EchoFusion to skip the existing radar signal processing pipeline. Specifically, we first generate the Bird's Eye View (BEV) queries and then take corresponding spectrum features from radar to fuse with other sensors.
arXiv Detail & Related papers (2023-07-31T09:53:50Z)
Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection. With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z)
Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme. Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z)
Pay "Attention" to Adverse Weather: Weather-aware Attention-based Object Detection [5.816506391882502]
This paper proposes a Global-Local Attention (GLA) framework to adaptively fuse the multi-modality sensing streams. Specifically, GLA integrates an early-stage fusion via a local attention network and a late-stage fusion via a global attention network to deal with both local and global information. Experimental results demonstrate the superior performance of the proposed GLA compared with state-of-the-art fusion approaches.
arXiv Detail & Related papers (2022-04-22T16:32:34Z)
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [49.689566246504356]
We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions. TransFusion achieves state-of-the-art performance on large-scale datasets. We extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking.
arXiv Detail & Related papers (2022-03-22T07:15:13Z)
HydraFusion: Context-Aware Selective Sensor Fusion for Robust and Efficient Autonomous Vehicle Perception [9.975955132759385]
Techniques to fuse sensor data from camera, radar, and lidar sensors have been proposed to improve autonomous vehicle (AV) perception. Existing methods are insufficiently robust in difficult driving contexts due to rigidity in their fusion implementations. We propose HydraFusion: a selective sensor fusion framework that learns to identify the current driving context and fuses the best combination of sensors.
arXiv Detail & Related papers (2022-01-17T22:19:53Z)
Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination. Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities. We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.