BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation
- URL: http://arxiv.org/abs/2503.03280v1
- Date: Wed, 05 Mar 2025 09:03:46 GMT
- Title: BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation
- Authors: Hiep Truong Cong, Ajay Kumar Sigatapu, Arindam Das, Yashwanth Sharma, Venkatesh Satagopan, Ganesh Sistu, Ciaran Eising,
- Abstract summary: We introduce BEVMOSNet, the first end-to-end multimodal fusion leveraging cameras, LiDAR, and radar to precisely predict the moving objects in bird's-eye-view (BEV)<n>We show an overall improvement in IoU score of 36.59% compared to the vision-based unimodal baseline BEV-MoSeg.
- Score: 3.613463012025065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate motion understanding of the dynamic objects within the scene in bird's-eye-view (BEV) is critical to ensure a reliable obstacle avoidance system and smooth path planning for autonomous vehicles. However, this task has received relatively limited exploration when compared to object detection and segmentation with only a few recent vision-based approaches presenting preliminary findings that significantly deteriorate in low-light, nighttime, and adverse weather conditions such as rain. Conversely, LiDAR and radar sensors remain almost unaffected in these scenarios, and radar provides key velocity information of the objects. Therefore, we introduce BEVMOSNet, to our knowledge, the first end-to-end multimodal fusion leveraging cameras, LiDAR, and radar to precisely predict the moving objects in BEV. In addition, we perform a deeper analysis to find out the optimal strategy for deformable cross-attention-guided sensor fusion for cross-sensor knowledge sharing in BEV. While evaluating BEVMOSNet on the nuScenes dataset, we show an overall improvement in IoU score of 36.59% compared to the vision-based unimodal baseline BEV-MoSeg (Sigatapu et al., 2023), and 2.35% compared to the multimodel SimpleBEV (Harley et al., 2022), extended for the motion segmentation task, establishing this method as the state-of-the-art in BEV motion segmentation.
Related papers
- AttentiveGRU: Recurrent Spatio-Temporal Modeling for Advanced Radar-Based BEV Object Detection [5.5967570276373655]
Bird's-eye view (BEV) object detection has become important for advanced automotive 3D radar-based perception systems.
In this paper, we introduce AttenRUtive, a novel attention-based recurrent approach tailored for radar constraints.
arXiv Detail & Related papers (2025-04-01T09:10:47Z) - RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion [58.77329237533034]
We propose a Radar-Camera fusion transformer (RaCFormer) to boost the accuracy of 3D object detection.<n>RaCFormer achieves superior results of 64.9% mAP and 70.2% NDS on nuScenes, even outperforming several LiDAR-based detectors.
arXiv Detail & Related papers (2024-12-17T09:47:48Z) - OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation [57.2213693781672]
Bird's-eye-view (BEV) semantic segmentation is becoming crucial in autonomous driving systems.
We propose OE-BevSeg, an end-to-end multimodal framework that enhances BEV segmentation performance.
Our approach achieves state-of-the-art results by a large margin on the nuScenes dataset for vehicle segmentation.
arXiv Detail & Related papers (2024-07-18T03:48:22Z) - BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight [30.45553559416835]
We propose BLOS-BEV, a novel BEV segmentation model that incorporates SD maps for accurate beyond line-of-sight perception, up to 200m.
Our approach is applicable to common BEV architectures and can achieve excellent results by incorporating information derived from SD maps.
arXiv Detail & Related papers (2024-07-11T14:15:48Z) - Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving [55.93813178692077]
We present RoboBEV, an extensive benchmark suite designed to evaluate the resilience of BEV algorithms.
We assess 33 state-of-the-art BEV-based perception models spanning tasks like detection, map segmentation, depth estimation, and occupancy prediction.
Our experimental results also underline the efficacy of strategies like pre-training and depth-free BEV transformations in enhancing robustness against out-of-distribution data.
arXiv Detail & Related papers (2024-05-27T17:59:39Z) - BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation [22.870994478494566]
We introduce BEVCar, a novel approach for joint BEV object and map segmentation.
The core novelty of our approach lies in first learning a point-based encoding of raw radar data.
We show that incorporating radar information significantly enhances robustness in challenging environmental conditions.
arXiv Detail & Related papers (2024-03-18T13:14:46Z) - Instance-aware Multi-Camera 3D Object Detection with Structural Priors
Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.
We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z) - BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection [47.7933708173225]
Recently, the rise of query-based Transformer decoders is reshaping camera-based 3D object detection.
This paper introduces a "modernized" dense BEV framework dubbed BEVNeXt.
On the nuScenes benchmark, BEVNeXt outperforms both BEV-based and query-based frameworks.
arXiv Detail & Related papers (2023-12-04T07:35:02Z) - Towards Efficient 3D Object Detection in Bird's-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach [13.513005108086006]
We propose an efficient BEV-based 3D detection framework called BEVENet.
BEVENet is 3$times$ faster than contemporary state-of-the-art (SOTA) approaches on the NuScenes challenge.
Our experiments show that BEVENet is 3$times$ faster than contemporary state-of-the-art (SOTA) approaches.
arXiv Detail & Related papers (2023-12-01T14:52:59Z) - OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection [29.530177591608297]
Multi-view 3D object detection is becoming popular in autonomous driving due to its high effectiveness and low cost.
Most of the current state-of-the-art detectors follow the query-based bird's-eye-view (BEV) paradigm.
We propose an Object-Centric query-BEV detector OCBEV, which can carve the temporal and spatial cues of moving targets more effectively.
arXiv Detail & Related papers (2023-06-02T17:59:48Z) - MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation [104.12419434114365]
In real-world applications, sensor corruptions and failures lead to inferior performances.
We propose a robust framework, called MetaBEV, to address extreme real-world environments.
We show MetaBEV outperforms prior arts by a large margin on both full and corrupted modalities.
arXiv Detail & Related papers (2023-04-19T16:37:17Z) - Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images [96.66271207089096]
FCOS-LiDAR is a fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes.
We show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors.
arXiv Detail & Related papers (2022-05-27T05:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.