OccFusion: Multi-Sensor Fusion Framework for 3D Semantic Occupancy Prediction
- URL: http://arxiv.org/abs/2403.01644v4
- Date: Thu, 9 May 2024 05:20:54 GMT
- Title: OccFusion: Multi-Sensor Fusion Framework for 3D Semantic Occupancy Prediction
- Authors: Zhenxing Ming, Julie Stephany Berrio, Mao Shan, Stewart Worrall,
- Abstract summary: This paper introduces OccFusion, a novel sensor fusion framework for predicting 3D occupancy.
By integrating features from additional sensors, such as lidar and surround view radars, our framework enhances the accuracy and robustness of occupancy prediction.
- Score: 11.33083039877258
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: A comprehensive understanding of 3D scenes is crucial in autonomous vehicles (AVs), and recent models for 3D semantic occupancy prediction have successfully addressed the challenge of describing real-world objects with varied shapes and classes. However, existing methods for 3D occupancy prediction heavily rely on surround-view camera images, making them susceptible to changes in lighting and weather conditions. This paper introduces OccFusion, a novel sensor fusion framework for predicting 3D occupancy. By integrating features from additional sensors, such as lidar and surround view radars, our framework enhances the accuracy and robustness of occupancy prediction, resulting in top-tier performance on the nuScenes benchmark. Furthermore, extensive experiments conducted on the nuScenes and semanticKITTI dataset, including challenging night and rainy scenarios, confirm the superior performance of our sensor fusion strategy across various perception ranges. The code for this framework will be made available at https://github.com/DanielMing123/OccFusion.
Related papers
- PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.
Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z) - Progressive Multi-Modal Fusion for Robust 3D Object Detection [12.048303829428452]
Existing methods perform sensor fusion in a single view by projecting features from both modalities either in Bird's Eye View (BEV) or Perspective View (PV)
We propose ProFusion3D, a progressive fusion framework that combines features in both BEV and PV at both intermediate and object query levels.
Our architecture hierarchically fuses local and global features, enhancing the robustness of 3D object detection.
arXiv Detail & Related papers (2024-10-09T22:57:47Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction [5.285847977231642]
3D occupancy prediction based on multi-sensor fusion,crucial for a reliable autonomous driving system.
Previous fusion-based 3D occupancy predictions relied on depth estimation for processing 2D image features.
We propose OccFusion, a depth estimation free multi-modal fusion framework.
arXiv Detail & Related papers (2024-03-08T14:07:37Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection [19.75965521357068]
We propose a novel approach called SOGDet (Semantic-Occupancy Guided Multi-view 3D Object Detection) to improve the accuracy of 3D object detection.
Our results show that SOGDet consistently enhance the performance of three baseline methods in terms of nuScenes Detection Score (NDS) and mean Average Precision (mAP)
This indicates that the combination of 3D object detection and 3D semantic occupancy leads to a more comprehensive perception of the 3D environment, thereby aiding build more robust autonomous driving systems.
arXiv Detail & Related papers (2023-08-26T07:38:21Z) - SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [98.74706005223685]
3D scene understanding plays a vital role in vision-based autonomous driving.
We propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.
arXiv Detail & Related papers (2023-03-16T17:59:08Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object
Detection [15.641616738865276]
We propose a general multimodal fusion framework FusionPainting to fuse the 2D RGB image and 3D point clouds at a semantic level for boosting the 3D object detection task.
Especially, the FusionPainting framework consists of three main modules: a multi-modal semantic segmentation module, an adaptive attention-based semantic fusion module, and a 3D object detector.
The effectiveness of the proposed framework has been verified on the large-scale nuScenes detection benchmark.
arXiv Detail & Related papers (2021-06-23T14:53:22Z) - Deep Continuous Fusion for Multi-Sensor 3D Object Detection [103.5060007382646]
We propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization.
We design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution.
arXiv Detail & Related papers (2020-12-20T18:43:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.