Related papers: MetaOcc: Surround-View 4D Radar and Camera Fusion Framework for 3D Occupancy Prediction with Dual Training Strategies

MetaOcc: Surround-View 4D Radar and Camera Fusion Framework for 3D Occupancy Prediction with Dual Training Strategies

URL: http://arxiv.org/abs/2501.15384v1
Date: Sun, 26 Jan 2025 03:51:56 GMT
Title: MetaOcc: Surround-View 4D Radar and Camera Fusion Framework for 3D Occupancy Prediction with Dual Training Strategies
Authors: Long Yang, Lianqing Zheng, Wenjin Ai, Minghao Liu, Sen Li, Qunshu Lin, Shengyu Yan, Jie Bai, Zhixiong Ma, Xichan Zhu,
Abstract summary: We propose MetaOcc, a novel multi-modal occupancy prediction framework.<n>We first design a height self-attention module for effective 3D feature extraction from sparse radar points.<n>Finally, we develop a semi-supervised training procedure leveraging open-set segmentor and geometric constraints for pseudo-label generation.
Score: 10.662778683303726
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D occupancy prediction is crucial for autonomous driving perception. Fusion of 4D radar and camera provides a potential solution of robust occupancy prediction on serve weather with least cost. How to achieve effective multi-modal feature fusion and reduce annotation costs remains significant challenges. In this work, we propose MetaOcc, a novel multi-modal occupancy prediction framework that fuses surround-view cameras and 4D radar for comprehensive environmental perception. We first design a height self-attention module for effective 3D feature extraction from sparse radar points. Then, a local-global fusion mechanism is proposed to adaptively capture modality contributions while handling spatio-temporal misalignments. Temporal alignment and fusion module is employed to further aggregate historical feature. Furthermore, we develop a semi-supervised training procedure leveraging open-set segmentor and geometric constraints for pseudo-label generation, enabling robust perception with limited annotations. Extensive experiments on OmniHD-Scenes dataset demonstrate that MetaOcc achieves state-of-the-art performance, surpassing previous methods by significant margins. Notably, as the first semi-supervised 4D radar and camera fusion-based occupancy prediction approach, MetaOcc maintains 92.5% of the fully-supervised performance while using only 50% of ground truth annotations, establishing a new benchmark for multi-modal 3D occupancy prediction. Code and data are available at https://github.com/LucasYang567/MetaOcc.

Related papers

ZFusion: An Effective Fuser of Camera and 4D Radar for 3D Object Perception in Autonomous Driving [7.037019489455008]
We propose a 3D object detection method, termed ZFusion, which fuses 4D radar and vision modality. FP-DDCA fuser packs Transformer blocks to interactively fuse multi-modal features at different scales. Experiments show that ZFusion achieved the state-of-the-art mAP in the region of interest.
arXiv Detail & Related papers (2025-04-04T13:29:32Z)
MinkOcc: Towards real-time label-efficient semantic occupancy prediction [8.239334282982623]
MinkOcc is a multi-modal 3D semantic occupancy prediction framework for cameras and LiDARs. It reduces reliance on manual labeling by 90% while maintaining competitive accuracy. We aim to extend MinkOcc beyond curated datasets, enabling broader real-world deployment of 3D semantic occupancy prediction in autonomous driving.
arXiv Detail & Related papers (2025-04-03T04:31:56Z)
RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection [68.99784784185019]
Poor lighting or adverse weather conditions degrade camera performance. Radar suffers from noise and positional ambiguity. We propose RobuRCDet, a robust object detection model in BEV.
arXiv Detail & Related papers (2025-02-18T17:17:38Z)
Bayesian Approximation-Based Trajectory Prediction and Tracking with 4D Radar [13.438311878715536]
3D multi-object tracking (MOT) is vital for autonomous vehicles, yet LiDAR and camera-based methods degrade in adverse weather. We propose Bayes-4DRTrack, a 4D Radar-based MOT framework that adopts a transformer-based motion prediction network.
arXiv Detail & Related papers (2025-02-03T13:49:21Z)
Doracamom: Joint 3D Detection and Occupancy Prediction with Multi-view 4D Radars and Cameras for Omnidirectional Perception [9.76463525667238]
We propose Doracamom, the first framework that fuses multi-view cameras and 4D radar for joint 3D object detection and semantic occupancy prediction.<n>Code and models will be publicly available.
arXiv Detail & Related papers (2025-01-26T04:24:07Z)
ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks. We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation. Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z)
ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera [53.20087549782785]
We introduce ET-Former, a novel end-to-end algorithm for semantic scene completion using a single monocular camera. Our approach generates a semantic occupancy map from single RGB observation while simultaneously providing uncertainty estimates for semantic predictions.
arXiv Detail & Related papers (2024-10-14T19:14:49Z)
RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar [15.776076554141687]
3D occupancy-based perception pipeline has significantly advanced autonomous driving. Current methods rely on LiDAR or camera inputs for 3D occupancy prediction. We introduce a novel approach that utilizes 4D imaging radar sensors for 3D occupancy prediction.
arXiv Detail & Related papers (2024-05-22T21:48:17Z)
OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision. We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range. For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z)
DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving. We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z)
HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for Autonomous Driving [95.42203932627102]
3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians. Our method efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin. Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages.
arXiv Detail & Related papers (2022-12-15T11:15:14Z)
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR. fusing these two modalities can significantly boost the performance of 3D perception models. We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.