Related papers: FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation

FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation

URL: http://arxiv.org/abs/2307.01492v1
Date: Tue, 4 Jul 2023 05:55:54 GMT
Title: FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation
Authors: Zhiqi Li, Zhiding Yu, David Austin, Mingsheng Fang, Shiyi Lan, Jan Kautz, Jose M. Alvarez
Abstract summary: Proposal builds upon FB-BEV, a cutting-edge camera-based bird's-eye view perception design using forward-backward projection. Designs and optimization result in a state-of-the-art mIoU score of 54.19% on the nuScenes dataset, ranking the 1st place in the challenge track.
Score: 79.41536932037822
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This technical report summarizes the winning solution for the 3D Occupancy Prediction Challenge, which is held in conjunction with the CVPR 2023 Workshop on End-to-End Autonomous Driving and CVPR 23 Workshop on Vision-Centric Autonomous Driving Workshop. Our proposed solution FB-OCC builds upon FB-BEV, a cutting-edge camera-based bird's-eye view perception design using forward-backward projection. On top of FB-BEV, we further study novel designs and optimization tailored to the 3D occupancy prediction task, including joint depth-semantic pre-training, joint voxel-BEV representation, model scaling up, and effective post-processing strategies. These designs and optimization result in a state-of-the-art mIoU score of 54.19% on the nuScenes dataset, ranking the 1st place in the challenge track. Code and models will be released at: https://github.com/NVlabs/FB-BEV.

Related papers

An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training [50.71892161377806]
DFIT-OccWorld is an efficient 3D occupancy world model that leverages decoupled dynamic flow and image-assisted training strategy. Our model forecasts future dynamic voxels by warping existing observations using voxel flow, whereas static voxels are easily obtained through pose transformation.
arXiv Detail & Related papers (2024-12-18T12:10:33Z)
AdaOcc: Adaptive Forward View Transformation and Flow Modeling for 3D Occupancy and Flow Prediction [56.72301849123049]
We present our solution for the Vision-Centric 3D Occupancy and Flow Prediction track in the nuScenes Open-Occ dataset challenge at CVPR 2024. Our innovative approach involves a dual-stage framework that enhances 3D occupancy and flow predictions by incorporating adaptive forward view transformation and flow modeling. Our method combines regression with classification to address scale variations in different scenes, and leverages predicted flow to warp current voxel features to future frames, guided by future frame ground truth.
arXiv Detail & Related papers (2024-07-01T16:32:15Z)
End-to-End Autonomous Driving without Costly Modularization and 3D Manual Annotation [34.070813293944944]
We propose UAD, a method for vision-based end-to-end autonomous driving (E2EAD) Our motivation stems from the observation that current E2EAD models still mimic the modular architecture in typical driving stacks. Our UAD achieves 38.7% relative improvements over UniAD on the average collision rate in nuScenes and surpasses VAD for 41.32 points on the driving score in CARLA's Town05 Long benchmark.
arXiv Detail & Related papers (2024-06-25T16:12:52Z)
BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in Vision-based Roadside 3D Object Detection [47.74067616658986]
Vision-based roadside 3D object detection has attracted rising attention in autonomous driving domain. Inspired by this insight, we propose a novel voxel pooling strategy to reduce such error, dubbed BEVSpread. BeVSpread can significantly improve the performance of existing frustum-based BEV methods by a large margin.
arXiv Detail & Related papers (2024-06-13T03:33:36Z)
OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks [75.10231099007494]
We introduce a self-supervised pretraining method, called OccFeat, for Bird's-Eye-View (BEV) segmentation networks. With OccFeat, we pretrain a BEV network via occupancy prediction and feature distillation tasks. Models pretrained with our method exhibit improved BEV semantic segmentation performance, particularly in low-data scenarios.
arXiv Detail & Related papers (2024-04-22T09:43:03Z)
OccTransformer: Improving BEVFormer for 3D camera-only occupancy prediction [32.17406995216123]
"occTransformer" is used for the 3D occupancy prediction track in the autonomous driving challenge at CVPR 2023. Our method builds upon the strong baseline BEVFormer and improves its performance through several simple yet effective techniques. Using these methods, our solution achieved 49.23 miou on the 3D occupancy prediction track in the autonomous driving challenge.
arXiv Detail & Related papers (2024-02-28T08:03:34Z)
RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images. We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z)
Instance-aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field. We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z)
Towards Efficient 3D Object Detection in Bird's-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach [13.513005108086006]
We propose an efficient BEV-based 3D detection framework called BEVENet. BEVENet is 3$times$ faster than contemporary state-of-the-art (SOTA) approaches on the NuScenes challenge. Our experiments show that BEVENet is 3$times$ faster than contemporary state-of-the-art (SOTA) approaches.
arXiv Detail & Related papers (2023-12-01T14:52:59Z)
UniOcc: Unifying Vision-Centric 3D Occupancy Prediction with Geometric and Semantic Rendering [27.712689811093362]
We present our solution, named UniOCC, for the Vision-Centric 3D occupancy prediction track. Our solution achieves 51.27% mIoU on the official leaderboard with single model, placing 3rd in this challenge.
arXiv Detail & Related papers (2023-06-15T13:23:57Z)
BEV-IO: Enhancing Bird's-Eye-View 3D Detection with Instance Occupancy [58.92659367605442]
We present BEV-IO, a new 3D detection paradigm to enhance BEV representation with instance occupancy information. We show that BEV-IO can outperform state-of-the-art methods while only adding a negligible increase in parameters and computational overhead.
arXiv Detail & Related papers (2023-05-26T11:16:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.