OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic
Occupancy Perception
- URL: http://arxiv.org/abs/2303.03991v1
- Date: Tue, 7 Mar 2023 15:43:39 GMT
- Title: OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic
Occupancy Perception
- Authors: Xiaofeng Wang, Zheng Zhu, Wenbo Xu, Yunpeng Zhang, Yi Wei, Xu Chi, Yun
Ye, Dalong Du, Jiwen Lu, Xingang Wang
- Abstract summary: We propose OpenOccupancy, which is the first surrounding semantic occupancy perception benchmark.
We extend the large-scale nuScenes dataset with dense semantic occupancy annotations.
Considering the complexity of surrounding occupancy perception, we propose the Cascade Occupancy Network (CONet) to refine the coarse prediction.
- Score: 73.05425657479704
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic occupancy perception is essential for autonomous driving, as
automated vehicles require a fine-grained perception of the 3D urban
structures. However, existing relevant benchmarks lack diversity in urban
scenes, and they only evaluate front-view predictions. Towards a comprehensive
benchmarking of surrounding perception algorithms, we propose OpenOccupancy,
which is the first surrounding semantic occupancy perception benchmark. In the
OpenOccupancy benchmark, we extend the large-scale nuScenes dataset with dense
semantic occupancy annotations. Previous annotations rely on LiDAR points
superimposition, where some occupancy labels are missed due to sparse LiDAR
channels. To mitigate the problem, we introduce the Augmenting And Purifying
(AAP) pipeline to ~2x densify the annotations, where ~4000 human hours are
involved in the labeling process. Besides, camera-based, LiDAR-based and
multi-modal baselines are established for the OpenOccupancy benchmark.
Furthermore, considering the complexity of surrounding occupancy perception
lies in the computational burden of high-resolution 3D predictions, we propose
the Cascade Occupancy Network (CONet) to refine the coarse prediction, which
relatively enhances the performance by ~30% than the baseline. We hope the
OpenOccupancy benchmark will boost the development of surrounding occupancy
perception algorithms.
Related papers
- ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks.
We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation.
Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries.
OPUS incorporates a suite of non-trivial strategies to enhance model performance.
Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z) - Fully Sparse 3D Occupancy Prediction [37.265473869812816]
Occupancy prediction plays a pivotal role in autonomous driving.
Previous methods typically construct dense 3D volumes, neglecting the inherent sparsity of the scene and suffering from high computational costs.
We introduce a novel fully sparse occupancy network, termed SparseOcc.
SparseOcc initially reconstructs a sparse 3D representation from camera-only inputs and subsequently predicts semantic/instance occupancy from the 3D sparse representation by sparse queries.
arXiv Detail & Related papers (2023-12-28T16:54:53Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic
Occupancy Prediction [72.75478398447396]
We propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively.
Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system.
We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane.
arXiv Detail & Related papers (2023-08-31T17:57:17Z) - Scene as Occupancy [66.43673774733307]
OccNet is a vision-centric pipeline with a cascade and temporal voxel decoder to reconstruct 3D occupancy.
We propose OpenOcc, the first dense high-quality 3D occupancy benchmark built on top of nuScenes.
arXiv Detail & Related papers (2023-06-05T13:01:38Z) - Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous
Driving [34.368848580725576]
We develop a label generation pipeline that produces dense, visibility-aware labels for any given scene.
This pipeline comprises three stages: voxel densification, reasoning, and image-guided voxel refinement.
We propose a new model, dubbed Coarse-to-Fine Occupancy (CTF-Occ) network, which demonstrates superior performance on the Occ3D benchmarks.
arXiv Detail & Related papers (2023-04-27T17:40:08Z) - A Simple Framework for 3D Occupancy Estimation in Autonomous Driving [16.605853706182696]
We present a CNN-based framework designed to reveal several key factors for 3D occupancy estimation.
We also explore the relationship between 3D occupancy estimation and other related tasks, such as monocular depth estimation and 3D reconstruction.
arXiv Detail & Related papers (2023-03-17T15:57:14Z) - Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP
Benchmark [23.872360763782037]
ASAP is the first benchmark to evaluate the online performance of vision-centric perception in autonomous driving.
We propose an annotation-extending pipeline to generate high-frame-rate labels for the 12Hz raw images.
In the ASAP benchmark, comprehensive experiment results reveal that the model rank alters under different constraints.
arXiv Detail & Related papers (2022-12-17T16:32:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.