Scene as Occupancy
- URL: http://arxiv.org/abs/2306.02851v3
- Date: Mon, 26 Jun 2023 12:42:31 GMT
- Title: Scene as Occupancy
- Authors: Chonghao Sima, Wenwen Tong, Tai Wang, Li Chen, Silei Wu, Hanming Deng,
Yi Gu, Lewei Lu, Ping Luo, Dahua Lin, Hongyang Li
- Abstract summary: OccNet is a vision-centric pipeline with a cascade and temporal voxel decoder to reconstruct 3D occupancy.
We propose OpenOcc, the first dense high-quality 3D occupancy benchmark built on top of nuScenes.
- Score: 66.43673774733307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human driver can easily describe the complex traffic scene by visual system.
Such an ability of precise perception is essential for driver's planning. To
achieve this, a geometry-aware representation that quantizes the physical 3D
scene into structured grid map with semantic labels per cell, termed as 3D
Occupancy, would be desirable. Compared to the form of bounding box, a key
insight behind occupancy is that it could capture the fine-grained details of
critical obstacles in the scene, and thereby facilitate subsequent tasks. Prior
or concurrent literature mainly concentrate on a single scene completion task,
where we might argue that the potential of this occupancy representation might
obsess broader impact. In this paper, we propose OccNet, a multi-view
vision-centric pipeline with a cascade and temporal voxel decoder to
reconstruct 3D occupancy. At the core of OccNet is a general occupancy
embedding to represent 3D physical world. Such a descriptor could be applied
towards a wide span of driving tasks, including detection, segmentation and
planning. To validate the effectiveness of this new representation and our
proposed algorithm, we propose OpenOcc, the first dense high-quality 3D
occupancy benchmark built on top of nuScenes. Empirical experiments show that
there are evident performance gain across multiple tasks, e.g., motion planning
could witness a collision rate reduction by 15%-58%, demonstrating the
superiority of our method.
Related papers
- SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation [50.420711084672966]
We present SliceOcc, an RGB camera-based model specifically tailored for indoor 3D semantic occupancy prediction.
Experimental results on the EmbodiedScan dataset demonstrate that SliceOcc achieves a mIoU of 15.45% across 81 indoor categories.
arXiv Detail & Related papers (2025-01-28T03:41:24Z) - Towards Flexible 3D Perception: Object-Centric Occupancy Completion Augments 3D Object Detection [54.78470057491049]
Occupancy has emerged as a promising alternative for 3D scene perception.
We introduce object-centric occupancy as a supplement to object bboxes.
We show that our occupancy features significantly enhance the detection results of state-of-the-art 3D object detectors.
arXiv Detail & Related papers (2024-12-06T16:12:38Z) - ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers [9.271932084757646]
3D occupancy represents the entire scene without distinguishing between foreground and background by the physical space into a grid map.
We propose our learning-first view attention mechanism for effective multi-view feature aggregation.
We present FlowOcc3D, a benchmark built on top existing high-quality datasets.
arXiv Detail & Related papers (2024-05-07T13:15:07Z) - PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic
Segmentation [45.39981876226129]
We study camera-based 3D panoptic segmentation, aiming to achieve a unified occupancy representation for camera-only 3D scene understanding.
We introduce a novel method called PanoOcc, which utilizes voxel queries to aggregate semantic information from multi-frame and multi-view images.
Our approach achieves new state-of-the-art results for camera-based segmentation and panoptic segmentation on the nuScenes dataset.
arXiv Detail & Related papers (2023-06-16T17:59:33Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z) - JPerceiver: Joint Perception Network for Depth, Pose and Layout
Estimation in Driving Scenes [75.20435924081585]
JPerceiver can simultaneously estimate scale-aware depth and VO as well as BEV layout from a monocular video sequence.
It exploits the cross-view geometric transformation (CGT) to propagate the absolute scale from the road layout to depth and VO.
Experiments on Argoverse, Nuscenes and KITTI show the superiority of JPerceiver over existing methods on all the above three tasks.
arXiv Detail & Related papers (2022-07-16T10:33:59Z) - 3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure
Prior [50.73148041205675]
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation.
We propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation.
Our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks.
arXiv Detail & Related papers (2020-03-31T09:33:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.