Scene as Occupancy
- URL: http://arxiv.org/abs/2306.02851v3
- Date: Mon, 26 Jun 2023 12:42:31 GMT
- Title: Scene as Occupancy
- Authors: Chonghao Sima, Wenwen Tong, Tai Wang, Li Chen, Silei Wu, Hanming Deng,
Yi Gu, Lewei Lu, Ping Luo, Dahua Lin, Hongyang Li
- Abstract summary: OccNet is a vision-centric pipeline with a cascade and temporal voxel decoder to reconstruct 3D occupancy.
We propose OpenOcc, the first dense high-quality 3D occupancy benchmark built on top of nuScenes.
- Score: 66.43673774733307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human driver can easily describe the complex traffic scene by visual system.
Such an ability of precise perception is essential for driver's planning. To
achieve this, a geometry-aware representation that quantizes the physical 3D
scene into structured grid map with semantic labels per cell, termed as 3D
Occupancy, would be desirable. Compared to the form of bounding box, a key
insight behind occupancy is that it could capture the fine-grained details of
critical obstacles in the scene, and thereby facilitate subsequent tasks. Prior
or concurrent literature mainly concentrate on a single scene completion task,
where we might argue that the potential of this occupancy representation might
obsess broader impact. In this paper, we propose OccNet, a multi-view
vision-centric pipeline with a cascade and temporal voxel decoder to
reconstruct 3D occupancy. At the core of OccNet is a general occupancy
embedding to represent 3D physical world. Such a descriptor could be applied
towards a wide span of driving tasks, including detection, segmentation and
planning. To validate the effectiveness of this new representation and our
proposed algorithm, we propose OpenOcc, the first dense high-quality 3D
occupancy benchmark built on top of nuScenes. Empirical experiments show that
there are evident performance gain across multiple tasks, e.g., motion planning
could witness a collision rate reduction by 15%-58%, demonstrating the
superiority of our method.
Related papers
- PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving [15.441175735210791]
Vision-centric occupancy networks represent the surrounding environment with uniform voxels with semantics.
Modern occupancy networks mainly focus on reconstructing visible voxels from object surfaces with voxel-wise semantic prediction.
arXiv Detail & Related papers (2024-06-11T07:51:26Z) - ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers [9.271932084757646]
3D occupancy represents the entire scene without distinguishing between foreground and background by the physical space into a grid map.
We propose our learning-first view attention mechanism for effective multi-view feature aggregation.
We present FlowOcc3D, a benchmark built on top existing high-quality datasets.
arXiv Detail & Related papers (2024-05-07T13:15:07Z) - SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction [77.15924044466976]
We propose SelfOcc to explore a self-supervised way to learn 3D occupancy using only video sequences.
We first transform the images into the 3D space (e.g., bird's eye view) to obtain 3D representation of the scene.
We can then render 2D images of previous and future frames as self-supervision signals to learn the 3D representations.
arXiv Detail & Related papers (2023-11-21T17:59:14Z) - PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic
Segmentation [45.39981876226129]
We study camera-based 3D panoptic segmentation, aiming to achieve a unified occupancy representation for camera-only 3D scene understanding.
We introduce a novel method called PanoOcc, which utilizes voxel queries to aggregate semantic information from multi-frame and multi-view images.
Our approach achieves new state-of-the-art results for camera-based segmentation and panoptic segmentation on the nuScenes dataset.
arXiv Detail & Related papers (2023-06-16T17:59:33Z) - Incremental 3D Semantic Scene Graph Prediction from RGB Sequences [86.77318031029404]
We propose a real-time framework that incrementally builds a consistent 3D semantic scene graph of a scene given an RGB image sequence.
Our method consists of a novel incremental entity estimation pipeline and a scene graph prediction network.
The proposed network estimates 3D semantic scene graphs with iterative message passing using multi-view and geometric features extracted from the scene entities.
arXiv Detail & Related papers (2023-05-04T11:32:16Z) - A Simple Framework for 3D Occupancy Estimation in Autonomous Driving [16.605853706182696]
We present a CNN-based framework designed to reveal several key factors for 3D occupancy estimation.
We also explore the relationship between 3D occupancy estimation and other related tasks, such as monocular depth estimation and 3D reconstruction.
arXiv Detail & Related papers (2023-03-17T15:57:14Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z) - JPerceiver: Joint Perception Network for Depth, Pose and Layout
Estimation in Driving Scenes [75.20435924081585]
JPerceiver can simultaneously estimate scale-aware depth and VO as well as BEV layout from a monocular video sequence.
It exploits the cross-view geometric transformation (CGT) to propagate the absolute scale from the road layout to depth and VO.
Experiments on Argoverse, Nuscenes and KITTI show the superiority of JPerceiver over existing methods on all the above three tasks.
arXiv Detail & Related papers (2022-07-16T10:33:59Z) - 3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure
Prior [50.73148041205675]
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation.
We propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation.
Our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks.
arXiv Detail & Related papers (2020-03-31T09:33:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.