PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic
Segmentation
- URL: http://arxiv.org/abs/2306.10013v1
- Date: Fri, 16 Jun 2023 17:59:33 GMT
- Title: PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic
Segmentation
- Authors: Yuqi Wang, Yuntao Chen, Xingyu Liao, Lue Fan and Zhaoxiang Zhang
- Abstract summary: We study camera-based 3D panoptic segmentation, aiming to achieve a unified occupancy representation for camera-only 3D scene understanding.
We introduce a novel method called PanoOcc, which utilizes voxel queries to aggregate semantic information from multi-frame and multi-view images.
Our approach achieves new state-of-the-art results for camera-based segmentation and panoptic segmentation on the nuScenes dataset.
- Score: 45.39981876226129
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Comprehensive modeling of the surrounding 3D world is key to the success of
autonomous driving. However, existing perception tasks like object detection,
road structure segmentation, depth & elevation estimation, and open-set object
localization each only focus on a small facet of the holistic 3D scene
understanding task. This divide-and-conquer strategy simplifies the algorithm
development procedure at the cost of losing an end-to-end unified solution to
the problem. In this work, we address this limitation by studying camera-based
3D panoptic segmentation, aiming to achieve a unified occupancy representation
for camera-only 3D scene understanding. To achieve this, we introduce a novel
method called PanoOcc, which utilizes voxel queries to aggregate spatiotemporal
information from multi-frame and multi-view images in a coarse-to-fine scheme,
integrating feature learning and scene representation into a unified occupancy
representation. We have conducted extensive ablation studies to verify the
effectiveness and efficiency of the proposed method. Our approach achieves new
state-of-the-art results for camera-based semantic segmentation and panoptic
segmentation on the nuScenes dataset. Furthermore, our method can be easily
extended to dense occupancy prediction and has shown promising performance on
the Occ3D benchmark. The code will be released at
https://github.com/Robertwyq/PanoOcc.
Related papers
- PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving [15.441175735210791]
Vision-centric occupancy networks represent the surrounding environment with uniform voxels with semantics.
Modern occupancy networks mainly focus on reconstructing visible voxels from object surfaces with voxel-wise semantic prediction.
arXiv Detail & Related papers (2024-06-11T07:51:26Z) - View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields [52.08335264414515]
We learn a novel feature field within a Neural Radiance Field (NeRF) representing a 3D scene.
Our method takes view-inconsistent multi-granularity 2D segmentations as input and produces a hierarchy of 3D-consistent segmentations as output.
We evaluate our method and several baselines on synthetic datasets with multi-view images and multi-granular segmentation, showcasing improved accuracy and viewpoint-consistency.
arXiv Detail & Related papers (2024-05-30T04:14:58Z) - Scene as Occupancy [66.43673774733307]
OccNet is a vision-centric pipeline with a cascade and temporal voxel decoder to reconstruct 3D occupancy.
We propose OpenOcc, the first dense high-quality 3D occupancy benchmark built on top of nuScenes.
arXiv Detail & Related papers (2023-06-05T13:01:38Z) - A Simple Baseline for Supervised Surround-view Depth Estimation [25.81521612343612]
We propose S3Depth, a Simple Baseline for Supervised Surround-view Depth Estimation.
We employ a global-to-local feature extraction module which combines CNN with transformer layers for enriched representations.
Our method achieves superior performance over existing state-of-the-art methods on both DDAD and nuScenes datasets.
arXiv Detail & Related papers (2023-03-14T10:06:19Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - 3D Scene Geometry-Aware Constraint for Camera Localization with Deep
Learning [11.599633757222406]
Recently end-to-end approaches based on convolutional neural network have been much studied to achieve or even exceed 3D-geometry based traditional methods.
In this work, we propose a compact network for absolute camera pose regression.
Inspired from those traditional methods, a 3D scene geometry-aware constraint is also introduced by exploiting all available information including motion, depth and image contents.
arXiv Detail & Related papers (2020-05-13T04:15:14Z) - 3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure
Prior [50.73148041205675]
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation.
We propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation.
Our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks.
arXiv Detail & Related papers (2020-03-31T09:33:46Z) - A Robotic 3D Perception System for Operating Room Environment Awareness [3.830091185868436]
We describe a 3D multi-view perception system for the da Vinci surgical system to enable Operating room (OR) scene understanding and context awareness.
Based on this architecture, a multi-view 3D scene semantic segmentation algorithm is created.
Our proposed architecture has acceptable registration error ($3.3%pm1.4%$ of object-camera distance) and can robustly improve scene segmentation performance.
arXiv Detail & Related papers (2020-03-20T20:27:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.