Related papers: PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving

PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving

URL: http://arxiv.org/abs/2406.07037v1
Date: Tue, 11 Jun 2024 07:51:26 GMT
Title: PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving
Authors: Yining Shi, Jiusi Li, Kun Jiang, Ke Wang, Yunlong Wang, Mengmeng Yang, Diange Yang,
Abstract summary: Vision-centric occupancy networks represent the surrounding environment with uniform voxels with semantics. Modern occupancy networks mainly focus on reconstructing visible voxels from object surfaces with voxel-wise semantic prediction.
Score: 15.441175735210791
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-centric occupancy networks, which represent the surrounding environment with uniform voxels with semantics, have become a new trend for safe driving of camera-only autonomous driving perception systems, as they are able to detect obstacles regardless of their shape and occlusion. Modern occupancy networks mainly focus on reconstructing visible voxels from object surfaces with voxel-wise semantic prediction. Usually, they suffer from inconsistent predictions of one object and mixed predictions for adjacent objects. These confusions may harm the safety of downstream planning modules. To this end, we investigate panoptic segmentation on 3D voxel scenarios and propose an instance-aware occupancy network, PanoSSC. We predict foreground objects and backgrounds separately and merge both in post-processing. For foreground instance grouping, we propose a novel 3D instance mask decoder that can efficiently extract individual objects. we unify geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation into PanoSSC framework and propose new metrics for evaluating panoptic voxels. Extensive experiments show that our method achieves competitive results on SemanticKITTI semantic scene completion benchmark.

Related papers

SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation [50.420711084672966]
We present SliceOcc, an RGB camera-based model specifically tailored for indoor 3D semantic occupancy prediction. Experimental results on the EmbodiedScan dataset demonstrate that SliceOcc achieves a mIoU of 15.45% across 81 indoor categories.
arXiv Detail & Related papers (2025-01-28T03:41:24Z)
Leverage Cross-Attention for End-to-End Open-Vocabulary Panoptic Reconstruction [24.82894136068243]
PanopticRecon++ is an end-to-end method that formulates panoptic reconstruction through a novel cross-attention perspective. This perspective models the relationship between 3D instances (as queries) and the scene's 3D embedding field (as keys) through their attention map. PanopticRecon++ shows competitive performance in terms of 3D and 2D segmentation and reconstruction performance on both simulation and real-world datasets.
arXiv Detail & Related papers (2025-01-02T07:37:09Z)
Towards Flexible 3D Perception: Object-Centric Occupancy Completion Augments 3D Object Detection [54.78470057491049]
Occupancy has emerged as a promising alternative for 3D scene perception. We introduce object-centric occupancy as a supplement to object bboxes. We show that our occupancy features significantly enhance the detection results of state-of-the-art 3D object detectors.
arXiv Detail & Related papers (2024-12-06T16:12:38Z)
WildOcc: A Benchmark for Off-Road 3D Semantic Occupancy Prediction [9.639795825672023]
Off-road environments are rich in geometric information, therefore it is suitable for 3D semantic occupancy prediction tasks. We introduce WildOcc, the first benchmark to provide dense occupancy annotations for off-road 3D semantic occupancy prediction tasks. A ground truth generation pipeline is proposed in this paper, which employs a coarse-to-fine reconstruction to achieve a more realistic result.
arXiv Detail & Related papers (2024-10-21T09:02:40Z)
OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision. We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range. For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z)
PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation [45.39981876226129]
We study camera-based 3D panoptic segmentation, aiming to achieve a unified occupancy representation for camera-only 3D scene understanding. We introduce a novel method called PanoOcc, which utilizes voxel queries to aggregate semantic information from multi-frame and multi-view images. Our approach achieves new state-of-the-art results for camera-based segmentation and panoptic segmentation on the nuScenes dataset.
arXiv Detail & Related papers (2023-06-16T17:59:33Z)
Scene as Occupancy [66.43673774733307]
OccNet is a vision-centric pipeline with a cascade and temporal voxel decoder to reconstruct 3D occupancy. We propose OpenOcc, the first dense high-quality 3D occupancy benchmark built on top of nuScenes.
arXiv Detail & Related papers (2023-06-05T13:01:38Z)
Unsupervised Multi-View Object Segmentation Using Radiance Field Propagation [55.9577535403381]
We present a novel approach to segmenting objects in 3D during reconstruction given only unlabeled multi-view images of a scene. The core of our method is a novel propagation strategy for individual objects' radiance fields with a bidirectional photometric loss. To the best of our knowledge, RFP is the first unsupervised approach for tackling 3D scene object segmentation for neural radiance field (NeRF)
arXiv Detail & Related papers (2022-10-02T11:14:23Z)
Towards Panoptic 3D Parsing for Single Image in the Wild [35.98539308998578]
This paper presents an integrated system that performs holistic image segmentation, object detection, instance segmentation, depth estimation, and object instance 3D reconstruction for indoor and outdoor scenes from a single RGB image. Our proposed panoptic 3D parsing framework points to a promising direction in computer vision. It can be applied to various applications, including autonomous driving, mapping, robotics, design, computer graphics, robotics, human-computer interaction, and augmented reality.
arXiv Detail & Related papers (2021-11-04T17:45:04Z)
Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model Alignments [81.38641691636847]
We rethink the problem of scene reconstruction from an embodied agent's perspective. We reconstruct an interactive scene using RGB-D data stream. This reconstructed scene replaces the object meshes in the dense panoptic map with part-based articulated CAD models.
arXiv Detail & Related papers (2021-03-30T05:56:58Z)
Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties. Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates. The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z)
Stereo RGB and Deeper LIDAR Based Network for 3D Object Detection [40.34710686994996]
3D object detection has become an emerging task in autonomous driving scenarios. Previous works process 3D point clouds using either projection-based or voxel-based models. We propose the Stereo RGB and Deeper LIDAR framework which can utilize semantic and spatial information simultaneously.
arXiv Detail & Related papers (2020-06-09T11:19:24Z)
3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior [50.73148041205675]
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation. We propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation. Our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks.
arXiv Detail & Related papers (2020-03-31T09:33:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.