Camera-based 3D Semantic Scene Completion with Sparse Guidance Network
- URL: http://arxiv.org/abs/2312.05752v1
- Date: Sun, 10 Dec 2023 04:17:27 GMT
- Title: Camera-based 3D Semantic Scene Completion with Sparse Guidance Network
- Authors: Jianbiao Mei, Yu Yang, Mengmeng Wang, Junyu Zhu, Xiangrui Zhao,
Jongwon Ra, Laijian Li, Yong Liu
- Abstract summary: Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations.
We propose an end-to-end camera-based SSC framework, termed SGN, to diffuse semantics from the semantic- and occupancy-aware seed voxels to the whole scene.
- Score: 20.876048262597255
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Semantic scene completion (SSC) aims to predict the semantic occupancy of
each voxel in the entire 3D scene from limited observations, which is an
emerging and critical task for autonomous driving. Recently, many studies have
turned to camera-based SSC solutions due to the richer visual cues and
cost-effectiveness of cameras. However, existing methods usually rely on
sophisticated and heavy 3D models to directly process the lifted 3D features
that are not discriminative enough for clear segmentation boundaries. In this
paper, we adopt the dense-sparse-dense design and propose an end-to-end
camera-based SSC framework, termed SGN, to diffuse semantics from the semantic-
and occupancy-aware seed voxels to the whole scene based on geometry prior and
occupancy information. By designing hybrid guidance (sparse semantic and
geometry guidance) and effective voxel aggregation for spatial occupancy and
geometry priors, we enhance the feature separation between different categories
and expedite the convergence of semantic diffusion. Extensive experimental
results on the SemanticKITTI dataset demonstrate the superiority of our SGN
over existing state-of-the-art methods.
Related papers
- OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - DepthSSC: Depth-Spatial Alignment and Dynamic Voxel Resolution for
Monocular 3D Semantic Scene Completion [0.4662017507844857]
DepthSSC is an advanced method for semantic scene completion solely based on monocular cameras.
It mitigates spatial misalignment and distortion issues observed in prior methods.
It demonstrates its effectiveness in capturing intricate 3D structural details and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-11-28T01:47:51Z) - S4C: Self-Supervised Semantic Scene Completion with Neural Fields [54.35865716337547]
3D semantic scene understanding is a fundamental challenge in computer vision.
Current methods for SSC are generally trained on 3D ground truth based on aggregated LiDAR scans.
Our work presents the first self-supervised approach to SSC called S4C that does not rely on 3D ground truth data.
arXiv Detail & Related papers (2023-10-11T14:19:05Z) - PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic
Occupancy Prediction [72.75478398447396]
We propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively.
Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system.
We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane.
arXiv Detail & Related papers (2023-08-31T17:57:17Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation
Separation and BEV Fusion [17.459062337718677]
We propose to solve outdoor SSC from the perspective of representation separation and BEV fusion.
We present the network, named SSC-RS, which uses separate branches with deep supervision to explicitly disentangle the learning procedure of the semantic and geometric representations.
A BEV fusion network equipped with the proposed Adaptive Representation Fusion (ARF) module is presented to aggregate the multi-scale features effectively and efficiently.
arXiv Detail & Related papers (2023-06-27T10:02:45Z) - PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic
Segmentation [45.39981876226129]
We study camera-based 3D panoptic segmentation, aiming to achieve a unified occupancy representation for camera-only 3D scene understanding.
We introduce a novel method called PanoOcc, which utilizes voxel queries to aggregate semantic information from multi-frame and multi-view images.
Our approach achieves new state-of-the-art results for camera-based segmentation and panoptic segmentation on the nuScenes dataset.
arXiv Detail & Related papers (2023-06-16T17:59:33Z) - SSCBench: Monocular 3D Semantic Scene Completion Benchmark in Street
Views [89.8436375840446]
SSCBench is a benchmark that integrates scenes from widely used automotive datasets.
We benchmark models using monocular, trinocular, and cloud input to assess the performance gap.
We have unified semantic labels across diverse datasets to simplify cross-domain generalization testing.
arXiv Detail & Related papers (2023-06-15T09:56:33Z) - 3D Scene Geometry-Aware Constraint for Camera Localization with Deep
Learning [11.599633757222406]
Recently end-to-end approaches based on convolutional neural network have been much studied to achieve or even exceed 3D-geometry based traditional methods.
In this work, we propose a compact network for absolute camera pose regression.
Inspired from those traditional methods, a 3D scene geometry-aware constraint is also introduced by exploiting all available information including motion, depth and image contents.
arXiv Detail & Related papers (2020-05-13T04:15:14Z) - 3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure
Prior [50.73148041205675]
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation.
We propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation.
Our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks.
arXiv Detail & Related papers (2020-03-31T09:33:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.