3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure
Prior
- URL: http://arxiv.org/abs/2003.14052v1
- Date: Tue, 31 Mar 2020 09:33:46 GMT
- Title: 3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure
Prior
- Authors: Xiaokang Chen, Kwan-Yee Lin, Chen Qian, Gang Zeng and Hongsheng Li
- Abstract summary: The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation.
We propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation.
Our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks.
- Score: 50.73148041205675
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of the Semantic Scene Completion (SSC) task is to simultaneously
predict a completed 3D voxel representation of volumetric occupancy and
semantic labels of objects in the scene from a single-view observation. Since
the computational cost generally increases explosively along with the growth of
voxel resolution, most current state-of-the-arts have to tailor their framework
into a low-resolution representation with the sacrifice of detail prediction.
Thus, voxel resolution becomes one of the crucial difficulties that lead to the
performance bottleneck.
In this paper, we propose to devise a new geometry-based strategy to embed
depth information with low-resolution voxel representation, which could still
be able to encode sufficient geometric information, e.g., room layout, object's
sizes and shapes, to infer the invisible areas of the scene with well
structure-preserving details. To this end, we first propose a novel 3D
sketch-aware feature embedding to explicitly encode geometric information
effectively and efficiently. With the 3D sketch in hand, we further devise a
simple yet effective semantic scene completion framework that incorporates a
light-weight 3D Sketch Hallucination module to guide the inference of occupancy
and the semantic labels via a semi-supervised structure prior learning
strategy. We demonstrate that our proposed geometric embedding works better
than the depth feature learning from habitual SSC frameworks. Our final model
surpasses state-of-the-arts consistently on three public benchmarks, which only
requires 3D volumes of 60 x 36 x 60 resolution for both input and output. The
code and the supplementary material will be available at
https://charlesCXK.github.io.
Related papers
- Open-Vocabulary Octree-Graph for 3D Scene Understanding [54.11828083068082]
Octree-Graph is a novel scene representation for open-vocabulary 3D scene understanding.
An adaptive-octree structure is developed that stores semantics and depicts the occupancy of an object adjustably according to its shape.
arXiv Detail & Related papers (2024-11-25T10:14:10Z) - General Geometry-aware Weakly Supervised 3D Object Detection [62.26729317523975]
A unified framework is developed for learning 3D object detectors from RGB images and associated 2D boxes.
Experiments on KITTI and SUN-RGBD datasets demonstrate that our method yields surprisingly high-quality 3D bounding boxes with only 2D annotation.
arXiv Detail & Related papers (2024-07-18T17:52:08Z) - DepthSSC: Depth-Spatial Alignment and Dynamic Voxel Resolution for
Monocular 3D Semantic Scene Completion [0.4662017507844857]
DepthSSC is an advanced method for semantic scene completion solely based on monocular cameras.
It mitigates spatial misalignment and distortion issues observed in prior methods.
It demonstrates its effectiveness in capturing intricate 3D structural details and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-11-28T01:47:51Z) - ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and
Planning [125.90002884194838]
ConceptGraphs is an open-vocabulary graph-structured representation for 3D scenes.
It is built by leveraging 2D foundation models and fusing their output to 3D by multi-view association.
We demonstrate the utility of this representation through a number of downstream planning tasks.
arXiv Detail & Related papers (2023-09-28T17:53:38Z) - Scene as Occupancy [66.43673774733307]
OccNet is a vision-centric pipeline with a cascade and temporal voxel decoder to reconstruct 3D occupancy.
We propose OpenOcc, the first dense high-quality 3D occupancy benchmark built on top of nuScenes.
arXiv Detail & Related papers (2023-06-05T13:01:38Z) - Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous
Driving [34.368848580725576]
We develop a label generation pipeline that produces dense, visibility-aware labels for any given scene.
This pipeline comprises three stages: voxel densification, reasoning, and image-guided voxel refinement.
We propose a new model, dubbed Coarse-to-Fine Occupancy (CTF-Occ) network, which demonstrates superior performance on the Occ3D benchmarks.
arXiv Detail & Related papers (2023-04-27T17:40:08Z) - Neural 3D Scene Reconstruction with the Manhattan-world Assumption [58.90559966227361]
This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images.
Planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods.
The proposed method outperforms previous methods by a large margin on 3D reconstruction quality.
arXiv Detail & Related papers (2022-05-05T17:59:55Z) - Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images [44.223070672713455]
In particular, man-made environments commonly consist of volumetric primitives such as cuboids or cylinders.
Previous approaches directly estimate shape parameters from a 2D or 3D input, and are only able to reproduce simple objects.
We propose a robust estimator for primitive fitting, which can meaningfully abstract real-world environments using cuboids.
arXiv Detail & Related papers (2021-05-05T13:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.