Semantic Scene Completion via Integrating Instances and Scene
in-the-Loop
- URL: http://arxiv.org/abs/2104.03640v1
- Date: Thu, 8 Apr 2021 09:50:30 GMT
- Title: Semantic Scene Completion via Integrating Instances and Scene
in-the-Loop
- Authors: Yingjie Cai, Xuesong Chen, Chao Zhang, Kwan-Yee Lin, Xiaogang Wang,
Hongsheng Li
- Abstract summary: Semantic Scene Completion aims at reconstructing a complete 3D scene with precise voxel-wise semantics from a single-view depth or RGBD image.
We present Scene-Instance-Scene Network (textitSISNet), which takes advantages of both instance and scene level semantic information.
Our method is capable of inferring fine-grained shape details as well as nearby objects whose semantic categories are easily mixed-up.
- Score: 73.11401855935726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic Scene Completion aims at reconstructing a complete 3D scene with
precise voxel-wise semantics from a single-view depth or RGBD image. It is a
crucial but challenging problem for indoor scene understanding. In this work,
we present a novel framework named Scene-Instance-Scene Network
(\textit{SISNet}), which takes advantages of both instance and scene level
semantic information. Our method is capable of inferring fine-grained shape
details as well as nearby objects whose semantic categories are easily
mixed-up. The key insight is that we decouple the instances from a coarsely
completed semantic scene instead of a raw input image to guide the
reconstruction of instances and the overall scene. SISNet conducts iterative
scene-to-instance (SI) and instance-to-scene (IS) semantic completion.
Specifically, the SI is able to encode objects' surrounding context for
effectively decoupling instances from the scene and each instance could be
voxelized into higher resolution to capture finer details. With IS,
fine-grained instance information can be integrated back into the 3D scene and
thus leads to more accurate semantic scene completion. Utilizing such an
iterative mechanism, the scene and instance completion benefits each other to
achieve higher completion accuracy. Extensively experiments show that our
proposed method consistently outperforms state-of-the-art methods on both real
NYU, NYUCAD and synthetic SUNCG-RGBD datasets. The code and the supplementary
material will be available at \url{https://github.com/yjcaimeow/SISNet}.
Related papers
- Set-the-Scene: Global-Local Training for Generating Controllable NeRF
Scenes [68.14127205949073]
We propose a novel GlobalLocal training framework for synthesizing a 3D scene using object proxies.
We show that using proxies allows a wide variety of editing options, such as adjusting the placement of each independent object.
Our results show that Set-the-Scene offers a powerful solution for scene synthesis and manipulation.
arXiv Detail & Related papers (2023-03-23T17:17:29Z) - DisCoScene: Spatially Disentangled Generative Radiance Fields for
Controllable 3D-aware Scene Synthesis [90.32352050266104]
DisCoScene is a 3Daware generative model for high-quality and controllable scene synthesis.
It disentangles the whole scene into object-centric generative fields by learning on only 2D images with the global-local discrimination.
We demonstrate state-of-the-art performance on many scene datasets, including the challenging outdoor dataset.
arXiv Detail & Related papers (2022-12-22T18:59:59Z) - OpenScene: 3D Scene Understanding with Open Vocabularies [73.1411930820683]
Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a model for a single task with supervision.
We propose OpenScene, an alternative approach where a model predicts dense features for 3D scene points that are co-embedded with text and image pixels in CLIP feature space.
This zero-shot approach enables task-agnostic training and open-vocabulary queries.
arXiv Detail & Related papers (2022-11-28T18:58:36Z) - CompNVS: Novel View Synthesis with Scene Completion [83.19663671794596]
We propose a generative pipeline performing on a sparse grid-based neural scene representation to complete unobserved scene parts.
We process encoded image features in 3D space with a geometry completion network and a subsequent texture inpainting network to extrapolate the missing area.
Photorealistic image sequences can be finally obtained via consistency-relevant differentiable rendering.
arXiv Detail & Related papers (2022-07-23T09:03:13Z) - Mix3D: Out-of-Context Data Augmentation for 3D Scenes [33.939743149673696]
We present Mix3D, a data augmentation technique for segmenting large-scale 3D scenes.
In experiments, we show that models trained with Mix3D profit from a significant performance boost on indoor (ScanNet, S3DIS) and outdoor datasets.
arXiv Detail & Related papers (2021-10-05T17:57:45Z) - Semantic Segmentation-assisted Scene Completion for LiDAR Point Clouds [9.489733900529204]
We propose an end-to-end semantic segmentation-assisted scene completion network.
The network takes a raw point cloud as input, and merges the features from the segmentation branch into the completion branch hierarchically.
Our method achieves competitive performance on Semantic KITTI dataset with low latency.
arXiv Detail & Related papers (2021-09-23T15:55:45Z) - Semantic Scene Completion using Local Deep Implicit Functions on LiDAR
Data [4.355440821669468]
We propose a scene segmentation network based on local Deep Implicit Functions as a novel learning-based method for scene completion.
We show that this continuous representation is suitable to encode geometric and semantic properties of extensive outdoor scenes without the need for spatial discretization.
Our experiments verify that our method generates a powerful representation that can be decoded into a dense 3D description of a given scene.
arXiv Detail & Related papers (2020-11-18T07:39:13Z) - Semantic Implicit Neural Scene Representations With Semi-Supervised
Training [47.61092265963234]
We show that implicit neural scene representations can be leveraged to perform per-point semantic segmentation.
Our method is simple, general, and only requires a few tens of labeled 2D segmentation masks.
We explore two novel applications for this semantically aware implicit neural scene representation.
arXiv Detail & Related papers (2020-03-28T00:43:17Z) - Depth Based Semantic Scene Completion with Position Importance Aware
Loss [52.06051681324545]
PALNet is a novel hybrid network for semantic scene completion.
It extracts both 2D and 3D features from multi-stages using fine-grained depth information.
It is beneficial for recovering key details like the boundaries of objects and the corners of the scene.
arXiv Detail & Related papers (2020-01-29T07:05:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.