Semantic Scene Completion with Cleaner Self
- URL: http://arxiv.org/abs/2303.09977v1
- Date: Fri, 17 Mar 2023 13:50:18 GMT
- Title: Semantic Scene Completion with Cleaner Self
- Authors: Fengyun Wang, Dong Zhang, Hanwang Zhang, Jinhui Tang, and Qianru Sun
- Abstract summary: Semantic Scene Completion (SSC) transforms an image of single-view depth and/or RGB 2D pixels into 3D voxels, each of whose semantic labels are predicted.
SSC is a well-known ill-posed problem as the prediction model has to "imagine" what is behind the visible surface, which is usually represented by Truncated Signed Distance Function (TSDF)
We use the ground-truth 3D voxels to generate a perfect visible surface, called TSDF-CAD, and then train a "cleaner" SSC model.
As the model is noise-free, it is expected to
- Score: 93.99441599791275
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Semantic Scene Completion (SSC) transforms an image of single-view depth
and/or RGB 2D pixels into 3D voxels, each of whose semantic labels are
predicted. SSC is a well-known ill-posed problem as the prediction model has to
"imagine" what is behind the visible surface, which is usually represented by
Truncated Signed Distance Function (TSDF). Due to the sensory imperfection of
the depth camera, most existing methods based on the noisy TSDF estimated from
depth values suffer from 1) incomplete volumetric predictions and 2) confused
semantic labels. To this end, we use the ground-truth 3D voxels to generate a
perfect visible surface, called TSDF-CAD, and then train a "cleaner" SSC model.
As the model is noise-free, it is expected to focus more on the "imagination"
of unseen voxels. Then, we propose to distill the intermediate "cleaner"
knowledge into another model with noisy TSDF input. In particular, we use the
3D occupancy feature and the semantic relations of the "cleaner self" to
supervise the counterparts of the "noisy self" to respectively address the
above two incorrect predictions. Experimental results validate that our method
improves the noisy counterparts with 3.1% IoU and 2.2% mIoU for measuring scene
completion and SSC, and also achieves new state-of-the-art accuracy on the
popular NYU dataset.
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving.
It predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
It is trained self-supervised with differentiable rendering to reconstruct RGB, depth, or feature images.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition [32.99080359375706]
ClusteringSDF is a novel approach to achieve both segmentation and reconstruction in 3D via the neural implicit surface representation.
We introduce a high-efficient clustering mechanism for lifting the 2D labels to 3D and the experimental results on the challenging scenes from ScanNet and Replica datasets show that ClusteringSDF can achieve competitive performance.
arXiv Detail & Related papers (2024-03-21T17:59:16Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - SSR-2D: Semantic 3D Scene Reconstruction from 2D Images [54.46126685716471]
In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations.
The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images.
Our method achieves the state-of-the-art performance of semantic scene completion on two large-scale benchmark datasets MatterPort3D and ScanNet.
arXiv Detail & Related papers (2023-02-07T17:47:52Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - 3D Dense Geometry-Guided Facial Expression Synthesis by Adversarial
Learning [54.24887282693925]
We propose a novel framework to exploit 3D dense (depth and surface normals) information for expression manipulation.
We use an off-the-shelf state-of-the-art 3D reconstruction model to estimate the depth and create a large-scale RGB-Depth dataset.
Our experiments demonstrate that the proposed method outperforms the competitive baseline and existing arts by a large margin.
arXiv Detail & Related papers (2020-09-30T17:12:35Z) - Atlas: End-to-End 3D Scene Reconstruction from Posed Images [13.154808583020229]
We present an end-to-end 3D reconstruction method for a scene by directly regressing a truncated signed distance function (TSDF) from a set of posed RGB images.
A 2D CNN extracts features from each image independently which are then back-projected and accumulated into a voxel volume.
A 3D CNN refines the accumulated features and predicts the TSDF values.
arXiv Detail & Related papers (2020-03-23T17:59:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.