Point Cloud Scene Completion with Joint Color and Semantic Estimation
from Single RGB-D Image
- URL: http://arxiv.org/abs/2210.05891v1
- Date: Wed, 12 Oct 2022 03:08:24 GMT
- Title: Point Cloud Scene Completion with Joint Color and Semantic Estimation
from Single RGB-D Image
- Authors: Zhaoxuan Zhang, Xiaoguang Han, Bo Dong, Tong Li, Baocai Yin, Xin Yang
- Abstract summary: We present a deep reinforcement learning method of progressive view inpainting for colored semantic point cloud scene completion under volume guidance.
Our approach is end-to-end, consisting of three modules: 3D scene volume reconstruction, 2D RGB-D and segmentation image inpainting, and multi-view selection for completion.
- Score: 45.640943637433416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a deep reinforcement learning method of progressive view
inpainting for colored semantic point cloud scene completion under volume
guidance, achieving high-quality scene reconstruction from only a single RGB-D
image with severe occlusion. Our approach is end-to-end, consisting of three
modules: 3D scene volume reconstruction, 2D RGB-D and segmentation image
inpainting, and multi-view selection for completion. Given a single RGB-D
image, our method first predicts its semantic segmentation map and goes through
the 3D volume branch to obtain a volumetric scene reconstruction as a guide to
the next view inpainting step, which attempts to make up the missing
information; the third step involves projecting the volume under the same view
of the input, concatenating them to complete the current view RGB-D and
segmentation map, and integrating all RGB-D and segmentation maps into the
point cloud. Since the occluded areas are unavailable, we resort to a A3C
network to glance around and pick the next best view for large hole completion
progressively until a scene is adequately reconstructed while guaranteeing
validity. All steps are learned jointly to achieve robust and consistent
results. We perform qualitative and quantitative evaluations with extensive
experiments on the 3D-FUTURE data, obtaining better results than
state-of-the-arts.
Related papers
- Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference [62.99706119370521]
Humans can easily deduce the relative pose of an unseen object, without label/training, given only a single query-reference image pair.
We propose a novel 3D generalizable relative pose estimation method by elaborating (i) with a 2.5D shape from an RGB-D reference, (ii) with an off-the-shelf differentiable, and (iii) with semantic cues from a pretrained model like DINOv2.
arXiv Detail & Related papers (2024-06-26T16:01:10Z) - UNeR3D: Versatile and Scalable 3D RGB Point Cloud Generation from 2D
Images in Unsupervised Reconstruction [2.7848140839111903]
UNeR3D sets a new standard for generating detailed 3D reconstructions solely from 2D views.
Our model significantly cuts down the training costs tied to supervised approaches.
UNeR3D ensures seamless color transitions, enhancing visual fidelity.
arXiv Detail & Related papers (2023-12-10T15:18:55Z) - $PC^2$: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D
Reconstruction [97.06927852165464]
Reconstructing the 3D shape of an object from a single RGB image is a long-standing and highly challenging problem in computer vision.
We propose a novel method for single-image 3D reconstruction which generates a sparse point cloud via a conditional denoising diffusion process.
arXiv Detail & Related papers (2023-02-21T13:37:07Z) - SSR-2D: Semantic 3D Scene Reconstruction from 2D Images [54.46126685716471]
In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations.
The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images.
Our method achieves the state-of-the-art performance of semantic scene completion on two large-scale benchmark datasets MatterPort3D and ScanNet.
arXiv Detail & Related papers (2023-02-07T17:47:52Z) - CompNVS: Novel View Synthesis with Scene Completion [83.19663671794596]
We propose a generative pipeline performing on a sparse grid-based neural scene representation to complete unobserved scene parts.
We process encoded image features in 3D space with a geometry completion network and a subsequent texture inpainting network to extrapolate the missing area.
Photorealistic image sequences can be finally obtained via consistency-relevant differentiable rendering.
arXiv Detail & Related papers (2022-07-23T09:03:13Z) - Panoptic 3D Scene Reconstruction From a Single RGB Image [24.960786016915105]
Understanding 3D scenes from a single image is fundamental to a wide variety of tasks, such as for robotics, motion planning, or augmented reality.
Inspired by 2D panoptic segmentation, we propose to unify the tasks of geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation into the task of panoptic 3D scene reconstruction.
We demonstrate that this holistic view of joint scene reconstruction, semantic, and instance segmentation is beneficial over treating the tasks independently, thus outperforming alternative approaches.
arXiv Detail & Related papers (2021-11-03T18:06:38Z) - Semantic Dense Reconstruction with Consistent Scene Segments [33.0310121044956]
A method for dense semantic 3D scene reconstruction from an RGB-D sequence is proposed to solve high-level scene understanding tasks.
First, each RGB-D pair is consistently segmented into 2D semantic maps based on a camera tracking backbone.
A dense 3D mesh model of an unknown environment is incrementally generated from the input RGB-D sequence.
arXiv Detail & Related papers (2021-09-30T03:01:17Z) - Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD
Images [69.5662419067878]
Grounding referring expressions in RGBD image has been an emerging field.
We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion.
Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that localizes the relevant regions in the RGBD image.
Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object.
arXiv Detail & Related papers (2021-03-14T11:18:50Z) - RGBD-Net: Predicting color and depth images for novel views synthesis [46.233701784858184]
RGBD-Net is proposed to predict the depth map and the color images at the target pose in a multi-scale manner.
The results indicate that RGBD-Net generalizes well to previously unseen data.
arXiv Detail & Related papers (2020-11-29T16:42:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.