Related papers: Point Cloud Scene Completion with Joint Color and Semantic Estimation from Single RGB-D Image

Point Cloud Scene Completion with Joint Color and Semantic Estimation from Single RGB-D Image

URL: http://arxiv.org/abs/2210.05891v1
Date: Wed, 12 Oct 2022 03:08:24 GMT
Title: Point Cloud Scene Completion with Joint Color and Semantic Estimation from Single RGB-D Image
Authors: Zhaoxuan Zhang, Xiaoguang Han, Bo Dong, Tong Li, Baocai Yin, Xin Yang
Abstract summary: We present a deep reinforcement learning method of progressive view inpainting for colored semantic point cloud scene completion under volume guidance. Our approach is end-to-end, consisting of three modules: 3D scene volume reconstruction, 2D RGB-D and segmentation image inpainting, and multi-view selection for completion.
Score: 45.640943637433416
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a deep reinforcement learning method of progressive view inpainting for colored semantic point cloud scene completion under volume guidance, achieving high-quality scene reconstruction from only a single RGB-D image with severe occlusion. Our approach is end-to-end, consisting of three modules: 3D scene volume reconstruction, 2D RGB-D and segmentation image inpainting, and multi-view selection for completion. Given a single RGB-D image, our method first predicts its semantic segmentation map and goes through the 3D volume branch to obtain a volumetric scene reconstruction as a guide to the next view inpainting step, which attempts to make up the missing information; the third step involves projecting the volume under the same view of the input, concatenating them to complete the current view RGB-D and segmentation map, and integrating all RGB-D and segmentation maps into the point cloud. Since the occluded areas are unavailable, we resort to a A3C network to glance around and pick the next best view for large hole completion progressively until a scene is adequately reconstructed while guaranteeing validity. All steps are learned jointly to achieve robust and consistent results. We perform qualitative and quantitative evaluations with extensive experiments on the 3D-FUTURE data, obtaining better results than state-of-the-arts.

Related papers

Coherent 3D Scene Diffusion From a Single RGB Image [68.31336308924477]
We present a novel diffusion-based approach for coherent 3D scene reconstruction from a single RGB image. Our method simultaneously denoises the 3D poses and geometries of all objects within the scene. By framing the task of single RGB image 3D scene reconstruction as a conditional diffusion process, our approach surpasses current state-of-the-art methods.
arXiv Detail & Related papers (2024-12-13T17:26:45Z)
Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference [62.99706119370521]
Humans can easily deduce the relative pose of an unseen object, without label/training, given only a single query-reference image pair. We propose a novel 3D generalizable relative pose estimation method by elaborating (i) with a 2.5D shape from an RGB-D reference, (ii) with an off-the-shelf differentiable, and (iii) with semantic cues from a pretrained model like DINOv2.
arXiv Detail & Related papers (2024-06-26T16:01:10Z)
UNeR3D: Versatile and Scalable 3D RGB Point Cloud Generation from 2D Images in Unsupervised Reconstruction [2.7848140839111903]
UNeR3D sets a new standard for generating detailed 3D reconstructions solely from 2D views. Our model significantly cuts down the training costs tied to supervised approaches. UNeR3D ensures seamless color transitions, enhancing visual fidelity.
arXiv Detail & Related papers (2023-12-10T15:18:55Z)
$PC^2$: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction [97.06927852165464]
Reconstructing the 3D shape of an object from a single RGB image is a long-standing and highly challenging problem in computer vision. We propose a novel method for single-image 3D reconstruction which generates a sparse point cloud via a conditional denoising diffusion process.
arXiv Detail & Related papers (2023-02-21T13:37:07Z)
SSR-2D: Semantic 3D Scene Reconstruction from 2D Images [54.46126685716471]
In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations. The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images. Our method achieves the state-of-the-art performance of semantic scene completion on two large-scale benchmark datasets MatterPort3D and ScanNet.
arXiv Detail & Related papers (2023-02-07T17:47:52Z)
CompNVS: Novel View Synthesis with Scene Completion [83.19663671794596]
We propose a generative pipeline performing on a sparse grid-based neural scene representation to complete unobserved scene parts. We process encoded image features in 3D space with a geometry completion network and a subsequent texture inpainting network to extrapolate the missing area. Photorealistic image sequences can be finally obtained via consistency-relevant differentiable rendering.
arXiv Detail & Related papers (2022-07-23T09:03:13Z)
Panoptic 3D Scene Reconstruction From a Single RGB Image [24.960786016915105]
Understanding 3D scenes from a single image is fundamental to a wide variety of tasks, such as for robotics, motion planning, or augmented reality. Inspired by 2D panoptic segmentation, we propose to unify the tasks of geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation into the task of panoptic 3D scene reconstruction. We demonstrate that this holistic view of joint scene reconstruction, semantic, and instance segmentation is beneficial over treating the tasks independently, thus outperforming alternative approaches.
arXiv Detail & Related papers (2021-11-03T18:06:38Z)
Semantic Dense Reconstruction with Consistent Scene Segments [33.0310121044956]
A method for dense semantic 3D scene reconstruction from an RGB-D sequence is proposed to solve high-level scene understanding tasks. First, each RGB-D pair is consistently segmented into 2D semantic maps based on a camera tracking backbone. A dense 3D mesh model of an unknown environment is incrementally generated from the input RGB-D sequence.
arXiv Detail & Related papers (2021-09-30T03:01:17Z)
Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images [69.5662419067878]
Grounding referring expressions in RGBD image has been an emerging field. We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion. Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that localizes the relevant regions in the RGBD image. Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object.
arXiv Detail & Related papers (2021-03-14T11:18:50Z)
RGBD-Net: Predicting color and depth images for novel views synthesis [46.233701784858184]
RGBD-Net is proposed to predict the depth map and the color images at the target pose in a multi-scale manner. The results indicate that RGBD-Net generalizes well to previously unseen data.
arXiv Detail & Related papers (2020-11-29T16:42:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.