GenRC: Generative 3D Room Completion from Sparse Image Collections
- URL: http://arxiv.org/abs/2407.12939v3
- Date: Fri, 2 Aug 2024 03:33:17 GMT
- Title: GenRC: Generative 3D Room Completion from Sparse Image Collections
- Authors: Ming-Feng Li, Yueh-Feng Ku, Hong-Xuan Yen, Chi Liu, Yu-Lun Liu, Albert Y. C. Chen, Cheng-Hao Kuo, Min Sun,
- Abstract summary: GenRC is an automated training-free pipeline to complete a room-scale 3D mesh with high-fidelity textures.
E-Diffusion generates a view-consistent panoramic RGBD image which ensures global geometry and appearance consistency.
GenRC outperforms state-of-the-art methods under most appearance and geometric metrics on ScanNet and ARKitScenes datasets.
- Score: 17.222652213723485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse RGBD scene completion is a challenging task especially when considering consistent textures and geometries throughout the entire scene. Different from existing solutions that rely on human-designed text prompts or predefined camera trajectories, we propose GenRC, an automated training-free pipeline to complete a room-scale 3D mesh with high-fidelity textures. To achieve this, we first project the sparse RGBD images to a highly incomplete 3D mesh. Instead of iteratively generating novel views to fill in the void, we utilized our proposed E-Diffusion to generate a view-consistent panoramic RGBD image which ensures global geometry and appearance consistency. Furthermore, we maintain the input-output scene stylistic consistency through textual inversion to replace human-designed text prompts. To bridge the domain gap among datasets, E-Diffusion leverages models trained on large-scale datasets to generate diverse appearances. GenRC outperforms state-of-the-art methods under most appearance and geometric metrics on ScanNet and ARKitScenes datasets, even though GenRC is not trained on these datasets nor using predefined camera trajectories. Project page: https://minfenli.github.io/GenRC
Related papers
- Bridging 3D Gaussian and Mesh for Freeview Video Rendering [57.21847030980905]
GauMesh bridges the 3D Gaussian and Mesh for modeling and rendering the dynamic scenes.
We show that our approach adapts the appropriate type of primitives to represent the different parts of the dynamic scene.
arXiv Detail & Related papers (2024-03-18T04:01:26Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - Generative Scene Synthesis via Incremental View Inpainting using RGBD
Diffusion Models [39.23531919945332]
In this work, we present a new solution that sequentially generates novel RGBD views along a camera trajectory.
Each rendered RGBD view is later back-projected as a partial surface and is supplemented into the intermediate mesh.
The use of intermediate mesh and camera projection helps solve the refractory problem of multi-view inconsistency.
arXiv Detail & Related papers (2022-12-12T15:50:00Z) - CompNVS: Novel View Synthesis with Scene Completion [83.19663671794596]
We propose a generative pipeline performing on a sparse grid-based neural scene representation to complete unobserved scene parts.
We process encoded image features in 3D space with a geometry completion network and a subsequent texture inpainting network to extrapolate the missing area.
Photorealistic image sequences can be finally obtained via consistency-relevant differentiable rendering.
arXiv Detail & Related papers (2022-07-23T09:03:13Z) - SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D
Sequences [76.28527350263012]
We propose a method to incrementally build up semantic scene graphs from a 3D environment given a sequence of RGB-D frames.
We aggregate PointNet features from primitive scene components by means of a graph neural network.
Our approach outperforms 3D scene graph prediction methods by a large margin and its accuracy is on par with other 3D semantic and panoptic segmentation methods while running at 35 Hz.
arXiv Detail & Related papers (2021-03-27T13:00:36Z) - Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD
Images [69.5662419067878]
Grounding referring expressions in RGBD image has been an emerging field.
We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion.
Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that localizes the relevant regions in the RGBD image.
Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object.
arXiv Detail & Related papers (2021-03-14T11:18:50Z) - OSTeC: One-Shot Texture Completion [86.23018402732748]
We propose an unsupervised approach for one-shot 3D facial texture completion.
The proposed approach rotates an input image in 3D and fill-in the unseen regions by reconstructing the rotated image in a 2D face generator.
We frontalize the target image by projecting the completed texture into the generator.
arXiv Detail & Related papers (2020-12-30T23:53:26Z) - Procedural 3D Terrain Generation using Generative Adversarial Networks [0.0]
We use Generative Adversarial Networks (GAN) to yield realistic 3D environments based on the distribution of remotely sensed images of landscapes, captured by satellites or drones.
We are able to construct 3D scenery consisting of a plausible height distribution and colorization, in relation to the remotely sensed landscapes provided during training.
arXiv Detail & Related papers (2020-10-13T14:15:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.