BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation
- URL: http://arxiv.org/abs/2401.17053v4
- Date: Fri, 24 May 2024 03:56:20 GMT
- Title: BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation
- Authors: Zhennan Wu, Yang Li, Han Yan, Taizhang Shang, Weixuan Sun, Senbo Wang, Ruikai Cui, Weizhe Liu, Hiroyuki Sato, Hongdong Li, Pan Ji,
- Abstract summary: BlockFusion is a diffusion-based model that generates 3D scenes as unit blocks and seamlessly incorporates new blocks to extend the scene.
A 2D layout conditioning mechanism is used to control the placement and arrangement of scene elements.
Experimental results indicate that BlockFusion is capable of generating diverse, geometrically consistent and unbounded large 3D scenes.
- Score: 51.030773085422034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present BlockFusion, a diffusion-based model that generates 3D scenes as unit blocks and seamlessly incorporates new blocks to extend the scene. BlockFusion is trained using datasets of 3D blocks that are randomly cropped from complete 3D scene meshes. Through per-block fitting, all training blocks are converted into the hybrid neural fields: with a tri-plane containing the geometry features, followed by a Multi-layer Perceptron (MLP) for decoding the signed distance values. A variational auto-encoder is employed to compress the tri-planes into the latent tri-plane space, on which the denoising diffusion process is performed. Diffusion applied to the latent representations allows for high-quality and diverse 3D scene generation. To expand a scene during generation, one needs only to append empty blocks to overlap with the current scene and extrapolate existing latent tri-planes to populate new blocks. The extrapolation is done by conditioning the generation process with the feature samples from the overlapping tri-planes during the denoising iterations. Latent tri-plane extrapolation produces semantically and geometrically meaningful transitions that harmoniously blend with the existing scene. A 2D layout conditioning mechanism is used to control the placement and arrangement of scene elements. Experimental results indicate that BlockFusion is capable of generating diverse, geometrically consistent and unbounded large 3D scenes with unprecedented high-quality shapes in both indoor and outdoor scenarios.
Related papers
- XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation [72.12250272218792]
We propose a more meticulous mask-level alignment between 3D features and the 2D-text embedding space through a cross-modal mask reasoning framework, XMask3D.
We integrate 3D global features as implicit conditions into the pre-trained 2D denoising UNet, enabling the generation of segmentation masks.
The generated 2D masks are employed to align mask-level 3D representations with the vision-language feature space, thereby augmenting the open vocabulary capability of 3D geometry embeddings.
arXiv Detail & Related papers (2024-11-20T12:02:12Z) - GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation [75.39457097832113]
This paper introduces a novel 3D generation framework, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space.
Our framework employs a Variational Autoencoder with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information.
The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs.
arXiv Detail & Related papers (2024-11-12T18:59:32Z) - LT3SD: Latent Trees for 3D Scene Diffusion [71.91446143124648]
We present LT3SD, a novel latent diffusion model for large-scale 3D scene generation.
We demonstrate the efficacy and benefits of LT3SD for large-scale, high-quality unconditional 3D scene generation.
arXiv Detail & Related papers (2024-09-12T16:55:51Z) - NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation [52.772319840580074]
3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints.
Existing methods often decompose 3D shapes into a sequence of localized components, treating each element in isolation.
We introduce a novel spatial-aware 3D shape generation framework that leverages 2D plane representations for enhanced 3D shape modeling.
arXiv Detail & Related papers (2024-03-27T04:09:34Z) - Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane [51.69069723429115]
Frankenstein is a diffusion-based framework that can generate semantic-compositional 3D scenes in a single pass.
It simultaneously generates multiple separated shapes, each corresponding to a semantically meaningful part.
The generated scenes facilitate many downstream applications, such as part-wise re-texturing, object rearrangement in the room or avatar cloth re-targeting.
arXiv Detail & Related papers (2024-03-24T16:09:21Z) - Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation [29.818827785812086]
Controllable generation of 3D assets is important for many practical applications like content creation in movies, games and engineering, as well as in AR/VR.
We present a suitable representation for 3D diffusion models to enable disentanglement by introducing a hybrid point cloud and neural radiance field approach.
arXiv Detail & Related papers (2023-12-21T18:46:27Z) - Free-form 3D Scene Inpainting with Dual-stream GAN [20.186778638697696]
We present a novel task named free-form 3D scene inpainting.
Unlike scenes in previous 3D completion datasets, the proposed inpainting dataset contains large and diverse missing regions.
Our dual-stream generator, fusing both geometry and color information, produces distinct semantic boundaries.
To further enhance the details, our lightweight dual-stream discriminator regularizes the geometry and color edges of the predicted scenes to be realistic and sharp.
arXiv Detail & Related papers (2022-12-16T13:20:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.