Text2Immersion: Generative Immersive Scene with 3D Gaussians
- URL: http://arxiv.org/abs/2312.09242v1
- Date: Thu, 14 Dec 2023 18:58:47 GMT
- Title: Text2Immersion: Generative Immersive Scene with 3D Gaussians
- Authors: Hao Ouyang, Kathryn Heal, Stephen Lombardi, Tiancheng Sun
- Abstract summary: Text2Immersion is an elegant method for producing high-quality 3D immersive scenes from text prompts.
Our system surpasses other methods in rendering quality and diversity, further progressing towards text-driven 3D scene generation.
- Score: 14.014016090679627
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Text2Immersion, an elegant method for producing high-quality 3D
immersive scenes from text prompts. Our proposed pipeline initiates by
progressively generating a Gaussian cloud using pre-trained 2D diffusion and
depth estimation models. This is followed by a refining stage on the Gaussian
cloud, interpolating and refining it to enhance the details of the generated
scene. Distinct from prevalent methods that focus on single object or indoor
scenes, or employ zoom-out trajectories, our approach generates diverse scenes
with various objects, even extending to the creation of imaginary scenes.
Consequently, Text2Immersion can have wide-ranging implications for various
applications such as virtual reality, game development, and automated content
creation. Extensive evaluations demonstrate that our system surpasses other
methods in rendering quality and diversity, further progressing towards
text-driven 3D scene generation. We will make the source code publicly
accessible at the project page.
Related papers
- HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions [31.342899807980654]
3D scene generation is in high demand across various domains, including virtual reality, gaming, and the film industry.
We introduce HoloDreamer, a framework that first generates high-definition panorama as a holistic initialization of the full 3D scene.
We then leverage 3D Gaussian Splatting (3D-GS) to quickly reconstruct the 3D scene, thereby facilitating the creation of view-consistent and fully enclosed 3D scenes.
arXiv Detail & Related papers (2024-07-21T14:52:51Z) - DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling [23.06464506261766]
We present DreamScape, a method for creating highly consistent 3D scenes solely from textual descriptions.
Our approach involves a 3D Gaussian Guide for scene representation, consisting of semantic primitives (objects) and their spatial transformations.
A progressive scale control is tailored during local object generation, ensuring that objects of different sizes and densities adapt to the scene.
arXiv Detail & Related papers (2024-04-14T12:13:07Z) - DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting [56.101576795566324]
We present a text-to-3D 360$circ$ scene generation pipeline.
Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement.
Our method offers a globally consistent 3D scene within a 360$circ$ perspective.
arXiv Detail & Related papers (2024-04-10T10:46:59Z) - 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes [52.31402192831474]
Existing 3D scene generation models, however, limit the target scene to specific domain.
We propose LucidDreamer, a domain-free scene generation pipeline.
LucidDreamer produces highly-detailed Gaussian splats with no constraint on domain of the target scene.
arXiv Detail & Related papers (2023-11-22T13:27:34Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language [31.691159120136064]
We introduce the task of 3D visual grounding in large-scale dynamic scenes based on natural linguistic descriptions and online captured multi-modal visual data.
We present a novel method, dubbed WildRefer, for this task by fully utilizing the rich appearance information in images, the position and geometric clues in point cloud.
Our datasets are significant for the research of 3D visual grounding in the wild and has huge potential to boost the development of autonomous driving and service robots.
arXiv Detail & Related papers (2023-04-12T06:48:26Z) - Text-To-4D Dynamic Scene Generation [111.89517759596345]
We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions.
Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency.
The dynamic video output generated from the provided text can be viewed from any camera location and angle, and can be composited into any 3D environment.
arXiv Detail & Related papers (2023-01-26T18:14:32Z) - Static and Animated 3D Scene Generation from Free-form Text Descriptions [1.102914654802229]
We study a new pipeline that aims to generate static as well as animated 3D scenes from different types of free-form textual scene description.
In the first stage, we encode the free-form text using an encoder-decoder neural architecture.
In the second stage, we generate a 3D scene based on the generated encoding.
arXiv Detail & Related papers (2020-10-04T11:31:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.