Related papers: FastScene: Text-Driven Fast 3D Indoor Scene Generation via Panoramic Gaussian Splatting

FastScene: Text-Driven Fast 3D Indoor Scene Generation via Panoramic Gaussian Splatting

URL: http://arxiv.org/abs/2405.05768v1
Date: Thu, 9 May 2024 13:44:16 GMT
Title: FastScene: Text-Driven Fast 3D Indoor Scene Generation via Panoramic Gaussian Splatting
Authors: Yikun Ma, Dandan Zhan, Zhi Jin,
Abstract summary: We propose FastScene, a framework for fast and higher-quality 3D scene generation. FastScene can generate a 3D scene within a mere 15 minutes, which is at least one hour faster than state-of-the-art methods.
Score: 15.648080938815879
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-driven 3D indoor scene generation holds broad applications, ranging from gaming and smart homes to AR/VR applications. Fast and high-fidelity scene generation is paramount for ensuring user-friendly experiences. However, existing methods are characterized by lengthy generation processes or necessitate the intricate manual specification of motion parameters, which introduces inconvenience for users. Furthermore, these methods often rely on narrow-field viewpoint iterative generations, compromising global consistency and overall scene quality. To address these issues, we propose FastScene, a framework for fast and higher-quality 3D scene generation, while maintaining the scene consistency. Specifically, given a text prompt, we generate a panorama and estimate its depth, since the panorama encompasses information about the entire scene and exhibits explicit geometric constraints. To obtain high-quality novel views, we introduce the Coarse View Synthesis (CVS) and Progressive Novel View Inpainting (PNVI) strategies, ensuring both scene consistency and view quality. Subsequently, we utilize Multi-View Projection (MVP) to form perspective views, and apply 3D Gaussian Splatting (3DGS) for scene reconstruction. Comprehensive experiments demonstrate FastScene surpasses other methods in both generation speed and quality with better scene consistency. Notably, guided only by a text prompt, FastScene can generate a 3D scene within a mere 15 minutes, which is at least one hour faster than state-of-the-art methods, making it a paradigm for user-friendly scene generation.

Related papers

ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment [13.983092770961514]
ScenePainter is a new framework for semantically consistent 3D scene generation.<n>Our framework overcomes the semantic drift issue and generates more consistent and immersive 3D view sequences.
arXiv Detail & Related papers (2025-07-25T08:21:12Z)
Video Perception Models for 3D Scene Synthesis [109.5543506037003]
VIPScene is a novel framework that exploits the encoded commonsense knowledge of the 3D physical world in video generation models.<n>VIPScene seamlessly integrates video generation, feedforward 3D reconstruction, and open-vocabulary perception models to semantically and geometrically analyze each object in a scene.
arXiv Detail & Related papers (2025-06-25T16:40:17Z)
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory [55.73900731190389]
We introduce Surfel-Indexed View Memory (VMem), a mechanism that remembers past views by indexing them geometrically based on the 3D surface elements they have observed.<n>VMem enables the efficient retrieval of the most relevant past views when generating new ones.<n>We evaluate our approach on challenging long-term scene synthesis benchmarks and demonstrate superior performance compared to existing methods in maintaining scene coherence and camera control.
arXiv Detail & Related papers (2025-06-23T17:59:56Z)
WorldPrompter: Traversable Text-to-Scene Generation [18.405299478122693]
We introduce WorldPrompter, a novel generative pipeline for synthesizing traversable 3D scenes from text prompts. WorldPrompter incorporates a conditional 360deg panoramic video generator, capable of producing a 128-frame video that simulates a person walking through and capturing a virtual environment. The resulting video is then reconstructed as Gaussian splats by a fast feedforward 3D reconstructor, enabling a true walkable experience within the 3D scene.
arXiv Detail & Related papers (2025-04-02T18:04:32Z)
SceneCraft: Layout-Guided 3D Scene Generation [29.713491313796084]
SceneCraft is a novel method for generating detailed indoor scenes that adhere to textual descriptions and spatial layout preferences. Our method significantly outperforms existing approaches in complex indoor scene generation with diverse textures, consistent geometry, and realistic visual quality.
arXiv Detail & Related papers (2024-10-11T17:59:58Z)
SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting [53.32467009064287]
We propose a text-driven 3D-consistent scene generation model: SceneDreamer360. Our proposed method leverages a text-driven panoramic image generation model as a prior for 3D scene generation. Our experiments demonstrate that SceneDreamer360 with its panoramic image generation and 3DGS can produce higher quality, spatially consistent, and visually appealing 3D scenes from any text prompt.
arXiv Detail & Related papers (2024-08-25T02:56:26Z)
LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation [105.52153675890408]
3D immersive scene generation is a challenging yet critical task in computer vision and graphics. LayerPano3D is a novel framework for full-view, explorable panoramic 3D scene generation from a single text prompt.
arXiv Detail & Related papers (2024-08-23T17:50:23Z)
HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions [31.342899807980654]
3D scene generation is in high demand across various domains, including virtual reality, gaming, and the film industry. We introduce HoloDreamer, a framework that first generates high-definition panorama as a holistic initialization of the full 3D scene. We then leverage 3D Gaussian Splatting (3D-GS) to quickly reconstruct the 3D scene, thereby facilitating the creation of view-consistent and fully enclosed 3D scenes.
arXiv Detail & Related papers (2024-07-21T14:52:51Z)
WonderWorld: Interactive 3D Scene Generation from a Single Image [38.83667648993784]
We present WonderWorld, a novel framework for interactive 3D scene generation. WonderWorld generates connected and diverse 3D scenes in less than 10 seconds on a single A6000 GPU.
arXiv Detail & Related papers (2024-06-13T17:59:10Z)
Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting [75.7154104065613]
We introduce a novel depth completion model, trained via teacher distillation and self-training to learn the 3D fusion process. We also introduce a new benchmarking scheme for scene generation methods that is based on ground truth geometry.
arXiv Detail & Related papers (2024-04-30T17:59:40Z)
3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene. Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z)
SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets. We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z)
RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture [80.0643976406225]
We propose "RoomDreamer", which leverages powerful natural language to synthesize a new room with a different style. Our work addresses the challenge of synthesizing both geometry and texture aligned to the input scene structure and prompt simultaneously. To validate the proposed method, real indoor scenes scanned with smartphones are used for extensive experiments.
arXiv Detail & Related papers (2023-05-18T22:57:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.