WorldPrompter: Traversable Text-to-Scene Generation
- URL: http://arxiv.org/abs/2504.02045v1
- Date: Wed, 02 Apr 2025 18:04:32 GMT
- Title: WorldPrompter: Traversable Text-to-Scene Generation
- Authors: Zhaoyang Zhang, Yannick Hold-Geoffroy, Miloš Hašan, Chen Ziwen, Fujun Luan, Julie Dorsey, Yiwei Hu,
- Abstract summary: We introduce WorldPrompter, a novel generative pipeline for synthesizing traversable 3D scenes from text prompts.<n>WorldPrompter incorporates a conditional 360deg panoramic video generator, capable of producing a 128-frame video that simulates a person walking through and capturing a virtual environment.<n>The resulting video is then reconstructed as Gaussian splats by a fast feedforward 3D reconstructor, enabling a true walkable experience within the 3D scene.
- Score: 18.405299478122693
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene-level 3D generation is a challenging research topic, with most existing methods generating only partial scenes and offering limited navigational freedom. We introduce WorldPrompter, a novel generative pipeline for synthesizing traversable 3D scenes from text prompts. We leverage panoramic videos as an intermediate representation to model the 360{\deg} details of a scene. WorldPrompter incorporates a conditional 360{\deg} panoramic video generator, capable of producing a 128-frame video that simulates a person walking through and capturing a virtual environment. The resulting video is then reconstructed as Gaussian splats by a fast feedforward 3D reconstructor, enabling a true walkable experience within the 3D scene. Experiments demonstrate that our panoramic video generation model achieves convincing view consistency across frames, enabling high-quality panoramic Gaussian splat reconstruction and facilitating traversal over an area of the scene. Qualitative and quantitative results also show it outperforms the state-of-the-art 360{\deg} video generators and 3D scene generation models.
Related papers
- Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos [64.10180665546237]
360deg videos offer a more complete perspective of our surroundings.
Existing video models excel at producing standard videos, but their ability to generate full panoramic videos remains elusive.
We develop a high-quality data filtering pipeline to curate pairwise training data and improve the quality of 360deg video generation.
Experimental results demonstrate that our model can generate realistic and coherent 360deg videos from in-the-wild perspective video.
arXiv Detail & Related papers (2025-04-10T17:51:38Z) - ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model [16.14713604672497]
ReconX is a novel 3D scene reconstruction paradigm that reframes the ambiguous reconstruction challenge as a temporal generation task.<n>The proposed ReconX first constructs a global point cloud and encodes it into a contextual space as the 3D structure condition.<n> Guided by the condition, the video diffusion model then synthesizes video frames that are both detail-preserved and exhibit a high degree of 3D consistency.
arXiv Detail & Related papers (2024-08-29T17:59:40Z) - SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting [53.32467009064287]
We propose a text-driven 3D-consistent scene generation model: SceneDreamer360.
Our proposed method leverages a text-driven panoramic image generation model as a prior for 3D scene generation.
Our experiments demonstrate that SceneDreamer360 with its panoramic image generation and 3DGS can produce higher quality, spatially consistent, and visually appealing 3D scenes from any text prompt.
arXiv Detail & Related papers (2024-08-25T02:56:26Z) - LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation [105.52153675890408]
3D immersive scene generation is a challenging yet critical task in computer vision and graphics.<n>Layerpano3D is a novel framework for full-view, explorable panoramic 3D scene generation from a single text prompt.
arXiv Detail & Related papers (2024-08-23T17:50:23Z) - HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions [31.342899807980654]
3D scene generation is in high demand across various domains, including virtual reality, gaming, and the film industry.
We introduce HoloDreamer, a framework that first generates high-definition panorama as a holistic initialization of the full 3D scene.
We then leverage 3D Gaussian Splatting (3D-GS) to quickly reconstruct the 3D scene, thereby facilitating the creation of view-consistent and fully enclosed 3D scenes.
arXiv Detail & Related papers (2024-07-21T14:52:51Z) - FastScene: Text-Driven Fast 3D Indoor Scene Generation via Panoramic Gaussian Splatting [15.648080938815879]
We propose FastScene, a framework for fast and higher-quality 3D scene generation.
FastScene can generate a 3D scene within a mere 15 minutes, which is at least one hour faster than state-of-the-art methods.
arXiv Detail & Related papers (2024-05-09T13:44:16Z) - DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting [56.101576795566324]
We present a text-to-3D 360$circ$ scene generation pipeline.
Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement.
Our method offers a globally consistent 3D scene within a 360$circ$ perspective.
arXiv Detail & Related papers (2024-04-10T10:46:59Z) - 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes [59.15910989235392]
We introduce NeO 360, Neural fields for sparse view synthesis of outdoor scenes.
NeO 360 is a generalizable method that reconstructs 360deg scenes from a single or a few posed RGB images.
Our representation combines the best of both voxel-based and bird's-eye-view (BEV) representations.
arXiv Detail & Related papers (2023-08-24T17:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.