Related papers: WorldExplorer: Towards Generating Fully Navigable 3D Scenes

WorldExplorer: Towards Generating Fully Navigable 3D Scenes

URL: http://arxiv.org/abs/2506.01799v1
Date: Mon, 02 Jun 2025 15:41:31 GMT
Title: WorldExplorer: Towards Generating Fully Navigable 3D Scenes
Authors: Manuel-Andreas Schneider, Lukas Höllein, Matthias Nießner,
Abstract summary: WorldExplorer builds fully navigable 3D scenes with consistent visual quality across a wide range of viewpoints.<n>We generate multiple videos along short, pre-defined trajectories, that explore the scene in depth.<n>Our novel scene memory conditions each video on the most relevant prior views, while a collision-detection mechanism prevents degenerate results.
Score: 49.21733308718443
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating 3D worlds from text is a highly anticipated goal in computer vision. Existing works are limited by the degree of exploration they allow inside of a scene, i.e., produce streched-out and noisy artifacts when moving beyond central or panoramic perspectives. To this end, we propose WorldExplorer, a novel method based on autoregressive video trajectory generation, which builds fully navigable 3D scenes with consistent visual quality across a wide range of viewpoints. We initialize our scenes by creating multi-view consistent images corresponding to a 360 degree panorama. Then, we expand it by leveraging video diffusion models in an iterative scene generation pipeline. Concretely, we generate multiple videos along short, pre-defined trajectories, that explore the scene in depth, including motion around objects. Our novel scene memory conditions each video on the most relevant prior views, while a collision-detection mechanism prevents degenerate results, like moving into objects. Finally, we fuse all generated views into a unified 3D representation via 3D Gaussian Splatting optimization. Compared to prior approaches, WorldExplorer produces high-quality scenes that remain stable under large camera motion, enabling for the first time realistic and unrestricted exploration. We believe this marks a significant step toward generating immersive and truly explorable virtual 3D environments.

Related papers

Matrix-3D: Omnidirectional Explorable 3D World Generation [20.568791715708134]
We propose Matrix-3D, a framework that utilize panoramic representation for wide-coverage omnidirectional 3D world generation.<n>We first train a trajectory-guided panoramic video diffusion model that employs scene mesh renders as condition.<n>To lift the panorama scene video to 3D world, we propose two separate methods: (1) a feed-forward large panorama reconstruction model for rapid 3D scene reconstruction and (2) an optimization-based pipeline for accurate and detailed 3D scene reconstruction.
arXiv Detail & Related papers (2025-08-11T15:29:57Z)
Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation [66.95956271144982]
We present Voyager, a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image.<n>Unlike existing approaches, Voyager achieves end-to-end scene generation and reconstruction with inherent consistency across frames.
arXiv Detail & Related papers (2025-06-04T17:59:04Z)
WorldPrompter: Traversable Text-to-Scene Generation [18.405299478122693]
We introduce WorldPrompter, a novel generative pipeline for synthesizing traversable 3D scenes from text prompts.<n>WorldPrompter incorporates a conditional 360deg panoramic video generator, capable of producing a 128-frame video that simulates a person walking through and capturing a virtual environment.<n>The resulting video is then reconstructed as Gaussian splats by a fast feedforward 3D reconstructor, enabling a true walkable experience within the 3D scene.
arXiv Detail & Related papers (2025-04-02T18:04:32Z)
From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos [71.22810401256234]
Three-dimensional (3D) understanding of objects and scenes play a key role in humans' ability to interact with the world.<n>Large scale synthetic and object-centric 3D datasets have shown to be effective in training models that have 3D understanding of objects.<n>We introduce 360-1M, a 360 video dataset, and a process for efficiently finding corresponding frames from diverse viewpoints at scale.
arXiv Detail & Related papers (2024-12-10T18:59:44Z)
SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting [53.32467009064287]
We propose a text-driven 3D-consistent scene generation model: SceneDreamer360. Our proposed method leverages a text-driven panoramic image generation model as a prior for 3D scene generation. Our experiments demonstrate that SceneDreamer360 with its panoramic image generation and 3DGS can produce higher quality, spatially consistent, and visually appealing 3D scenes from any text prompt.
arXiv Detail & Related papers (2024-08-25T02:56:26Z)
WonderWorld: Interactive 3D Scene Generation from a Single Image [38.83667648993784]
We present WonderWorld, a novel framework for interactive 3D scene generation.<n>WonderWorld generates connected and diverse 3D scenes in less than 10 seconds on a single A6000 GPU.
arXiv Detail & Related papers (2024-06-13T17:59:10Z)
DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos [21.93514516437402]
We present DreamScene4D, the first approach to generate 3D dynamic scenes of multiple objects from monocular videos via novel view synthesis. Our key insight is a "decompose-recompose" approach that factorizes the video scene into the background and object tracks. We show extensive results on challenging DAVIS, Kubric, and self-captured videos with quantitative comparisons and a user preference study.
arXiv Detail & Related papers (2024-05-03T17:55:34Z)
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting [56.101576795566324]
We present a text-to-3D 360$circ$ scene generation pipeline. Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement. Our method offers a globally consistent 3D scene within a 360$circ$ perspective.
arXiv Detail & Related papers (2024-04-10T10:46:59Z)
SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets. We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.