WorldGen: From Text to Traversable and Interactive 3D Worlds
- URL: http://arxiv.org/abs/2511.16825v1
- Date: Thu, 20 Nov 2025 22:13:18 GMT
- Title: WorldGen: From Text to Traversable and Interactive 3D Worlds
- Authors: Dilin Wang, Hyunyoung Jung, Tom Monnier, Kihyuk Sohn, Chuhang Zou, Xiaoyu Xiang, Yu-Ying Yeh, Di Liu, Zixuan Huang, Thu Nguyen-Phuoc, Yuchen Fan, Sergiu Oprea, Ziyan Wang, Roman Shapovalov, Nikolaos Sarafianos, Thibault Groueix, Antoine Toisoul, Prithviraj Dhar, Xiao Chu, Minghao Chen, Geon Yeong Park, Mahima Gupta, Yassir Azziz, Rakesh Ranjan, Andrea Vedaldi,
- Abstract summary: We introduce WorldGen, a system that enables the automatic creation of large-scale, interactive 3D worlds directly from text prompts.<n>Our approach transforms natural language descriptions into fully textured environments that can be immediately explored or edited within standard game engines.<n>This work represents a step towards accessible, generative world-building at scale, advancing the frontier of 3D generative AI for applications in gaming, simulation, and immersive social environments.
- Score: 87.95088818329403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce WorldGen, a system that enables the automatic creation of large-scale, interactive 3D worlds directly from text prompts. Our approach transforms natural language descriptions into traversable, fully textured environments that can be immediately explored or edited within standard game engines. By combining LLM-driven scene layout reasoning, procedural generation, diffusion-based 3D generation, and object-aware scene decomposition, WorldGen bridges the gap between creative intent and functional virtual spaces, allowing creators to design coherent, navigable worlds without manual modeling or specialized 3D expertise. The system is fully modular and supports fine-grained control over layout, scale, and style, producing worlds that are geometrically consistent, visually rich, and efficient to render in real time. This work represents a step towards accessible, generative world-building at scale, advancing the frontier of 3D generative AI for applications in gaming, simulation, and immersive social environments.
Related papers
- Beyond Pixel Histories: World Models with Persistent 3D State [50.4601060508243]
PERSIST is a new paradigm of world model which simulates the evolution of a latent 3D scene.<n>We show substantial improvements in spatial memory, 3D consistency, and long-horizon stability over existing methods.
arXiv Detail & Related papers (2026-03-03T19:58:31Z) - WorldGrow: Generating Infinite 3D World [75.81531067447203]
We tackle the challenge of generating the infinitely extendable 3D world -- large, continuous environments with coherent geometry and realistic appearance.<n>We propose WorldGrow, a hierarchical framework for unbounded 3D scene synthesis.<n>Our method features three core components: (1) a data curation pipeline that extracts high-quality scene blocks for training, making the 3D structured latent representations suitable for scene generation; (2) a 3D block inpainting mechanism that enables context-aware scene extension; and (3) a coarse-to-fine generation strategy that ensures both global layout plausibility and local geometric/textural fidelity.
arXiv Detail & Related papers (2025-10-24T17:39:52Z) - NeoWorld: Neural Simulation of Explorable Virtual Worlds via Progressive 3D Unfolding [46.79724166827757]
We introduce NeoWorld, a framework for generating interactive 3D virtual worlds from a single input image.<n>Inspired by the on-demand worldbuilding concept in the science fiction novel Simulacron-3 (1964), our system constructs expansive environments.
arXiv Detail & Related papers (2025-09-29T08:24:28Z) - LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation [35.4193352348583]
We propose a simple yet effective 3D world generation framework that streamlines the industrial production pipeline of 3D environments.<n>LatticeWorld creates large-scale 3D interactive worlds with dynamic agents, featuring competitive multi-agent interaction.<n>LatticeWorld achieves over a $90times$ increase in industrial production efficiency.
arXiv Detail & Related papers (2025-09-05T17:22:33Z) - HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels [30.986527559921335]
HunyuanWorld 1.0 is a novel framework that combines the best of both worlds for generating immersive, explorable, and interactive 3D scenes from text and image conditions.<n>Our approach features three key advantages: 1) 360deg immersive experiences via panoramic world proxies; 2) mesh export capabilities for seamless compatibility with existing computer graphics pipelines; 3) disentangled object representations for augmented interactivity.
arXiv Detail & Related papers (2025-07-29T13:43:35Z) - SynCity: Training-Free Generation of 3D Worlds [107.69875149880679]
We propose SynCity, a training- and optimization-free approach to generating 3D worlds from textual descriptions.<n>We show how 3D and 2D generators can be combined to generate ever-expanding scenes.
arXiv Detail & Related papers (2025-03-20T17:59:40Z) - GenEx: Generating an Explorable World [59.0666303068111]
We introduce GenEx, a system capable of planning complex embodied world exploration, guided by its generative imagination.<n>GenEx generates an entire 3D-consistent imaginative environment from as little as a single RGB image.<n> GPT-assisted agents are equipped to perform complex embodied tasks, including both goal-agnostic exploration and goal-driven navigation.
arXiv Detail & Related papers (2024-12-12T18:59:57Z) - Towards Language-guided Interactive 3D Generation: LLMs as Layout
Interpreter with Generative Feedback [20.151147653552155]
Large Language Models (LLMs) have demonstrated impressive reasoning, conversational, and zero-shot generation abilities.
We propose a novel language-guided interactive 3D generation system, dubbed LI3D, that integrates LLMs as a 3D layout interpreter.
Our system also incorporates LLaVA, a large language and vision assistant, to provide generative feedback from the visual aspect for improving the visual quality of generated content.
arXiv Detail & Related papers (2023-05-25T07:43:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.