Imagine a City: CityGenAgent for Procedural 3D City Generation
- URL: http://arxiv.org/abs/2602.05362v1
- Date: Thu, 05 Feb 2026 06:36:03 GMT
- Title: Imagine a City: CityGenAgent for Procedural 3D City Generation
- Authors: Zishan Liu, Zecong Tang, RuoCheng Wu, Xinzhe Zheng, Jingyu Hu, Ka-Hei Hui, Haoran Xie, Bo Dai, Zhengzhe Liu,
- Abstract summary: We introduce CityGenAgent, a natural language-driven framework for hierarchical procedural generation of high-quality 3D cities.<n>Our approach decomposes city generation into two interpretable components, Block Program and Building Program.<n>Benefiting from the programs and the models' generalization, CityGenAgent supports natural language editing and manipulation.
- Score: 22.929582644377277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The automated generation of interactive 3D cities is a critical challenge with broad applications in autonomous driving, virtual reality, and embodied intelligence. While recent advances in generative models and procedural techniques have improved the realism of city generation, existing methods often struggle with high-fidelity asset creation, controllability, and manipulation. In this work, we introduce CityGenAgent, a natural language-driven framework for hierarchical procedural generation of high-quality 3D cities. Our approach decomposes city generation into two interpretable components, Block Program and Building Program. To ensure structural correctness and semantic alignment, we adopt a two-stage learning strategy: (1) Supervised Fine-Tuning (SFT). We train BlockGen and BuildingGen to generate valid programs that adhere to schema constraints, including non-self-intersecting polygons and complete fields; (2) Reinforcement Learning (RL). We design Spatial Alignment Reward to enhance spatial reasoning ability and Visual Consistency Reward to bridge the gap between textual descriptions and the visual modality. Benefiting from the programs and the models' generalization, CityGenAgent supports natural language editing and manipulation. Comprehensive evaluations demonstrate superior semantic alignment, visual quality, and controllability compared to existing methods, establishing a robust foundation for scalable 3D city generation.
Related papers
- MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts [37.22973657277324]
We introduce MajutsuCity, a natural language-driven and adaptive framework for synthesizing 3D urban scenes.<n>MajutsuCity represents a city as a composition of controllable layouts, assets, and materials, and operates through a four-stage pipeline.<n>We develop a practical set of evaluation metrics, covering key dimensions such as structural consistency, scene complexity, material fidelity, and lighting atmosphere.
arXiv Detail & Related papers (2025-11-25T15:40:12Z) - Yo'City: Personalized and Boundless 3D Realistic City Scene Generation via Self-Critic Expansion [28.00050174055204]
Yo'City is a novel agentic framework that enables user-customized and infinitely expandable 3D city generation.<n>To simulate continuous city evolution, Yo'City introduces a user-interactive, relationship-guided expansion mechanism.
arXiv Detail & Related papers (2025-11-24T04:02:48Z) - Sat2RealCity: Geometry-Aware and Appearance-Controllable 3D Urban Generation from Satellite Imagery [12.88788681361607]
Sat2RealCity is a geometry-aware and appearance-controllable framework for 3D urban generation from real-world satellite imagery.<n>We introduce the OSM-based spatial priors strategy to achieve interpretable geometric generation from spatial topology to building instances.<n>We construct an MLLM-powered semantic-guided generation pipeline, bridging semantic interpretation and geometric reconstruction.
arXiv Detail & Related papers (2025-11-14T16:42:03Z) - WorldGrow: Generating Infinite 3D World [75.81531067447203]
We tackle the challenge of generating the infinitely extendable 3D world -- large, continuous environments with coherent geometry and realistic appearance.<n>We propose WorldGrow, a hierarchical framework for unbounded 3D scene synthesis.<n>Our method features three core components: (1) a data curation pipeline that extracts high-quality scene blocks for training, making the 3D structured latent representations suitable for scene generation; (2) a 3D block inpainting mechanism that enables context-aware scene extension; and (3) a coarse-to-fine generation strategy that ensures both global layout plausibility and local geometric/textural fidelity.
arXiv Detail & Related papers (2025-10-24T17:39:52Z) - UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding [65.60549881706959]
We introduce UniUGG, the first unified understanding and generation framework for 3D modalities.<n>Our framework employs an LLM to comprehend and decode sentences and 3D representations.<n>We propose a spatial decoder leveraging a latent diffusion model to generate high-quality 3D representations.
arXiv Detail & Related papers (2025-08-16T07:27:31Z) - Agentic 3D Scene Generation with Spatially Contextualized VLMs [67.31920821192323]
We introduce a new paradigm that enables vision-language models to generate, understand, and edit complex 3D environments.<n>We develop an agentic 3D scene generation pipeline in which the VLM iteratively reads from and updates the spatial context.<n>Results show that our framework can handle diverse and challenging inputs, achieving a level of generalization not observed in prior work.
arXiv Detail & Related papers (2025-05-26T15:28:17Z) - Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets [90.99212668875971]
Step1X-3D is an open framework addressing challenges such as data scarcity, algorithmic limitations, and ecosystem fragmentation.<n>We present a two-stage 3D-native architecture combining a hybrid VAE-DiT geometry generator with a diffusion-based texture synthesis module.<n> Benchmark results demonstrate state-of-the-art performance that exceeds existing open-source methods.
arXiv Detail & Related papers (2025-05-12T16:56:30Z) - Proc-GS: Procedural Building Generation for City Assembly with 3D Gaussians [65.09942210464747]
Building asset creation is labor-intensive and requires specialized skills to develop design rules.<n>Recent generative models for building creation often overlook these patterns, leading to low visual fidelity and limited scalability.<n>By manipulating procedural code, we can streamline this process and generate an infinite variety of buildings.
arXiv Detail & Related papers (2024-12-10T16:45:32Z) - CityX: Controllable Procedural Content Generation for Unbounded 3D Cities [50.10101235281943]
Current generative methods fall short in either diversity, controllability, or fidelity.<n>In this work, we resort to the procedural content generation (PCG) technique for high-fidelity generation.<n>We develop a multi-agent framework to transform multi-modal instructions, including OSM, semantic maps, and satellite images, into executable programs.<n>Our method, named CityX, demonstrates its superiority in creating diverse, controllable, and realistic 3D urban scenes.
arXiv Detail & Related papers (2024-07-24T18:05:13Z) - CityCraft: A Real Crafter for 3D City Generation [25.7885801163556]
CityCraft is an innovative framework designed to enhance both the diversity and quality of urban scene generation.
Our approach integrates three key stages: initially, a diffusion transformer (DiT) model is deployed to generate diverse and controllable 2D city layouts.
Based on the generated layout and city plan, we utilize the asset retrieval module and Blender for precise asset placement and scene construction.
arXiv Detail & Related papers (2024-06-07T14:49:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.