Related papers: Yo'City: Personalized and Boundless 3D Realistic City Scene Generation via Self-Critic Expansion

Yo'City: Personalized and Boundless 3D Realistic City Scene Generation via Self-Critic Expansion

URL: http://arxiv.org/abs/2511.18734v2
Date: Fri, 28 Nov 2025 12:46:09 GMT
Title: Yo'City: Personalized and Boundless 3D Realistic City Scene Generation via Self-Critic Expansion
Authors: Keyang Lu, Sifan Zhou, Hongbin Xu, Gang Xu, Zhifei Yang, Yikai Wang, Zhen Xiao, Jieyi Long, Ming Li,
Abstract summary: Yo'City is a novel agentic framework that enables user-customized and infinitely expandable 3D city generation.<n>To simulate continuous city evolution, Yo'City introduces a user-interactive, relationship-guided expansion mechanism.
Score: 28.00050174055204
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Realistic 3D city generation is fundamental to a wide range of applications, including virtual reality and digital twins. However, most existing methods rely on training a single diffusion model, which limits their ability to generate personalized and boundless city-scale scenes. In this paper, we present Yo'City, a novel agentic framework that enables user-customized and infinitely expandable 3D city generation by leveraging the reasoning and compositional capabilities of off-the-shelf large models. Specifically, Yo'City first conceptualize the city through a top-down planning strategy that defines a hierarchical "City-District-Grid" structure. The Global Planner determines the overall layout and potential functional districts, while the Local Designer further refines each district with detailed grid-level descriptions. Subsequently, the grid-level 3D generation is achieved through a "produce-refine-evaluate" isometric image synthesis loop, followed by image-to-3D generation. To simulate continuous city evolution, Yo'City further introduces a user-interactive, relationship-guided expansion mechanism, which performs scene graph-based distance- and semantics-aware layout optimization, ensuring spatially coherent city growth. To comprehensively evaluate our method, we construct a diverse benchmark dataset and design six multi-dimensional metrics that assess generation quality from the perspectives of semantics, geometry, texture, and layout. Extensive experiments demonstrate that Yo'City consistently outperforms existing state-of-the-art methods across all evaluation aspects.

Related papers

MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts [37.22973657277324]
We introduce MajutsuCity, a natural language-driven and adaptive framework for synthesizing 3D urban scenes.<n>MajutsuCity represents a city as a composition of controllable layouts, assets, and materials, and operates through a four-stage pipeline.<n>We develop a practical set of evaluation metrics, covering key dimensions such as structural consistency, scene complexity, material fidelity, and lighting atmosphere.
arXiv Detail & Related papers (2025-11-25T15:40:12Z)
WorldGrow: Generating Infinite 3D World [75.81531067447203]
We tackle the challenge of generating the infinitely extendable 3D world -- large, continuous environments with coherent geometry and realistic appearance.<n>We propose WorldGrow, a hierarchical framework for unbounded 3D scene synthesis.<n>Our method features three core components: (1) a data curation pipeline that extracts high-quality scene blocks for training, making the 3D structured latent representations suitable for scene generation; (2) a 3D block inpainting mechanism that enables context-aware scene extension; and (3) a coarse-to-fine generation strategy that ensures both global layout plausibility and local geometric/textural fidelity.
arXiv Detail & Related papers (2025-10-24T17:39:52Z)
Controllable Generation of Large-Scale 3D Urban Layouts with Semantic and Structural Guidance [7.298148118365382]
We present a controllable framework for large-scale 3D vector urban layout generation.<n>By fusing geometric and semantic attributes, edge weights, and embedding building height in the graph, our method extends 2D layouts to realistic 3D structures.
arXiv Detail & Related papers (2025-09-28T11:08:17Z)
SynCity: Training-Free Generation of 3D Worlds [107.69875149880679]
We propose SynCity, a training- and optimization-free approach to generating 3D worlds from textual descriptions.<n>We show how 3D and 2D generators can be combined to generate ever-expanding scenes.
arXiv Detail & Related papers (2025-03-20T17:59:40Z)
Compositional Generative Model of Unbounded 4D Cities [56.36624718397362]
We propose a compositional generative model specifically tailored for generating 4D cities.<n>CityDreamer4D supports a range of downstream applications, such as instance editing, city stylization, and urban simulation.
arXiv Detail & Related papers (2025-01-15T17:59:56Z)
CityX: Controllable Procedural Content Generation for Unbounded 3D Cities [50.10101235281943]
Current generative methods fall short in either diversity, controllability, or fidelity.<n>In this work, we resort to the procedural content generation (PCG) technique for high-fidelity generation.<n>We develop a multi-agent framework to transform multi-modal instructions, including OSM, semantic maps, and satellite images, into executable programs.<n>Our method, named CityX, demonstrates its superiority in creating diverse, controllable, and realistic 3D urban scenes.
arXiv Detail & Related papers (2024-07-24T18:05:13Z)
COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation [1.5745692520785073]
We introduce a novel graph-based masked autoencoder (GMAE) for city-scale urban layout generation. The method encodes attributed buildings, city blocks, communities and cities into a unified graph structure. Our approach achieves good realism, semantic consistency, and correctness across the heterogeneous urban styles in 330 US cities.
arXiv Detail & Related papers (2024-07-16T00:49:53Z)
CityCraft: A Real Crafter for 3D City Generation [25.7885801163556]
CityCraft is an innovative framework designed to enhance both the diversity and quality of urban scene generation. Our approach integrates three key stages: initially, a diffusion transformer (DiT) model is deployed to generate diverse and controllable 2D city layouts. Based on the generated layout and city plan, we utilize the asset retrieval module and Blender for precise asset placement and scene construction.
arXiv Detail & Related papers (2024-06-07T14:49:00Z)
Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior [43.14168074750301]
We introduce a compositional 3D layout representation into text-to-3D paradigm, serving as an additional prior. It comprises a set of semantic primitives with simple geometric structures and explicit arrangement relationships. We also present various scene editing demonstrations, showing the powers of steerable urban scene generation.
arXiv Detail & Related papers (2024-04-10T06:41:30Z)
Urban Scene Diffusion through Semantic Occupancy Map [49.20779809250597]
UrbanDiffusion is a 3D diffusion model conditioned on a Bird's-Eye View (BEV) map. Our model learns the data distribution of scene-level structures within a latent space. After training on real-world driving datasets, our model can generate a wide range of diverse urban scenes.
arXiv Detail & Related papers (2024-03-18T11:54:35Z)
CityGen: Infinite and Controllable City Layout Generation [23.01347015691264]
CityGen is an end-to-end framework for infinite, diverse, and controllable city layout generation.<n>Our framework introduces an infinite expansion module to extend local layouts to city-scale layouts.<n>We convert the 2D layout to 3D by synthesizing a height field, facilitating downstream applications.
arXiv Detail & Related papers (2023-12-03T21:16:37Z)
Future Urban Scenes Generation Through Vehicles Synthesis [90.1731992199415]
We propose a deep learning pipeline to predict the visual future appearance of an urban scene. We follow a two stages approach, where interpretable information is included in the loop and each actor is modelled independently. We show the superiority of this approach over traditional end-to-end scene-generation methods on CityFlow.
arXiv Detail & Related papers (2020-07-01T08:40:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.