MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts
- URL: http://arxiv.org/abs/2511.20415v1
- Date: Tue, 25 Nov 2025 15:40:12 GMT
- Title: MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts
- Authors: Zilong Huang, Jun He, Xiaobin Huang, Ziyi Xiong, Yang Luo, Junyan Ye, Weijia Li, Yiping Chen, Ting Han,
- Abstract summary: We introduce MajutsuCity, a natural language-driven and adaptive framework for synthesizing 3D urban scenes.<n>MajutsuCity represents a city as a composition of controllable layouts, assets, and materials, and operates through a four-stage pipeline.<n>We develop a practical set of evaluation metrics, covering key dimensions such as structural consistency, scene complexity, material fidelity, and lighting atmosphere.
- Score: 37.22973657277324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating realistic 3D cities is fundamental to world models, virtual reality, and game development, where an ideal urban scene must satisfy both stylistic diversity, fine-grained, and controllability. However, existing methods struggle to balance the creative flexibility offered by text-based generation with the object-level editability enabled by explicit structural representations. We introduce MajutsuCity, a natural language-driven and aesthetically adaptive framework for synthesizing structurally consistent and stylistically diverse 3D urban scenes. MajutsuCity represents a city as a composition of controllable layouts, assets, and materials, and operates through a four-stage pipeline. To extend controllability beyond initial generation, we further integrate MajutsuAgent, an interactive language-grounded editing agent} that supports five object-level operations. To support photorealistic and customizable scene synthesis, we also construct MajutsuDataset, a high-quality multimodal dataset} containing 2D semantic layouts and height maps, diverse 3D building assets, and curated PBR materials and skyboxes, each accompanied by detailed annotations. Meanwhile, we develop a practical set of evaluation metrics, covering key dimensions such as structural consistency, scene complexity, material fidelity, and lighting atmosphere. Extensive experiments demonstrate MajutsuCity reduces layout FID by 83.7% compared with CityDreamer and by 20.1% over CityCraft. Our method ranks first across all AQS and RDR scores, outperforming existing methods by a clear margin. These results confirm MajutsuCity as a new state-of-the-art in geometric fidelity, stylistic adaptability, and semantic controllability for 3D city generation. We expect our framework can inspire new avenues of research in 3D city generation. Our dataset and code will be released at https://github.com/LongHZ140516/MajutsuCity.
Related papers
- SceMoS: Scene-Aware 3D Human Motion Synthesis by Planning with Geometry-Grounded Tokens [89.05195827071582]
SceMoS is a scene-aware motion synthesis framework.<n>It disentangles global planning from local execution using lightweight 2D cues.<n>SceMoS achieves state-of-the-art motion realism and contact accuracy on the TRUMANS benchmark.
arXiv Detail & Related papers (2026-02-24T02:09:12Z) - Imagine a City: CityGenAgent for Procedural 3D City Generation [22.929582644377277]
We introduce CityGenAgent, a natural language-driven framework for hierarchical procedural generation of high-quality 3D cities.<n>Our approach decomposes city generation into two interpretable components, Block Program and Building Program.<n>Benefiting from the programs and the models' generalization, CityGenAgent supports natural language editing and manipulation.
arXiv Detail & Related papers (2026-02-05T06:36:03Z) - Yo'City: Personalized and Boundless 3D Realistic City Scene Generation via Self-Critic Expansion [28.00050174055204]
Yo'City is a novel agentic framework that enables user-customized and infinitely expandable 3D city generation.<n>To simulate continuous city evolution, Yo'City introduces a user-interactive, relationship-guided expansion mechanism.
arXiv Detail & Related papers (2025-11-24T04:02:48Z) - WorldGrow: Generating Infinite 3D World [75.81531067447203]
We tackle the challenge of generating the infinitely extendable 3D world -- large, continuous environments with coherent geometry and realistic appearance.<n>We propose WorldGrow, a hierarchical framework for unbounded 3D scene synthesis.<n>Our method features three core components: (1) a data curation pipeline that extracts high-quality scene blocks for training, making the 3D structured latent representations suitable for scene generation; (2) a 3D block inpainting mechanism that enables context-aware scene extension; and (3) a coarse-to-fine generation strategy that ensures both global layout plausibility and local geometric/textural fidelity.
arXiv Detail & Related papers (2025-10-24T17:39:52Z) - Compositional Generative Model of Unbounded 4D Cities [56.36624718397362]
We propose a compositional generative model specifically tailored for generating 4D cities.<n>CityDreamer4D supports a range of downstream applications, such as instance editing, city stylization, and urban simulation.
arXiv Detail & Related papers (2025-01-15T17:59:56Z) - CityX: Controllable Procedural Content Generation for Unbounded 3D Cities [50.10101235281943]
Current generative methods fall short in either diversity, controllability, or fidelity.<n>In this work, we resort to the procedural content generation (PCG) technique for high-fidelity generation.<n>We develop a multi-agent framework to transform multi-modal instructions, including OSM, semantic maps, and satellite images, into executable programs.<n>Our method, named CityX, demonstrates its superiority in creating diverse, controllable, and realistic 3D urban scenes.
arXiv Detail & Related papers (2024-07-24T18:05:13Z) - CityCraft: A Real Crafter for 3D City Generation [25.7885801163556]
CityCraft is an innovative framework designed to enhance both the diversity and quality of urban scene generation.
Our approach integrates three key stages: initially, a diffusion transformer (DiT) model is deployed to generate diverse and controllable 2D city layouts.
Based on the generated layout and city plan, we utilize the asset retrieval module and Blender for precise asset placement and scene construction.
arXiv Detail & Related papers (2024-06-07T14:49:00Z) - Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior [43.14168074750301]
We introduce a compositional 3D layout representation into text-to-3D paradigm, serving as an additional prior.
It comprises a set of semantic primitives with simple geometric structures and explicit arrangement relationships.
We also present various scene editing demonstrations, showing the powers of steerable urban scene generation.
arXiv Detail & Related papers (2024-04-10T06:41:30Z) - Urban Scene Diffusion through Semantic Occupancy Map [49.20779809250597]
UrbanDiffusion is a 3D diffusion model conditioned on a Bird's-Eye View (BEV) map.
Our model learns the data distribution of scene-level structures within a latent space.
After training on real-world driving datasets, our model can generate a wide range of diverse urban scenes.
arXiv Detail & Related papers (2024-03-18T11:54:35Z) - CityDreamer: Compositional Generative Model of Unbounded 3D Cities [44.203932215464214]
CityDreamer is a compositional generative model designed specifically for unbounded 3D cities.
We adopt the bird's eye view scene representation and employ a volumetric render for both instance-oriented and stuff-oriented neural fields.
CityDreamer achieves state-of-the-art performance not only in generating realistic 3D cities but also in localized editing within the generated cities.
arXiv Detail & Related papers (2023-09-01T17:57:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.