Related papers: UrbanWorld: An Urban World Model for 3D City Generation

UrbanWorld: An Urban World Model for 3D City Generation

URL: http://arxiv.org/abs/2407.11965v2
Date: Tue, 22 Oct 2024 08:31:44 GMT
Title: UrbanWorld: An Urban World Model for 3D City Generation
Authors: Yu Shang, Yuming Lin, Yu Zheng, Hangyu Fan, Jingtao Ding, Jie Feng, Jiansheng Chen, Li Tian, Yong Li,
Abstract summary: UrbanWorld is a generative urban world model that can automatically create a customized, realistic and interactive 3D urban world with flexible control conditions. We conduct extensive quantitative analysis on five visual metrics, demonstrating that UrbanWorld achieves SOTA generation realism. We verify the interactive nature of these environments by showcasing the agent perception and navigation within the created environments.
Score: 21.21375372182025
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cities, as the essential environment of human life, encompass diverse physical elements such as buildings, roads and vegetation, which continuously interact with dynamic entities like people and vehicles. Crafting realistic, interactive 3D urban environments is essential for nurturing AGI systems and constructing AI agents capable of perceiving, decision-making, and acting like humans in real-world environments. However, creating high-fidelity 3D urban environments usually entails extensive manual labor from designers, involving intricate detailing and representation of complex urban elements. Therefore, accomplishing this automatically remains a longstanding challenge. Toward this problem, we propose UrbanWorld, the first generative urban world model that can automatically create a customized, realistic and interactive 3D urban world with flexible control conditions. UrbanWorld incorporates four key stages in the generation pipeline: flexible 3D layout generation from OSM data or urban layout with semantic and height maps, urban scene design with Urban MLLM, controllable urban asset rendering via progressive 3D diffusion, and MLLM-assisted scene refinement. We conduct extensive quantitative analysis on five visual metrics, demonstrating that UrbanWorld achieves SOTA generation realism. Next, we provide qualitative results about the controllable generation capabilities of UrbanWorld using both textual and image-based prompts. Lastly, we verify the interactive nature of these environments by showcasing the agent perception and navigation within the created environments. We contribute UrbanWorld as an open-source tool available at https://github.com/Urban-World/UrbanWorld.

Related papers

SynCity: Training-Free Generation of 3D Worlds [107.69875149880679]
We propose SynCity, a training- and optimization-free approach to generating 3D worlds from textual descriptions. We show how 3D and 2D generators can be combined to generate ever-expanding scenes.
arXiv Detail & Related papers (2025-03-20T17:59:40Z)
Compositional Generative Model of Unbounded 4D Cities [44.203932215464214]
We propose a compositional generative model specifically tailored for generating 4D cities. CityDreamer4D supports a range of downstream applications, such as instance editing, city stylization, and urban simulation.
arXiv Detail & Related papers (2025-01-15T17:59:56Z)
GenEx: Generating an Explorable World [59.0666303068111]
We introduce GenEx, a system capable of planning complex embodied world exploration, guided by its generative imagination. GenEx generates an entire 3D-consistent imaginative environment from as little as a single RGB image. GPT-assisted agents are equipped to perform complex embodied tasks, including both goal-agnostic exploration and goal-driven navigation.
arXiv Detail & Related papers (2024-12-12T18:59:57Z)
AerialGo: Walking-through City View Generation from Aerial Perspectives [48.53976414257845]
AerialGo is a framework that generates realistic walking-through city views from aerial images. By conditioning ground-view synthesis on accessible aerial data, AerialGo bypasses the privacy risks inherent in ground-level imagery. Experiments show that AerialGo significantly enhances ground-level realism and structural coherence.
arXiv Detail & Related papers (2024-11-29T08:14:07Z)
CityX: Controllable Procedural Content Generation for Unbounded 3D Cities [55.737060358043536]
We propose a novel multi-modal controllable procedural content generation method, named CityX. It enhances realistic, unbounded 3D city generation guided by multiple layout conditions, including OSM, semantic maps, and satellite images. Through this effective framework, CityX shows the potential to build an innovative ecosystem for 3D scene generation.
arXiv Detail & Related papers (2024-07-24T18:05:13Z)
3D Question Answering for City Scene Understanding [12.433903847890322]
3D multimodal question answering (MQA) plays a crucial role in scene understanding by enabling intelligent agents to comprehend their surroundings in 3D environments. We introduce a novel 3D MQA dataset named City-3DQA for city-level scene understanding. A new benchmark is reported and our proposed Sg-CityU achieves accuracy of 63.94 % and 63.76 % in different settings of City-3DQA.
arXiv Detail & Related papers (2024-07-24T16:22:27Z)
MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility [52.0930915607703]
Recent advances in Robotics and Embodied AI make public urban spaces no longer exclusive to humans. Micromobility enabled by AI for short-distance travel in public urban spaces plays a crucial component in the future transportation system. We present MetaUrban, a compositional simulation platform for the AI-driven urban micromobility research.
arXiv Detail & Related papers (2024-07-11T17:56:49Z)
CityCraft: A Real Crafter for 3D City Generation [25.7885801163556]
CityCraft is an innovative framework designed to enhance both the diversity and quality of urban scene generation. Our approach integrates three key stages: initially, a diffusion transformer (DiT) model is deployed to generate diverse and controllable 2D city layouts. Based on the generated layout and city plan, we utilize the asset retrieval module and Blender for precise asset placement and scene construction.
arXiv Detail & Related papers (2024-06-07T14:49:00Z)
Urban Scene Diffusion through Semantic Occupancy Map [49.20779809250597]
UrbanDiffusion is a 3D diffusion model conditioned on a Bird's-Eye View (BEV) map. Our model learns the data distribution of scene-level structures within a latent space. After training on real-world driving datasets, our model can generate a wide range of diverse urban scenes.
arXiv Detail & Related papers (2024-03-18T11:54:35Z)
Urban Generative Intelligence (UGI): A Foundational Platform for Agents in Embodied City Environment [32.53845672285722]
Urban environments, characterized by their complex, multi-layered networks, face significant challenges in the face of rapid urbanization. Recent developments in big data, artificial intelligence, urban computing, and digital twins have laid the groundwork for sophisticated city modeling and simulation. This paper proposes Urban Generative Intelligence (UGI), a novel foundational platform integrating Large Language Models (LLMs) into urban systems.
arXiv Detail & Related papers (2023-12-19T03:12:13Z)
CityDreamer: Compositional Generative Model of Unbounded 3D Cities [44.203932215464214]
CityDreamer is a compositional generative model designed specifically for unbounded 3D cities. We adopt the bird's eye view scene representation and employ a volumetric render for both instance-oriented and stuff-oriented neural fields. CityDreamer achieves state-of-the-art performance not only in generating realistic 3D cities but also in localized editing within the generated cities.
arXiv Detail & Related papers (2023-09-01T17:57:02Z)
CIRCLE: Capture In Rich Contextual Environments [69.97976304918149]
We propose a novel motion acquisition system in which the actor perceives and operates in a highly contextual virtual world. We present CIRCLE, a dataset containing 10 hours of full-body reaching motion from 5 subjects across nine scenes. We use this dataset to train a model that generates human motion conditioned on scene information.
arXiv Detail & Related papers (2023-03-31T09:18:12Z)
Future Urban Scenes Generation Through Vehicles Synthesis [90.1731992199415]
We propose a deep learning pipeline to predict the visual future appearance of an urban scene. We follow a two stages approach, where interpretable information is included in the loop and each actor is modelled independently. We show the superiority of this approach over traditional end-to-end scene-generation methods on CityFlow.
arXiv Detail & Related papers (2020-07-01T08:40:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.