Related papers: UrbanWorld: An Urban World Model for 3D City Generation

UrbanWorld: An Urban World Model for 3D City Generation

URL: http://arxiv.org/abs/2407.11965v1
Date: Tue, 16 Jul 2024 17:59:29 GMT
Title: UrbanWorld: An Urban World Model for 3D City Generation
Authors: Yu Shang, Jiansheng Chen, Hangyu Fan, Jingtao Ding, Jie Feng, Yong Li,
Abstract summary: UrbanWorld is the first generative urban world model that can automatically create a customized, realistic and interactive 3D urban world with flexible control conditions. The crafted high-fidelity 3D urban environments enable realistic feedback and interactions for general AI and machine perceptual systems in simulations.
Score: 15.095017388300947
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cities, as the most fundamental environment of human life, encompass diverse physical elements such as buildings, roads and vegetation with complex interconnection. Crafting realistic, interactive 3D urban environments plays a crucial role in constructing AI agents capable of perceiving, decision-making, and acting like humans in real-world environments. However, creating high-fidelity 3D urban environments usually entails extensive manual labor from designers, involving intricate detailing and accurate representation of complex urban features. Therefore, how to accomplish this in an automatical way remains a longstanding challenge. Toward this problem, we propose UrbanWorld, the first generative urban world model that can automatically create a customized, realistic and interactive 3D urban world with flexible control conditions. UrbanWorld incorporates four key stages in the automatical crafting pipeline: 3D layout generation from openly accessible OSM data, urban scene planning and designing with a powerful urban multimodal large language model (Urban MLLM), controllable urban asset rendering with advanced 3D diffusion techniques, and finally the MLLM-assisted scene refinement. The crafted high-fidelity 3D urban environments enable realistic feedback and interactions for general AI and machine perceptual systems in simulations. We are working on contributing UrbanWorld as an open-source and versatile platform for evaluating and improving AI abilities in perception, decision-making, and interaction in realistic urban environments.

Related papers

SynCity: Training-Free Generation of 3D Worlds [107.69875149880679]
We propose SynCity, a training- and optimization-free approach to generating 3D worlds from textual descriptions. We show how 3D and 2D generators can be combined to generate ever-expanding scenes.
arXiv Detail & Related papers (2025-03-20T17:59:40Z)
Compositional Generative Model of Unbounded 4D Cities [44.203932215464214]
We propose a compositional generative model specifically tailored for generating 4D cities. CityDreamer4D supports a range of downstream applications, such as instance editing, city stylization, and urban simulation.
arXiv Detail & Related papers (2025-01-15T17:59:56Z)
GenEx: Generating an Explorable World [59.0666303068111]
We introduce GenEx, a system capable of planning complex embodied world exploration, guided by its generative imagination. GenEx generates an entire 3D-consistent imaginative environment from as little as a single RGB image. GPT-assisted agents are equipped to perform complex embodied tasks, including both goal-agnostic exploration and goal-driven navigation.
arXiv Detail & Related papers (2024-12-12T18:59:57Z)
AerialGo: Walking-through City View Generation from Aerial Perspectives [48.53976414257845]
AerialGo is a framework that generates realistic walking-through city views from aerial images. By conditioning ground-view synthesis on accessible aerial data, AerialGo bypasses the privacy risks inherent in ground-level imagery. Experiments show that AerialGo significantly enhances ground-level realism and structural coherence.
arXiv Detail & Related papers (2024-11-29T08:14:07Z)
CityX: Controllable Procedural Content Generation for Unbounded 3D Cities [55.737060358043536]
We propose a novel multi-modal controllable procedural content generation method, named CityX. It enhances realistic, unbounded 3D city generation guided by multiple layout conditions, including OSM, semantic maps, and satellite images. Through this effective framework, CityX shows the potential to build an innovative ecosystem for 3D scene generation.
arXiv Detail & Related papers (2024-07-24T18:05:13Z)
3D Question Answering for City Scene Understanding [12.433903847890322]
3D multimodal question answering (MQA) plays a crucial role in scene understanding by enabling intelligent agents to comprehend their surroundings in 3D environments. We introduce a novel 3D MQA dataset named City-3DQA for city-level scene understanding. A new benchmark is reported and our proposed Sg-CityU achieves accuracy of 63.94 % and 63.76 % in different settings of City-3DQA.
arXiv Detail & Related papers (2024-07-24T16:22:27Z)
MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility [52.0930915607703]
Recent advances in Robotics and Embodied AI make public urban spaces no longer exclusive to humans. Micromobility enabled by AI for short-distance travel in public urban spaces plays a crucial component in the future transportation system. We present MetaUrban, a compositional simulation platform for the AI-driven urban micromobility research.
arXiv Detail & Related papers (2024-07-11T17:56:49Z)
CityCraft: A Real Crafter for 3D City Generation [25.7885801163556]
CityCraft is an innovative framework designed to enhance both the diversity and quality of urban scene generation. Our approach integrates three key stages: initially, a diffusion transformer (DiT) model is deployed to generate diverse and controllable 2D city layouts. Based on the generated layout and city plan, we utilize the asset retrieval module and Blender for precise asset placement and scene construction.
arXiv Detail & Related papers (2024-06-07T14:49:00Z)
Urban Scene Diffusion through Semantic Occupancy Map [49.20779809250597]
UrbanDiffusion is a 3D diffusion model conditioned on a Bird's-Eye View (BEV) map. Our model learns the data distribution of scene-level structures within a latent space. After training on real-world driving datasets, our model can generate a wide range of diverse urban scenes.
arXiv Detail & Related papers (2024-03-18T11:54:35Z)
Urban Generative Intelligence (UGI): A Foundational Platform for Agents in Embodied City Environment [32.53845672285722]
Urban environments, characterized by their complex, multi-layered networks, face significant challenges in the face of rapid urbanization. Recent developments in big data, artificial intelligence, urban computing, and digital twins have laid the groundwork for sophisticated city modeling and simulation. This paper proposes Urban Generative Intelligence (UGI), a novel foundational platform integrating Large Language Models (LLMs) into urban systems.
arXiv Detail & Related papers (2023-12-19T03:12:13Z)
CityDreamer: Compositional Generative Model of Unbounded 3D Cities [44.203932215464214]
CityDreamer is a compositional generative model designed specifically for unbounded 3D cities. We adopt the bird's eye view scene representation and employ a volumetric render for both instance-oriented and stuff-oriented neural fields. CityDreamer achieves state-of-the-art performance not only in generating realistic 3D cities but also in localized editing within the generated cities.
arXiv Detail & Related papers (2023-09-01T17:57:02Z)
CIRCLE: Capture In Rich Contextual Environments [69.97976304918149]
We propose a novel motion acquisition system in which the actor perceives and operates in a highly contextual virtual world. We present CIRCLE, a dataset containing 10 hours of full-body reaching motion from 5 subjects across nine scenes. We use this dataset to train a model that generates human motion conditioned on scene information.
arXiv Detail & Related papers (2023-03-31T09:18:12Z)
Future Urban Scenes Generation Through Vehicles Synthesis [90.1731992199415]
We propose a deep learning pipeline to predict the visual future appearance of an urban scene. We follow a two stages approach, where interpretable information is included in the loop and each actor is modelled independently. We show the superiority of this approach over traditional end-to-end scene-generation methods on CityFlow.
arXiv Detail & Related papers (2020-07-01T08:40:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.