Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior
- URL: http://arxiv.org/abs/2404.06780v1
- Date: Wed, 10 Apr 2024 06:41:30 GMT
- Title: Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior
- Authors: Fan Lu, Kwan-Yee Lin, Yan Xu, Hongsheng Li, Guang Chen, Changjun Jiang,
- Abstract summary: We introduce a compositional 3D layout representation into text-to-3D paradigm, serving as an additional prior.
It comprises a set of semantic primitives with simple geometric structures and explicit arrangement relationships.
We also present various scene editing demonstrations, showing the powers of steerable urban scene generation.
- Score: 43.14168074750301
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-3D generation has achieved remarkable success via large-scale text-to-image diffusion models. Nevertheless, there is no paradigm for scaling up the methodology to urban scale. Urban scenes, characterized by numerous elements, intricate arrangement relationships, and vast scale, present a formidable barrier to the interpretability of ambiguous textual descriptions for effective model optimization. In this work, we surmount the limitations by introducing a compositional 3D layout representation into text-to-3D paradigm, serving as an additional prior. It comprises a set of semantic primitives with simple geometric structures and explicit arrangement relationships, complementing textual descriptions and enabling steerable generation. Upon this, we propose two modifications -- (1) We introduce Layout-Guided Variational Score Distillation to address model optimization inadequacies. It conditions the score distillation sampling process with geometric and semantic constraints of 3D layouts. (2) To handle the unbounded nature of urban scenes, we represent 3D scene with a Scalable Hash Grid structure, incrementally adapting to the growing scale of urban scenes. Extensive experiments substantiate the capability of our framework to scale text-to-3D generation to large-scale urban scenes that cover over 1000m driving distance for the first time. We also present various scene editing demonstrations, showing the powers of steerable urban scene generation. Website: https://urbanarchitect.github.io.
Related papers
- CityX: Controllable Procedural Content Generation for Unbounded 3D Cities [55.737060358043536]
We propose a novel multi-modal controllable procedural content generation method, named CityX.
It enhances realistic, unbounded 3D city generation guided by multiple layout conditions, including OSM, semantic maps, and satellite images.
Through this effective framework, CityX shows the potential to build an innovative ecosystem for 3D scene generation.
arXiv Detail & Related papers (2024-07-24T18:05:13Z) - COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation [1.5745692520785073]
We introduce a novel graph-based masked autoencoder (GMAE) for city-scale urban layout generation.
The method encodes attributed buildings, city blocks, communities and cities into a unified graph structure.
Our approach achieves good realism, semantic consistency, and correctness across the heterogeneous urban styles in 330 US cities.
arXiv Detail & Related papers (2024-07-16T00:49:53Z) - CityCraft: A Real Crafter for 3D City Generation [25.7885801163556]
CityCraft is an innovative framework designed to enhance both the diversity and quality of urban scene generation.
Our approach integrates three key stages: initially, a diffusion transformer (DiT) model is deployed to generate diverse and controllable 2D city layouts.
Based on the generated layout and city plan, we utilize the asset retrieval module and Blender for precise asset placement and scene construction.
arXiv Detail & Related papers (2024-06-07T14:49:00Z) - Urban Scene Diffusion through Semantic Occupancy Map [49.20779809250597]
UrbanDiffusion is a 3D diffusion model conditioned on a Bird's-Eye View (BEV) map.
Our model learns the data distribution of scene-level structures within a latent space.
After training on real-world driving datasets, our model can generate a wide range of diverse urban scenes.
arXiv Detail & Related papers (2024-03-18T11:54:35Z) - Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion [77.34078223594686]
We propose a novel architecture for direct 3D scene generation by introducing diffusion models into 3D sparse representations and combining them with neural rendering techniques.
Specifically, our approach generates texture colors at the point level for a given geometry using a 3D diffusion model first, which is then transformed into a scene representation in a feed-forward manner.
Experiments in two city-scale datasets show that our model demonstrates proficiency in generating photo-realistic street-view image sequences and cross-view urban scenes from satellite imagery.
arXiv Detail & Related papers (2024-01-19T16:15:37Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - CityDreamer: Compositional Generative Model of Unbounded 3D Cities [44.203932215464214]
CityDreamer is a compositional generative model designed specifically for unbounded 3D cities.
We adopt the bird's eye view scene representation and employ a volumetric render for both instance-oriented and stuff-oriented neural fields.
CityDreamer achieves state-of-the-art performance not only in generating realistic 3D cities but also in localized editing within the generated cities.
arXiv Detail & Related papers (2023-09-01T17:57:02Z) - 3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure
Prior [50.73148041205675]
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation.
We propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation.
Our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks.
arXiv Detail & Related papers (2020-03-31T09:33:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.