BuildingWorld: A Structured 3D Building Dataset for Urban Foundation Models
- URL: http://arxiv.org/abs/2511.06337v1
- Date: Sun, 09 Nov 2025 11:40:34 GMT
- Title: BuildingWorld: A Structured 3D Building Dataset for Urban Foundation Models
- Authors: Shangfeng Huang, Ruisheng Wang, Xin Wang,
- Abstract summary: BuildingWorld is a globally representative dataset for urban-scale foundation modeling and analysis.<n>It provides about five million LOD2 building models collected from diverse sources, accompanied by real and simulated airborne LiDAR point clouds.<n>Cyber City is a virtual city model to enable the generation of unlimited training data with customized and structurally diverse point cloud distributions.
- Score: 17.325315633493624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As digital twins become central to the transformation of modern cities, accurate and structured 3D building models emerge as a key enabler of high-fidelity, updatable urban representations. These models underpin diverse applications including energy modeling, urban planning, autonomous navigation, and real-time reasoning. Despite recent advances in 3D urban modeling, most learning-based models are trained on building datasets with limited architectural diversity, which significantly undermines their generalizability across heterogeneous urban environments. To address this limitation, we present BuildingWorld, a comprehensive and structured 3D building dataset designed to bridge the gap in stylistic diversity. It encompasses buildings from geographically and architecturally diverse regions -- including North America, Europe, Asia, Africa, and Oceania -- offering a globally representative dataset for urban-scale foundation modeling and analysis. Specifically, BuildingWorld provides about five million LOD2 building models collected from diverse sources, accompanied by real and simulated airborne LiDAR point clouds. This enables comprehensive research on 3D building reconstruction, detection and segmentation. Cyber City, a virtual city model, is introduced to enable the generation of unlimited training data with customized and structurally diverse point cloud distributions. Furthermore, we provide standardized evaluation metrics tailored for building reconstruction, aiming to facilitate the training, evaluation, and comparison of large-scale vision models and foundation models in structured 3D urban environments.
Related papers
- SAM 3D for 3D Object Reconstruction from Remote Sensing Images [3.893451853752809]
This paper presents the first systematic evaluation of SAM 3D, a general-purpose image-to-3D foundation model.<n> Experimental results demonstrate that SAM 3D produces more coherent roof geometry and sharper boundaries compared to TRELLIS.
arXiv Detail & Related papers (2025-12-27T03:47:39Z) - Sat2RealCity: Geometry-Aware and Appearance-Controllable 3D Urban Generation from Satellite Imagery [12.88788681361607]
Sat2RealCity is a geometry-aware and appearance-controllable framework for 3D urban generation from real-world satellite imagery.<n>We introduce the OSM-based spatial priors strategy to achieve interpretable geometric generation from spatial topology to building instances.<n>We construct an MLLM-powered semantic-guided generation pipeline, bridging semantic interpretation and geometric reconstruction.
arXiv Detail & Related papers (2025-11-14T16:42:03Z) - MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts [50.37005070020306]
MoRE is a dense 3D visual foundation model based on a Mixture-of-Experts (MoE) architecture.<n>MoRE incorporates a confidence-based depth refinement module that stabilizes and refines geometric estimation.<n>It integrates dense semantic features with globally aligned 3D backbone representations for high-fidelity surface normal prediction.
arXiv Detail & Related papers (2025-10-31T06:54:27Z) - SYNBUILD-3D: A large, multi-modal, and semantically rich synthetic dataset of 3D building models at Level of Detail 4 [1.3166179099143722]
We introduce SYNBUILD-3D, a large, diverse, and multi-modal dataset of over 6.2 million synthetic 3D residential buildings at Level of Detail (LoD) 4.<n>In the dataset, each building is represented through three distinct modalities: a semantically enriched 3D wireframe graph at LoD 4 (Modality I), the corresponding floor plan images (Modality II), and a LiDAR-like roof point cloud (Modality III)<n>The semantic annotations for each building wireframe are derived from the corresponding floor plan images and include information on rooms, doors, and windows.
arXiv Detail & Related papers (2025-08-28T19:11:01Z) - Sat2City: 3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion [18.943643720564996]
Sat2City is a novel framework that synergizes the representational capacity of sparse voxel grids with latent diffusion models.<n>We introduce a dataset of synthesized large-scale 3D cities paired with satellite-view height maps.<n>Our framework generates detailed 3D structures from a single satellite image, achieving superior fidelity compared to existing city generation models.
arXiv Detail & Related papers (2025-07-06T14:30:08Z) - TUM2TWIN: Introducing the Large-Scale Multimodal Urban Digital Twin Benchmark Dataset [90.97440987655084]
Urban Digital Twins (UDTs) have become essential for managing cities and integrating complex, heterogeneous data from diverse sources.<n>To address these challenges, we introduce the first comprehensive multimodal Urban Digital Twin benchmark dataset: TUM2TWIN.<n>This dataset includes georeferenced, semantically aligned 3D models and networks along with various terrestrial, mobile, aerial, and satellite observations boasting 32 data subsets over roughly 100,000 $m2$ and currently 767 GB of data.
arXiv Detail & Related papers (2025-05-12T09:48:32Z) - Aether: Geometric-Aware Unified World Modeling [49.33579903601599]
Aether is a unified framework that enables geometry-aware reasoning in world models.<n>Our framework achieves zero-shot generalization in both action following and reconstruction tasks.<n>We hope our work inspires the community to explore new frontiers in physically-reasonable world modeling.
arXiv Detail & Related papers (2025-03-24T17:59:51Z) - AerialGo: Walking-through City View Generation from Aerial Perspectives [48.53976414257845]
AerialGo is a framework that generates realistic walking-through city views from aerial images.<n>By conditioning ground-view synthesis on accessible aerial data, AerialGo bypasses the privacy risks inherent in ground-level imagery.<n>Experiments show that AerialGo significantly enhances ground-level realism and structural coherence.
arXiv Detail & Related papers (2024-11-29T08:14:07Z) - 3D-VLA: A 3D Vision-Language-Action Generative World Model [68.0388311799959]
Recent vision-language-action (VLA) models rely on 2D inputs, lacking integration with the broader realm of the 3D physical world.
We propose 3D-VLA by introducing a new family of embodied foundation models that seamlessly link 3D perception, reasoning, and action.
Our experiments on held-in datasets demonstrate that 3D-VLA significantly improves the reasoning, multimodal generation, and planning capabilities in embodied environments.
arXiv Detail & Related papers (2024-03-14T17:58:41Z) - Building3D: An Urban-Scale Dataset and Benchmarks for Learning Roof
Structures from Point Clouds [4.38301148531795]
Existing datasets for 3D modeling mainly focus on common objects such as furniture or cars.
We present a urban-scale dataset consisting of more than 160 thousands buildings along with corresponding point clouds, mesh and wire-frame models, covering 16 cities in Estonia about 998 Km2.
Experimental results indicate that Building3D has challenges of high intra-class variance, data imbalance and large-scale noises.
arXiv Detail & Related papers (2023-07-21T21:38:57Z) - UrbanBIS: a Large-scale Benchmark for Fine-grained Urban Building
Instance Segmentation [50.52615875873055]
UrbanBIS comprises six real urban scenes, with 2.5 billion points, covering a vast area of 10.78 square kilometers.
UrbanBIS provides semantic-level annotations on a rich set of urban objects, including buildings, vehicles, vegetation, roads, and bridges.
UrbanBIS is the first 3D dataset that introduces fine-grained building sub-categories.
arXiv Detail & Related papers (2023-05-04T08:01:38Z) - Elevation Estimation-Driven Building 3D Reconstruction from Single-View
Remote Sensing Imagery [20.001807614214922]
Building 3D reconstruction from remote sensing images has a wide range of applications in smart cities, photogrammetry and other fields.
We propose an efficient DSM estimation-driven reconstruction framework (Building3D) to reconstruct 3D building models from the input single-view remote sensing image.
Our Building3D is rooted in the SFFDE network for building elevation prediction, synchronized with a building extraction network for building masks, and then sequentially performs point cloud reconstruction, surface reconstruction (or CityGML model reconstruction)
arXiv Detail & Related papers (2023-01-11T17:20:30Z) - BuildingNet: Learning to Label 3D Buildings [19.641000866952815]
BuildingNet: (a) large-scale 3D building models whose exteriors consistently labeled, (b) a neural network that labels building analyzing and structural relations of their geometric primitives.
The dataset covers categories, such as houses, churches, skyscrapers, town halls and castles.
arXiv Detail & Related papers (2021-10-11T01:45:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.