GBA-UBF : A Large-Scale and Fine-Grained Building Function Classification Dataset in the Greater Bay Area
- URL: http://arxiv.org/abs/2510.08921v1
- Date: Fri, 10 Oct 2025 02:09:16 GMT
- Title: GBA-UBF : A Large-Scale and Fine-Grained Building Function Classification Dataset in the Greater Bay Area
- Authors: Chunsong Chen, Yichen Hou, Huan Chen, Junlin Li, Rong Fu, Qiushen Lai, Yiping Chen, Ting Han,
- Abstract summary: Rapid urbanization in the Guangdong-Hong Kong-Macao Greater Bay Area has created urgent demand for high-resolution, building-level functional data.<n>We present a large-scale, fine-grained dataset that assigns one of five functional categories to nearly four million buildings.
- Score: 16.25561105858139
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rapid urbanization in the Guangdong-Hong Kong-Macao Greater Bay Area (GBA) has created urgent demand for high-resolution, building-level functional data to support sustainable spatial planning. Existing land use datasets suffer from coarse granularity and difficulty in capturing intra-block heterogeneity. To this end, we present the Greater Bay Area Urban Building Function Dataset (GBA-UBF), a large-scale, fine-grained dataset that assigns one of five functional categories to nearly four million buildings across six core GBA cities. We proposed a Multi-level Building Function Optimization (ML-BFO) method by integrating Points of Interest (POI) records and building footprints through a three-stage pipeline: (1) candidate label generation using spatial overlay with proximity weighting, (2) iterative refinement based on neighborhood label autocorrelation, and (3) function-related correction informed by High-level POI buffers. To quantitatively validate results, we design the Building Function Matching Index (BFMI), which jointly measures categorical consistency and distributional similarity against POI-derived probability heatmaps. Comparative experiments demonstrate that GBA-UBF achieves significantly higher accuracy, with a BMFI of 0.58. This value markedly exceeds that of the baseline dataset and exhibits superior alignment with urban activity patterns. Field validation further confirms the dataset's semantic reliability and practical interpretability. The GBA-UBF dataset establishes a reproducible framework for building-level functional classification, bridging the gap between coarse land use maps and fine-grained urban analytics. The dataset is accessible at https://github.com/chenchs0629/GBA-UBF, and the data will undergo continuous improvement and updates based on feedback from the community.
Related papers
- Glocal Information Bottleneck for Time Series Imputation [70.41814118117311]
Time Series Imputation aims to recover missing values in temporal data.<n>Existing models typically optimize the point-wise reconstruction loss, focusing on recovering numerical values (local information)<n>We propose a new training paradigm, Glocal Information Bottleneck (Glocal-IB)
arXiv Detail & Related papers (2025-10-06T15:24:44Z) - PlaceFM: A Training-free Geospatial Foundation Model of Places using Large-Scale Point of Interest Data [0.5735035463793009]
PlaceFM captures place representations through a training-free, clustering-based approach.<n>placeFM summarizes the entire point of interest graph constructed from U.S. Foursquare data.<n>placeFM produces general-purpose region embeddings while automatically identifying places of interest.<n>placeFM achieves up to a 100x speedup in generating region-level representations on large-scale POI graphs.
arXiv Detail & Related papers (2025-06-25T15:10:31Z) - Enriching Location Representation with Detailed Semantic Information [0.6554326244334866]
CaLLiPer+ is an extension of the CaLLiPer model that integrates Point-of-Interest (POI) names alongside categorical labels.<n>We evaluate its effectiveness on two downstream tasks, land use classification and socioeconomic status distribution mapping.
arXiv Detail & Related papers (2025-06-03T11:06:51Z) - Fine-Grained Building Function Recognition from Street-View Images via Geometry-Aware Semi-Supervised Learning [18.432786227782803]
We propose a geometry-aware semi-supervised framework for fine-grained building function recognition.
We use geometric relationships among multi-source data to enhance pseudo-label accuracy in semi-supervised learning.
Our proposed framework exhibits superior performance in fine-grained functional recognition of buildings.
arXiv Detail & Related papers (2024-08-18T12:48:48Z) - FRACTAL: An Ultra-Large-Scale Aerial Lidar Dataset for 3D Semantic Segmentation of Diverse Landscapes [0.0]
We present an ultra-large-scale aerial Lidar dataset made of 100,000 dense point clouds with high quality labels for 7 semantic classes.
We describe the data collection, annotation, and curation process of the dataset.
We provide baseline semantic segmentation results using a state of the art 3D point cloud classification model.
arXiv Detail & Related papers (2024-05-07T19:37:22Z) - Trust your Good Friends: Source-free Domain Adaptation by Reciprocal
Neighborhood Clustering [50.46892302138662]
We address the source-free domain adaptation problem, where the source pretrained model is adapted to the target domain in the absence of source data.
Our method is based on the observation that target data, which might not align with the source domain classifier, still forms clear clusters.
We demonstrate that this local structure can be efficiently captured by considering the local neighbors, the reciprocal neighbors, and the expanded neighborhood.
arXiv Detail & Related papers (2023-09-01T15:31:18Z) - Semi-supervised Learning from Street-View Images and OpenStreetMap for
Automatic Building Height Estimation [59.6553058160943]
We propose a semi-supervised learning (SSL) method of automatically estimating building height from Mapillary SVI and OpenStreetMap data.
The proposed method leads to a clear performance boosting in estimating building heights with a Mean Absolute Error (MAE) around 2.1 meters.
The preliminary result is promising and motivates our future work in scaling up the proposed method based on low-cost VGI data.
arXiv Detail & Related papers (2023-07-05T18:16:30Z) - LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting [65.71129509623587]
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning.
However, the promising results achieved on current public datasets may not be applicable to practical scenarios.
We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
arXiv Detail & Related papers (2023-06-14T05:48:36Z) - Robust Self-Tuning Data Association for Geo-Referencing Using Lane Markings [44.4879068879732]
This paper presents a complete pipeline for resolving ambiguities during the data association.
Its core is a robust self-tuning data association that adapts the search area depending on the entropy of the measurements.
We evaluate our method on real data from urban and rural scenarios around the city of Karlsruhe in Germany.
arXiv Detail & Related papers (2022-07-28T12:29:39Z) - DataPerf: Benchmarks for Data-Centric AI Development [81.03754002516862]
DataPerf is a community-led benchmark suite for evaluating ML datasets and data-centric algorithms.
We provide an open, online platform with multiple rounds of challenges to support this iterative development.
The benchmarks, online evaluation platform, and baseline implementations are open source.
arXiv Detail & Related papers (2022-07-20T17:47:54Z) - Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset,
Benchmarks and Challenges [52.624157840253204]
We present an urban-scale photogrammetric point cloud dataset with nearly three billion richly annotated points.
Our dataset consists of large areas from three UK cities, covering about 7.6 km2 of the city landscape.
We evaluate the performance of state-of-the-art algorithms on our dataset and provide a comprehensive analysis of the results.
arXiv Detail & Related papers (2020-09-07T14:47:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.