CitySeg: A 3D Open Vocabulary Semantic Segmentation Foundation Model in City-scale Scenarios
- URL: http://arxiv.org/abs/2508.09470v1
- Date: Wed, 13 Aug 2025 03:55:56 GMT
- Title: CitySeg: A 3D Open Vocabulary Semantic Segmentation Foundation Model in City-scale Scenarios
- Authors: Jialei Xu, Zizhuang Wei, Weikang You, Linyun Li, Weijian Sun,
- Abstract summary: CitySeg is a foundation model for city-scale point cloud semantic segmentation.<n>It incorporates text modality to achieve open vocabulary segmentation and zero-shot inference.<n>For the first time, CitySeg enables zero-shot generalization in city-scale point cloud scenarios.
- Score: 3.195397940217441
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic segmentation of city-scale point clouds is a critical technology for Unmanned Aerial Vehicle (UAV) perception systems, enabling the classification of 3D points without relying on any visual information to achieve comprehensive 3D understanding. However, existing models are frequently constrained by the limited scale of 3D data and the domain gap between datasets, which lead to reduced generalization capability. To address these challenges, we propose CitySeg, a foundation model for city-scale point cloud semantic segmentation that incorporates text modality to achieve open vocabulary segmentation and zero-shot inference. Specifically, in order to mitigate the issue of non-uniform data distribution across multiple domains, we customize the data preprocessing rules, and propose a local-global cross-attention network to enhance the perception capabilities of point networks in UAV scenarios. To resolve semantic label discrepancies across datasets, we introduce a hierarchical classification strategy. A hierarchical graph established according to the data annotation rules consolidates the data labels, and the graph encoder is used to model the hierarchical relationships between categories. In addition, we propose a two-stage training strategy and employ hinge loss to increase the feature separability of subcategories. Experimental results demonstrate that the proposed CitySeg achieves state-of-the-art (SOTA) performance on nine closed-set benchmarks, significantly outperforming existing approaches. Moreover, for the first time, CitySeg enables zero-shot generalization in city-scale point cloud scenarios without relying on visual information.
Related papers
- OpenUrban3D: Annotation-Free Open-Vocabulary Semantic Segmentation of Large-Scale Urban Point Clouds [23.982606719607702]
We present OpenUrban3D, the first 3D open-vocabulary semantic segmentation framework for large-scale urban scenes.<n>Our approach generates robust semantic features directly from raw point clouds through multi-view, multi-granularity rendering, mask-level vision-language feature extraction, and sample-balanced fusion.<n>This design enables zero-shot segmentation for arbitrary text queries while capturing both semantic richness and geometric priors.
arXiv Detail & Related papers (2025-09-13T15:03:28Z) - Graph-Guided Dual-Level Augmentation for 3D Scene Segmentation [21.553363236403822]
3D point cloud segmentation aims to assign semantic labels to individual points in a scene for fine-grained spatial understanding.<n>Existing methods typically adopt data augmentation to alleviate the burden of large-scale annotation.<n>We propose a graph-guided data augmentation framework with dual-level constraints for realistic 3D scene synthesis.
arXiv Detail & Related papers (2025-07-30T13:25:36Z) - A Data-efficient Framework for Robotics Large-scale LiDAR Scene Parsing [10.497309421830671]
Existing state-of-the-art 3D point clouds understanding methods only perform well in a fully supervised manner.
This work presents a general and simple framework to tackle point clouds understanding when labels are limited.
arXiv Detail & Related papers (2023-12-03T02:38:51Z) - Dual Adaptive Transformations for Weakly Supervised Point Cloud
Segmentation [78.6612285236938]
We propose a novel DAT (textbfDual textbfAdaptive textbfTransformations) model for weakly supervised point cloud segmentation.
We evaluate our proposed DAT model with two popular backbones on the large-scale S3DIS and ScanNet-V2 datasets.
arXiv Detail & Related papers (2022-07-19T05:43:14Z) - SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation [94.11915008006483]
We propose SemAffiNet for point cloud semantic segmentation.
We conduct extensive experiments on the ScanNetV2 and NYUv2 datasets.
arXiv Detail & Related papers (2022-05-26T17:00:23Z) - Semantic Segmentation for Point Cloud Scenes via Dilated Graph Feature
Aggregation and Pyramid Decoders [15.860648472852597]
We propose a graph convolutional network DGFA-Net rooted in dilated graph feature aggregation (DGFA)
Experiments on S3DIS, ShapeNetPart and Toronto-3D show that DGFA-Net outperforms the baseline approach.
arXiv Detail & Related papers (2022-04-11T08:41:01Z) - 3D Spatial Recognition without Spatially Labeled 3D [127.6254240158249]
We introduce WyPR, a Weakly-supervised framework for Point cloud Recognition.
We show that WyPR can detect and segment objects in point cloud data without access to any spatial labels at training time.
arXiv Detail & Related papers (2021-05-13T17:58:07Z) - PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object
Detection [57.49788100647103]
LiDAR-based 3D object detection is an important task for autonomous driving.
Current approaches suffer from sparse and partial point clouds of distant and occluded objects.
In this paper, we propose a novel two-stage approach, namely PC-RGNN, dealing with such challenges by two specific solutions.
arXiv Detail & Related papers (2020-12-18T18:06:43Z) - Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset,
Benchmarks and Challenges [52.624157840253204]
We present an urban-scale photogrammetric point cloud dataset with nearly three billion richly annotated points.
Our dataset consists of large areas from three UK cities, covering about 7.6 km2 of the city landscape.
We evaluate the performance of state-of-the-art algorithms on our dataset and provide a comprehensive analysis of the results.
arXiv Detail & Related papers (2020-09-07T14:47:07Z) - Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical
Understanding of Outdoor Scene [76.4183572058063]
We present a richly-annotated 3D point cloud dataset for multiple outdoor scene understanding tasks.
The dataset has been point-wisely annotated with both hierarchical and instance-based labels.
We formulate a hierarchical learning problem for 3D point cloud segmentation and propose a measurement evaluating consistency across various hierarchies.
arXiv Detail & Related papers (2020-08-11T19:10:32Z) - Weakly Supervised Semantic Segmentation in 3D Graph-Structured Point
Clouds of Wild Scenes [36.07733308424772]
The deficiency of 3D segmentation labels is one of the main obstacles to effective point cloud segmentation.
We propose a novel deep graph convolutional network-based framework for large-scale semantic scene segmentation in point clouds with sole 2D supervision.
arXiv Detail & Related papers (2020-04-26T23:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.