Open-world Point Cloud Semantic Segmentation: A Human-in-the-loop Framework
- URL: http://arxiv.org/abs/2508.04962v1
- Date: Thu, 07 Aug 2025 01:20:41 GMT
- Title: Open-world Point Cloud Semantic Segmentation: A Human-in-the-loop Framework
- Authors: Peng Zhang, Songru Yang, Jinsheng Sun, Weiqing Li, Zhiyong Su,
- Abstract summary: Open-world point cloud semantic segmentation (OW-Seg) aims to predict point labels of both base and novel classes in real-world scenarios.<n>We propose HOW-Seg, the first human-in-the-loop framework for OW-Seg.<n>By leveraging sparse human annotations as guidance, HOW-Seg enables prototype-based segmentation for both base and novel classes.
- Score: 8.451270206964534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-world point cloud semantic segmentation (OW-Seg) aims to predict point labels of both base and novel classes in real-world scenarios. However, existing methods rely on resource-intensive offline incremental learning or densely annotated support data, limiting their practicality. To address these limitations, we propose HOW-Seg, the first human-in-the-loop framework for OW-Seg. Specifically, we construct class prototypes, the fundamental segmentation units, directly on the query data, avoiding the prototype bias caused by intra-class distribution shifts between the support and query data. By leveraging sparse human annotations as guidance, HOW-Seg enables prototype-based segmentation for both base and novel classes. Considering the lack of granularity of initial prototypes, we introduce a hierarchical prototype disambiguation mechanism to refine ambiguous prototypes, which correspond to annotations of different classes. To further enrich contextual awareness, we employ a dense conditional random field (CRF) upon the refined prototypes to optimize their label assignments. Through iterative human feedback, HOW-Seg dynamically improves its predictions, achieving high-quality segmentation for both base and novel classes. Experiments demonstrate that with sparse annotations (e.g., one-novel-class-one-click), HOW-Seg matches or surpasses the state-of-the-art generalized few-shot segmentation (GFS-Seg) method under the 5-shot setting. When using advanced backbones (e.g., Stratified Transformer) and denser annotations (e.g., 10 clicks per sub-scene), HOW-Seg achieves 85.27% mIoU on S3DIS and 66.37% mIoU on ScanNetv2, significantly outperforming alternatives.
Related papers
- Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation [23.250178208474928]
This paper introduces a novel method, namely subspace prototype guidance (textbfSPG) to guide the training of segmentation network.
The proposed method significantly improves the segmentation performance and surpasses the state-of-the-art method.
arXiv Detail & Related papers (2024-08-20T04:31:46Z) - Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery [23.359450657842686]
Generalized Class Discovery (GCD) aims to dynamically assign labels to unlabelled data partially based on knowledge learned from labelled data.
We propose an adaptive probing mechanism that introduces learnable potential prototypes to expand cluster prototypes.
Our method surpasses the nearest competitor by a significant margin of 9.7% within the Stanford Cars dataset.
arXiv Detail & Related papers (2024-04-13T12:41:40Z) - Training-Free Semantic Segmentation via LLM-Supervision [37.9007813884699]
This paper introduces a new approach to text-supervised semantic segmentation using supervision by a large language model (LLM)
Our method starts from an LLM to generate a detailed set of subclasses for more accurate class representation.
We then employ an advanced text-supervised semantic segmentation model to apply the generated subclasses as target labels.
arXiv Detail & Related papers (2024-03-31T14:37:25Z) - Rethinking Few-shot 3D Point Cloud Semantic Segmentation [62.80639841429669]
This paper revisits few-shot 3D point cloud semantic segmentation (FS-PCS)
We focus on two significant issues in the state-of-the-art: foreground leakage and sparse point distribution.
To address these issues, we introduce a standardized FS-PCS setting, upon which a new benchmark is built.
arXiv Detail & Related papers (2024-03-01T15:14:47Z) - Boosting Few-shot 3D Point Cloud Segmentation via Query-Guided
Enhancement [30.017448714419455]
This paper proposes a novel approach to improve point cloud few-shot segmentation (PC-FSS) models.
Unlike existing PC-FSS methods that directly utilize categorical information from support prototypes to recognize novel classes in query samples, our method identifies two critical aspects that substantially enhance model performance.
arXiv Detail & Related papers (2023-08-06T18:07:45Z) - Weakly Supervised 3D Point Cloud Segmentation via Multi-Prototype
Learning [37.76664203157892]
A fundamental challenge here lies in the large intra-class variations of local geometric structure, resulting in subclasses within a semantic class.
We leverage this intuition and opt for maintaining an individual classifier for each subclass.
Our hypothesis is also verified given the consistent discovery of semantic subclasses at no cost of additional annotations.
arXiv Detail & Related papers (2022-05-06T11:07:36Z) - BMD: A General Class-balanced Multicentric Dynamic Prototype Strategy
for Source-free Domain Adaptation [74.93176783541332]
Source-free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to the unlabeled target domain without accessing the well-labeled source data.
To make up for the absence of source data, most existing methods introduced feature prototype based pseudo-labeling strategies.
We propose a general class-Balanced Multicentric Dynamic prototype strategy for the SFDA task.
arXiv Detail & Related papers (2022-04-06T13:23:02Z) - APANet: Adaptive Prototypes Alignment Network for Few-Shot Semantic
Segmentation [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a given query image with only a few labeled support images.
Most advanced solutions exploit a metric learning framework that performs segmentation through matching each query feature to a learned class-specific prototype.
We present an adaptive prototype representation by introducing class-specific and class-agnostic prototypes.
arXiv Detail & Related papers (2021-11-24T04:38:37Z) - Dual Prototypical Contrastive Learning for Few-shot Semantic
Segmentation [55.339405417090084]
We propose a dual prototypical contrastive learning approach tailored to the few-shot semantic segmentation (FSS) task.
The main idea is to encourage the prototypes more discriminative by increasing inter-class distance while reducing intra-class distance in prototype feature space.
We demonstrate that the proposed dual contrastive learning approach outperforms state-of-the-art FSS methods on PASCAL-5i and COCO-20i datasets.
arXiv Detail & Related papers (2021-11-09T08:14:50Z) - Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results.
Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples.
Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z) - Part-aware Prototype Network for Few-shot Semantic Segmentation [50.581647306020095]
We propose a novel few-shot semantic segmentation framework based on the prototype representation.
Our key idea is to decompose the holistic class representation into a set of part-aware prototypes.
We develop a novel graph neural network model to generate and enhance the proposed part-aware prototypes.
arXiv Detail & Related papers (2020-07-13T11:03:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.