Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
- URL: http://arxiv.org/abs/2503.16282v1
- Date: Thu, 20 Mar 2025 16:10:33 GMT
- Title: Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
- Authors: Zhaochong An, Guolei Sun, Yun Liu, Runjia Li, Junlin Han, Ender Konukoglu, Serge Belongie,
- Abstract summary: Generalized few-shot 3D point cloud segmentation (GFS-PCS) adapts models to new classes with few support samples while retaining base class segmentation.<n>We introduce a GFS-PCS framework that synergizes dense but noisy pseudo-labels with precise yet sparse few-shot samples to maximize the strengths of both.
- Score: 25.666715057529935
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Generalized few-shot 3D point cloud segmentation (GFS-PCS) adapts models to new classes with few support samples while retaining base class segmentation. Existing GFS-PCS methods enhance prototypes via interacting with support or query features but remain limited by sparse knowledge from few-shot samples. Meanwhile, 3D vision-language models (3D VLMs), generalizing across open-world novel classes, contain rich but noisy novel class knowledge. In this work, we introduce a GFS-PCS framework that synergizes dense but noisy pseudo-labels from 3D VLMs with precise yet sparse few-shot samples to maximize the strengths of both, named GFS-VL. Specifically, we present a prototype-guided pseudo-label selection to filter low-quality regions, followed by an adaptive infilling strategy that combines knowledge from pseudo-label contexts and few-shot samples to adaptively label the filtered, unlabeled areas. Additionally, we design a novel-base mix strategy to embed few-shot samples into training scenes, preserving essential context for improved novel class learning. Moreover, recognizing the limited diversity in current GFS-PCS benchmarks, we introduce two challenging benchmarks with diverse novel classes for comprehensive generalization evaluation. Experiments validate the effectiveness of our framework across models and datasets. Our approach and benchmarks provide a solid foundation for advancing GFS-PCS in the real world. The code is at https://github.com/ZhaochongAn/GFS-VL
Related papers
- From Dataset to Real-world: General 3D Object Detection via Generalized Cross-domain Few-shot Learning [13.282416396765392]
We introduce the first generalized cross-domain few-shot (GCFS) task in 3D object detection.<n>Our solution integrates multi-modal fusion and contrastive-enhanced prototype learning within one framework.<n>To effectively capture domain-specific representations for each class from limited target data, we propose a contrastive-enhanced prototype learning.
arXiv Detail & Related papers (2025-03-08T17:05:21Z) - LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging [65.72891334156706]
We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the problem of multi-label few-shot classification.<n> LC-Protonets generate one prototype per label combination, derived from the power set of labels present in the limited training items.<n>Our method is applied to automatic audio tagging across diverse music datasets, covering various cultures and including both modern and traditional music.
arXiv Detail & Related papers (2024-09-17T15:13:07Z) - Rethinking Few-shot 3D Point Cloud Semantic Segmentation [62.80639841429669]
This paper revisits few-shot 3D point cloud semantic segmentation (FS-PCS)
We focus on two significant issues in the state-of-the-art: foreground leakage and sparse point distribution.
To address these issues, we introduce a standardized FS-PCS setting, upon which a new benchmark is built.
arXiv Detail & Related papers (2024-03-01T15:14:47Z) - Generalized Robot 3D Vision-Language Model with Fast Rendering and Pre-Training Vision-Language Alignment [55.11291053011696]
This work presents a framework for dealing with 3D scene understanding when the labeled scenes are quite limited.
To extract knowledge for novel categories from the pre-trained vision-language models, we propose a hierarchical feature-aligned pre-training and knowledge distillation strategy.
In the limited reconstruction case, our proposed approach, termed WS3D++, ranks 1st on the large-scale ScanNet benchmark.
arXiv Detail & Related papers (2023-12-01T15:47:04Z) - Prototypical VoteNet for Few-Shot 3D Point Cloud Object Detection [37.48935478836176]
Prototypical VoteNet is a few-shot 3D point cloud object detection approach.
It incorporates two new modules: Prototypical Vote Module (PVM) and Prototypical Head Module (PHM)
arXiv Detail & Related papers (2022-10-11T16:25:38Z) - Novel Class Discovery in Semantic Segmentation [104.30729847367104]
We introduce a new setting of Novel Class Discovery in Semantic (NCDSS)
It aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes.
In NCDSS, we need to distinguish the objects and background, and to handle the existence of multiple classes within an image.
We propose the Entropy-based Uncertainty Modeling and Self-training (EUMS) framework to overcome noisy pseudo-labels.
arXiv Detail & Related papers (2021-12-03T13:31:59Z) - Segmenting 3D Hybrid Scenes via Zero-Shot Learning [13.161136148641813]
This work is to tackle the problem of point cloud semantic segmentation for 3D hybrid scenes under the framework of zero-shot learning.
We propose a network to synthesize point features for various classes of objects by leveraging the semantic features of both seen and unseen object classes, called PFNet.
The proposed PFNet employs a GAN architecture to synthesize point features, where the semantic relationship between seen-class and unseen-class features is consolidated by adapting a new semantic regularizer.
We introduce two benchmarks for algorithmic evaluation by re-organizing the public S3DIS and ScanNet datasets under six different data splits.
arXiv Detail & Related papers (2021-07-01T13:21:49Z) - Generative Multi-Label Zero-Shot Learning [136.17594611722285]
Multi-label zero-shot learning strives to classify images into multiple unseen categories for which no data is available during training.
Our work is the first to tackle the problem of multi-label feature in the (generalized) zero-shot setting.
Our cross-level fusion-based generative approach outperforms the state-of-the-art on all three datasets.
arXiv Detail & Related papers (2021-01-27T18:56:46Z) - UniT: Unified Knowledge Transfer for Any-shot Object Detection and
Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training.
We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.