Prototype Adaption and Projection for Few- and Zero-shot 3D Point Cloud
Semantic Segmentation
- URL: http://arxiv.org/abs/2305.14335v1
- Date: Tue, 23 May 2023 17:58:05 GMT
- Title: Prototype Adaption and Projection for Few- and Zero-shot 3D Point Cloud
Semantic Segmentation
- Authors: Shuting He, Xudong Jiang, Wei Jiang, Henghui Ding
- Abstract summary: We address the challenging task of few-shot and zero-shot 3D point cloud semantic segmentation.
Our proposed method surpasses state-of-the-art algorithms by a considerable 7.90% and 14.82% under the 2-way 1-shot setting on S3DIS and ScanNet benchmarks, respectively.
- Score: 30.18333233940194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we address the challenging task of few-shot and zero-shot 3D
point cloud semantic segmentation. The success of few-shot semantic
segmentation in 2D computer vision is mainly driven by the pre-training on
large-scale datasets like imagenet. The feature extractor pre-trained on
large-scale 2D datasets greatly helps the 2D few-shot learning. However, the
development of 3D deep learning is hindered by the limited volume and instance
modality of datasets due to the significant cost of 3D data collection and
annotation. This results in less representative features and large intra-class
feature variation for few-shot 3D point cloud segmentation. As a consequence,
directly extending existing popular prototypical methods of 2D few-shot
classification/segmentation into 3D point cloud segmentation won't work as well
as in 2D domain. To address this issue, we propose a Query-Guided Prototype
Adaption (QGPA) module to adapt the prototype from support point clouds feature
space to query point clouds feature space. With such prototype adaption, we
greatly alleviate the issue of large feature intra-class variation in point
cloud and significantly improve the performance of few-shot 3D segmentation.
Besides, to enhance the representation of prototypes, we introduce a
Self-Reconstruction (SR) module that enables prototype to reconstruct the
support mask as well as possible. Moreover, we further consider zero-shot 3D
point cloud semantic segmentation where there is no support sample. To this
end, we introduce category words as semantic information and propose a
semantic-visual projection model to bridge the semantic and visual spaces. Our
proposed method surpasses state-of-the-art algorithms by a considerable 7.90%
and 14.82% under the 2-way 1-shot setting on S3DIS and ScanNet benchmarks,
respectively. Code is available at https://github.com/heshuting555/PAP-FZS3D.
Related papers
- Robust 3D Point Clouds Classification based on Declarative Defenders [18.51700931775295]
3D point clouds are unstructured and sparse, while 2D images are structured and dense.
In this paper, we explore three distinct algorithms for mapping 3D point clouds into 2D images.
The proposed approaches demonstrate superior accuracy and robustness against adversarial attacks.
arXiv Detail & Related papers (2024-10-13T01:32:38Z) - DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - 2D-3D Interlaced Transformer for Point Cloud Segmentation with
Scene-Level Supervision [36.282611420496416]
We propose a transformer model with two encoders and one decoder for weakly supervised point cloud segmentation.
The decoder implements 2D-3D cross-attention and carries out implicit 2D and 3D feature fusion.
Experiments show that it performs favorably against existing weakly supervised point cloud segmentation methods.
arXiv Detail & Related papers (2023-10-19T15:12:44Z) - Variational Relational Point Completion Network for Robust 3D
Classification [59.80993960827833]
Vari point cloud completion methods tend to generate global shape skeletons hence lack fine local details.
This paper proposes a variational framework, point Completion Network (VRCNet) with two appealing properties.
VRCNet shows great generalizability and robustness on real-world point cloud scans.
arXiv Detail & Related papers (2023-04-18T17:03:20Z) - Few-Shot 3D Point Cloud Semantic Segmentation via Stratified
Class-Specific Attention Based Transformer Network [22.9434434107516]
We develop a new multi-layer transformer network for few-shot point cloud semantic segmentation.
Our method achieves the new state-of-the-art performance, with 15% less inference time, over existing few-shot 3D point cloud segmentation models.
arXiv Detail & Related papers (2023-03-28T00:27:54Z) - PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained
Image-Language Models [56.324516906160234]
Generalizable 3D part segmentation is important but challenging in vision and robotics.
This paper explores an alternative way for low-shot part segmentation of 3D point clouds by leveraging a pretrained image-language model, GLIP.
We transfer the rich knowledge from 2D to 3D through GLIP-based part detection on point cloud rendering and a novel 2D-to-3D label lifting algorithm.
arXiv Detail & Related papers (2022-12-03T06:59:01Z) - Interactive Object Segmentation in 3D Point Clouds [27.88495480980352]
We present an interactive 3D object segmentation method in which the user interacts directly with the 3D point cloud.
Our model does not require training data from the target domain.
It performs well on several other datasets with different data characteristics as well as different object classes.
arXiv Detail & Related papers (2022-04-14T18:31:59Z) - Voint Cloud: Multi-View Point Cloud Representation for 3D Understanding [80.04281842702294]
We introduce the concept of the multi-view point cloud (Voint cloud) representing each 3D point as a set of features extracted from several view-points.
This novel 3D Voint cloud representation combines the compactness of 3D point cloud representation with the natural view-awareness of multi-view representation.
We deploy a Voint neural network (VointNet) with a theoretically established functional form to learn representations in the Voint space.
arXiv Detail & Related papers (2021-11-30T13:08:19Z) - ParaNet: Deep Regular Representation for 3D Point Clouds [62.81379889095186]
ParaNet is a novel end-to-end deep learning framework for representing 3D point clouds.
It converts an irregular 3D point cloud into a regular 2D color image, named point geometry image (PGI)
In contrast to conventional regular representation modalities based on multi-view projection and voxelization, the proposed representation is differentiable and reversible.
arXiv Detail & Related papers (2020-12-05T13:19:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.