ESP-Zero: Unsupervised enhancement of zero-shot classification for Extremely Sparse Point cloud
- URL: http://arxiv.org/abs/2404.19639v1
- Date: Tue, 30 Apr 2024 15:42:45 GMT
- Title: ESP-Zero: Unsupervised enhancement of zero-shot classification for Extremely Sparse Point cloud
- Authors: Jiayi Han, Zidi Cao, Weibo Zheng, Xiangguo Zhou, Xiangjian He, Yuanfang Zhang, Daisen Wei,
- Abstract summary: We propose an unsupervised model adaptation approach to enhance the point cloud encoder for the extremely sparse point clouds.
We propose a novel fused-cross attention layer that expands the pre-trained self-attention layer with additional learnable tokens and attention blocks.
We also propose a complementary learning-based self-distillation schema that encourages the modified features to be pulled apart from the irrelevant text embeddings.
- Score: 7.066196862701362
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In recent years, zero-shot learning has attracted the focus of many researchers, due to its flexibility and generality. Many approaches have been proposed to achieve the zero-shot classification of the point clouds for 3D object understanding, following the schema of CLIP. However, in the real world, the point clouds could be extremely sparse, dramatically limiting the effectiveness of the 3D point cloud encoders, and resulting in the misalignment of point cloud features and text embeddings. To the point cloud encoders to fit the extremely sparse point clouds without re-running the pre-training procedure which could be time-consuming and expensive, in this work, we propose an unsupervised model adaptation approach to enhance the point cloud encoder for the extremely sparse point clouds. We propose a novel fused-cross attention layer that expands the pre-trained self-attention layer with additional learnable tokens and attention blocks, which effectively modifies the point cloud features while maintaining the alignment between point cloud features and text embeddings. We also propose a complementary learning-based self-distillation schema that encourages the modified features to be pulled apart from the irrelevant text embeddings without overfitting the feature space to the observed text embeddings. Extensive experiments demonstrate that the proposed approach effectively increases the zero-shot capability on extremely sparse point clouds, and overwhelms other state-of-the-art model adaptation approaches.
Related papers
- Zero-shot Point Cloud Completion Via 2D Priors [52.72867922938023]
3D point cloud completion is designed to recover complete shapes from partially observed point clouds.
We propose a zero-shot framework aimed at completing partially observed point clouds across any unseen categories.
arXiv Detail & Related papers (2024-04-10T08:02:17Z) - PRED: Pre-training via Semantic Rendering on LiDAR Point Clouds [18.840000859663153]
We propose PRED, a novel image-assisted pre-training framework for outdoor point clouds.
The main ingredient of our framework is a Birds-Eye-View (BEV) feature map conditioned semantic rendering.
We further enhance our model's performance by incorporating point-wise masking with a high mask ratio.
arXiv Detail & Related papers (2023-11-08T07:26:09Z) - SDFReg: Learning Signed Distance Functions for Point Cloud Registration [8.465771798353904]
We propose a novel point cloud registration framework for imperfect point clouds.
We replace the problem of rigid registration between point clouds with a registration problem between the point cloud and the neural implicit function.
Our method showcases remarkable robustness in the face of challenges such as noise, incompleteness, and density changes of point clouds.
arXiv Detail & Related papers (2023-04-18T12:14:20Z) - EPCL: Frozen CLIP Transformer is An Efficient Point Cloud Encoder [60.52613206271329]
This paper introduces textbfEfficient textbfPoint textbfCloud textbfLearning (EPCL) for training high-quality point cloud models with a frozen CLIP transformer.
Our EPCL connects the 2D and 3D modalities by semantically aligning the image features and point cloud features without paired 2D-3D data.
arXiv Detail & Related papers (2022-12-08T06:27:11Z) - Data Augmentation-free Unsupervised Learning for 3D Point Cloud
Understanding [61.30276576646909]
We propose an augmentation-free unsupervised approach for point clouds to learn transferable point-level features via soft clustering, named SoftClu.
We exploit the affiliation of points to their clusters as a proxy to enable self-training through a pseudo-label prediction task.
arXiv Detail & Related papers (2022-10-06T10:18:16Z) - Towards Robust 3D Object Recognition with Dense-to-Sparse Deep Domain
Adaptation [5.763876449960417]
Three-dimensional (3D) object recognition is crucial for intelligent autonomous agents.
Most state-of-art approaches rely on relatively dense point clouds and performance drops significantly for sparse point clouds.
Unsupervised domain adaption allows to minimise the discrepancy between dense and sparse point clouds.
arXiv Detail & Related papers (2022-05-07T13:42:43Z) - Learning a Structured Latent Space for Unsupervised Point Cloud
Completion [48.79411151132766]
We propose a novel framework, which learns a unified and structured latent space that encoding both partial and complete point clouds.
Our proposed method consistently outperforms state-of-the-art unsupervised methods on both synthetic ShapeNet and real-world KITTI, ScanNet, and Matterport3D datasets.
arXiv Detail & Related papers (2022-03-29T13:58:44Z) - PointAttN: You Only Need Attention for Point Cloud Completion [89.88766317412052]
Point cloud completion refers to completing 3D shapes from partial 3D point clouds.
We propose a novel neural network for processing point cloud in a per-point manner to eliminate kNNs.
The proposed framework, namely PointAttN, is simple, neat and effective, which can precisely capture the structural information of 3D shapes.
arXiv Detail & Related papers (2022-03-16T09:20:01Z) - SSPU-Net: Self-Supervised Point Cloud Upsampling via Differentiable
Rendering [21.563862632172363]
We propose a self-supervised point cloud upsampling network (SSPU-Net) to generate dense point clouds without using ground truth.
To achieve this, we exploit the consistency between the input sparse point cloud and generated dense point cloud for the shapes and rendered images.
arXiv Detail & Related papers (2021-08-01T13:26:01Z) - Pseudo-LiDAR Point Cloud Interpolation Based on 3D Motion Representation
and Spatial Supervision [68.35777836993212]
We propose a Pseudo-LiDAR point cloud network to generate temporally and spatially high-quality point cloud sequences.
By exploiting the scene flow between point clouds, the proposed network is able to learn a more accurate representation of the 3D spatial motion relationship.
arXiv Detail & Related papers (2020-06-20T03:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.