Triple Point Masking
- URL: http://arxiv.org/abs/2409.17547v2
- Date: Tue, 15 Oct 2024 04:00:03 GMT
- Title: Triple Point Masking
- Authors: Jiaming Liu, Linghe Kong, Yue Wu, Maoguo Gong, Hao Li, Qiguang Miao, Wenping Ma, Can Qin,
- Abstract summary: Existing 3D mask learning methods encounter performance bottlenecks under limited data.
We introduce a triple point masking scheme, named TPM, which serves as a scalable framework for pre-training of masked autoencoders.
Extensive experiments show that the four baselines equipped with the proposed TPM achieve comprehensive performance improvements on various downstream tasks.
- Score: 49.39218611030084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing 3D mask learning methods encounter performance bottlenecks under limited data, and our objective is to overcome this limitation. In this paper, we introduce a triple point masking scheme, named TPM, which serves as a scalable framework for pre-training of masked autoencoders to achieve multi-mask learning for 3D point clouds. Specifically, we augment the baselines with two additional mask choices (i.e., medium mask and low mask) as our core insight is that the recovery process of an object can manifest in diverse ways. Previous high-masking schemes focus on capturing the global representation but lack the fine-grained recovery capability, so that the generated pre-trained weights tend to play a limited role in the fine-tuning process. With the support of the proposed TPM, available methods can exhibit more flexible and accurate completion capabilities, enabling the potential autoencoder in the pre-training stage to consider multiple representations of a single 3D object. In addition, an SVM-guided weight selection module is proposed to fill the encoder parameters for downstream networks with the optimal weight during the fine-tuning stage, maximizing linear accuracy and facilitating the acquisition of intricate representations for new objects. Extensive experiments show that the four baselines equipped with the proposed TPM achieve comprehensive performance improvements on various downstream tasks. Our code and models are available at https://github.com/liujia99/TPM.
Related papers
- Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models.
Recent studies extend the SAM to Few-shot Semantic segmentation (FSS)
We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z) - Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - Fast and Efficient: Mask Neural Fields for 3D Scene Segmentation [47.08813064337934]
This paper presents MaskField, which enables efficient 3D open-vocabulary segmentation with neural fields from a novel perspective.
MaskField decomposes the distillation of mask and semantic features from foundation models by formulating a mask feature field and queries.
Our experiments show that MaskField not only surpasses prior state-of-the-art methods but also achieves remarkably fast convergence.
arXiv Detail & Related papers (2024-07-01T12:07:26Z) - MaskLRF: Self-supervised Pretraining via Masked Autoencoding of Local Reference Frames for Rotation-invariant 3D Point Set Analysis [1.19658449368018]
This paper develops, for the first time, a rotation-invariant self-supervised pretraining framework for practical 3D point set analysis.
The proposed algorithm, called MaskLRF, learns rotation-invariant and highly generalizable latent features via masked autoencoding of 3D points.
I confirm that MaskLRF achieves new state-of-the-art accuracies in analyzing 3D point sets having inconsistent orientations.
arXiv Detail & Related papers (2024-03-01T00:42:49Z) - Towards Compact 3D Representations via Point Feature Enhancement Masked
Autoencoders [52.66195794216989]
We propose Point Feature Enhancement Masked Autoencoders (Point-FEMAE) to learn compact 3D representations.
Point-FEMAE consists of a global branch and a local branch to capture latent semantic features.
Our method significantly improves the pre-training efficiency compared to cross-modal alternatives.
arXiv Detail & Related papers (2023-12-17T14:17:05Z) - M$^3$CS: Multi-Target Masked Point Modeling with Learnable Codebook and
Siamese Decoders [19.68592678093725]
Masked point modeling has become a promising scheme of self-supervised pre-training for point clouds.
M$3$CS is proposed to enable the model with the above abilities.
arXiv Detail & Related papers (2023-09-23T02:19:21Z) - Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud
Pre-training [56.81809311892475]
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for language and 2D image transformers.
We propose Point-M2AE, a strong Multi-scale MAE pre-training framework for hierarchical self-supervised learning of 3D point clouds.
arXiv Detail & Related papers (2022-05-28T11:22:53Z) - PointINS: Point-based Instance Segmentation [117.38579097923052]
Mask representation in instance segmentation with Point-of-Interest (PoI) features is challenging because learning a high-dimensional mask feature for each instance requires a heavy computing burden.
We propose an instance-aware convolution, which decomposes this mask representation learning task into two tractable modules.
Along with instance-aware convolution, we propose PointINS, a simple and practical instance segmentation approach.
arXiv Detail & Related papers (2020-03-13T08:24:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.