Masked Clustering Prediction for Unsupervised Point Cloud Pre-training
- URL: http://arxiv.org/abs/2508.08910v1
- Date: Tue, 12 Aug 2025 12:58:44 GMT
- Title: Masked Clustering Prediction for Unsupervised Point Cloud Pre-training
- Authors: Bin Ren, Xiaoshui Huang, Mengyuan Liu, Hong Liu, Fabio Poiesi, Nicu Sebe, Guofeng Mei,
- Abstract summary: MaskClu is a novel unsupervised pre-training method for ViTs on 3D point clouds.<n>It integrates masked point modeling with clustering-based learning.
- Score: 61.11226004056774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision transformers (ViTs) have recently been widely applied to 3D point cloud understanding, with masked autoencoding as the predominant pre-training paradigm. However, the challenge of learning dense and informative semantic features from point clouds via standard ViTs remains underexplored. We propose MaskClu, a novel unsupervised pre-training method for ViTs on 3D point clouds that integrates masked point modeling with clustering-based learning. MaskClu is designed to reconstruct both cluster assignments and cluster centers from masked point clouds, thus encouraging the model to capture dense semantic information. Additionally, we introduce a global contrastive learning mechanism that enhances instance-level feature learning by contrasting different masked views of the same point cloud. By jointly optimizing these complementary objectives, i.e., dense semantic reconstruction, and instance-level contrastive learning. MaskClu enables ViTs to learn richer and more semantically meaningful representations from 3D point clouds. We validate the effectiveness of our method via multiple 3D tasks, including part segmentation, semantic segmentation, object detection, and classification, where MaskClu sets new competitive results. The code and models will be released at:https://github.com/Amazingren/maskclu.
Related papers
- Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Clustering based Point Cloud Representation Learning for 3D Analysis [80.88995099442374]
We propose a clustering based supervised learning scheme for point cloud analysis.
Unlike current de-facto, scene-wise training paradigm, our algorithm conducts within-class clustering on the point embedding space.
Our algorithm shows notable improvements on famous point cloud segmentation datasets.
arXiv Detail & Related papers (2023-07-27T03:42:12Z) - CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud
Semantic Segmentation [60.0893353960514]
We study the task of weakly-supervised point cloud semantic segmentation with sparse annotations.
We propose a Contextual Point Cloud Modeling ( CPCM) method that consists of two parts: a region-wise masking (RegionMask) strategy and a contextual masked training (CMT) method.
arXiv Detail & Related papers (2023-07-19T04:41:18Z) - Self-supervised adversarial masking for 3D point cloud representation
learning [0.38233569758620056]
We introduce PointCAM, a novel adversarial method for learning a masking function for point clouds.
Compared to previous techniques, we postulate applying an auxiliary network that learns how to select masks instead of choosing them randomly.
Our results show that the learned masking function achieves state-of-the-art or competitive performance on various downstream tasks.
arXiv Detail & Related papers (2023-07-11T15:11:06Z) - Data Augmentation-free Unsupervised Learning for 3D Point Cloud
Understanding [61.30276576646909]
We propose an augmentation-free unsupervised approach for point clouds to learn transferable point-level features via soft clustering, named SoftClu.
We exploit the affiliation of points to their clusters as a proxy to enable self-training through a pseudo-label prediction task.
arXiv Detail & Related papers (2022-10-06T10:18:16Z) - Masked Autoencoders in 3D Point Cloud Representation Learning [7.617783375837524]
We propose masked Autoencoders in 3D point cloud representation learning (abbreviated as MAE3D)
We first split the input point cloud into patches and mask a portion of them, then use our Patch Embedding Module to extract the features of unmasked patches.
Comprehensive experiments demonstrate that the local features extracted by our MAE3D from point cloud patches are beneficial for downstream classification tasks.
arXiv Detail & Related papers (2022-07-04T16:13:27Z) - Masked Discrimination for Self-Supervised Learning on Point Clouds [27.652157544218234]
Masked autoencoding has achieved great success for self-supervised learning in the image and language domains.
Standard backbones like PointNet are unable to properly handle the training versus testing distribution mismatch introduced by masking during training.
We bridge this gap by proposing a discriminative mask pretraining Transformer framework, MaskPoint, for point clouds.
arXiv Detail & Related papers (2022-03-21T17:57:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.