Point-McBert: A Multi-choice Self-supervised Framework for Point Cloud
Pre-training
- URL: http://arxiv.org/abs/2207.13226v1
- Date: Wed, 27 Jul 2022 00:34:33 GMT
- Title: Point-McBert: A Multi-choice Self-supervised Framework for Point Cloud
Pre-training
- Authors: Kexue Fu, Mingzhi Yuan, Manning Wang
- Abstract summary: We propose Point-McBert, a pre-training framework with eased and refined supervision signals.
Specifically, we ease the previous single-choice constraint on patches, and provide multi-choice token ids for each patch as supervision.
Our method achieves 94.1% accuracy on ModelNet40, 84.28% accuracy on the hardest setting of ScanObjectNN and new state-of-the-art performance on few-shot learning.
- Score: 6.037383467521294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Masked language modeling (MLM) has become one of the most successful
self-supervised pre-training task. Inspired by its success, Point-Bert, as a
pioneer work in point cloud, proposed masked point modeling (MPM) to pre-train
point transformer on large scale unanotated dataset. Despite its great
performance, we find inherent difference between language and point cloud tends
to cause ambiguous tokenization for point cloud. For point cloud, there doesn't
exist a gold standard for point cloud tokenization. Although Point-Bert
introduce a discrete Variational AutoEncoder (dVAE) as tokenizer to allocate
token ids to local patches, it tends to generate ambigious token ids for local
patches. We find this imperfect tokenizer might generate different token ids
for semantically-similar patches and same token ids for semantically-dissimilar
patches. To tackle above problem, we propose our Point-McBert, a pre-training
framework with eased and refined supervision signals. Specifically, we ease the
previous single-choice constraint on patches, and provide multi-choice token
ids for each patch as supervision. Moreover, we utilitze the high-level
semantics learned by transformer to further refine our supervision signals.
Extensive experiments on point cloud classification, few-shot classification
and part segmentation tasks demonstrate the superiority of our method, e.g.,
the pre-trained transformer achieves 94.1% accuracy on ModelNet40, 84.28%
accuracy on the hardest setting of ScanObjectNN and new state-of-the-art
performance on few-shot learning. We also demonstrate that our method not only
improves the performance of Point-Bert on all downstream tasks, but also incurs
almost no extra computational overhead.
Related papers
- Point Cloud Pre-training with Diffusion Models [62.12279263217138]
We propose a novel pre-training method called Point cloud Diffusion pre-training (PointDif)
PointDif achieves substantial improvement across various real-world datasets for diverse downstream tasks such as classification, segmentation and detection.
arXiv Detail & Related papers (2023-11-25T08:10:05Z) - FreePoint: Unsupervised Point Cloud Instance Segmentation [72.64540130803687]
We propose FreePoint, for underexplored unsupervised class-agnostic instance segmentation on point clouds.
We represent point features by combining coordinates, colors, and self-supervised deep features.
Based on the point features, we segment point clouds into coarse instance masks as pseudo labels, which are used to train a point cloud instance segmentation model.
arXiv Detail & Related papers (2023-05-11T16:56:26Z) - PointPatchMix: Point Cloud Mixing with Patch Scoring [58.58535918705736]
We propose PointPatchMix, which mixes point clouds at the patch level and generates content-based targets for mixed point clouds.
Our approach preserves local features at the patch level, while the patch scoring module assigns targets based on the content-based significance score from a pre-trained teacher model.
With Point-MAE as our baseline, our model surpasses previous methods by a significant margin, achieving 86.3% accuracy on ScanObjectNN and 94.1% accuracy on ModelNet40.
arXiv Detail & Related papers (2023-03-12T14:49:42Z) - Weakly Supervised Semantic Segmentation for Large-Scale Point Cloud [69.36717778451667]
Existing methods for large-scale point cloud semantic segmentation require expensive, tedious and error-prone manual point-wise annotations.
We propose an effective weakly supervised method containing two components to solve the problem.
The experimental results show the large gain against existing weakly supervised and comparable results to fully supervised methods.
arXiv Detail & Related papers (2022-12-09T09:42:26Z) - EPCL: Frozen CLIP Transformer is An Efficient Point Cloud Encoder [60.52613206271329]
This paper introduces textbfEfficient textbfPoint textbfCloud textbfLearning (EPCL) for training high-quality point cloud models with a frozen CLIP transformer.
Our EPCL connects the 2D and 3D modalities by semantically aligning the image features and point cloud features without paired 2D-3D data.
arXiv Detail & Related papers (2022-12-08T06:27:11Z) - Masked Autoencoders in 3D Point Cloud Representation Learning [7.617783375837524]
We propose masked Autoencoders in 3D point cloud representation learning (abbreviated as MAE3D)
We first split the input point cloud into patches and mask a portion of them, then use our Patch Embedding Module to extract the features of unmasked patches.
Comprehensive experiments demonstrate that the local features extracted by our MAE3D from point cloud patches are beneficial for downstream classification tasks.
arXiv Detail & Related papers (2022-07-04T16:13:27Z) - POS-BERT: Point Cloud One-Stage BERT Pre-Training [34.30767607646814]
We propose POS-BERT, a one-stage BERT pre-training method for point clouds.
Unlike Point-BERT, its tokenizer is extra-trained and frozen.
POS-BERT achieves the state-of-the-art classification accuracy, which exceeds Point-BERT by 3.5%.
arXiv Detail & Related papers (2022-04-03T04:49:39Z) - Masked Discrimination for Self-Supervised Learning on Point Clouds [27.652157544218234]
Masked autoencoding has achieved great success for self-supervised learning in the image and language domains.
Standard backbones like PointNet are unable to properly handle the training versus testing distribution mismatch introduced by masking during training.
We bridge this gap by proposing a discriminative mask pretraining Transformer framework, MaskPoint, for point clouds.
arXiv Detail & Related papers (2022-03-21T17:57:34Z) - Masked Autoencoders for Point Cloud Self-supervised Learning [27.894216954216716]
We propose a neat scheme of masked autoencoders for point cloud self-supervised learning.
We divide the input point cloud into irregular point patches and randomly mask them at a high ratio.
A standard Transformer based autoencoder, with an asymmetric design and a shifting mask tokens operation, learns high-level latent features from unmasked point patches.
arXiv Detail & Related papers (2022-03-13T09:23:39Z) - Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point
Modeling [104.82953953453503]
We present Point-BERT, a new paradigm for learning Transformers to generalize the concept of BERT to 3D point cloud.
Experiments demonstrate that the proposed BERT-style pre-training strategy significantly improves the performance of standard point cloud Transformers.
arXiv Detail & Related papers (2021-11-29T18:59:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.