Related papers: Masked Autoencoders for Point Cloud Self-supervised Learning

Masked Autoencoders for Point Cloud Self-supervised Learning

URL: http://arxiv.org/abs/2203.06604v1
Date: Sun, 13 Mar 2022 09:23:39 GMT
Title: Masked Autoencoders for Point Cloud Self-supervised Learning
Authors: Yatian Pang, Wenxiao Wang, Francis E.H. Tay, Wei Liu, Yonghong Tian, Li Yuan
Abstract summary: We propose a neat scheme of masked autoencoders for point cloud self-supervised learning. We divide the input point cloud into irregular point patches and randomly mask them at a high ratio. A standard Transformer based autoencoder, with an asymmetric design and a shifting mask tokens operation, learns high-level latent features from unmasked point patches.
Score: 27.894216954216716
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As a promising scheme of self-supervised learning, masked autoencoding has significantly advanced natural language processing and computer vision. Inspired by this, we propose a neat scheme of masked autoencoders for point cloud self-supervised learning, addressing the challenges posed by point cloud's properties, including leakage of location information and uneven information density. Concretely, we divide the input point cloud into irregular point patches and randomly mask them at a high ratio. Then, a standard Transformer based autoencoder, with an asymmetric design and a shifting mask tokens operation, learns high-level latent features from unmasked point patches, aiming to reconstruct the masked point patches. Extensive experiments show that our approach is efficient during pre-training and generalizes well on various downstream tasks. Specifically, our pre-trained models achieve 84.52\% accuracy on ScanObjectNN and 94.04% accuracy on ModelNet40, outperforming all the other self-supervised learning methods. We show with our scheme, a simple architecture entirely based on standard Transformers can surpass dedicated Transformer models from supervised learning. Our approach also advances state-of-the-art accuracies by 1.5%-2.3% in the few-shot object classification. Furthermore, our work inspires the feasibility of applying unified architectures from languages and images to the point cloud.

Related papers

Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data. We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process. In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z)
SeRP: Self-Supervised Representation Learning Using Perturbed Point Clouds [6.29475963948119]
SeRP consists of encoder-decoder architecture that takes perturbed or corrupted point clouds as inputs. We have used Transformers and PointNet-based Autoencoders.
arXiv Detail & Related papers (2022-09-13T15:22:36Z)
SdAE: Self-distillated Masked Autoencoder [95.3684955370897]
Self-distillated masked AutoEncoder network SdAE is proposed in this paper. With only 300 epochs pre-training, a vanilla ViT-Base model achieves an 84.1% fine-tuning accuracy on ImageNet-1k classification.
arXiv Detail & Related papers (2022-07-31T15:07:25Z)
Masked Autoencoders in 3D Point Cloud Representation Learning [7.617783375837524]
We propose masked Autoencoders in 3D point cloud representation learning (abbreviated as MAE3D) We first split the input point cloud into patches and mask a portion of them, then use our Patch Embedding Module to extract the features of unmasked patches. Comprehensive experiments demonstrate that the local features extracted by our MAE3D from point cloud patches are beneficial for downstream classification tasks.
arXiv Detail & Related papers (2022-07-04T16:13:27Z)
Masked Autoencoders for Self-Supervised Learning on Automotive Point Clouds [2.8544513613730205]
Masked autoencoding has become a successful pre-training paradigm for Transformer models for text, images, and recently, point clouds. We propose VoxelMAE, a simple masked autoencoding pretraining scheme designed for voxel representations. Our method improves the 3D OD performance by 1.75 mAP points and 1.05 NDS on the challenging nuScenes dataset.
arXiv Detail & Related papers (2022-07-01T16:31:45Z)
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training [56.81809311892475]
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for language and 2D image transformers. We propose Point-M2AE, a strong Multi-scale MAE pre-training framework for hierarchical self-supervised learning of 3D point clouds.
arXiv Detail & Related papers (2022-05-28T11:22:53Z)
Masked Discrimination for Self-Supervised Learning on Point Clouds [27.652157544218234]
Masked autoencoding has achieved great success for self-supervised learning in the image and language domains. Standard backbones like PointNet are unable to properly handle the training versus testing distribution mismatch introduced by masking during training. We bridge this gap by proposing a discriminative mask pretraining Transformer framework, MaskPoint, for point clouds.
arXiv Detail & Related papers (2022-03-21T17:57:34Z)
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling [104.82953953453503]
We present Point-BERT, a new paradigm for learning Transformers to generalize the concept of BERT to 3D point cloud. Experiments demonstrate that the proposed BERT-style pre-training strategy significantly improves the performance of standard point cloud Transformers.
arXiv Detail & Related papers (2021-11-29T18:59:03Z)
Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z)
MST: Masked Self-Supervised Transformer for Visual Representation [52.099722121603506]
Transformer has been widely used for self-supervised pre-training in Natural Language Processing (NLP) We present a novel Masked Self-supervised Transformer approach named MST, which can explicitly capture the local context of an image. MST achieves Top-1 accuracy of 76.9% with DeiT-S only using 300-epoch pre-training by linear evaluation.
arXiv Detail & Related papers (2021-06-10T11:05:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.