Masked Autoencoders for Self-Supervised Learning on Automotive Point
Clouds
- URL: http://arxiv.org/abs/2207.00531v1
- Date: Fri, 1 Jul 2022 16:31:45 GMT
- Title: Masked Autoencoders for Self-Supervised Learning on Automotive Point
Clouds
- Authors: Georg Hess, Johan Jaxing, Elias Svensson, David Hagerman, Christoffer
Petersson, Lennart Svensson
- Abstract summary: Masked autoencoding has become a successful pre-training paradigm for Transformer models for text, images, and recently, point clouds.
We propose VoxelMAE, a simple masked autoencoding pretraining scheme designed for voxel representations.
Our method improves the 3D OD performance by 1.75 mAP points and 1.05 NDS on the challenging nuScenes dataset.
- Score: 2.8544513613730205
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Masked autoencoding has become a successful pre-training paradigm for
Transformer models for text, images, and recently, point clouds. Raw automotive
datasets are a suitable candidate for self-supervised pre-training as they
generally are cheap to collect compared to annotations for tasks like 3D object
detection (OD). However, development of masked autoencoders for point clouds
has focused solely on synthetic and indoor data. Consequently, existing methods
have tailored their representations and models toward point clouds which are
small, dense and have homogeneous point density. In this work, we study masked
autoencoding for point clouds in an automotive setting, which are sparse and
for which the point density can vary drastically among objects in the same
scene. To this end, we propose Voxel-MAE, a simple masked autoencoding
pre-training scheme designed for voxel representations. We pre-train the
backbone of a Transformer-based 3D object detector to reconstruct masked voxels
and to distinguish between empty and non-empty voxels. Our method improves the
3D OD performance by 1.75 mAP points and 1.05 NDS on the challenging nuScenes
dataset. Compared to existing self-supervised methods for automotive data,
Voxel-MAE displays up to $2\times$ performance increase. Further, we show that
by pre-training with Voxel-MAE, we require only 40% of the annotated data to
outperform a randomly initialized equivalent. Code will be released.
Related papers
- BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud
Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving.
Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation.
We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z) - SeRP: Self-Supervised Representation Learning Using Perturbed Point
Clouds [6.29475963948119]
SeRP consists of encoder-decoder architecture that takes perturbed or corrupted point clouds as inputs.
We have used Transformers and PointNet-based Autoencoders.
arXiv Detail & Related papers (2022-09-13T15:22:36Z) - MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point
Cloud Action Recognition [160.49403075559158]
We propose a Masked Pseudo-Labeling autoEncoder (textbfMAPLE) framework for point cloud action recognition.
In particular, we design a novel and efficient textbfDecoupled textbfspatial-textbftemporal TranstextbfFormer (textbfDestFormer) as the backbone of MAPLE.
MAPLE achieves superior results on three public benchmarks and outperforms the state-of-the-art method by 8.08% accuracy on the MSR-Action3
arXiv Detail & Related papers (2022-09-01T12:32:40Z) - Masked Autoencoders in 3D Point Cloud Representation Learning [7.617783375837524]
We propose masked Autoencoders in 3D point cloud representation learning (abbreviated as MAE3D)
We first split the input point cloud into patches and mask a portion of them, then use our Patch Embedding Module to extract the features of unmasked patches.
Comprehensive experiments demonstrate that the local features extracted by our MAE3D from point cloud patches are beneficial for downstream classification tasks.
arXiv Detail & Related papers (2022-07-04T16:13:27Z) - Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud
Pre-training [56.81809311892475]
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for language and 2D image transformers.
We propose Point-M2AE, a strong Multi-scale MAE pre-training framework for hierarchical self-supervised learning of 3D point clouds.
arXiv Detail & Related papers (2022-05-28T11:22:53Z) - Self-Supervised Point Cloud Representation Learning with Occlusion
Auto-Encoder [63.77257588569852]
We present 3D Occlusion Auto-Encoder (3D-OAE) for learning representations for point clouds.
Our key idea is to randomly occlude some local patches of the input point cloud and establish the supervision via recovering the occluded patches.
In contrast with previous methods, our 3D-OAE can remove a large proportion of patches and predict them only with a small number of visible patches.
arXiv Detail & Related papers (2022-03-26T14:06:29Z) - Masked Autoencoders for Point Cloud Self-supervised Learning [27.894216954216716]
We propose a neat scheme of masked autoencoders for point cloud self-supervised learning.
We divide the input point cloud into irregular point patches and randomly mask them at a high ratio.
A standard Transformer based autoencoder, with an asymmetric design and a shifting mask tokens operation, learns high-level latent features from unmasked point patches.
arXiv Detail & Related papers (2022-03-13T09:23:39Z) - Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point
Modeling [104.82953953453503]
We present Point-BERT, a new paradigm for learning Transformers to generalize the concept of BERT to 3D point cloud.
Experiments demonstrate that the proposed BERT-style pre-training strategy significantly improves the performance of standard point cloud Transformers.
arXiv Detail & Related papers (2021-11-29T18:59:03Z) - InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic
Information Modeling [65.47126868838836]
We propose a novel 3D object detection framework with dynamic information modeling.
Coarse predictions are generated in the first stage via a voxel-based region proposal network.
Experiments are conducted on the large-scale nuScenes 3D detection benchmark.
arXiv Detail & Related papers (2020-07-16T18:27:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.