Related papers: Masked Autoencoders for Self-Supervised Learning on Automotive Point Clouds

Masked Autoencoders for Self-Supervised Learning on Automotive Point Clouds

URL: http://arxiv.org/abs/2207.00531v1
Date: Fri, 1 Jul 2022 16:31:45 GMT
Title: Masked Autoencoders for Self-Supervised Learning on Automotive Point Clouds
Authors: Georg Hess, Johan Jaxing, Elias Svensson, David Hagerman, Christoffer Petersson, Lennart Svensson
Abstract summary: Masked autoencoding has become a successful pre-training paradigm for Transformer models for text, images, and recently, point clouds. We propose VoxelMAE, a simple masked autoencoding pretraining scheme designed for voxel representations. Our method improves the 3D OD performance by 1.75 mAP points and 1.05 NDS on the challenging nuScenes dataset.
Score: 2.8544513613730205
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Masked autoencoding has become a successful pre-training paradigm for Transformer models for text, images, and recently, point clouds. Raw automotive datasets are a suitable candidate for self-supervised pre-training as they generally are cheap to collect compared to annotations for tasks like 3D object detection (OD). However, development of masked autoencoders for point clouds has focused solely on synthetic and indoor data. Consequently, existing methods have tailored their representations and models toward point clouds which are small, dense and have homogeneous point density. In this work, we study masked autoencoding for point clouds in an automotive setting, which are sparse and for which the point density can vary drastically among objects in the same scene. To this end, we propose Voxel-MAE, a simple masked autoencoding pre-training scheme designed for voxel representations. We pre-train the backbone of a Transformer-based 3D object detector to reconstruct masked voxels and to distinguish between empty and non-empty voxels. Our method improves the 3D OD performance by 1.75 mAP points and 1.05 NDS on the challenging nuScenes dataset. Compared to existing self-supervised methods for automotive data, Voxel-MAE displays up to $2\times$ performance increase. Further, we show that by pre-training with Voxel-MAE, we require only 40% of the annotated data to outperform a randomly initialized equivalent. Code will be released.

Related papers

3D Point Cloud Generation via Autoregressive Up-sampling [60.05226063558296]
We introduce a pioneering autoregressive generative model for 3D point cloud generation. Inspired by visual autoregressive modeling, we conceptualize point cloud generation as an autoregressive up-sampling process. PointARU progressively refines 3D point clouds from coarse to fine scales.
arXiv Detail & Related papers (2025-03-11T16:30:45Z)
BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving. Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation. We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z)
SeRP: Self-Supervised Representation Learning Using Perturbed Point Clouds [6.29475963948119]
SeRP consists of encoder-decoder architecture that takes perturbed or corrupted point clouds as inputs. We have used Transformers and PointNet-based Autoencoders.
arXiv Detail & Related papers (2022-09-13T15:22:36Z)
MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point Cloud Action Recognition [160.49403075559158]
We propose a Masked Pseudo-Labeling autoEncoder (textbfMAPLE) framework for point cloud action recognition. In particular, we design a novel and efficient textbfDecoupled textbfspatial-textbftemporal TranstextbfFormer (textbfDestFormer) as the backbone of MAPLE. MAPLE achieves superior results on three public benchmarks and outperforms the state-of-the-art method by 8.08% accuracy on the MSR-Action3
arXiv Detail & Related papers (2022-09-01T12:32:40Z)
Masked Autoencoders in 3D Point Cloud Representation Learning [7.617783375837524]
We propose masked Autoencoders in 3D point cloud representation learning (abbreviated as MAE3D) We first split the input point cloud into patches and mask a portion of them, then use our Patch Embedding Module to extract the features of unmasked patches. Comprehensive experiments demonstrate that the local features extracted by our MAE3D from point cloud patches are beneficial for downstream classification tasks.
arXiv Detail & Related papers (2022-07-04T16:13:27Z)
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training [56.81809311892475]
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for language and 2D image transformers. We propose Point-M2AE, a strong Multi-scale MAE pre-training framework for hierarchical self-supervised learning of 3D point clouds.
arXiv Detail & Related papers (2022-05-28T11:22:53Z)
Self-Supervised Point Cloud Representation Learning with Occlusion Auto-Encoder [63.77257588569852]
We present 3D Occlusion Auto-Encoder (3D-OAE) for learning representations for point clouds. Our key idea is to randomly occlude some local patches of the input point cloud and establish the supervision via recovering the occluded patches. In contrast with previous methods, our 3D-OAE can remove a large proportion of patches and predict them only with a small number of visible patches.
arXiv Detail & Related papers (2022-03-26T14:06:29Z)
Masked Autoencoders for Point Cloud Self-supervised Learning [27.894216954216716]
We propose a neat scheme of masked autoencoders for point cloud self-supervised learning. We divide the input point cloud into irregular point patches and randomly mask them at a high ratio. A standard Transformer based autoencoder, with an asymmetric design and a shifting mask tokens operation, learns high-level latent features from unmasked point patches.
arXiv Detail & Related papers (2022-03-13T09:23:39Z)
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling [104.82953953453503]
We present Point-BERT, a new paradigm for learning Transformers to generalize the concept of BERT to 3D point cloud. Experiments demonstrate that the proposed BERT-style pre-training strategy significantly improves the performance of standard point cloud Transformers.
arXiv Detail & Related papers (2021-11-29T18:59:03Z)
InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling [65.47126868838836]
We propose a novel 3D object detection framework with dynamic information modeling. Coarse predictions are generated in the first stage via a voxel-based region proposal network. Experiments are conducted on the large-scale nuScenes 3D detection benchmark.
arXiv Detail & Related papers (2020-07-16T18:27:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.